More on Signals in Bash
You will login a shell, launch a long running script and then quit the shell/terminal but not sure if the script (and/or its child processes) will continue running or not? You tried to catch and handle signals in a script but it didn't work as you expected? Below are some tests to help you understand these things.
1: #!/usr/local/bin/bash 2: 3: for each in "SIGHUP" "SIGINT" "SIGTERM"; do 4: # shellcheck disable=SC2064 5: trap "echo Received $each && exit 1" $each 6: done 7: 8: python3 -c ' 9: import datetime, time 10: while True: 11: print("Python loop - %s" % datetime.datetime.now(), flush=True) 12: time.sleep(2) 13: ' 14: 15: while True; do 16: echo "Script loop - $(date -u)"; sleep 2 17: done 18: 19: echo "end of the script"
The above is the script (test.sh
) I use for the tests:
- 3~6
- set up traps to print the signal name before exiting the script if one of the three signals is received.
- 8~13
- spawn a Python process to loop infinitely. This to mimic a long running external command invoked in the script.
- 15~18
- an infinitely shell loop.
Now, let's start the tests. In each test, I'll run the test script (test.sh
)
and then monitor (tail) the output file to check if the processes are terminated
or not.
Fistly, the simplest case, run the script in the foreground: ./test.sh >test.out 2>&1
The interactive bash shell sends SIGINT (for Ctrl-c) or SIGHUP (or closing
terminal) to all foreground processes. Therefore, in these two cases, both the
parent process (test.sh
) and the child process (python3
) will get the signal
and exit:
Press Ctrl-c
Python loop - 2020-02-11 14:51:04.795263 ... Python loop - 2020-02-11 14:51:12.806468 Traceback (most recent call last): File "<string>", line 5, in <module> KeyboardInterrupt <=== from the child process: python3 Received SIGINT <=== from the parent process: test.sh
Close the terminal
Python loop - 2020-02-11 14:52:24.618521 Python loop - 2020-02-11 14:52:26.620370 Python loop - 2020-02-11 14:52:28.622768 Hangup: 1 <=== from the child Received SIGHUP <=== from the parent
Secondly, run the script in background: ./test.sh >test.out 2>&1 &
Obviously, the processes will continue running upon Ctrl-c because the interactive shell will not send SIGINT to background processes. But, when the terminal is closed, they will still receive SIGHUP and abort.
NOTE
- Closing the terminal means closing the terminal directly. Running
exit
to quit the shell (so that the terminal emulator close the window automatically) does not count. In this case, the shell quits actively and does not receive SIGHUP. Hence it won't send SIGHUP to the script and the child processes of the script. - You may not observe this behaviour with some terminal emulators. My guess is: some terminal emulators manage to communicate with the shell session and close it gracefully when a terminal window is closed. Therefore, the shell (as well as the script and its child processes) does not receive SIGHUP. According to my test with OS X, SIGHUP is sent when a terminal is closed in tmux or iTerm2 but that is not case with the builtin Terminal APP.
The 3rd and 4th tests are running the script using nohup
As the name suggests, nohup
makes the spawned process ignore SIGHUP. But they
will not ignore SIGINT. As a result:
- If
nohup
is run in the foreground (nohup ./test.sh
), Ctrl-c interrupts the script. However, closing the terminal will not interrupt the script. - In comparison, if
nohup
is run in the background (nohup ./test.sh &
), neither Ctrl-C nor closing terminal would interrupt the script.
NOTE If a process registers its own SIGHUP handler, nohup
will not overwrite
the handler to ignore SIGHUP.
What if we edit test.sh
to execute the Python process in the background?
I.e. edit test.sh
to run python3 -c '...' &
(note the additional &
at the
end).
In this case, the Python process will be run in the background no matter if the parent process (i.e. the script) is in the background or not. Therefore, we'd expect the following:
Ctrl-c does not affect the Python process but may interrupt the parent script.
Below is the output of
./test.sh >test.out 2>&1
. Initially both the parent process and child process were printing timestamps. Once I pressed Ctrl-c, the parent process (the loop in the script) was interrupted but the Python process continued.Python loop - 2020-02-11 23:12:02.196811 Script loop - Tue Feb 11 12:12:04 UTC 2020 Python loop - 2020-02-11 23:12:04.198417 Script loop - Tue Feb 11 12:12:06 UTC 2020 Python loop - 2020-02-11 23:12:06.201232 Received SIGINT Python loop - 2020-02-11 23:12:08.201730 Python loop - 2020-02-11 23:12:10.206854 Python loop - 2020-02-11 23:12:12.209047
On the other hand, closing terminal still delivers SIGHUP to both the parent and the child process and interrupts them regardless if the script is executed in the foreground or not.
Script loop - Tue Feb 11 12:15:10 UTC 2020 Python loop - 2020-02-11 23:15:10.597609 Python loop - 2020-02-11 23:15:12.600922 Script loop - Tue Feb 11 12:15:12 UTC 2020 Python loop - 2020-02-11 23:15:14.601010 Script loop - Tue Feb 11 12:15:14 UTC 2020 Hangup: 1 Received SIGHUP
What if we remove the while True:
loop from the shell script?
I.e. what if the parent script finishes before the child? Does it change the test results at all?
Below is the modified script:
#!/usr/local/bin/bash for each in "SIGHUP" "SIGINT" "SIGTERM"; do # shellcheck disable=SC2064 trap "echo Received $each && exit 1" $each done python3 -c ' import datetime, time while True: print("Python loop - %s" % datetime.datetime.now(), flush=True) time.sleep(2) ' & echo "end of the script"
Launch the above modified script in the foreground and then check the processes.
From the output of ps
below, we can see:
- The parent process (the script) has finished.
- The child process is reaped by the
init
process (ppid of it is 1).
$ ./test.sh >test.out 2>&1 $ ps -eo "pid,ppid,pgid,jobc,command" | egrep -i '(test|python)' | sed 's/ \/.*\// /' 55317 1 55306 0 Python -c \012import datetime, time\012while True:\012 print("Python loop - %s" % datetime.datetime.now(), flush=True)\012 time.sleep(2)\012 ...
In comparison, when both the script and the python process keep running, the currently shell is the parent of the script and the script is the parent of the spawned process:
$ echo $$ 53064 $ ps -eo "pid,ppid,pgid,jobc,command" | egrep -i '(test|python)' | sed 's/ \/.*\// /' 53488 53064 53488 1 test.sh 53494 53488 53488 1 Python -c \012import datetime, time\012while True:\012 print("Python loop - %s" % datetime.datetime.now(), flush=True)\012 time.sleep(2)\012 ...
This is actually a very importance difference. Once the spawned process a
child of the init
process, the current session will no longer not dispatch
SIGINT, SIGHUP etc. to it. Hence, it is common practice to start a daemon like
this: a parent script spawn the long running daemon process in the background
and then quit.
trap
Finally let's have a closer look at trap
. As you must already know, trap
catches the specified signals and runs according command(s).
But, if you run the original script and then send SIGINT
to the script using
kill -SIGINT pid_of_script
, you'll notice the script does not respond to the
signal.
Below is how I ran the test:
$ ./test.sh >test.out 2>&1 & # <=== run the script [1] 75519 $ ps -eo "pid,ppid,pgid,jobc,command" | egrep -i '(test|python)' | sed 's/ \/.*\// /' 75519 56180 75519 1 test.sh 75525 75519 75519 1 Python -c \012import datetime, time\012while True:\012 print("Python loop - %s" % datetime.datetime.now(), flush=True)\012 time.sleep(2)\012 ... $ kill -SIGINT 75519 # <=== send SIGTERM to the parent process $ kill -SIGINT 75525 # <=== send SIGTERM to the child [1]+ Exit 1 ./test.sh > test.out 2>&1 $
The test.out
indicates the script didn't respond to the signal until I sent
SIGINT to the child (python) process. Once the child aborted because of
SIGINT, the parent ran the trap.
Python loop - 2020-02-13 16:09:59.676853 Python loop - 2020-02-13 16:10:01.677777 Python loop - 2020-02-13 16:10:03.679282 Python loop - 2020-02-13 16:10:05.680427 Traceback (most recent call last): File "<string>", line 5, in <module> KeyboardInterrupt Received SIGINT
So, the parent did receive the signal but would not run the trap until the
child process completed? Exactly! In fact, if you review the test.out
of
previous tests you'll find the child process always quit before the parent.
Why? This is actually clearly documented in Bash Manual:
If Bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes.
The Bash Manual also says:
When Bash is waiting for an asynchronous command via the wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed.
How does wait
change the behaviour of the parent process and the child
process? I'd leave it for you to figure out.
Why kill -SIGINT pid_of_parent
does not work the same as pressing Ctrl-c?
Now we understand why the parent process didn't exit immediately when we sent
it a SIGINT via kill -SIGINT pid
. But, why in the first test we were able to
interrupt both the parent and the child by pressing Ctrl-c?
The reason is: actually Ctrl-c sends SIGINT to not a process but a process
group, i.e. all the processes within the group. Therefore, when we press
Ctrl-c, both the child and parent receive the signal, causing the child and
parent exit in turn. To achieve the same using kill
, please send the signal
to the process group instead using kill -SIGHUP -- -pgid
. In the following
example, both the script and the python process belong to process group 77210,
therefore kill -SIGINT -- -77210
did the trick.
$ ./test.sh >test.out 2>&1 & [1] 77210 $ ps -eo "pid,ppid,pgid,jobc,command" | egrep -i '(test|python)' | sed 's/ \/.*\// /' 77210 56180 77210 1 test.sh 77216 77210 77210 1 Python -c \012import datetime, time\012while True:\012 print("Python loop - %s" % datetime.datetime.now(), flush=True)\012 time.sleep(2)\012 .. $ kill -SIGINT -- -77210 [1]+ Exit 1 ./test.sh > test.out 2>&1
References
For in-depth discussions on processes and signals in the UNIX/Linux environment, please refer to:
- Advanced Programming in the UNIX Environment, Third Edition: Chapter 8. Process Control, 9. Process Relationships, and 10. Signals.
- Bash Guide for Beginners - Chapter 12. Catching signals
blog comments powered by Disqus