13 February 2020

You will login a shell, launch a long running script and then quit the shell/terminal but not sure if the script (and/or its child processes) will continue running or not? You tried to catch and handle signals in a script but it didn't work as you expected? Below are some tests to help you understand these things.

 1: #!/usr/local/bin/bash
 2: 
 3: for each in "SIGHUP" "SIGINT" "SIGTERM"; do
 4:     # shellcheck disable=SC2064
 5:     trap "echo Received $each && exit 1" $each
 6: done
 7: 
 8: python3 -c '
 9: import datetime, time
10: while True:
11:     print("Python loop - %s" % datetime.datetime.now(), flush=True)
12:     time.sleep(2)
13: '
14: 
15: while True; do
16:     echo "Script loop - $(date -u)"; sleep 2
17: done
18: 
19: echo "end of the script"

The above is the script (test.sh) I use for the tests:

3~6
set up traps to print the signal name before exiting the script if one of the three signals is received.
8~13
spawn a Python process to loop infinitely. This to mimic a long running external command invoked in the script.
15~18
an infinitely shell loop.

Now, let's start the tests. In each test, I'll run the test script (test.sh) and then monitor (tail) the output file to check if the processes are terminated or not.

Fistly, the simplest case, run the script in the foreground: ./test.sh >test.out 2>&1

The interactive bash shell sends SIGINT (for Ctrl-c) or SIGHUP (or closing terminal) to all foreground processes. Therefore, in these two cases, both the parent process (test.sh) and the child process (python3) will get the signal and exit:

  • Press Ctrl-c

    Python loop - 2020-02-11 14:51:04.795263
    ...
    Python loop - 2020-02-11 14:51:12.806468
    Traceback (most recent call last):            
      File "<string>", line 5, in <module>
    KeyboardInterrupt                             <=== from the child process: python3
    Received SIGINT                               <=== from the parent process: test.sh
    
  • Close the terminal

    Python loop - 2020-02-11 14:52:24.618521
    Python loop - 2020-02-11 14:52:26.620370
    Python loop - 2020-02-11 14:52:28.622768
    Hangup: 1                                     <=== from the child
    Received SIGHUP                               <=== from the parent
    

Secondly, run the script in background: ./test.sh >test.out 2>&1 &

Obviously, the processes will continue running upon Ctrl-c because the interactive shell will not send SIGINT to background processes. But, when the terminal is closed, they will still receive SIGHUP and abort.

NOTE

  • Closing the terminal means closing the terminal directly. Running exit to quit the shell (so that the terminal emulator close the window automatically) does not count. In this case, the shell quits actively and does not receive SIGHUP. Hence it won't send SIGHUP to the script and the child processes of the script.
  • You may not observe this behaviour with some terminal emulators. My guess is: some terminal emulators manage to communicate with the shell session and close it gracefully when a terminal window is closed. Therefore, the shell (as well as the script and its child processes) does not receive SIGHUP. According to my test with OS X, SIGHUP is sent when a terminal is closed in tmux or iTerm2 but that is not case with the builtin Terminal APP.

The 3rd and 4th tests are running the script using nohup

As the name suggests, nohup makes the spawned process ignore SIGHUP. But they will not ignore SIGINT. As a result:

  • If nohup is run in the foreground (nohup ./test.sh), Ctrl-c interrupts the script. However, closing the terminal will not interrupt the script.
  • In comparison, if nohup is run in the background (nohup ./test.sh &), neither Ctrl-C nor closing terminal would interrupt the script.

NOTE If a process registers its own SIGHUP handler, nohup will not overwrite the handler to ignore SIGHUP.

What if we edit test.sh to execute the Python process in the background?

I.e. edit test.sh to run python3 -c '...' & (note the additional & at the end).

In this case, the Python process will be run in the background no matter if the parent process (i.e. the script) is in the background or not. Therefore, we'd expect the following:

  • Ctrl-c does not affect the Python process but may interrupt the parent script.

    Below is the output of ./test.sh >test.out 2>&1. Initially both the parent process and child process were printing timestamps. Once I pressed Ctrl-c, the parent process (the loop in the script) was interrupted but the Python process continued.

    Python loop - 2020-02-11 23:12:02.196811
    Script loop - Tue Feb 11 12:12:04 UTC 2020
    Python loop - 2020-02-11 23:12:04.198417
    Script loop - Tue Feb 11 12:12:06 UTC 2020
    Python loop - 2020-02-11 23:12:06.201232
    Received SIGINT
    Python loop - 2020-02-11 23:12:08.201730
    Python loop - 2020-02-11 23:12:10.206854
    Python loop - 2020-02-11 23:12:12.209047
    
  • On the other hand, closing terminal still delivers SIGHUP to both the parent and the child process and interrupts them regardless if the script is executed in the foreground or not.

    Script loop - Tue Feb 11 12:15:10 UTC 2020
    Python loop - 2020-02-11 23:15:10.597609
    Python loop - 2020-02-11 23:15:12.600922
    Script loop - Tue Feb 11 12:15:12 UTC 2020
    Python loop - 2020-02-11 23:15:14.601010
    Script loop - Tue Feb 11 12:15:14 UTC 2020
    Hangup: 1
    Received SIGHUP
    

What if we remove the while True: loop from the shell script?

I.e. what if the parent script finishes before the child? Does it change the test results at all?

Below is the modified script:

#!/usr/local/bin/bash

for each in "SIGHUP" "SIGINT" "SIGTERM"; do
    # shellcheck disable=SC2064
    trap "echo Received $each && exit 1" $each
done

python3 -c '
import datetime, time
while True:
    print("Python loop - %s" % datetime.datetime.now(), flush=True)
    time.sleep(2)
' &

echo "end of the script"

Launch the above modified script in the foreground and then check the processes. From the output of ps below, we can see:

  • The parent process (the script) has finished.
  • The child process is reaped by the init process (ppid of it is 1).
$ ./test.sh >test.out 2>&1

$ ps -eo "pid,ppid,pgid,jobc,command" | egrep -i '(test|python)' | sed 's/ \/.*\// /'
55317     1 55306    0 Python -c \012import datetime, time\012while True:\012    print("Python loop - %s" % datetime.datetime.now(), flush=True)\012    time.sleep(2)\012
...

In comparison, when both the script and the python process keep running, the currently shell is the parent of the script and the script is the parent of the spawned process:

$ echo $$
53064

$ ps -eo "pid,ppid,pgid,jobc,command" | egrep -i '(test|python)' | sed 's/ \/.*\// /'
53488 53064 53488    1 test.sh
53494 53488 53488    1 Python -c \012import datetime, time\012while True:\012    print("Python loop - %s" % datetime.datetime.now(), flush=True)\012    time.sleep(2)\012
...

This is actually a very importance difference. Once the spawned process a child of the init process, the current session will no longer not dispatch SIGINT, SIGHUP etc. to it. Hence, it is common practice to start a daemon like this: a parent script spawn the long running daemon process in the background and then quit.

trap

Finally let's have a closer look at trap. As you must already know, trap catches the specified signals and runs according command(s).

But, if you run the original script and then send SIGINT to the script using kill -SIGINT pid_of_script, you'll notice the script does not respond to the signal.

Below is how I ran the test:

$ ./test.sh >test.out 2>&1 &        # <=== run the script
[1] 75519


$ ps -eo "pid,ppid,pgid,jobc,command" | egrep -i '(test|python)' | sed 's/ \/.*\// /'
75519 56180 75519    1 test.sh
75525 75519 75519    1 Python -c \012import datetime, time\012while True:\012    print("Python loop - %s" % datetime.datetime.now(), flush=True)\012    time.sleep(2)\012
...

$ kill -SIGINT 75519                # <=== send SIGTERM to the parent process

$ kill -SIGINT 75525                # <=== send SIGTERM to the child
[1]+  Exit 1                  ./test.sh > test.out 2>&1

$

The test.out indicates the script didn't respond to the signal until I sent SIGINT to the child (python) process. Once the child aborted because of SIGINT, the parent ran the trap.

Python loop - 2020-02-13 16:09:59.676853
Python loop - 2020-02-13 16:10:01.677777
Python loop - 2020-02-13 16:10:03.679282
Python loop - 2020-02-13 16:10:05.680427
Traceback (most recent call last):
  File "<string>", line 5, in <module>
KeyboardInterrupt
Received SIGINT

So, the parent did receive the signal but would not run the trap until the child process completed? Exactly! In fact, if you review the test.out of previous tests you'll find the child process always quit before the parent. Why? This is actually clearly documented in Bash Manual:

If Bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes.

The Bash Manual also says:

When Bash is waiting for an asynchronous command via the wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed.

How does wait change the behaviour of the parent process and the child process? I'd leave it for you to figure out.

Why kill -SIGINT pid_of_parent does not work the same as pressing Ctrl-c?

Now we understand why the parent process didn't exit immediately when we sent it a SIGINT via kill -SIGINT pid. But, why in the first test we were able to interrupt both the parent and the child by pressing Ctrl-c?

The reason is: actually Ctrl-c sends SIGINT to not a process but a process group, i.e. all the processes within the group. Therefore, when we press Ctrl-c, both the child and parent receive the signal, causing the child and parent exit in turn. To achieve the same using kill, please send the signal to the process group instead using kill -SIGHUP -- -pgid. In the following example, both the script and the python process belong to process group 77210, therefore kill -SIGINT -- -77210 did the trick.

$ ./test.sh >test.out 2>&1 &
[1] 77210

$ ps -eo "pid,ppid,pgid,jobc,command" | egrep -i '(test|python)' | sed 's/ \/.*\// /'
77210 56180 77210    1 test.sh
77216 77210 77210    1 Python -c \012import datetime, time\012while True:\012    print("Python loop - %s" % datetime.datetime.now(), flush=True)\012    time.sleep(2)\012
..

$ kill -SIGINT -- -77210
[1]+  Exit 1                  ./test.sh > test.out 2>&1

References

For in-depth discussions on processes and signals in the UNIX/Linux environment, please refer to:



blog comments powered by Disqus