13 February 2020

You login a shell and launch a long running script, then quit the shell/terminal but not sure if the script (and its child processes) will continue running or not? You try to catch and handle signals in the script but it does work as you expected? Below are some tests to help you understand these things.

 1: #!/usr/local/bin/bash
 2: 
 3: for each in "SIGHUP" "SIGINT" "SIGTERM"; do
 4:     # shellcheck disable=SC2064
 5:     trap "echo Received $each && exit 1" $each
 6: done
 7: 
 8: python3 -c '
 9: import datetime, time
10: while True:
11:     print("Python loop - %s" % datetime.datetime.now(), flush=True)
12:     time.sleep(2)
13: '
14: 
15: while True; do
16:     echo "Script loop - $(date -u)"; sleep 2
17: done
18: 
19: echo "end of the script"

The above is the script (test.sh) I use for the tests:

3~6
set up traps to print the signal name before exiting the script if one of the three signals is received.
8~13
spawn a Python process to loop infinitely. This to mimic a long running external command invoked in the script.
15~18
an infinitely shell loop.

Now, let's start the tests. In each test, I'll run the test script (test.sh) and then monitor (tail) the output file to check if the processes are terminated or not.

  • Fistly, the simplest case, run the script in the foreground: ./test.sh >test.out 2>&1

    The interactive bash shell sends SIGINT (for Ctrl-c) or SIGHUP (or closing terminal) to all foreground processes. Therefore, in these two cases, both the parent process (test.sh) and the child process (python3) will get the signal and exit:

    • Press Ctrl-c

      Python loop - 2020-02-11 14:51:04.795263
      ...
      Python loop - 2020-02-11 14:51:12.806468
      Traceback (most recent call last):            
        File "<string>", line 5, in <module>
      KeyboardInterrupt                             <=== from the child process: python3
      Received SIGINT                               <=== from the parent process: test.sh
      
    • Close the terminal

      Python loop - 2020-02-11 14:52:24.618521
      Python loop - 2020-02-11 14:52:26.620370
      Python loop - 2020-02-11 14:52:28.622768
      Hangup: 1                                     <=== from the child
      Received SIGHUP                               <=== from the parent
      
  • Secondly, run the script in background: ./test.sh >test.out 2>&1 &

    Obviously, the processes will continue running upon Ctrl-c because the interactive shell will not send SIGINT to background processes. But, when the terminal is closed, they will still receive SIGHUP and abort.

    NOTE

    • Closing the terminal means closing the terminal directly. Running exit to quit the shell (so that the terminal emulator close the window automatically) does not count. In this case, the shell quits actively and does not receive SIGHUP. Hence it won't send SIGHUP to the script and the child processes of the script.
    • You may not observe this behaviour with some terminal emulators. My guess is: some terminal emulators manage to communicate with the shell session and close it gracefully when a terminal window is closed. Therefore, the shell (as well as the script and its child processes) does not receive SIGHUP. According to my test with OS X, SIGHUP is sent when a terminal is closed in tmux or iTerm2 but that is not case with the builtin Terminal APP.
  • Run the script using nohup

    As the name suggests, nohup makes the spawned process ignore SIGHUP. But they will not ignore SIGINT. As a result:

    • If nohup is run in the foreground (nohup ./test.sh), Ctrl-c interrupts the script. However, closing the terminal will not interrupt the script.
    • In comparison, if nohup is run in the background (nohup ./test.sh &), neither Ctrl-C nor closing terminal would interrupt the script.

    NOTE If a process registers its own SIGHUP handler, nohup will not overwrite the handler to ignore SIGHUP.

  • What if we edit test.sh to execute the Python process in the background (i.e. python3 -c '...' &)?

    In this case, the Python process will be run in the background no matter if the parent process (i.e. the script) is in the background or not. Therefore, we'd expect the following:

    • Ctrl-c does not affect the Python process but may interrupt the parent script.

      Below is the output of ./test.sh >test.out 2>&1. Initially both the parent process and child process were printing timestamps. Once I pressed Ctrl-c, the parent process (the loop in the script) was interrupted but the Python process continued.

      Python loop - 2020-02-11 23:12:02.196811
      Script loop - Tue Feb 11 12:12:04 UTC 2020
      Python loop - 2020-02-11 23:12:04.198417
      Script loop - Tue Feb 11 12:12:06 UTC 2020
      Python loop - 2020-02-11 23:12:06.201232
      Received SIGINT
      Python loop - 2020-02-11 23:12:08.201730
      Python loop - 2020-02-11 23:12:10.206854
      Python loop - 2020-02-11 23:12:12.209047
      
    • On the other hand, closing terminal still delivers SIGHUP to both the parent and the child process and interrupts them regardless if the script is executed in the foreground or not.

      Script loop - Tue Feb 11 12:15:10 UTC 2020
      Python loop - 2020-02-11 23:15:10.597609
      Python loop - 2020-02-11 23:15:12.600922
      Script loop - Tue Feb 11 12:15:12 UTC 2020
      Python loop - 2020-02-11 23:15:14.601010
      Script loop - Tue Feb 11 12:15:14 UTC 2020
      Hangup: 1
      Received SIGHUP
      
  • What if we remove the while True: loop from the shell script? Does it change the test results at all when compared with the previous test?

    Below is the modified script:

    #!/usr/local/bin/bash
    
    for each in "SIGHUP" "SIGINT" "SIGTERM"; do
        # shellcheck disable=SC2064
        trap "echo Received $each && exit 1" $each
    done
    
    python3 -c '
    import datetime, time
    while True:
        print("Python loop - %s" % datetime.datetime.now(), flush=True)
        time.sleep(2)
    ' &
    
    echo "end of the script"
    

    Launch the above modified script in the foreground and then check the processes. From the output of ps below, we can see:

    • The parent process (the script) has finished (not in the output).
    • The child process is reaped by the init process (ppid of it is 1).
    $ ./test.sh >test.out 2>&1
    
    $ ps -eo "pid,ppid,pgid,jobc,command" | egrep -i '(test|python)' | sed 's/ \/.*\// /'
    55317     1 55306    0 Python -c \012import datetime, time\012while True:\012    print("Python loop - %s" % datetime.datetime.now(), flush=True)\012    time.sleep(2)\012
    ...
    

    In comparison, when both the script and the python process keep running, the currently shell is the parent of the script and the script is the parent of the spawned process:

    $ echo $$
    53064
    
    $ ps -eo "pid,ppid,pgid,jobc,command" | egrep -i '(test|python)' | sed 's/ \/.*\// /'
    53488 53064 53488    1 test.sh
    53494 53488 53488    1 Python -c \012import datetime, time\012while True:\012    print("Python loop - %s" % datetime.datetime.now(), flush=True)\012    time.sleep(2)\012
    ...
    

    This is actually a very importance difference. Many processes do register their own SIGHUP handlers and that defeats the purpose of nohup: upon the reception of SIGHUP, these processes will not ignore the signal but instead call the registered signal handlers. In this situation, the common practice to ensure these processes keep running after corresponding terminals are closed is: the parent script spawn the long running process in the background and then quit. As shown above, this makes the spawned process a child of the init process, meaning the current session will not dispatch SIGINT, SIGHUP etc. to it.

  • trap

    Finally let's have a closer look at trap. As you must already know, trap catches the specified signals and runs according command(s).

    But, if you run the original script and then send SIGINT to the script using kill -SIGINT pid_of_script, you'll notice the script does not respond to the signal.

    Below is how I ran the test:

    $ ./test.sh >test.out 2>&1 &        # <=== run the script
    [1] 75519
    
    
    $ ps -eo "pid,ppid,pgid,jobc,command" | egrep -i '(test|python)' | sed 's/ \/.*\// /'
    75519 56180 75519    1 test.sh
    75525 75519 75519    1 Python -c \012import datetime, time\012while True:\012    print("Python loop - %s" % datetime.datetime.now(), flush=True)\012    time.sleep(2)\012
    ...
    
    $ kill -SIGINT 75519                # <=== send SIGTERM to the parent process
    
    $ kill -SIGINT 75525                # <=== send SIGTERM to the child
    [1]+  Exit 1                  ./test.sh > test.out 2>&1
    
    $
    

    The test.out indicates the script didn't respond to the signal until I sent SIGINT to the child (python) process. Once the child aborted because of SIGINT, the parent ran the trap.

    Python loop - 2020-02-13 16:09:59.676853
    Python loop - 2020-02-13 16:10:01.677777
    Python loop - 2020-02-13 16:10:03.679282
    Python loop - 2020-02-13 16:10:05.680427
    Traceback (most recent call last):
      File "<string>", line 5, in <module>
    KeyboardInterrupt
    Received SIGINT
    

    So, the parent did receive the signal but would not run the trap until the child process completed? Exactly! In fact, if you review the test.out of previous tests you'll find the child process always quit before the parent. Why? This is actually clearly documented in Bash Manual:

    If Bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes.

    The Bash Manual also says:

    When Bash is waiting for an asynchronous command via the wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed.

    How does wait change the behaviour of the parent process and the child process? I'd leave it for you to figure out.

  • Why kill -SIGINT pid_of_parent does not work the same as pressing Ctrl-c?

    Now we understand why the parent process didn't exit immediately when we sent it a SIGINT via kill -SIGINT pid. But, why in the first test we were able to interrupt both the parent and the child by pressing Ctrl-c?

    The reason is: actually Ctrl-c sends SIGINT to not a process a process group. Therefore, when we press Ctrl-c, both the child and parent receive the signal, causing the child and parent exit in turn. To achieve the same using kill, please send the signal to the process group instead using kill -SIGHUP -- -pgid. In the following example, both the script and the python process belong to process group 77210, therefore kill -SIGINT -- -77210 did the trick.

    $ ./test.sh >test.out 2>&1 &
    [1] 77210
    
    $ ps -eo "pid,ppid,pgid,jobc,command" | egrep -i '(test|python)' | sed 's/ \/.*\// /'
    77210 56180 77210    1 test.sh
    77216 77210 77210    1 Python -c \012import datetime, time\012while True:\012    print("Python loop - %s" % datetime.datetime.now(), flush=True)\012    time.sleep(2)\012
    ..
    
    $ kill -SIGINT -- -77210
    [1]+  Exit 1                  ./test.sh > test.out 2>&1
    
  • References:

    For in-depth discussions on processes and signals in the UNIX/Linux environment, please refer to:



blog comments powered by Disqus