Use tee
to Stream to a Process and Handle Ctrl+c
Using the tee
utility is often a nice method to save and view the output of a certain command at the same time. I use tee
for creating logfiles of long running commands. The problem with this approach is that these files might grow to a gigantic size. Since logfiles are usually just plain text they often compress very well.
The naive approch to accomplish compressed files with tee
is something like this:
$ command | tee logfile.log
$ zstd logfile.log
The problem with this approach is: logfile.log
occupies disk space and the subsequent compression command might takes a lot of time. We can do better by using the bash
process substitution feature. With this feature it is possible to instruct tee
to write to a command instead of a file. Therefore, it is possible to easily implement a streaming compression and write compressed data to logfile.log
directly. Internally, bash
creates a named pipe (= FIFO) and calls tee
with the path to the temporary FIFO instead. This FIFO is connected to the command which is specified in the process substitution invocation. The two lines from above condense to this one liner:
$ command | tee >(zstd - -o logfile.log)
There is a problem! When the pipeline is terminated via ctrl+c
the created logfile.log
will be truncated. As far as I understand the problem, the SIGINT signal is sent to all members of the process group at the same time. Thus, there might be unread data in the pipeline while zstd
terminates the compressed logfile.log
correctly according to the zst
file format.
To solve this problem and ensure that no data is lost on ctrl+c
the SIGINT signal must be blocked on all processes in the pipeline but the first one. This can be done via a subshell where SIGINT is trapped.
$ command | (trap '' SIGINT; tee >(zstd - -o logfile.json))
Or when multiple commands are involved:
$ mask_sigint() (trap '' SIGINT; "$@")
$ command | mask_sigint tee >(zstd - -o logfile.json) | mask_sigint next_command | mask_sigint further_command
Using this pipeline command
receives SIGINT, terminates properly, and the remaining pipeline is flushed. No data is truncated any more.