Bash Redirection and Paging

Part of the book: Bash: The Linux Command Line

One of the most powerful features of Bash (and Linux/Unix shells in general) is the ability to pass data to and from files and between commands. When passing data from files to commands or vice-versa it's called redirection. When passing data between commands it's called pipelining.

These features by themselves don't seem so interesting but because all the commands are designed to work with input and output, commands tend to be simple and single purpose. This may sound backwards, but consider that these simple commands can be strung together to create a complex result. This is the power of Bash, all commands are building blocks.

Standard Input, Output and Error

To fully understand redirection and pipelining you have to understand the standard input and output channels that all programs have by default. Inside the program each channel looks and feels like a file. When a program is created it has an input channel and without redirection or pipelining this channel is your keyboard. The output and error channels both are connected to your terminal emulator (i.e. your console or your xterm window.)

These are often called STDIN, STDOUT and STDERR and in the shell are assigned the numeric values 0, 1 and 2 respectively. Remember these short forms, because I'll use them and remember these numbers because I will reference them later.

Well designed programs can take their input data from STDIN and send the modified data to STDOUT and they report any warnings or errors to STDERR.

The reason for a STDERR is that it is handy to have errors not appear in the output data. If they did the data could become corrupt. Consider a program that converts JPEG images to BMP format. If error text appeared in the stream the BMP file would be corrupt.

Redirection and pipelining change where the STDIN, STDOUT and STDERR go. With redirection rather than STDIN or STDOUT (or both) coming from your keyboard or going to your terminal (respectively) they go to a file. With pipelining they are directed from or to another program.

Redirection

Redirection uses the greater than (>)and less than (<) symbols to change where data comes goes to or comes from. Let's look at an example.

Let's use image conversion tools in these examples. Here is a simple one that converts a JPEG image file to a generic PNM image file:

jpegtopnm image.jpg > image.pnm

The jpegtopnm command takes a file and converts it to PNM format and sends it to STDOUT (i.e. sends it to the screen). If we didn't redirect the output we would see binary data on the screen and it would likely put the terminal emulator in a weird state. In the above command we see the > symbol and it "points" to the image.pnm file. This command creates or overwrites the image.pnm file.

If an error were to occur it would display on the screen. Note that the simple > only redirects STDOUT to a file, not STDERR.

Rather than replace a file we can append to a file using the (>>) symbols.

ls -l >> list.txt

The above creates the list.txt file if it doesn't exist. If the file does exist the output of ls -l is appended to the file.

Most commands take file names as input so there isn't as much reason to use the < redirection. But we will show how it works using the same command. This time we will redirect input and output:

jpegtopnm  image.jpg > image.pnm

The order isn't important we could have easily written jpegtopnm > image.pnm < image.jpg. Both commands take input from image.jpg and put the result in image.pnm.

We can also redirect error messages. To do this we need to use a modified > symbol, we prefix the symbol with the number of the STDERR channel, if you recall from above, it's 2. So the symbol we use is 2>.

pegtopnm image.jpb > image.pnm 2> errors.txt

This way you can store the errors in a file so that you can reference it later. Perhaps you're Googling to find out why the error is happening.

Perhaps you want both the STDERR and STDOUT to go to the same place. We can do that to. Sometimes we don't want to see any output. This is often done in scripts to hide odd looking errors from users.

wget -O - http://drupal.com/cron.php > /dev/null 2>&1

Above we are using the wget command to poll a server but we don't want to create a file or show error messages. We use the > to direct STDERR to /dev/null. By redirecting to /dev/null the output is discarded. The symbol 2> redirects STDERR and the symbol &1 indicates to redirect to STDOUT. The &1 needs to follow the > without a space.

The order of these are important. Redirection operations are sequential from left to right. So first STDOUT is redirected to /dev/null, then STDERR is directed to the same output at STDOUT, which has already been redirected to /dev/null. So both go to /dev/null.

If we had specified this in reverse order (i.e. 2>&1 > /dev/null) then STDERR would be redirected to the same as STDOUT (the screen) and then STDOUT is redirected to /dev/null. The result is that STDERR goes to the screen.

Another way to use redirection is to write errors to STDERR. This is common if you write shell scripts. To send output to the screen we commonly use echo:

echo "Danger Will Robinson, Danger!" >&2

That echo sends the error message to STDERR. This way the users of the script can expect its output to be consistent with standard Linux commands.

Special Files

When you use certain file names Bash works differently than expected. There are several that are special, but the most interesting ones are /dev/tcp/host/port and /dev/udp/host/port. These files don't actually exist, bash creates sockets to the host using the specified port.

To really make network sockets work they need to be bidirectional. You send a request and receive a response. That's not simple to do with one single command, that's not how most Linux commands work. To make this work bidirectionally we use an interesting trick. We use the exec command.

exec 3> /dev/tcp/www.google.com/80
echo -e "GET / HTTP/1.1\n\n" >&3
cat &3

Pipelining

Passing output from one command into the input of another command is called pipelining. This is a way to string commands together to achieve a more complex result. This allows commands to be good a performing simple functions and not encumbered by having to offer lots of unrelated functions. It also allows the user the flexibility of choosing different programs.

Consider paging, that is displaying output one page at a time. This is one example of user choices that pipelining allows, users can choose between more and less, two paging commands.

ls -l | less

In the above example the output from ls -l is "connected" to the input of less which displays a page at a time.

We can connect more than two commands in this way. Let's use grep which filters lines based on regular expressions (i.e. patterns). If a line matches the pattern it is outputted, if not it is filtered. So let's list processes using ps looking for a specific user's process (john's in this case) and list the output one page at a time using less:

ps -efw | grep john | less

In the above example the output of ps is passed to grep and its output is passed to less. Internally this is done by connecting the STDOUT file in one program to the STDIN on the next.

Standard error (STDERR) can be piped to another command using the |& symbol. It can be handy if the error output is long and needs to be paged or grepped. Compiling a C program might qualify as lots of error output:

gcc myprogram.c |& less

Combining Redirection and Pipelining

Redirection and pipelining can be combined. Redirection is applied every command individually so each command can have it's output or input redirected. In some ways this can defeat pipelining. Here is an example:

ps -efw | grep john > johns_procs.txt

That was a simple example where the output of the last command in the pipeline is saved to a file. Let's look at a stupid example. We will redirect the input of the last command:

ls -l | less  /etc/hosts

What you will see is the contents of /etc/hosts. The output of ls -l is lost because the input of less comes from the redirection.

Now back to a normal example. This is a similar to the |& symbol except that it combines the STDOUT and STDERR output and pages them together.

gcc myprogram.c 2>&1 | less