Bash Pattern Matching

Part of the book: Bash: The Linux Command Line

Pattern matching in Bash is also called Globbing. It sounds all bloaty and goey, but it's really boring and plain and not sticky at all, but also quite useful. Glob is actually the name of the glibc function that does the real work.

File pattern matching is usually about selecting groups of files, but it can be useful in avoiding typing long file names. Rather than type out a full file name, just type a pattern that contains a unique part and you've matched the file.

How it Works

When you issue a command and the command or argument contains a pattern, the shell first expands the pattern to one or more file names and then runs the command. If the pattern matches the pattern is replaced with the matching files and the command doesn't see the pattern, only the matching files. If the pattern fails to match, then, as a default, the command is given the pattern.

By default the shell doesn't match hidden file names, i.e. file and directory names that begin with a dot (.) won't get matched. This behaviour can be changed (see Configuring Pattern Matching below.)

Case Sensitive

By default file patterns are also case sensitive. Meaning that the files "upper" and "UPPER" are different.

Asterisk

The most used pattern is the asterisk (*). It matches zero or more of any character. It can be used at the beginning, middle or end of a pattern.

Some examples are:

Pattern Matches
*.pdf Anything.pdf or just .pdf
img*jpg img0001.jpg or imgjpg
index.* index.html or index.php or index.html.bak
* anything or really or Anything

Question Mark

A lesser used but still useful pattern is the question mark (?). This indicates any one character.

Pattern Matches
?ndex.html Index.html or index.html
file.?? file.01 or file.js

More complicated

Rather than matching all characters you can specify a list of characters, a range or a class of characters. Using the square brackets ([ and ]) you can specify the characters to match. Despite the pattern taking up more than one character the pattern will only match one character.

Pattern Matches
messages.[123] messages.1, messages.2 or messages.3
page[a-z].txt pagea.txt, pageb.txt ... pagez.txt
page[-a-z].txt As above, but also matches page-.txt
page[^m-z].txt or page[!m-z].txt Doesn't match files pagem.txt through pagez.txt but matches all others (i.e. page?.txt)

Classes can be used, these are shortforms for full ranges of characters. The following clases can be used: alnum, alpha, ascii, blank, cntrl, digit, graph, lower, print, punct, space, upper, word, xdigit. Use a class like this:

ls img[:digit:].jpg

That matches img0.jpg through img9.jpg. It seems like it's useless, but consider that this works independant of language. The real use of these classes is that if the locale changes (i.e. the language) then the characters that match also change. So this will match English or Arabic numerals.

Extended Patterns

Extended patterns can be handy in niche situations, but you'll probably find that they are disabled by default in your bash. To enable them you would have to run:

shopt -s extglob

And like all the settings I've mentioned, if you want it to be turned on every time to start a shell you need to add it to your .bashrc file or the system-wide /etc/bashrc file.

With extended patterns you can create more complex patterns that match more than a single character but less than all characters of any length.

?(pattern-list) Matches zero or one occurrence of the given patterns
*(pattern-list) Matches zero or more occurrences of the given patterns
+(pattern-list) Matches one or more occurrences of the given patterns
@(pattern-list) Matches one of the given patterns
!(pattern-list) Matches anything except one of the given patterns

The patters can be any of the regular patterns so +([:digit:]) matches one or more digits.

Configuring Pattern Matching

There are shell options that allow control over how the shell matches patterns and how it reacts to failed patterns.

dotglob If set, bash includes filenames beginning with a ‘.’ in the results of pathname expansion.
extglob If set, the extended pattern matching features described above under Pathname Expansion are enabled.
failglob f set, patterns which fail to match filenames during pathname expansion result in an expansion error.
globstar If set, the pattern ** used in a filename expansion context will match a files and zero or more directories and subdirectories. If the pattern is followed by a /, only directories and subdirectories match.

To see if these options are set run:

shopt dotglob

To set it:

shopt -s dotglob

To unset it:

shopt -u dotglob

As you have come to expect it, to make these changes permanent you need to add the shopt commands to your .bashrcfile or the system-wide /etc/bashrc file.

Seeing File Expansion Work

The easy way to see file expansion working is to use the echo command. Just give it a pattern and it will print the results.

echo *.jpg

Another way it to position you cursor at the end of the pattern and press Tab twice and the shell will list matching files below. You can also expand the pattern on the line by positioning your cursor at the end of the pattern and pressing Ctrl-x then *, the matching files now appear on the command line.