There is a classic bit of computer wisdom that states “If you’ve got a problem, and decide to solve it with regular expressions, now you’ve got two problems.” This of course stems from the perception that regular expressions are a complicated mix of magic characters and Voodoo. Regular expressions can allow you to achieve elegant and concise program logic quickly and easily, but only once you’ve learned to understand how they work and why. Just about any Linux or Mac system comes with a powerful regex tool call grep and learning grep is an essential task for any power user or system administrator. Today, we’ll explore some of what you can do with grep and how it can be one of the most powerful tools in your geek arsenal.
How It Works
In short, grep’s job is to search through a block of input. That’s pretty vague, so it’s best described by example. Let’s say you’ve got a text file called distros.txt that has a list of Linux distributions, such as the one below.
Debian – Stable server distribution
Ubuntu – Desktop distro originally based on Debian
Kubuntu – Uses KDE desktop instead of Gnome
Fedora – Continuation of the free Red Hat desktop system
Gentoo – A fast, source-based Linux system for pro users
SuSE – Commercial Linux owned by Novell
Mint – Ubuntu-derived distro with additional restricted software
Grep can be used to read through the text and filter it to show only the parts you want. If you wanted to see only the lines that contain the word “Ubuntu”, you’d run the following command:
grep Ubuntu distros.txt
(Your version of grep may or may not include color highlighting like in the example above)
You may have noticed that our last search did not return Kubuntu. Unless told otherwise, grep will assume that you entered your expression exactly the way you wanted it, and this applies to upper and lower case. If you search for “ubuntu” but your text file contains “Ubuntu”, your search will find nothing. To make your search case-insensitive, use the -i switch, as in
grep -i ubuntu distros.txt
With the previous search, you included all capitalization variants of the word “Ubuntu”. It included Kubuntu because it contains the word you searched for. You may want to only include the standard version, not Kubuntu or Edubuntu, etc. If that’s the case, you can tell grep to match the whole word only by passing the -w option.
grep -i ubuntu distros.txt
Much as you can use grep to show only matching entries, you can also use it to show everything BUT the matching entries. To expand on our previous searches, we can now use the -v option to reverse our results and only show the lines that don’t match.
Grep has full support for wildcards when matching patterns. When using wildcards and other special characters, you want to make sure your search pattern is in quotes, so the Linux shell doesn’t try to interpret them before grep can. Common wildcards include * for groups of characters and . to represent a single unknown character.
If the wildcards are a little too broad for you, you can specify individual characters or a range to include in your search. Characters within square brackets will be included in your search pattern. For example, if you had a file with a list such as
Item 1 - apples Item 2 - bananas Item 3 - coconuts Item 4 - peaches Item 5 - Grapes Item 6 - Apricot
You can choose a particular range by using something like
grep "Item [2-4]" items.txt
Grep is an immensely powerful tool, and learning it thoroughly can pay off in all kinds of ways. Understanding grep is also makes it much simpler to move on to other powerful console tools like sed and awk. Between those three tools, an amazing amount of console and script magic can be done with far less effort than seems possible. If you’re a fan of grep, or would like to see other tools like sed and awk covered here, please drop a note in the comments.