grep is a command-line tool for searching line matching a regular expression pattern. In Unix-like systems, like Linux or macOS, everything is a file, and more precisely everything is a stream of bytes. A file is a collection of bytes that you can read and/or write. A reference to such file is called a file descriptor (fd). This approach allow us to use the same set of system calls to access a given resource, and subsequently using the same set of tools, like grep.
Write programs to handle text streams, because that is a universal interface.
— Doug Mcillroy, Unix pioneer
If you know how to handle text streams, you can make your Linux life a lot easier.
Of these great text-processing tools, grep is one of the oldest.
Everything it does is based on finding regular patterns in lines of
text, and printing them. Yet, despite its age and its simplicity, grep
provides you a large amount of power and flexibility.
As you learn to use grep, you’ll find more and more problems that have
easy grep solutions.
Here are some typical uses of grep:
- Finding where an expression occurs in a file or directory
- Counting how many times an expression occurs
- Filtering output from a different program.
grep combines very well with other programs. The last part of this
how-to section gives you an example of how to use grep to make a
simple reporting tool.
Tutorials: searching for text patterns in Romeo and Juliet
Finding text patterns is probably the most common use of grep.
First, we need a source file to work with. Any text that has multiple lines is a good place to start. How about a play by Shakespeare? (Server logs are a more traditional example).
If you want to follow along, you can use
Project Guttenberg’s public-domain version of Romeo and Juliet. You can
download the file from your shell using curl, like this:
curl -o romeo-and-juliet.txt https://www.gutenberg.org/files/1513/1513-h/1513-h.htm
Now, let’s define a goal. In the play, “Nurse” is an important person. Our task is to find the answer to these questions:
- How many times does the word “nurse” get spoken in the play?
- How many times does the person, “Nurse,” speak?
Tutorial 1. Find all lines where the word “nurse” is spoken
Inspect the file for patterns.
To make precise queries, it helps to know something about the structure of the file. A quick look at
romeo-and-juliet.txttells us that:- A person’s name is introduced in all capitals
- A person’s speech begins with their name in all-caps followed by
a period, e.g.
ROMEO. - All dialogue is separated by a blank line
- Stage direction is written in brackets with underscores, e.g.
[\_Exit Nurse._]
Find a simple expression.
Let’s look for the expression
nurse:grep "nurse" romeo-and-juliet.txtSurprisingly, this yields only two results, a very small number for a major character!
But, the results make sense. In this play, “Nurse” is a proper name. So, the character’s name will start with a capital
N.Search again with proper capitalization:
grep "Nurse" romeo-and-juliet.txtNow we’ve got many more results.
Search for multiple case patterns
If you want to find all times that the word “Nurse” appears in the play’s dialogue, the first obvious thing to do would be a case-insensitive search.
grephas an-iflag, which lets you search for all cases.grep -i "Nurse" romeo-and-juliet.txtBut wait─now we have too many results! Many of the output lines just say
NURSE.These lines indicate who’s speaking the dialogue.For now, we want to find only lines where the word “nurse” is spoken inside the text. That is, we want to print lines with the expressions “Nurse” and “nurse”
Use bracket expressions to find multiple combinations
To search for
Nurseandnursetogether, the simplest way is with a bracket expression. Bracket expressions match any character in the brackets.grep "[nN]urse" romeo-and-juliet.txtAlmost there! But, there are still the stage directions, like this
[\_Exit Nurse_]Filter output using regex, pipes, and the
-voption.We already have most of the output we need. We just need to filter out where
Nurseappears between the expressions[_and_]. To do this, we can use a regular expression:[_.*Nurse.*_]..*means zero or more of any character.A simple way narrow furter is to pipe our output into another
grepcommand. The-vflag inverts matches: it prints only lines that do not contain the expression.grep "[nN]urse" romeo-and-juliet.txt | grep -v "\[_.*Nurse.*_]"Great! We have successfully…oh, wait.
It seems that Gutenberg uses a special format for stage directions when a person enters. So, we also need to filter out lines that look like this:
Re-enter Nurse. Enter Nurse.Use the
-eflag for multiple expressionsgrep "[Nn]urse" romeo-and-juliet.txt \ | grep -v -e " Re-enter\| Enter" -e "\[_.*Nurse.*_]"
👉 We must escape the
|character. Otherwise,grepwill interpret it literally.
Congrats! Now we’ve really found every line where the word “nurse” is spoken in Romeo and Juliet. To do this, we used a combination of search, simple regex, and filters. This script is not very efficient or robust, but, for the purpose of our task, it’s acceptable.
The text is not going to change, and it’s not very long. Once our search prints what we need, we probably don’t need to run many more times.
If the text were long and dynamically changing, and the script needed to be run often, we’d probably need a more robust solution. See When grep is not so great.
Tutorial 2: Find all times Nurse speaks
This task is simpler. We’ve already discovered that the text introduces dialogue by printing the speaker’s name in all caps, followed by a period.
Use
grepto print all, broad matchesgrep "NURSE" romeo-and-juliet.txtInspect output for false positives
Everything looks good, except the first printed line.
NURSE to Juliet.We need to include the period after the character’s name.
Use
grepto print more specific matchesIn this case, escape the
.character. If you don’t,grepwill match any character that follows the expressionNURSE.grep "NURSE\." romeo-and-juliet.txtUse the
-coption to count lines.grep -c "Nurse\." romeo-and-juliet.txt
Now you know: the Nurse speaks 90 times in the play.
How to grep
To become a grep master, there are two things you need to memorize:
- The command options
- The regex patterns
Learning these is mostly a matter of memorization and practice.
After you’ve memorized these, you’ll develop your own method for using
grep effectively. The steps in the
second tutorial
demonstrate a common process for problem-solving with grep (and all
regex).
- First, make a general search. Don’t get too complicated!
- Inspect your output for false positives and for missing lines.
- Make a more precise query, inspect again.
- Repeat until
grepprints exactly what you need
Now that you’ve used grep to solve some specific problems, it’s time
to look at how to use grep in general cases.
How to grep over multiple files
There are a few ways to grep over multiple files:
- With multiple arguments
- With file globbing
- With recursive search.
To grep with multiple files, just pass the files as arguments.
grep "bash" file1.sh file2.sh
You can also grep using file globs. This command checks for the
expression bash across all shell files in the directiory.
grep "bash" *.sh
The -r option searches recursively through directories. It is one of
the handiest options.
This command searches for all files with bin/bash in your scripts
directory:
grep -r "bin/.*sh" ~/scripts
This should match all shells invoked, including /bin/bash,
/bin/dash, bin/env bash, etc.
To add a little context to this script, use the -A option to print two
lines after the pattern.
grep -rA 2 "bin/.*sh" ~/scripts
How to exclude results
There are multiple ways to exclude searches.
You might want to use the -v option to exclude lines. This was
demonstrated in the
first tutorial.
In a recursive search, you might want to search through only lines with
a certain extension. In these cases, you can use the --exclude option.
For example, to exclude yaml files, do something like this:
grep -rA 2 --exclude="*.yaml" "bin/.*sh" ~/scripts
Or, to exclude all .git directories, use --exclude-dir:
grep -rA 2 --exclude-dir=".git" "bin/.*sh" ~/scripts
How to combine grep with other programs
grep can take input from standard out. It’s often handy to use grep
to filter output from another program. For example, to print all Firefox
processes, you could run this command.
ps -ax|grep "firefox"
You can also pipe grep output to another program. For example, if a
command’s output is too large to fit on your screen, you might want to
pipe grep to less.
grep -rA 2 "bin/.*sh" ~/scripts | less
How to use grep with regex
Knowing how to use regex patterns can be really useful for grepping.
Here’s the last Shakespeare example: how would you search for every
derivative form of the word love in Romeo and Juliet?
A match should print lines with forms like “lovers” or “loving,” but not lines with only the base word, “love.”
First, you can make your search case-insensitive.
grep -i "love" romeo-and-juliet.txtHowever, this prints false positives, like “glove”.
Add the word boundary character,
\b.grep -i "\blove\b" romeo-and-juliet.txtNow you’ve got all matches of
love, including the word itself. But the task requires only derivatives of the word.Extend the base pattern with the
.character.The base expression is
lov. This expression is in all derivatives, like “loving” and for “lover”. To match any character, use the.grep -i "\blov.\+\b" romeo-and-juliet.txtUnfortunately, this commands prints exactly what we don’t want: only lines with the word “love”.
Use the
\wword character to search for other word characters.\wsearches of all word-like charcters, i.e. alphanumerics.grep -i "\blov.\w\b" romeo-and-juliet.txtMuch better! But this matches only 5-letter derivations, like “lover”.
Expand the pattern with
\+The
\+character matches one or more instances of the preceding item. In this instance, it searches for one or more instance of a\wcharacter (i.e., any word character).grep -i "\blov.\+\b" romeo-and-juliet.txtUse the
-Eflag to avoid escaping all special charaters, like+.
Bingo! Let’s look at the first five results:
lovers
loving
lovers
lovers
lov’d
Besides predictable forms, like “loving”, there’s also surprising
antiquated forms, like “lov’d”. A good grep can be pleasantly
surprising.
How to use grep with process substition
Maybe you want to search for a regular expression that changes
dynamically. In these cases, you can use process substition and
variables to pass expressions to grep.
For example, consider a log file where each line starts with a date, in
the format YYYY-MM-DD. Something like.
2021-12-02 <more-text>
2021-12-02 <more-text>
2021-12-01 <more-text>
2021-11-29 <more-text>
...
1970-01-01 <more-text>
Perhaps you want to know about how many times an event was logged in the
current month. With process substition, you could use grep to create a
dynamic report:
#!/bin/bash
#Print out how many times an event occurred this month
month=$(date +%B) # gets name of month
search=$(date +%Y-%m) # makes a search term from date, in YYYY-MM format
file=long-text.log
count=$(grep -c "^$search" "$file") "uses expression to count events in $file"
echo "This month, $month, has logged $count events.
When is grep not so great?
The beauty of grep is its simplicity. Don’t get too complicated.
In the examples of the tutorial and how-to, the data is well-structured, and the queries are relatively simple.
For example, the Romeo and Juliet text has a very precise way of defining when dialogue happens, and how stage direction happens. The log file might be very long file, but it also has very regular patterns. Every line begins with a date in one format.
For simple searches, or for matches across a large set of files, grep
is very powerful.
However if you want to manipulate text, or work with specific fields of
a file, you’ll probably want to use a more specific tool, like sed or
awk.
For advanced text searching, like with natural langauge processing, it’s probably time to use a language with dedicated libraries to help you achieve your task.
Supplementary links
Want more grep? Here’s some grep-related links:
- Video: Brian Kernighan talks about the origins of
grep. One night in 1971. - The GNU
grepmanual. Everything that’s possible with GNU’s implementation ofgrep. - Why is GNU
grepso fast? A technical discussion of an implementation.
The whole point with “everything is a file” is not that you have some random filename (indeed, sockets and pipes show that “file” and “filename” have nothing to do with each other), but the fact that you can use common tools to operate on different things.
[…]
The UNIX philosophy is often quoted as “everything is a file”, but that really means “everything is a stream of bytes”— Linus Torvalds, Newsgroups: fa.linux.kernel




