grep is a command-line tool for searching line matching a regular expression pattern. In Unix-like systems, like Linux or macOS, everything is a file, and more precisely everything is a stream of bytes. A file is a collection of bytes that you can read and/or write. A reference to such file is called a file descriptor (fd). This approach allow us to use the same set of system calls to access a given resource, and subsequently using the same set of tools, like grep.
Write programs to handle text streams, because that is a universal interface.
— Doug Mcillroy, Unix pioneer
If you know how to handle text streams, you can make your Linux life a lot easier.
Of these great text-processing tools,
grep is one of the oldest.
Everything it does is based on finding regular patterns in lines of
text, and printing them. Yet, despite its age and its simplicity,
provides you a large amount of power and flexibility.
As you learn to use
grep, you’ll find more and more problems that have
Here are some typical uses of
- Finding where an expression occurs in a file or directory
- Counting how many times an expression occurs
- Filtering output from a different program.
grep combines very well with other programs. The last part of this
how-to section gives you an example of how to use
grep to make a
simple reporting tool.
Tutorials: searching for text patterns in Romeo and Juliet
Finding text patterns is probably the most common use of
First, we need a source file to work with. Any text that has multiple lines is a good place to start. How about a play by Shakespeare? (Server logs are a more traditional example).
If you want to follow along, you can use
Project Guttenberg’s public-domain version of Romeo and Juliet. You can
download the file from your shell using
curl, like this:
curl -o romeo-and-juliet.txt https://www.gutenberg.org/files/1513/1513-h/1513-h.htm
Now, let’s define a goal. In the play, “Nurse” is an important person. Our task is to find the answer to these questions:
- How many times does the word “nurse” get spoken in the play?
- How many times does the person, “Nurse,” speak?
Tutorial 1. Find all lines where the word “nurse” is spoken
Inspect the file for patterns.
To make precise queries, it helps to know something about the structure of the file. A quick look at
romeo-and-juliet.txttells us that:
- A person’s name is introduced in all capitals
- A person’s speech begins with their name in all-caps followed by
a period, e.g.
- All dialogue is separated by a blank line
- Stage direction is written in brackets with underscores, e.g.
Find a simple expression.
Let’s look for the expression
grep "nurse" romeo-and-juliet.txt
Surprisingly, this yields only two results, a very small number for a major character!
But, the results make sense. In this play, “Nurse” is a proper name. So, the character’s name will start with a capital
Search again with proper capitalization:
grep "Nurse" romeo-and-juliet.txt
Now we’ve got many more results.
Search for multiple case patterns
If you want to find all times that the word “Nurse” appears in the play’s dialogue, the first obvious thing to do would be a case-insensitive search.
-iflag, which lets you search for all cases.
grep -i "Nurse" romeo-and-juliet.txt
But wait─now we have too many results! Many of the output lines just say
NURSE.These lines indicate who’s speaking the dialogue.
For now, we want to find only lines where the word “nurse” is spoken inside the text. That is, we want to print lines with the expressions “Nurse” and “nurse”
Use bracket expressions to find multiple combinations
To search for
nursetogether, the simplest way is with a bracket expression. Bracket expressions match any character in the brackets.
grep "[nN]urse" romeo-and-juliet.txt
Almost there! But, there are still the stage directions, like this
Filter output using regex, pipes, and the
We already have most of the output we need. We just need to filter out where
Nurseappears between the expressions
_]. To do this, we can use a regular expression:
.*means zero or more of any character.
A simple way narrow furter is to pipe our output into another
-vflag inverts matches: it prints only lines that do not contain the expression.
grep "[nN]urse" romeo-and-juliet.txt | grep -v "\[_.*Nurse.*_]"
Great! We have successfully…oh, wait.
It seems that Gutenberg uses a special format for stage directions when a person enters. So, we also need to filter out lines that look like this:
Re-enter Nurse. Enter Nurse.
-eflag for multiple expressions
grep "[Nn]urse" romeo-and-juliet.txt \ | grep -v -e " Re-enter\| Enter" -e "\[_.*Nurse.*_]"
👉 We must escape the
grepwill interpret it literally.
Congrats! Now we’ve really found every line where the word “nurse” is spoken in Romeo and Juliet. To do this, we used a combination of search, simple regex, and filters. This script is not very efficient or robust, but, for the purpose of our task, it’s acceptable.
The text is not going to change, and it’s not very long. Once our search prints what we need, we probably don’t need to run many more times.
If the text were long and dynamically changing, and the script needed to be run often, we’d probably need a more robust solution. See When grep is not so great.
Tutorial 2: Find all times Nurse speaks
This task is simpler. We’ve already discovered that the text introduces dialogue by printing the speaker’s name in all caps, followed by a period.
grepto print all, broad matches
grep "NURSE" romeo-and-juliet.txt
Inspect output for false positives
Everything looks good, except the first printed line.
NURSE to Juliet.
We need to include the period after the character’s name.
grepto print more specific matches
In this case, escape the
.character. If you don’t,
grepwill match any character that follows the expression
grep "NURSE\." romeo-and-juliet.txt
-coption to count lines.
grep -c "Nurse\." romeo-and-juliet.txt
Now you know: the Nurse speaks 90 times in the play.
How to grep
To become a
grep master, there are two things you need to memorize:
- The command options
- The regex patterns
Learning these is mostly a matter of memorization and practice.
After you’ve memorized these, you’ll develop your own method for using
grep effectively. The steps in the
demonstrate a common process for problem-solving with
grep (and all
- First, make a general search. Don’t get too complicated!
- Inspect your output for false positives and for missing lines.
- Make a more precise query, inspect again.
- Repeat until
grepprints exactly what you need
Now that you’ve used
grep to solve some specific problems, it’s time
to look at how to use
grep in general cases.
grep over multiple files
There are a few ways to
grep over multiple files:
- With multiple arguments
- With file globbing
- With recursive search.
grep with multiple files, just pass the files as arguments.
grep "bash" file1.sh file2.sh
You can also
grep using file globs. This command checks for the
bash across all shell files in the directiory.
grep "bash" *.sh
-r option searches recursively through directories. It is one of
the handiest options.
This command searches for all files with
bin/bash in your scripts
grep -r "bin/.*sh" ~/scripts
This should match all shells invoked, including
bin/env bash, etc.
To add a little context to this script, use the
-A option to print two
lines after the pattern.
grep -rA 2 "bin/.*sh" ~/scripts
How to exclude results
There are multiple ways to exclude searches.
You might want to use the
-v option to exclude lines. This was
demonstrated in the
In a recursive search, you might want to search through only lines with
a certain extension. In these cases, you can use the
For example, to exclude yaml files, do something like this:
grep -rA 2 --exclude="*.yaml" "bin/.*sh" ~/scripts
Or, to exclude all
.git directories, use
grep -rA 2 --exclude-dir=".git" "bin/.*sh" ~/scripts
How to combine
grep with other programs
grep can take input from standard out. It’s often handy to use
to filter output from another program. For example, to print all Firefox
processes, you could run this command.
ps -ax|grep "firefox"
You can also pipe
grep output to another program. For example, if a
command’s output is too large to fit on your screen, you might want to
grep -rA 2 "bin/.*sh" ~/scripts | less
How to use
grep with regex
Knowing how to use regex patterns can be really useful for grepping.
Here’s the last Shakespeare example: how would you search for every
derivative form of the word
love in Romeo and Juliet?
A match should print lines with forms like “lovers” or “loving,” but not lines with only the base word, “love.”
First, you can make your search case-insensitive.
grep -i "love" romeo-and-juliet.txt
However, this prints false positives, like “glove”.
Add the word boundary character,
grep -i "\blove\b" romeo-and-juliet.txt
Now you’ve got all matches of
love, including the word itself. But the task requires only derivatives of the word.
Extend the base pattern with the
The base expression is
lov. This expression is in all derivatives, like “loving” and for “lover”. To match any character, use the
grep -i "\blov.\+\b" romeo-and-juliet.txt
Unfortunately, this commands prints exactly what we don’t want: only lines with the word “love”.
\wword character to search for other word characters.
\wsearches of all word-like charcters, i.e. alphanumerics.
grep -i "\blov.\w\b" romeo-and-juliet.txt
Much better! But this matches only 5-letter derivations, like “lover”.
Expand the pattern with
\+character matches one or more instances of the preceding item. In this instance, it searches for one or more instance of a
\wcharacter (i.e., any word character).
grep -i "\blov.\+\b" romeo-and-juliet.txt
-Eflag to avoid escaping all special charaters, like
Bingo! Let’s look at the first five results:
lovers loving lovers lovers lov’d
Besides predictable forms, like “loving”, there’s also surprising
antiquated forms, like “lov’d”. A good
grep can be pleasantly
How to use
grep with process substition
Maybe you want to search for a regular expression that changes
dynamically. In these cases, you can use process substition and
variables to pass expressions to
For example, consider a log file where each line starts with a date, in
YYYY-MM-DD. Something like.
2021-12-02 <more-text> 2021-12-02 <more-text> 2021-12-01 <more-text> 2021-11-29 <more-text> ... 1970-01-01 <more-text>
Perhaps you want to know about how many times an event was logged in the
current month. With process substition, you could use
grep to create a
#!/bin/bash #Print out how many times an event occurred this month month=$(date +%B) # gets name of month search=$(date +%Y-%m) # makes a search term from date, in YYYY-MM format file=long-text.log count=$(grep -c "^$search" "$file") "uses expression to count events in $file" echo "This month, $month, has logged $count events.
When is grep not so great?
The beauty of
grep is its simplicity. Don’t get too complicated.
In the examples of the tutorial and how-to, the data is well-structured, and the queries are relatively simple.
For example, the Romeo and Juliet text has a very precise way of defining when dialogue happens, and how stage direction happens. The log file might be very long file, but it also has very regular patterns. Every line begins with a date in one format.
For simple searches, or for matches across a large set of files,
is very powerful.
However if you want to manipulate text, or work with specific fields of
a file, you’ll probably want to use a more specific tool, like
For advanced text searching, like with natural langauge processing, it’s probably time to use a language with dedicated libraries to help you achieve your task.
grep? Here’s some
- Video: Brian Kernighan talks about the origins of
grep. One night in 1971.
- The GNU
grepmanual. Everything that’s possible with GNU’s implementation of
- Why is GNU
grepso fast? A technical discussion of an implementation.
The whole point with “everything is a file” is not that you have some random filename (indeed, sockets and pipes show that “file” and “filename” have nothing to do with each other), but the fact that you can use common tools to operate on different things.
The UNIX philosophy is often quoted as “everything is a file”, but that really means “everything is a stream of bytes”
— Linus Torvalds, Newsgroups: fa.linux.kernel