Home  >  Article  >  php教程  >  Detailed explanation of grep usage grep and regular expressions

Detailed explanation of grep usage grep and regular expressions

高洛峰
高洛峰Original
2016-12-13 14:28:031160browse

Regular expression is just a representation. As long as the tool supports this representation, then the tool can process regular expression strings. vim, grep, awk, and sed all support regular expressions, and it is precisely because they support regular expressions that they are powerful; in the company I worked in before, because the company was a web-based service website (nginx), regular expressions were The demand is relatively large, so I also spent some time studying regular expressions, and I would like to share them with you:

1 Basic regular expressions
grep tool, which has been introduced before.
grep -[acinv] 'Search content string' filename
-a Search in text file
-c Count the number of matching lines found
-i Ignore case
-n Output the line number by the way
-v Reverse selection, That is, find the lines without the search string
The search string can be a regular expression!

1
Search for the lines with the and output the line number
$grep -n 'the' regular_express.txt
Search for the lines without the, And output the line number
$grep -nv 'the' regular_express.txt

2 Use [] to search for set characters
[] represents one of the characters, for example [ade] represents a or d or e
woody@xiaoc:~ /tmp$ grep -n 't[ae]st' regular_express.txt
8:I can't finish the test.
9:Oh! the soup taste good!

You can use the ^ symbol as a prefix within []. Indicates characters other than characters in [].
For example, search for the line containing a string without g before oo. Use '[^g]oo' as the search string
woody@xiaoc:~/tmp$ grep -n '[^g]oo' regular_express.txt
2 :apple is my favorite food.
3:Football game is not use feet only.
18:google is the best tools for search keyword.
19:goooooogle yes!

[] can be represented by a range, such as [a-z] Represents lowercase letters, [0-9] represents numbers from 0 to 9, and [A-Z] represents uppercase letters. [a-zA-Z0-9] represents all numbers and English characters. Of course, you can also use ^ to exclude characters.
Search for lines containing numbers
woody@xiaoc:~/tmp$ grep -n '[0-9]' regular_express.txt
5:However, this dress is about $ 3183 dollars.
15:You are the best is menu you are the no.1.

Start and end of line characters^ $. ^ represents the beginning of the line, $ represents the end of the line (not a character, but a position), then '^$' represents a blank line, because there are only
lines Beginning and end of line.
The meaning of ^ here is different from the ^ used in []. It means that the string after ^ is at the beginning of the line.
For example, search for lines starting with the
woody@xiaoc:~/tmp$ grep -n '^the' regular_express.txt
12: the symbol '*' is represented as star.

Search for lines starting with lowercase letters
woody@xiaoc:~/tmp$ grep -n '^[a-z]' regular_express.txt
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as star.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
woody@xiaoc:~/tmp$

Search for lines that do not start with English letters
woody@xiaoc:~/tmp$ grep -n '^[^a-zA-Z]' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
21: #I am VBird
woody@xiaoc:~/tmp$

$ means that the string before it is at the end of the line, such as '.' means. At the end of a line
Search for lines with . at the end
woody@xiaoc :~/tmp$ grep -n '.$' regular_express.txt //. is a special symbol for regular expressions, so escape it
1: "Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
5:However ,this dress is about $ 3183 dollars.
6:GNU is free air not free beer .
.....

Note that in text files generated under MS systems, a ^M character will be added to the line break. So the last character will be hidden ^M, please pay special attention when processing text under Windows
!
You can use cat dos_file | tr -d 'r' > unix_file to delete the ^M symbol. ^M==r

Then '^$' means only the empty lines at the beginning and end of the line are pulled!
Search for empty lines
woody@xiaoc:~/tmp$ grep -n '^$' regular_express.txt
22:
23:
woody@xiaoc:~/tmp$

Search for non-empty lines
woody@xiaoc:~ /tmp$ grep -vn '^$' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4 :this dress doesn't fit me.
..........

any character. With repeated characters *

In bash, * represents a wildcard character, which is used to represent any number of characters, but in regular expressions, its meaning is different. * represents 0 or more of a certain character.
For example, oo*, means that the first o must exist, and the second o can have one, more, or none, so it represents at least one o.

Dot. represents an arbitrary character and must exist. g??d can be represented by 'g..d'. good ,gxxd ,gabd....are all consistent.

woody@xiaoc:~/tmp$ grep -n 'g..d' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
9:Oh! the soup tastes good!
16: The world is the same with 'glad'.
woody@xiaoc:~/tmp$

Search for strings with more than two o's
woody@xiaoc:~/tmp$ grep -n 'ooo*' regular_express.txt // The first two o's must exist, but the third o's may not exist, or there may be multiple o's.
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! the soup taste good!
18:google is the best tools for search keyword.
19:goooooogle yes!

Search for a string that starts and ends with g and has at least one o in the middle, that is, gog, goog....gooog...etc.
woody@xiaoc:~/ tmp$ grep -n 'goo*g' regular_express.txt
18:google is the best tools for search keyword.
19:goooooogle yes!

Search for the lines starting and ending with g
woody@xiaoc:~ /tmp$ grep -n 'g.*g' regular_express.txt // .* represents 0 or more arbitrary characters
1:"Open Source" is a good mechanism to develop programs.
14:The gd software is a library for drafting programs.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.


Limit the range of consecutive repeated characters { }
. * Can only limit 0 or more, if you want to specifically limit the number of repeated characters, use {range}. The range is used for numbers, separated by 2,5 to represent 2~5,
2 represents 2, 2, represents 2 or more
Note that since { } has a special meaning in SHELL, it is used as a regular expression Sometimes you need to use escaping.

Search for lines containing a string with two o's.
woody@xiaoc:~/tmp$ grep -n 'o{2}' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! the soup taste good!
18:google is the best tools for search keyword.
19:goooooogle yes!

Search for g followed by 2~5 o’s, followed by one g's row of strings.
woody@xiaoc:~/tmp$ grep -n 'go{2,5}g' regular_express.txt
18:google is the best tools for search keyword.


Search contains g followed by more than 2 o's, followed by Then follow the line of g. .
woody@xiaoc:~/tmp$ grep -n 'go{2,}g' regular_express.txt
18:google is the best tools for search keyword.
19:goooooogle yes!


Note, give in [] The ^ - does not express any special meaning and can be placed after the content in [].
'[^a-z.!^ -]' means no lowercase letters, no. No!, no spaces, no - strings. Note that there is a small space in [].

In addition, the reverse selection in the shell is [!range], and in the regular expression is [^range]


2 Extended regular expressions

Extended regular expressions add several special components to the basic regular expressions.
It makes certain operations more convenient.
For example, if we want to remove blank lines and lines starting with #, we would use this:
woody@xiaoc:~/tmp$ grep -v '^$' regular_express.txt | grep -v '^#'
"Open Source " is a good mechanism to develop programs.
apple is my favorite food.
Football game is not use feet only.
this dress doesn't fit me.
............

However It will be much more convenient to use egrep which supports extended regular expressions and the extended special symbol |.
Note that grep only supports basic expressions, while egrep supports extended expressions. In fact, egrep is just an alias of grep -E. Therefore grep -E supports extended regular expressions.
Then:
woody@xiaoc:~/tmp$ egrep -v '^$|^#' regular_express.txt
"Open Source" is a good mechanism to develop programs.
apple is my favorite food.
Football game is not use feet only.
this dress doesn't fit me.
........................
here| means the relationship of or. That is, a string that satisfies ^$ or ^#.

Here are several extended special symbols:
+, which has a similar function to . * and represents one or more repeated characters.
?, has a similar function to . * and represents 0 or one character.
|, represents or relationship, for example, 'gd|good|dog' represents a string containing gd, good or dog
(), which combines part of the content into a unit group. For example, if you want to search for glad or good, you can use 'g(la|oo)d'
(). The advantage is that you can use + ? * etc. for groups.
For example, if you want to search for strings that start and end with A and C, and have at least one (xyz) in the middle, you can do this: 'A(xyz)+C'

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn