Home  >  Article  >  Backend Development  >  Summary of regular expression characters

Summary of regular expression characters

小云云
小云云Original
2018-02-22 13:14:031502browse

Basic regular expression

Matches a single character

Matches the writing of a single number, which can be "[0-9]" or " \d”.

matches a single non-numeric character , then use uppercase "\D".

Matches any and of the 26 letters, use "[a-zA-Z]"

Matches any one character, use the period If "."

matches specific characters, just write it directly. For example, "abcd" matches itself. If you encounter special characters, you need to escape , and the escape character is "\</code>".

matches a character and the use of square brackets is called "character set". Square brackets are used to specify a "set", matching a character in this set, such as the hexadecimal number "[0-9a-fA-F]". The dot in the character set represents the dot itself , but other special characters still need to be transferred, such as the backslash character.

Use quantifiers

Greedy matching

If you want to express the repetition of a rule, you need to use quantifiers. Use curly braces to indicate the number of repetitions. For example, 8 numbers can be expressed like this: "\d{8}"

The quantifiers in the curly brackets can be changed. For example, if 7 to 8 numbers are expressed, it is expressed as " \d{7,8}". The rvalue representing the upper limit does not need to be written. For example, "{0,}" is legal, indicating that it is greater than or equal to 0 characters; but "{,10}" is trying to express the upper limit alone. ” is illegal and should at least be written as “{0,10}”.

The plus sign "+" indicates that the number of elements to its left is "one or more", which is equal to the effect of "{1,}". So the plus sign is also a special character.

The asterisk "*" means that the number of elements to its left is "zero or at least one", that is, "{0,}".

The question mark "?" means "zero or one", which is equivalent to "{0,1}".

Lazy matching

The above items such as + and * will use the "greedy" pattern when matching. That is, match as many numbers as possible. For example, if you use "5+" to match the string "55555", it will match the longest string it can find, which is "55555".

If you add a question mark after the quantifier, the matching pattern will become "lazy", which is the one with the least matching. For example, if you use "5+?" to match, you will only find the smallest matching character "5".

The following are available lazy matching expressions: +? , *?, {n,}?, {m,n}?

Capture grouping (similar to macro definition )

You can "capture" part of the expression and reference it later as a macro. Use brackets to define (capture), and then use "\1" after the definition for reference; if it is the second capture, use "\2", and so on.

Groups are generally saved, but when the expression is very long, it may be necessary to explicitly indicate not to save the group. For example, if you use the format "(?:THE|The|the)", you use the "?:" label to indicate that no naming tags are required.

"OR" logic

Use "|" to link two fields to provide "OR" logic. Note the use of

"not" logic with parentheses

If the character "^" is used in the set "[...]", It means "not", for example, "[^0-9]" is equivalent to "\D".

Summary of regular expression characters

Simple pattern matching

The following is a list of commonly used single character matches:

##Number##Letters, numbers, underscores\w[_a-zA-Z0-9]##non-digitNon-letter\t\0##Backspace[\b] is equivalent to "”\bThis only matches the beginning/end of the word, no characters are consumedAny characterThe line terminator cannot be matched using this symbol

Summary of regular expression characters

Boundary

This section designs a concept: Assertion, also known as "Zero-width assertion ( zero-width assertion)". This concept does not match characters, but positions in the string.

Start and end of line

  • Use "^" to indicate the beginning of a line

  • Use "$" to indicate the end of a line

Word boundaries and non-word boundaries

For example, to match the word "the", write "\bthe\b”. If you want to match words with "e" in the middle of "brother", you can write "\Be\B"

You can use "\<" to match the beginning of the word, " \>” matches the end of a word. However, these two are not recommended because new matchers may not support them.

Summary of regular expression characters

Unicode characters and other characters

Regular expressions support inputting unicode values, such as “\u00e9". Note that unicode must have four hexadecimal digits, either upper or lower case. Javascript also supports "\xe9", but "\x00e9" is wrong.

Related recommendations:

js regular expression verification time format example

Regular expression \v metacharacter detailed explanation

JS regular expression key points analysis


Reference type Pattern Remarks
\d
Equivalent to ""
\D
\W
##Tab characterTab
##Null character

##Space
\s
[ \t\n\r] ##Return \r
Line break \n
##Space between words
.

The above is the detailed content of Summary of regular expression characters. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn