Home > Article > Backend Development > The meaning of special characters in php regular expressions_PHP tutorial
An article about the meaning of special characters in regular expressions. I hope it will be helpful to everyone.
Character/
Meaning: For characters, it usually means literal meaning, indicating that the following characters are special characters without explanation.
For example: /b/ matches the character 'b'. By adding a backslash in front of b, which is /b/, the character becomes a special character, which means
Match a word's boundary.
Or:
For several characters, the specification is usually special, indicating that the following characters are not special and should be interpreted literally.
For example: * is a special character, matching any number of characters (including 0 characters); for example: /a*/ means matching 0 or more a's. To match a literal *, precede a with a backslash; for example: /a*/ matches 'a*'.
Character^
Meaning: Indicates that the matching character must be at the front.
For example: /^A/ does not match the 'A' in "an A," but matches the first 'A' in "An A.".
Character $
Meaning: Similar to ^, matches the last character.
For example: /t$/ does not match the 't' in "eater", but does match the 't' in "eat".
Character*
Meaning: Match the character before * 0 or n times.
For example: /bo*/ matches 'boooo' in "A ghost booooed" or 'b' in "A bird warbled", but not "Agoat g
runted".
Character+
Meaning: Match the character before the + sign 1 or n times. Equivalent to {1,}.
For example: /a+/ matches the 'a' in "candy" and all the 'a's in "caaaaaaandy."
Character?
Meaning: Match the character before ? 0 or 1 times.
For example: /e?le?/ matches 'el' in "angel" and 'le' in "angle.".
Character.
Meaning: (decimal point) matches all single characters except newline characters.
For example: /.n/ matches 'an' and 'on' in "nay, an apple is on the tree", but does not match 'nay'.
Character(x)
Meaning: Match 'x' and record the matching value.
For example: /(foo)/ matches and records 'foo' in "foo bar." Matching substrings can be returned by elements [1], ...,[n] in the result array
Returned, or returned by the properties, ..., of the RegExp object.
Character x│y
Meaning: Match 'x' or 'y'.
For example: /green│red/ matches the 'green' in "green apple" and the 'red' in "red apple."
Character { n }
Meaning: n here is a positive integer. Matches the first n characters.
For example: /a{ 2 }/ does not match the 'a' in "candy," but matches all the 'a's in "caandy," and the first two 'a's in "caaandy.".
Characters { n, }
Meaning: n here is a positive integer. Matches at least n previous characters.
For example: /a{ 2, } does not match 'a' in "candy", but matches all 'a' in "caandy" and all 'a' in "caaaaaaandy."
Character {n,m}
Meaning: n and m here are both positive integers. Matches at least n and at most m previous characters.
For example: /a{ 1,3 }/ does not match any characters in "cndy", but matches the 'a' in "candy," and the first two
in "caandy,"
'a' and the first three 'a's in "caaaaaaandy", note: even if there are many 'a's in "caaaaaaandy", only the first three 'a's, that is, "aaa" are matched.
Character[xyz]
Meaning: A list of characters, matching any character in the list. You can specify a range of characters using the hyphen -.
For example: [abcd] is the same as [a-c]. They match the 'b' in "brisket" and the 'c' in "ache".
Character[^xyz]
Meaning: The one-character complement, that is, it matches everything except the listed characters. You can use hyphens to indicate a range of characters.
For example: [^abc] and [^a-c] are equivalent, they first match the 'r' in "brisket" and the 'h' in "chop.".
Characters
Meaning: Matches a space (not to be confused with b)
Character b
Meaning: Match a word boundary, such as a space (not to be confused with)
For example: /bnw/ matches 'no' in "noonday", /wyb/ matches 'ly' in "possibly yesterday."
Character B
Meaning: Match the non-breaking line of a word
For example: /wBn/ matches 'on' in "noonday", /yBw/ matches 'ye' in "possibly yesterday."
Character cX
Meaning: The X here is a control character. Matches a string of control characters.
For example: /cM/ matches control-M in a string.
Character d
Meaning: Match a number, equivalent to [0-9].
For example: /d/ or /[0-9]/ matches '2' in "B2 is the suite number."
Character D
Meaning: Matches any non-number, equivalent to [^0-9].
For example: /D/ or /[^0-9]/ matches 'B' in "B2 is the suite number."
Character f
Meaning: Match a form character
Character n
Meaning: Match a newline character
Character r
Meaning: Match a carriage return
Characters
Meaning: Matches a single white space character, including space, tab, form feed, and newline character, equivalent to [fnrtv].
For example: /sw*/ matches 'bar' in "foo bar."
Character S
Meaning: Matches a single character other than white space, equivalent to [^ fnrtv].
For example: /S/w* matches 'foo' in "foo bar."
Character t
Meaning: Match a tab character
Character v
Meaning: Matches a leading tab character
Character w
Meaning: Matches all numbers, letters and underscores, equivalent to [A-Za-z0-9_].
For example: /w/ matches the 'a' in "apple,", the '5' in ".28," and the '3' in "3D.".
Character W
Meaning: Matches other characters except numbers, letters and underscores, equivalent to [^A-Za-z0-9_].
For example: /W/ or /[^$A-Za-z0-9_]/ matches the '%' in "50%.".
Character n
Meaning: n here is a positive integer. The value of n that matches the last substring of a regular expression (counting left parentheses).
For example: /apple(,)sorange1/ matches 'apple, orange' in "apple, orange, cherry, peach." There is a more complete example below.
Note: If the number in the left parenthesis is smaller than the number specified by n, n takes the octal escape of the next line as the description.
Characters ooctal and xhex
Meaning: ooctal here is an octal escape value, and xhex is a hexadecimal escape value, allowing ASCII code to be embedded in a regular expression
PS: The following table is a complete list of metacharacters and their behavior in the context of regular expressions:
Character Description
Mark the next character as a special character, or a literal character, or a backreference, or an octal escape character. For example, 'n' matches the character "n". 'n' matches a newline character. The sequence '' matches "" and "(" matches "(".
^
Matches the beginning of the input string. If the Multiline property of the RegExp object is set, ^ also matches the position after 'n' or 'r'.
$
Matches the end of the input string. If the Multiline property of the RegExp object is set, $ also matches the position before 'n' or 'r'.
*
Matches the preceding subexpression zero or more times. For example, zo* matches "z" and "zoo". * Equivalent to {0,}.
+ Matches the preceding subexpression one or more times. For example, 'zo+' matches "zo" and "zoo", but not "z". + is equivalent to {1,}.
?
Matches the preceding subexpression zero or one time. For example, "do(es)?" matches "do" or "do" in "does". ? Equivalent to {0,1}.
{n}
n is a nonnegative integer. Match determined n times. For example, 'o{2}' cannot match the 'o' in "Bob", but it can match the two o's in "food".
{n,}
n is a nonnegative integer. Match at least n times. For example, 'o{2,}' does not match the 'o' in "Bob", but it matches all o's in "foooood". 'o{1,}' is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'.
{n,m}
Both m and n are non-negative integers, where n ?
When this character immediately follows any of the other qualifiers (*, +, ?, {n}, {n,}, {n,m}), the matching pattern is non-greedy. Non-greedy mode matches as little of the searched string as possible, while the default greedy mode matches as much of the searched string as possible. For example, for the string "oooo", 'o+?' will match a single "o", while 'o+' will match all 'o's.
.
Matches any single character except "n". To match any character including 'n', use a pattern like '[.n]'.
(pattern)
Match pattern and get this match. The matches obtained can be obtained from the generated Matches collection, using the SubMatches collection in VBScript or the {CONTENT}... attribute in JScript. To match parentheses characters, use '(' or ')'.
(?:pattern)
Matches the pattern but does not obtain the matching result, which means that this is a non-retrieval match and is not stored for later use. This is useful when using the "or" character (|) to combine parts of a pattern. For example, 'industr(?:y|ies) is a shorter expression than 'industry|industries'.
(?=pattern)
Forward lookup, matches the search string at the beginning of any string matching pattern. This is a non-fetch match, that is, the match does not need to be fetched for later use. For example, 'Windows (?=95|98|NT|2000)' matches "Windows" in "Windows 2000" but not "Windows" in "Windows 3.1". Prefetching does not consume characters, that is, after a match occurs, the search for the next match begins immediately after the last match, rather than starting after the character containing the prefetch.
(?!pattern)
Negative lookahead matches the search string at any point where a string not matching pattern starts. This is a non-fetch match, that is, the match does not need to be fetched for later use. For example, 'Windows (?!95|98|NT|2000)' can match "Windows" in "Windows 3.1", but not "Windows" in "Windows 2000". Prefetching does not consume characters, that is, after a match occurs, the search for the next match starts immediately after the last match, rather than starting after the character containing the prefetch
x|y
Match x or y. For example, 'z|food' matches "z" or "food". '(z|f)ood' matches "zood" or "food".
[xyz]
Character collection. Matches any one of the characters contained. For example, '[abc]' matches 'a' in "plain".
[^xyz]
A collection of negative characters. Matches any character not included. For example, '[^abc]' matches 'p' in "plain".
[a-z]
Character range. Matches any character within the specified range. For example, '[a-z]' matches any lowercase alphabetic character in the range 'a' to 'z'.
[^a-z]
Negative character range. Matches any character not within the specified range. For example, '[^a-z]' matches any character not in the range 'a' to 'z'.
b
Matches a word boundary, which is the position between a word and a space. For example, 'erb' matches 'er' in "never" but not "er" in "verb".
B
Match non-word boundaries. 'erB' matches the 'er' in "verb", but not the 'er' in "never".
cx
Matches the control character specified by x. For example, cM matches a Control-M or carriage return character. The value of x must be one of A-Z or a-z. Otherwise, c is treated as a literal 'c' character.
d
Matches a numeric character. Equivalent to [0-9].
D
Matches a non-numeric character. Equivalent to [^0-9].
f
Matches a form feed. Equivalent to x0c and cL.
n
Matches a newline character. Equivalent to x0a and cJ.
r
Matches a carriage return character. Equivalent to x0d and cM.
s
Matches any whitespace character, including spaces, tabs, form feeds, and so on. Equivalent to [fnrtv].
S
Matches any non-whitespace character. Equivalent to [^ fnrtv].
t
Matches a tab character. Equivalent to x09 and cI.
v
Matches a vertical tab character. Equivalent to x0b and cK.
w
Matches any word character including an underscore. Equivalent to '[A-Za-z0-9_]'.
W
Matches any non-word character. Equivalent to '[^A-Za-z0-9_]'.
xn
Matches n, where n is the hexadecimal escape value. The hexadecimal escape value must be exactly two digits long. For example, 'x41' matches "A". 'x041' is equivalent to 'x04' & "1". ASCII encoding can be used in regular expressions. .
num
Matches num, where num is a positive integer. A reference to the match obtained. For example, '(.)' matches two consecutive identical characters.
n
Identifies an octal escape value or a backreference. n is a backreference if n is preceded by at least n fetched subexpressions. Otherwise, if n is an octal number (0-7), then n is an octal escape value.
nm
Identifies an octal escape value or a backreference. If nm is preceded by at least nm fetched subexpressions, nm is a back reference. If nm is preceded by at least n obtains, then n is a backreference followed by the literal m. If none of the previous conditions are met, then nm will match the octal escape value nm if n and m are both octal numbers (0-7).
nml
If n is an octal digit (0-3), and m and l are both octal digits (0-7), then the octal escape value nml is matched.
un
Matches n, where n is a Unicode character represented as four hexadecimal digits. For example, u00A9 matches the copyright symbol (?).
Excerpted from qeenoo’s blog