Regular expressions - metacharacters
The following table contains the complete list of metacharacters and their behavior in the context of regular expressions:
Character | Description |
---|---|
\ | Mark the next character as a special character, or A literal character, or a backreference, or an octal escape character. For example, 'n' matches the character "n". '\n' matches a newline character. The sequence '\\' matches "\" and "\(" matches "(". |
^ | matches the beginning of the input string. If set The Multiline property of the RegExp object, ^ also matches the position after '\n' or '\r'. |
matches the end position of the input string. When the Multiline property of the RegExp object is set, $ also matches the position before '\n' or '\r' | |
matches the preceding subexpression zero. times or more. For example, zo* can match "z" and "zoo". * is equivalent to {0,}. Subexpression one or more times. For example, 'zo+' matches "zo" and "zoo", but not "z" + is equivalent to {1,}. #? | Matches the preceding subexpression zero or one time. For example, "do(es)?" can match "do" in "do" or "does". Equivalent to {0, 1}. |
##{n} | n is a non-negative integer. For example, 'o{2}' cannot match "Bob". 'o' in "food", but can match two o's in "food". |
n is a non-negative integer that matches at least n. times. For example, 'o{2,}' cannot match 'o' in "Bob", but it can match all o's in "foooood". 'o{1,}' is equivalent to 'o+'. 0,}' is equivalent to 'o*'. | |
m and n are both non-negative integers, where n <= m. Matches at least n times and at most m times. For example, "o{1,3}" will match the first three o's in "fooooood", which is equivalent to 'o?'. Please note that there cannot be a space between the comma and the two numbers | |
when this character is followed by any other limiter (*, +, ? , {n}, {n,}, {n,m}), the matching mode is non-greedy. The non-greedy mode matches as little of the searched string as possible, while the default greedy mode matches as much as possible. The string being searched for. For example, for the string "oooo", 'o+?' will match a single 'o', while 'o+' will match all 'o's. | Matches any single character except "\n". To match any character including '\n', use a pattern like '[.\n]'. |
(pattern) | Match pattern and get this match. The matches obtained can be obtained from the generated Matches collection, using the SubMatches collection in VBScript or the $0…$9 properties in JScript.To match parentheses characters, use '\(' or '\)'. |
(?:pattern) | Matches the pattern but does not obtain the matching result, which means that this is a non-acquisition match and is not stored for later use. This is useful when using the "or" character (|) to combine parts of a pattern. For example, 'industr(?:y|ies) is a shorter expression than 'industry|industries'. |
(?=pattern) | Forward lookup, matches the search string at the beginning of any string matching pattern. This is a non-fetch match, that is, the match does not need to be fetched for later use. For example, 'Windows (?=95|98|NT|2000)' matches "Windows" in "Windows 2000" but not "Windows" in "Windows 3.1". Prefetching does not consume characters, that is, after a match occurs, the search for the next match begins immediately after the last match, rather than starting after the character containing the prefetch. |
(?!pattern) | Negative lookup, matches the search string at the beginning of any string that does not match pattern. This is a non-fetch match, that is, the match does not need to be fetched for later use. For example, 'Windows (?!95|98|NT|2000)' can match "Windows" in "Windows 3.1", but not "Windows" in "Windows 2000". Prefetching does not consume characters, that is, after a match occurs, the search for the next match begins immediately after the last match, rather than starting after the character containing the prefetch. |
x|y | Matches x or y. For example, 'z|food' matches "z" or "food". '(z|f)ood' matches "zood" or "food". |
[xyz] | Character collection. Matches any one of the characters contained. For example, '[abc]' matches 'a' in "plain". |
[^xyz] | Negative value character set. Matches any character not included. For example, '[^abc]' matches 'p', 'l', 'i', 'n' in "plain". |
[a-z] | Character range. Matches any character within the specified range. For example, '[a-z]' matches any lowercase alphabetic character in the range 'a' through 'z'. |
[^a-z] | Negative character range. Matches any character not within the specified range. For example, '[^a-z]' matches any character that is not in the range 'a' to 'z'. |
\b | Matches a word boundary, which refers to the position between a word and a space. For example, 'er\b' matches 'er' in "never" but not in "verb". |
\B | Matches non-word boundaries. 'er\B' matches 'er' in "verb" but not in "never". |
\cx | Matches the control character specified by x.For example, \cM matches a Control-M or carriage return character. The value of x must be one of A-Z or a-z. Otherwise, c is treated as a literal 'c' character. |
\d | Matches a numeric character. Equivalent to [0-9]. |
\D | Matches a non-numeric character. Equivalent to [^0-9]. |
\f | Matches a form feed character. Equivalent to \x0c and \cL. |
\n | Matches a newline character. Equivalent to \x0a and \cJ. |
\r | Matches a carriage return character. Equivalent to \x0d and \cM. |
\s | Matches any whitespace character, including spaces, tabs, form feeds, etc. Equivalent to [ \f\n\r\t\v]. |
\S | Matches any non-whitespace character. Equivalent to [^ \f\n\r\t\v]. |
\t | Matches a tab character. Equivalent to \x09 and \cI. |
\v | Matches a vertical tab character. Equivalent to \x0b and \cK. |
\w | Matches any word character including an underscore. Equivalent to '[A-Za-z0-9_]'. |
\W | Matches any non-word character. Equivalent to '[^A-Za-z0-9_]'. |
\xn | Matches n, where n is the hexadecimal escape value. The hexadecimal escape value must be exactly two digits long. For example, '\x41' matches "A". '\x041' is equivalent to '\x04' & "1". ASCII encoding can be used in regular expressions. |
\num | Matches num, where num is a positive integer. A reference to the match obtained. For example, '(.)\1' matches two consecutive identical characters. |
\n | Identifies an octal escape value or a backreference. If \n is preceded by at least n fetched subexpressions, n is a backward reference. Otherwise, if n is an octal number (0-7), then n is an octal escape value. |
\nm | Identifies an octal escape value or a backreference. If \nm is preceded by at least nm get-subexpressions, nm is a backward reference. If \nm is preceded by at least n obtains, n is a backward reference followed by a literal m. If neither of the previous conditions is true, then \nm will match the octal escape value nm if n and m are both octal digits (0-7). |
\nml | If n is an octal number (0-3), and m and l are both octal numbers (0-7), then match the octal escape Value nml. |
\un | Matches n, where n is a Unicode character represented by four hexadecimal digits. For example, \u00A9 matches the copyright symbol (?). |