This article organizes the metacharacters of C# regular expressions. Regular expressions are expressions composed of characters. Each character represents a rule. The characters in the expression are divided into two types: ordinary characters and metacharacters. Ordinary characters refer to characters whose literal meaning remains unchanged and match text in an exact match manner, while metacharacters have special meanings and represent a type of character. Treat text as a stream of characters, with each character placed in a position. For example, the regular expression "Room\d\d\d", the first four The character Room is an ordinary character, and the following character \ is an escape character. It forms a metacharacter \d with the following character d, which means there is any number at that position. Described in the language of regular expressions: the regular expression "Room\d\d\d" captures a total of 7 characters, which means "starting with Room and ending with A type of string ending with three digits. We call this type of string a pattern, also called a regular pattern. 1. Escape characters The escape character is \, which escapes ordinary characters into metacharacters with special meanings. Commonly used The escape characters are: \t: horizontal tab character \v: vertical tab character \r: Carriage return \n: Line feed \\: Represents the character \, that is Say, escape the escape character \ into the ordinary character \ \": represents the character". In C#, double quotes are used to define strings. The double quotes contained in the string Quotes use \" to represent . Second, the character class treats the input text as having In a sequential character stream, character class metacharacters match characters and capture characters. The so-called capture characters mean that characters captured by one metacharacter will not be matched by other metacharacters, and subsequent metacharacters can only be matched from the remaining metacharacters. Match again in the text. Commonly used character class metacharacters: [char_group]: Match any character in the character group [^char_group]: Match any character except character group [first-last]: Match the character range from first to last Any character, the character range includes first and last. .: Wildcard, matches any character except \n \w: Match any word character, word characters usually refer to A-Z, a-z and 0-9 \W: Match any non-word character, except A-Z Characters other than , a-z and 0-9 \s: Matches any whitespace character \S: Matches any non-whitespace character Characters \d: Matches any numeric character \D: Matches any non-numeric character Note , escape characters also belong to character class metacharacters, and characters will also be captured when performing regular matching. Three, locator The object of locator matching (or capture) is position. It determines whether the pattern matching is successful based on the position of the character. The locator does not capture characters and is zero-width (width is 0). Commonly used locators are: ^: By default, matches the beginning of the string; in multi-line mode, matches the beginning of each line; $: By default, match the end position of the string, or the position before the \n at the end of the string; in multi-line mode, match the position before the end of each line, or the position before the \n at the end of each line. \A: matches the beginning position of the string; \Z: matches the end position of the string, or \n at the end of the string Previous position; \z: Match the end position of the string; \G: Match the end position of the previous match; \b: Matches the beginning or end of a word; \B: Matches the middle position of a word; 4. Quantifiers, greed and laziness Quantifiers refer to limiting the number of occurrences of a previous regular pattern. Quantifiers are divided into There are two modes: greedy mode and lazy mode. Greedy mode means matching as many characters as possible, while lazy mode means matching as few characters as possible. By default, quantifiers are in greedy mode. Add ? after the quantifier to enable lazy mode. *: Occurs 0 or more times : Occurs 1 or more times ?: Appears 0 or 1 times {n}: Appears n times {n,}: Appears at least n times {n,m}: appears n to m times Note that multiple occurrences means that the preceding metacharacter appears multiple times, for example, \d {2} is equivalent to \d\d, except that two numbers appear, and the two numbers are not required to be the same. To represent the same two numbers, grouping must be used. 5. Grouping and capturing characters () Parentheses not only determine the scope of the expression, but also To create a group, the expression within () is a group, and the reference group means that the text matched by the two groups is exactly the same. The basic syntax for defining a grouping: (pattern) This type of grouping will capture characters. The so-called capturing characters refer to: a element Characters captured by characters will not be matched by other metacharacters, and subsequent metacharacters can only be rematched from the remaining text. 1, group number and naming By default, each group is automatically assigned a group number. The rules are: from left to right, according to the left bracket of the group They are numbered in order of appearance. The first group has a group number of 1, the second group has a group number of 2, and so on. You can also specify a name for the group. This group is called a named group. The named group will also be automatically numbered. The number starts from 1 and increases by 1 one by one. The syntax for specifying a name for the group is: (?< name > pattern)## Generally speaking, Groups are divided into named groups and numbered groups. The ways to reference a group are: Reference a group by group name:\kReference a group by group number:\ number Note that groups can only be referenced backward, that is, starting from the left side of the regular expression text, the group must be defined before it can be referenced later. The syntax for referencing groups in regular expressions is "\number". For example, "\1" represents the substring matching group 1, and "\2" represents the string matching group 2. In this way analogy. For example, "<(.*?)>.*?\1>" can match valid, when referencing the group, the text corresponding to the group are exactly the same. 2, grouping constructor The grouping construction method is as follows: (pattern): Capture matching subexpression, and assign a group number to the group (?< name > pattern): Capture the matching subexpression into a named group (?:pattern): Non-capture grouping, a group number is not assigned to the group (?> pattern): Greedy grouping 3, greedy groupingGreedy grouping is also called non-backtracking grouping. This grouping disables backtracking. The regular expression engine will match as many characters in the input text as possible. character. If no further matches can be made, there is no backtracking to try additional pattern matches. (?> pattern ) 4, choose one of the two | means Is or, matches either one of the two. Note that | divides the expressions on the left and right into two parts. pattern1 | pattern2 Six, zero-width assertionZero-width means that the width is 0, and the matching is position, so the matched substring will not Appears in the matching result, and assertion refers to the result of judgment. Only when the assertion is true can the match be considered successful. For locators, you can match the beginning and end of a sentence (^ $) or the beginning and end of a word (\b). These metacharacters only match one position, specifying that this position satisfies Certain conditions, rather than matching certain characters, are therefore called Zero-width assertions. The so-called zero-width means that they do not match any characters, but match a position; the so-called assertion refers to a judgment, and the regular expression will continue to match only when the assertion is true. Zero-width assertions can match an exact position, rather than simply specifying a sentence or word. Regular expressions treat text as a flow of characters from left to right. To the right is called backward (Look behind), and to the left is called forward (Look ahead). For regular expressions, only when the specified pattern (Pattern) is matched, the assertion is True, which is called a positive expression, and the unmatched pattern is True, which is called a negative expression. According to the direction of matching and the qualitative nature of matching, zero-width assertions are divided into four types: (?= pattern):前向、肯定断言 (?! pattern):前向、否定断言 (?<= pattern):后向、肯定断言 (? pattern):后向、否定断言 1,前向肯定断言 前向肯定断言定义一个模式必须存在于文本的末尾(或右侧),但是该模式匹配的子串不会出现在匹配的结果中,前向断言通常出现在正则表达式的右侧,表示文本的右侧必须满足特定的模式: (?= subexpression ) 使用前向肯定断言可以定一个模糊匹配,后缀必须包含特定的字符: \b\w+(?=\sis\b) 对正则表达式进行分析: \b:表示单词的边界 \w+:表示单词至少出现一次 (?=\sis\b):前向肯定断言,\s 表示一个空白字符, is 是普通字符,完全匹配,\b 是单词的边界。 从分析中,可以得出,匹配该正则表达式的文本中必须包含 is 单词,is是一个单独的单词,不是某一个单词的一个部分。举个例子 Sunday is a weekend day 匹配该正则,匹配的值是Sunday,而The island has beautiful birds 不匹配该正则。 2,后向肯定断言 后向肯定断言定义一个模式必须存在于文本的开始(或左侧),但是该模式匹配的子串不会出现在匹配的结果中,后向断言通常出现在正则表达式的左侧,表示文本的左侧必须满足特定的模式: (?<= subexpression ) 使用后向肯定断言可以定一个模糊匹配,前缀必须包含特定的字符: (?<=\b20)\d{2}\b 对正则表达式进行分析: (?<=\b20):后向断言,\b表示单词的开始,20是普通字符 \d{2}:表示两个数字,数字不要求相同 \b:单词的边界 该正则表达式匹配的文本具备的模式是:文本以20开头、以两个数字结尾。 推荐学习:C#.Net教程