C# Tutoriallogin
C# Tutorial
author:php.cn  update time:2022-04-11 14:06:23

C# regular expressions



Regular expression is a pattern that matches input text. The .Net framework provides a regular expression engine that allows such matching. A pattern consists of one or more characters, operators, and structures.

Define regular expressions

Listed below are the various categories of characters, operators, and structures used to define regular expressions.

  • Character escape

  • Character class

  • Anchor point

  • Group construction

  • Qualifier

  • Backreference construction

  • Alternate Construction

  • Replacement

  • Miscellaneous construction

Character escape

Regular The backslash character (\) in an expression indicates that the character that follows it is a special character or that the character should be interpreted literally.

The following table lists the escape characters:

matches the form feed character \u000C. [\f]{2,}"\f\f\f" in "\f\f\f"##\n\e\ nnn\x nn \c X \c x\u nnnn\

Character class

Character class matches any one character in a set of characters.

The following table lists the character classes:

Escape characterDescriptionPatternMatch
\a matches the alarm (bell) symbol \u0007. \a"Warning!" + "\u0007" in '\u0007'
\b In character classes, matches the backspace key\u0008. [\b]{3,}"\b\b\b\b" in "\b\b\b\b"
\t matches the tab character \u0009. (\w+)\t"Name\t" and "Addr\t" in "Name\tAddr\t"
\r matches the carriage return character \u000D. (\r is not equivalent to the newline character \n.) \r\n(\w+)"\r\nHello in "\r\Hello\nWorld." "
\v matches the vertical tab character \u000B. [\v]{2,}"\v\v\v" in "\v\v\v"
##\f
matches the newline character \u000A. \r\n(\w+)"\r\nHello" in "\r\Hello\nWorld."
matches the escape character \u001B. \e"\x001B" in "\x001B"
Use The octal representation specifies a character (nnn consists of two to three digits). \w\040\w"a b" and "c d" in "a bc d"
Specify characters using hexadecimal representation (nn consists of exactly two digits). \w\x20\w"a b" and "c d" in "a bc d"
Matches the ASCII control character specified by X or x, where X or x is the letter of the control character. \cC"\x0003" in "\x0003" (Ctrl-C)
Matches a Unicode character using the hexadecimal representation (four digits represented by nnnn). \w\u0020\w"a b" and "c d" in "a bc d"
matches this character when followed by an unrecognized escape character. \d+[\+-x\*]\d+\d+[\+-x\*\d+"2 in "(2+2) * 3*9" +2" and "3*9"
Matches any whitespace character. \w\s"D" in "ID A1.3" matches any non-whitespace character. \s\S" _" in "int __ctr" Matches any decimal number. \d"4" in "4 = IV"matches not Any character of a decimal number. \D" ", "=", " ", "I" and "V" in "4 = IV"

Anchors

Anchor points or atomic zero-width assertions cause a match to succeed or fail, depending on the current position in the string, but they do not cause the engine to advance in the string or use characters.

The following table lists the anchor points:

Character ClassDescriptionModeMatch
[character_group]Match any single character in character_group. By default, matching is case-sensitive. [mn]The "m" in "mat", the "m" and "n" in "moon"
[^character_group]Not: Matches any single character not in character_group. By default, characters in character_group are case-sensitive. [^aei]The "v" and "l" in "avail"
[ first - last ]Character range: Matches any single character in the range from first to last. (\w+)\t"Name\t" and "Addr\t" in "Name\tAddr\t"
.Wildcard: Matches any single character except \n.
To match the original period character (. or \u002E), you must precede the character with an escape character (\.).
a.eThe "ave" in "have", the "ate" in "mate"
\p{ name } matches any single character in the Unicode universal category or named block specified by name. \p{Lu}The "C" and "L" in "City Lights"
\P{ name } matches any single character that is not in the Unicode universal category or named block specified by name. \P{Lu}"i", "t" and "y" in "City"
\w matches any word character. \w"R", "o", "m" and "1" in "Room#1"
\W matches any non-word character. \W"#" in "Room#1"
##\s
\S
\d
\D
The match must occur at the end of the string or before -\d{3}\Z"-007" in "Bond-901-007"The match must occur at the end of the string. -\d{3}\z"-333" in "-901-333"The match must appear where the previous match ends. \\G\(\d\)"(1)", "(3) in "(1)(3)(5)[7](9)" " and "(5)" matches must appear between \B\b

Group construction

The grouping construction describes a subexpression of a regular expression, usually used to capture substrings of the input string.

The following table lists the grouping constructs:

AssertionDescriptionPattern Match
^The match must start at the beginning of the string or line. ^\d{3}"567" in "567-777-"
$The match must occur at the end of the string or before \n at the end of the line or string. -\d{4}$"-2012" in "8-12-2012"
\AThe match must appear at the beginning of the string. \A\w{4}"Code" in "Code-007-"
##\Z \n at the end of the string.
\z
\G
\b\w (alphanumeric) and # On the boundaries between ##\W (non-alphanumeric) characters. \w"R", "o", "m" and "1" in "Room#1"
Matches must not occur on boundaries. \Bend\w*\b"ends" and "ender" in "end sends endure lender"
Capture matching subexpressions into a named group. (?< double>\w)\k< double>"ee" in "deep"Define the balancing group definition. (((?'Open'\()[^\(\)]*)+((?'Close-Open'\))[^\(\)]*)+)*( ?(Open)(?!))$"((1-3)*(3-1)" in "3+2^((1-3)*(3-1))" )"Define a non-capturing group. Write(?:Line)?"WriteLine" in "Console.WriteLine()"Applies or disables the options specified in A\d{2}(?i:\w+)\b"A12xl" and "A12XL" in "A12xl A12XL a12xl"Zero-width positive lookahead assertion. \w+(?=\.)"is", "ran" and "out" in "He is. The dog ran. The sun is out."Zero-width negative lookahead assertion. \b(?!un)\w+\b"sure" and "used" in "unsure sure unity used"Zero-width assertion is issued after positive review. (?<=19)\d{2}\b"99", "50" and "05" in "1851 1999 1950 1905 2003" Zero-width negative lookback asserts. (?< !19)\d{2}\b"ends" and "ender" in "end sends endure lender"Non-backtracking (also called "greedy") subexpression. [13579](?>A+B+)"1ABB", "3ABB" and "5AB" in "1ABB 3ABBC 5AB 5AC"

Instance

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "1851 1999 1950 1905 2003";
      string pattern = @"(?<=19)\d{2}\b";

      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Value);
   }
}

Run Instance»

Click the "Run Instance" button View online examples

Qualifier

Qualifier specifies how many instances of the previous element (which can be a character, group, or character class) must be present in the input string for a match to occur item. Qualifiers include the language elements listed in the following table.

The following table lists the qualifiers:

Group ConstructionDescriptionPatternMatch
(subexpression)Captures the matching subexpression and assigns it to a zero-based sequence number. (\w)\1"ee" in "deep"
##(?< name >subexpression)
( ?< name1 -name2 >subexpression)
(?: subexpression)
(?imnsx- imnsx:subexpression)subexpression.
(?= subexpression)
(?! subexpression)
(?< =subexpression)
(?< ! subexpression)
(?> subexpression)
Matches the previous element zero or one time. "rai?n""ran"、 "rain"##{ n }{ n ,}{ n , m }*?+??? ##{ n }?",\d{3}?"",043" in "1,043.6", ",876", ",543" and ",210" in "9,876,543,210" { n ,}?"\d{2,}?""166", "29" and "1930"{ n , m }?"\d{3,5}?""166", "17668", "193" and "024" in "193024"

Backreference Construction

Backreference allows subsequent identification of previously matched subexpressions within the same regular expression.

The following table lists the backreference constructs:

QualifierDescriptionPatternMatch
* Matches the previous element zero or more times. \d*\.\d".0"、"19.9"、"219.9"
+ Matches the previous element one or more times. "be+""bee" in "been", "be" in "bent"
##?
Match the previous element exactly n times. ",\d{3}"",043" in "1,043.6", ",876", ",543" and ",210" in "9,876,543,210"
Matches the previous element at least n times. "\d{2,}""166"、"29"、"1930"
Match the previous element at least n times, but not more than m times. "\d{3,5}""166", "17668", "19302"
in "193024" matches the previous element zero or more times, but as few times as possible. \d*?\.\d".0"、"19.9"、"219.9"
Match the previous element one or more times, but as few times as possible. "be+?"The "be" in "been", the "be" in "bent"
Matches the previous element zero or once, but as few times as possible. "rai??n""ran"、 "rain"
Matches the leading element exactly n times.
Match the previous element at least n times, but as few times as possible.
The number of times the previous element is matched is between n and m, but the number of times is as small as possible.
Named backreference. Matches the value of a named expression. (?< char>\w)\k< char>"ee" in "seek"##Alternate Constructs
Backreference constructDescriptionPatternMatches
\ numberbackreference. Matches the value of the number subexpression. (\w)\1"ee" in "seek"
##\k< name >

Alternate constructs are used to modify regular expressions to enable either/or matching.

The following table lists the alternate constructs:

##Alternate constructsDescriptionModeMatch |th(e|is|at)"the" and "this" in "this is the day. "(?( expression )yes | no ); otherwise it matches the optional "A10" and "910" in "A10 C103 910" (?( name )yes | no ); otherwise matches optional Dogs in "Dogs.jpg "Yiska playing.jpg"" .jpg and "Yiska playing.jpg"

Replacement

Replacement is the regular expression used in the replacement pattern.

The following table lists the characters used for substitution:

Matches any element separated by a vertical bar (|) character.
If the regular expression pattern is specified by expression matching, it matches yesno part. expression is interpreted as a zero-width assertion. (?(A)A\d{2}\b|\b\d{3}\b)
Matches if name or a named or numbered capturing group has a matchyesno. (?< quoted>")?(?(quoted).+?"|\S+\s)
CharacterDescriptionPattern Replacement patternInput stringResult string
$numberReplacement Substring to match by group number. \b(\w+)(\s)(\w+)\b$3$2$1"one two""two one"
${name}Replace matching substring by named group name . \b(?< word1>\w+)(\s)(?< word2>\w+)\b${word2} ${word1}"one two""two one"
$$Replace the character "$". \b(\d+)\s?USD$$$1"103 USD""$103"
$&Replaces the entire match with a copy. (\$*(\d*(\.+\d+)?){1})**$&"$1.30""**$1.30**"
$`Replace all text of the input string before matching. B+$`"AABBCC""AAAACC"
$' Replace all text of the matched input string. B+$'"AABBCC""AACCCC"
$+ Replace the last captured group. B+(C+)$+"AABBCCDD"AACCDD
$ _Replaces the entire input string. B+$_"AABBCC""AAAABBCCCC"

Miscellaneous construction

The following table lists various miscellaneous constructs:

ConstructionDescriptionExample
(?imnsx-imnsx) Set or disable options such as case insensitivity in the middle of the mode. \bA(?i)b\w+\b Matches "ABA" and "Able" in "ABA Able Act"
(?# Comments)Inline comments. The comment terminates at the first closing bracket. \bA(?#match words starting with A)\w+\b
[End of line] The comment begins with an unescaped # and continues to the end of the line. (?x)\bA\w+\b# Match words starting with A

Regex class

The Regex class is used to represent a regular expression.

The following table lists some commonly used methods in the Regex class:

Serial numberMethod & Description
1public bool IsMatch( string input )
Indicates whether the regular expression specified in the Regex constructor found a match in the specified input string.
2public bool IsMatch( string input, int startat )
Indicates whether the regular expression specified in the Regex constructor finds a match in the specified input string, starting at the specified starting position in the string.
3public static bool IsMatch( string input, string pattern )
Indicates whether the specified regular expression found a match in the specified input string.
4public MatchCollection Matches( string input )
Search for all occurrences of a regular expression in the specified input string.
5public string Replace( string input, string replacement )
In the specified input string, replace all matching strings matching the regular expression pattern with the specified replacement string.
6public string[] Split( string input )
Split the input string into an array of substrings based on the positions defined by the regular expression pattern specified in the Regex constructor.

For a complete list of properties of the Regex class, see Microsoft's C# documentation.

Example 1

The following example matches words starting with 'S':

using System;
using System.Text.RegularExpressions;

namespace RegExApplication
{
   class Program
   {
      private static void showMatch(string text, string expr)
      {
         Console.WriteLine("The Expression: " + expr);
         MatchCollection mc = Regex.Matches(text, expr);
         foreach (Match m in mc)
         {
            Console.WriteLine(m);
         }
      }
      static void Main(string[] args)
      {
         string str = "A Thousand Splendid Suns";

         Console.WriteLine("Matching words that start with 'S': ");
         showMatch(str, @"\bS\S*");
         Console.ReadKey();
      }
   }
}

When the above code is compiled and executed, it produces the following results:

Matching words that start with 'S':
The Expression: \bS\S*
Splendid
Suns

Example 2

The following example matches words starting with 'm' and ending with 'e':

using System;
using System.Text.RegularExpressions;

namespace RegExApplication
{
   class Program
   {
      private static void showMatch(string text, string expr)
      {
         Console.WriteLine("The Expression: " + expr);
         MatchCollection mc = Regex.Matches(text, expr);
         foreach (Match m in mc)
         {
            Console.WriteLine(m);
         }
      }
      static void Main(string[] args)
      {
         string str = "make maze and manage to measure it";

         Console.WriteLine("Matching words start with 'm' and ends with 'e':");
         showMatch(str, @"\bm\S*e\b");
         Console.ReadKey();
      }
   }
}

When the above code is compiled and executed, it Will produce the following results:

Matching words start with 'm' and ends with 'e':
The Expression: \bm\S*e\b
make
maze
manage
measure

Example 3

The following example replaces the extra spaces:

using System;
using System.Text.RegularExpressions;

namespace RegExApplication
{
   class Program
   {
      static void Main(string[] args)
      {
         string input = "Hello   World   ";
         string pattern = "\s+";
         string replacement = " ";
         Regex rgx = new Regex(pattern);
         string result = rgx.Replace(input, replacement);

         Console.WriteLine("Original String: {0}", input);
         Console.WriteLine("Replacement String: {0}", result);    
         Console.ReadKey();
      }
   }
}

When the above code is compiled and executed, it will produce the following Result:

Original String: Hello   World   
Replacement String: Hello World

php.cn