Home  >  Article  >  Backend Development  >  [C# Tutorial] C# Regular Expressions

[C# Tutorial] C# Regular Expressions

黄舟
黄舟Original
2016-12-24 13:37:521031browse

C# Regular Expression

Regular expression is a pattern that matches input text. The .Net framework provides a regular expression engine that allows such matching. A pattern consists of one or more characters, operators, and structures.

Define Regular Expressions

Listed below are the various categories of characters, operators, and structures used to define regular expressions.

Character escapes

Character classes

anchors

Group constructs

Qualifiers

Backreference constructs

Alternate constructs

Replacements

Miscellaneous constructs

Character escapes

in regular expressions The backslash character () indicates that the character following it is a special character or that the character should be interpreted literally.

The following table lists the escape characters:

Escape character

Description

Pattern

match

a Matches the alarm (bell) symbol u0007. a "Warning!" + "u0007" in 'u0007'

b In the character class, matches the backspace key u0008. [b]{3,} "bbbb" in "bbbb"

t matches the tab character u0009. (w+)t "Namet" and "Addrt" in "NametAddrt"

r match the carriage return character u000D. (r is not equivalent to the newline character n.) rn(w+) "rnHello" in "rHellonWorld."

v matches the vertical tab character u000B. [v]{2,} "vvv" in "vvv"

f matches the form feed character u000C. [f]{2,} "fff" in "fff"

n matches the newline character u000A. rn(w+) "rnHello" in "rHellonWorld." matches the escape character u001B. e "x001B" in "x001B"

nnn specifies a character using octal representation (nnn consists of two to three digits). w

p{ name } Matches any single character in the Unicode universal category or named block specified by name. The "C" and "L" in "City Lights"

P{ name } match any single character that is not in the Unicode universal category or named block specified by name. P{Lu} "i", "t" and "y" in "City"

w match any word character. w "R", "o", "m" and "1" in "Room#1"

W matches any non-word character. The "#" in W "Room#1"

s matches any whitespace character. The "D" in ws "ID A1.3"

S matches any non-whitespace character. The " _" in sS "int __ctr"

d matches any decimal digit. "4" in d "4 = IV"

D matches any character that is not a decimal number. D " ", "=", " ", "I" and "V" in "4 = IV"

anchors

anchors or atomic zero-width assertions will cause the match to succeed or fail, depending on the string the current position in the string, but they do not cause the engine to advance in the string or use characters.

The following table lists the anchor points:

Assertion

Description

Pattern

Match

^ The match must start at the beginning of a string or a line. ^d{3} "567" in "567-777-"

$ The match must occur at the end of the string or before n at the end of the line or string. -d{4}$ "-2012" in "8-12-2012"

A The match must appear at the beginning of the string. Aw{3} "Code" in "Code-007-"

Z The match must appear at the end of the string or before n at the end of the string. -d{3}Z "-007" in "Bond-901-007"

z The match must appear at the end of the string. -d{3}z "-333" in "-901-333"

G The match must appear at the end of the previous match. \G(d) "(1)(3)(5)[7](9)" The matches "(1)", "(3)" and "(5)"

b must appear in w( on the boundary between alphanumeric) and W (non-alphanumeric) characters. w "R", "o", "m" and "1" in "Room#1"

B matches must not occur on the b boundary. Bendw*b "ends" and "ender" in "end sends endure lender"

Group construction

Group construction describes a subexpression of a regular expression, usually used to capture substrings of the input string.

The following table lists grouping constructs:

Grouping constructs

Description

Pattern

Match

( subexpression ) Captures the matching subexpression and assigns it to a zero-based sequence number . (w)1 "ee" in "deep"

(?2dd749663259a7ede2528ec9cca95d77subexpression) Captures matching subexpressions into a named group. (?f5fbfbee748f450acbec5bd3114a2e8bw)kf5fbfbee748f450acbec5bd3114a2e8b "ee" in "deep"

(?30a878cd74d918ee999c6783be1410a8subexpression) defines the balancing group definition. (((?'Open'()[^()]*)+((?'Close-Open'))[^()]*)+)*(?(Open)(?!))$ "3 "((1-3)*(3-1))" in +2^((1-3)*(3-1))"

(?: subexpression) defines a non-capturing group. Write(?:Line)? "WriteLine" in "Console.WriteLine()"

(?imnsx-imnsx:subexpression) Applies or disables the options specified in subexpression. Ad{2}(?i:w+)b "A12xl" and "A12XL" in "A12xl A12XL a12xl"

(?= subexpression) Zero-width positive lookahead assertion. "is", "ran" and "out" in w+(?=.) "He is. The dog ran. The sun is out."

(?! subexpression) Zero-width negative lookahead assertion. b(?!un)w+b "sure" and "used" in "unsure sure unity used"

(?866208085812f1ea8568d0f953ea8a28 subexpression) Non-backtracking (also called "greedy") subexpression. [13579](?>A+B+) "1ABB", "3ABB" and "5AB" in "1ABB 3ABBC 5AB 5AC"

Qualifier

Qualifier specifies that the previous element must exist in the input string ( How many instances of a character, group, or character class are required for a match to occur. Qualifiers include the language elements listed in the following table.

The following table lists the qualifiers:

Qualifier

Description

Pattern

Match

* Matches the previous element zero or more times. d*.d ".0", "19.9", "219.9"

+ Matches the previous element one or more times. "be+" "bee" in "been", "be" in "bent"

? Matches the previous element zero or once. "rai?n" "ran", "rain"

{ n } Matches the previous element exactly n times. ",d{3}" ",043" in "1,043.6", ",876", ",543" and ",210" in "9,876,543,210"

{ n ,} Match the previous element at least n times. "d{2,}" "166", "29", "1930"

{ n , m } Match the previous element at least n times, but not more than m times. "19302" in "d{3,5}" "166", "17668", "193024"

*? Matches the previous element zero or more times, but as few times as possible. d*?.d ".0", "19.9", "219.9"

+? Match the previous element one or more times, but as few times as possible. "be+?" "be" in "been", "be" in "bent"

?? Matches the previous element zero or once, but as few times as possible. "rai??n" "ran", "rain"

{ n }? Matches the leading element exactly n times. ",d{3}?" ",043" in "1,043.6", ",876", ",543" and ",210" in "9,876,543,210"

{ n ,}? Matches the previous element at least n times, but as few times as possible. "d{2,}?" "166", "29" and "1930"

{ n , m }? Match the previous element the number of times between n and m, but as few times as possible. "d{3,5}?" "166", "17668", "193" and "024" in "193024"

Backreference construction

Backreference allows subsequent identification of previous ones in the same regular expression The matching subexpression.

The following table lists the backreference constructs:

Backreference construct

Description

Pattern

Match

number of backreferences. Matches the value of the number subexpression. (w)1 "ee" in "seek"

k2dd749663259a7ede2528ec9cca95d77 Named backreference. Matches the value of a named expression. (?371aa25074d3a749602b343134e651a8w)k371aa25074d3a749602b343134e651a8 "ee" in "seek"

Alternate construct

The alternate construct is used to modify the regular expression to enable either/or matching.

The following table lists the alternate constructs:

Alternate constructs

Description

Pattern

Match

| Matches any one element separated by a vertical bar (|) character. "the" and "this" in th(e|is|at) "this is the day."

(?( expression )yes | no ) Matches yes if the regular expression pattern is specified by expression matching; otherwise matches the optional no part. expression is interpreted as a zero-width assertion. (?(A)Ad{2}b|bd{3}b) "A10" and "910" in "A10 C103 910"

(?( name )yes | no ) If name is either named or numbered Matches yes if the capturing group has a match; otherwise matches optional no. (?81d16d9e90d2cdd965fc29116d92bbc8")?(?(quoted).+?"|S+s) "Dogs.jpg "Yiska playing.jpg"" and "Yiska playing.jpg" in "Dogs.jpg"

Replace

Replacement is a regular expression used in replacement patterns.

The following table lists the characters used for replacement:

Character

Description

Pattern

Replacement pattern

Input string

Result string

$number Replace by group number Matched substring. b(w+)(s)(w+)b $3$2$1 "one two" "two one"

${name} Replace the substring matched by the named group name. b(?bc50d2b8e0f6f0a16aa7662b92d4f696w+)(s)(?389baf92d469489f3e373ca0f6c49feaw+)b ${word2} ${word1} "one two" "two one"

$$ Replace the character "$". b(d+)s?USD $$$1 "103 USD" "$103"

$& Replaces the entire match with a copy. ($*(d*(.+d+)?){1}) **$& "$1.30" "**$1.30**"

$` Replace all text of the input string before matching. B+ $` "AABBCC" "AAAACC"

$' Replace all text of the matched input string. B+ $' "AABBCC" "AACCC"

$+ Replace the last captured group. B+(C+) $+ "AABBCCDD" AACCDD

$_ Replace the entire input string. B+ $_ "AABBCC" "AAAABBCCCC"

Miscellaneous constructs

The following table lists various miscellaneous constructs:

Construction

Description

Example

(?imnsx-imnsx) in the middle of the pattern Options such as case insensitivity can be set or disabled. bA(?i)bw+b matches "ABA" and "Able" in "ABA Able Act"

(?#comment) Inline comments. The comment terminates at the first closing bracket. bA(?#Matches words starting with A)w+b

# [to end of line] X mode comment. The comment begins with an unescaped # and continues to the end of the line. (?x)bAw+b#Matches words starting with A


Regex class

Regex class is used to represent a regular expression.

The following table lists some commonly used methods in the Regex class:

Serial number

Method & Description

1 public bool IsMatch( string input )
Indicates whether the regular expression specified in the Regex constructor is A match is found in the specified input string.

2 public bool IsMatch(string input, int startat)
Indicates whether the regular expression specified in the Regex constructor finds a match in the specified input string, starting at the specified start position in the string.

3 public static bool IsMatch(string input, string pattern)
Indicates whether the specified regular expression finds a match in the specified input string.

4 public MatchCollection Matches( string input )
Search for all matches of the regular expression in the specified input string.

5 public string Replace(string input, string replacement)
In the specified input string, replace all matching strings matching the regular expression pattern with the specified replacement string.

6 public string[] Split( string input )
Split the input string into an array of substrings, split according to the positions defined by the regular expression pattern specified in the Regex constructor.

For a complete list of properties of the Regex class, please refer to Microsoft's C# documentation.

Example 1

The following example matches words starting with 'S':

using System;
using System.Text.RegularExpressions;

namespace RegExApplication
{
   class Program
   {
      private static void showMatch(string text, string expr)
      {
         Console.WriteLine("The Expression: " + expr);
         MatchCollection mc = Regex.Matches(text, expr);
         foreach (Match m in mc)
         {
            Console.WriteLine(m);
         }
      }
      static void Main(string[] args)
      {
         string str = "A Thousand Splendid Suns";

         Console.WriteLine("Matching words that start with 'S': ");
         showMatch(str, @"\bS\S*");
         Console.ReadKey();
      }
   }
}

When the above code is compiled and executed, it produces the following results:

Matching words that start with 'S':
The Expression: \bS\S*
Splendid
Suns

实例 2

下面的实例匹配了以 'm' 开头以 'e' 结尾的单词:

using System;
using System.Text.RegularExpressions;

namespace RegExApplication
{
   class Program
   {
      private static void showMatch(string text, string expr)
      {
         Console.WriteLine("The Expression: " + expr);
         MatchCollection mc = Regex.Matches(text, expr);
         foreach (Match m in mc)
         {
            Console.WriteLine(m);
         }
      }
      static void Main(string[] args)
      {
         string str = "make maze and manage to measure it";

         Console.WriteLine("Matching words start with 'm' and ends with 'e':");
         showMatch(str, @"\bm\S*e\b");
         Console.ReadKey();
      }
   }
}

当上面的代码被编译和执行时,它会产生下列结果:

Matching words start with 'm' and ends with 'e':
The Expression: \bm\S*e\b
make
maze
manage
measure

实例 3

下面的实例替换掉多余的空格:

using System;
using System.Text.RegularExpressions;

namespace RegExApplication
{
   class Program
   {
      static void Main(string[] args)
      {
         string input = "Hello   World   ";
         string pattern = "\\s+";
         string replacement = " ";
         Regex rgx = new Regex(pattern);
         string result = rgx.Replace(input, replacement);

         Console.WriteLine("Original String: {0}", input);
         Console.WriteLine("Replacement String: {0}", result);    
         Console.ReadKey();
      }
   }
}

当上面的代码被编译和执行时,它会产生下列结果:

Original String: Hello   World   
Replacement String: Hello World

 以上就是【c#教程】C# 正则表达式的内容,更多相关内容请关注PHP中文网(www.php.cn)!


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn