Introduction to regular expressions in C#-C#.Net Tutorial-php.cn

Home

Backend Development

C#.Net Tutorial

Introduction to regular expressions in C#

青灯夜游

Nov 23, 2019 pm 05:15 PM

c#regular expression

This article organizes the metacharacters of C# regular expressions. Regular expressions are expressions composed of characters. Each character represents a rule. The characters in the expression are divided into two types: ordinary characters and metacharacters. Ordinary characters refer to characters whose literal meaning remains unchanged and match text in an exact match manner, while metacharacters have special meanings and represent a type of character.

Introduction to regular expressions in C#

Treat text as a stream of characters, with each character placed in a position. For example, the regular expression "Room\d\d\d", the first four The character Room is an ordinary character, and the following character \ is an escape character. It forms a metacharacter \d with the following character d, which means there is any number at that position.

Described in the language of regular expressions: the regular expression "Room\d\d\d" captures a total of 7 characters, which means "starting with Room and ending with A type of string ending with three digits. We call this type of string a pattern, also called a regular pattern.

1. Escape characters

The escape character is \, which escapes ordinary characters into metacharacters with special meanings. Commonly used The escape characters are:

\t: horizontal tab character
\v: vertical tab character
\r: Carriage return
\n: Line feed
\\: Represents the character \, that is Say, escape the escape character \ into the ordinary character \
\": represents the character". In C#, double quotes are used to define strings. The double quotes contained in the string Quotes use \" to represent

. Second, the character class

treats the input text as having In a sequential character stream, character class metacharacters match characters and capture characters. The so-called capture characters mean that characters captured by one metacharacter will not be matched by other metacharacters, and subsequent metacharacters can only be matched from the remaining metacharacters. Match again in the text.

Commonly used character class metacharacters:

[char_group]: Match any character in the character group
[^char_group]: Match any character except character group
[first-last]: Match the character range from first to last Any character, the character range includes first and last.
.: Wildcard, matches any character except \n
\w: Match any word character, word characters usually refer to A-Z, a-z and 0-9
\W: Match any non-word character, except A-Z Characters other than , a-z and 0-9
\s: Matches any whitespace character
\S: Matches any non-whitespace character Characters
\d: Matches any numeric character
\D: Matches any non-numeric character

Note , escape characters also belong to character class metacharacters, and characters will also be captured when performing regular matching.

Three, locator

The object of locator matching (or capture) is position. It determines whether the pattern matching is successful based on the position of the character. The locator does not capture characters and is zero-width (width is 0). Commonly used locators are:

^: By default, matches the beginning of the string; in multi-line mode, matches the beginning of each line;
$: By default, match the end position of the string, or the position before the \n at the end of the string; in multi-line mode, match the position before the end of each line, or the position before the \n at the end of each line.
\A: matches the beginning position of the string;
\Z: matches the end position of the string, or \n at the end of the string Previous position;
\z: Match the end position of the string;
\G: Match the end position of the previous match;
\b: Matches the beginning or end of a word;
\B: Matches the middle position of a word;

4. Quantifiers, greed and laziness

Quantifiers refer to limiting the number of occurrences of a previous regular pattern. Quantifiers are divided into There are two modes: greedy mode and lazy mode. Greedy mode means matching as many characters as possible, while lazy mode means matching as few characters as possible. By default, quantifiers are in greedy mode. Add ? after the quantifier to enable lazy mode.

*: Occurs 0 or more times
: Occurs 1 or more times
?: Appears 0 or 1 times
{n}: Appears n times
{n,}: Appears at least n times
{n,m}: appears n to m times

Note that multiple occurrences means that the preceding metacharacter appears multiple times, for example, \d {2} is equivalent to \d\d, except that two numbers appear, and the two numbers are not required to be the same. To represent the same two numbers, grouping must be used.

5. Grouping and capturing characters

() Parentheses not only determine the scope of the expression, but also To create a group, the expression within () is a group, and the reference group means that the text matched by the two groups is exactly the same. The basic syntax for defining a grouping:

(pattern)

This type of grouping will capture characters. The so-called capturing characters refer to: a element Characters captured by characters will not be matched by other metacharacters, and subsequent metacharacters can only be rematched from the remaining text.

1, group number and naming

By default, each group is automatically assigned a group number. The rules are: from left to right, according to the left bracket of the group They are numbered in order of appearance. The first group has a group number of 1, the second group has a group number of 2, and so on. You can also specify a name for the group. This group is called a named group. The named group will also be automatically numbered. The number starts from 1 and increases by 1 one by one. The syntax for specifying a name for the group is:

(? <em>name</em> <code>> pattern)

## Generally speaking, Groups are divided into named groups and numbered groups. The ways to reference a group are:

Reference a group by group number:\ number

Note that groups can only be referenced backward, that is, starting from the left side of the regular expression text, the group must be defined before it can be referenced later.

The syntax for referencing groups in regular expressions is "\number". For example, "\1" represents the substring matching group 1, and "\2" represents the string matching group 2. In this way analogy.

For example, ".*?\1>" can match

valid

, when referencing the group, the text corresponding to the group are exactly the same.

2, grouping constructor

The grouping construction method is as follows:

(pattern): Capture matching subexpression, and assign a group number to the group
(? pattern): Capture the matching subexpression into a named group
(?:pattern): Non-capture grouping, a group number is not assigned to the group
(?> pattern): Greedy grouping

3, greedy grouping

Greedy grouping is also called non-backtracking grouping. This grouping disables backtracking. The regular expression engine will match as many characters in the input text as possible. character. If no further matches can be made, there is no backtracking to try additional pattern matches.

(?> pattern )

4, choose one of the two

| means Is or, matches either one of the two. Note that | divides the expressions on the left and right into two parts.

pattern1 ｜ pattern2

Six, zero-width assertion

Zero-width means that the width is 0, and the matching is position, so the matched substring will not Appears in the matching result, and assertion refers to the result of judgment. Only when the assertion is true can the match be considered successful.

For locators, you can match the beginning and end of a sentence (^ $) or the beginning and end of a word (\b). These metacharacters only match one position, specifying that this position satisfies Certain conditions, rather than matching certain characters, are therefore called
Zero-width assertions. The so-called zero-width means that they do not match any characters, but match a position; the so-called assertion refers to a judgment, and the regular expression will continue to match only when the assertion is true. Zero-width assertions can match an exact position, rather than simply specifying a sentence or word.

Regular expressions treat text as a flow of characters from left to right. To the right is called backward (Look behind), and to the left is called forward (Look ahead). For regular expressions, only when the specified pattern (Pattern) is matched, the assertion is True, which is called a positive expression, and the unmatched pattern is True, which is called a negative expression.

According to the direction of matching and the qualitative nature of matching, zero-width assertions are divided into four types:

(?= pattern)：前向、肯定断言
(?! pattern)：前向、否定断言
(? pattern<code>)：后向、肯定断言
(? pattern<code>)：后向、否定断言

1，前向肯定断言

前向肯定断言定义一个模式必须存在于文本的末尾（或右侧），但是该模式匹配的子串不会出现在匹配的结果中，前向断言通常出现在正则表达式的右侧，表示文本的右侧必须满足特定的模式：

(?= subexpression )

使用前向肯定断言可以定一个模糊匹配，后缀必须包含特定的字符：

\b\w+(?=\sis\b)

对正则表达式进行分析：

\b：表示单词的边界
\w+：表示单词至少出现一次
(?=\sis\b)：前向肯定断言，\s 表示一个空白字符， is 是普通字符，完全匹配，\b 是单词的边界。

从分析中，可以得出，匹配该正则表达式的文本中必须包含 is 单词，is是一个单独的单词，不是某一个单词的一个部分。举个例子

Sunday is a weekend day 匹配该正则，匹配的值是Sunday，而The island has beautiful birds 不匹配该正则。

2，后向肯定断言

后向肯定断言定义一个模式必须存在于文本的开始（或左侧），但是该模式匹配的子串不会出现在匹配的结果中，后向断言通常出现在正则表达式的左侧，表示文本的左侧必须满足特定的模式：

(?<p>使用后向肯定断言可以定一个模糊匹配，前缀必须包含特定的字符：</p><pre class="brush:php;toolbar:false">(?<p>对正则表达式进行分析：</p>

(?：后向断言，\b表示单词的开始，20是普通字符
\d{2}：表示两个数字，数字不要求相同
\b：单词的边界

该正则表达式匹配的文本具备的模式是：文本以20开头、以两个数字结尾。

推荐学习：C#.Net教程

The above is the detailed content of Introduction to regular expressions in C#. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:cnblogs. If there is any infringement, please contact admin@php.cn delete

Mastering C# .NET Design Patterns: From Singleton to Dependency InjectionMay 09, 2025 am 12:15 AM

Design patterns in C#.NET include Singleton patterns and dependency injection. 1.Singleton mode ensures that there is only one instance of the class, which is suitable for scenarios where global access points are required, but attention should be paid to thread safety and abuse issues. 2. Dependency injection improves code flexibility and testability by injecting dependencies. It is often used for constructor injection, but it is necessary to avoid excessive use to increase complexity.

C# .NET in the Modern World: Applications and IndustriesMay 08, 2025 am 12:08 AM

C#.NET is widely used in the modern world in the fields of game development, financial services, the Internet of Things and cloud computing. 1) In game development, use C# to program through the Unity engine. 2) In the field of financial services, C#.NET is used to develop high-performance trading systems and data analysis tools. 3) In terms of IoT and cloud computing, C#.NET provides support through Azure services to develop device control logic and data processing.

C# .NET Framework vs. .NET Core/5/6: What's the Difference?May 07, 2025 am 12:06 AM

.NETFrameworkisWindows-centric,while.NETCore/5/6supportscross-platformdevelopment.1).NETFramework,since2002,isidealforWindowsapplicationsbutlimitedincross-platformcapabilities.2).NETCore,from2016,anditsevolutions(.NET5/6)offerbetterperformance,cross-

The Community of C# .NET Developers: Resources and SupportMay 06, 2025 am 12:11 AM

The C#.NET developer community provides rich resources and support, including: 1. Microsoft's official documents, 2. Community forums such as StackOverflow and Reddit, and 3. Open source projects on GitHub. These resources help developers improve their programming skills from basic learning to advanced applications.

The C# .NET Advantage: Features, Benefits, and Use CasesMay 05, 2025 am 12:01 AM

The advantages of C#.NET include: 1) Language features, such as asynchronous programming simplifies development; 2) Performance and reliability, improving efficiency through JIT compilation and garbage collection mechanisms; 3) Cross-platform support, .NETCore expands application scenarios; 4) A wide range of practical applications, with outstanding performance from the Web to desktop and game development.

Is C# Always Associated with .NET? Exploring AlternativesMay 04, 2025 am 12:06 AM

C# is not always tied to .NET. 1) C# can run in the Mono runtime environment and is suitable for Linux and macOS. 2) In the Unity game engine, C# is used for scripting and does not rely on the .NET framework. 3) C# can also be used for embedded system development, such as .NETMicroFramework.

The .NET Ecosystem: C#'s Role and BeyondMay 03, 2025 am 12:04 AM

C# plays a core role in the .NET ecosystem and is the preferred language for developers. 1) C# provides efficient and easy-to-use programming methods, combining the advantages of C, C and Java. 2) Execute through .NET runtime (CLR) to ensure efficient cross-platform operation. 3) C# supports basic to advanced usage, such as LINQ and asynchronous programming. 4) Optimization and best practices include using StringBuilder and asynchronous programming to improve performance and maintainability.

C# as a .NET Language: The Foundation of the EcosystemMay 02, 2025 am 12:01 AM

C# is a programming language released by Microsoft in 2000, aiming to combine the power of C and the simplicity of Java. 1.C# is a type-safe, object-oriented programming language that supports encapsulation, inheritance and polymorphism. 2. The compilation process of C# converts the code into an intermediate language (IL), and then compiles it into machine code execution in the .NET runtime environment (CLR). 3. The basic usage of C# includes variable declarations, control flows and function definitions, while advanced usages cover asynchronous programming, LINQ and delegates, etc. 4. Common errors include type mismatch and null reference exceptions, which can be debugged through debugger, exception handling and logging. 5. Performance optimization suggestions include the use of LINQ, asynchronous programming, and improving code readability.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

4 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

4 weeks agoByDDD

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks agoByDDD

Hot Tools

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Dreamweaver Mac version

Visual web development tools

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.