Home >Backend Development >C++ >Why is `\d` Less Efficient Than `[0-9]` for Digit Matching in C# Regex?
C#regular expression
The reason for is lower than d
[0-9]
is unexpectedly lower than the d
and [0-9]
character set. This discovery triggered two problems: [0123456789]
higher efficiency than setting ? [0-9]
[0123456789]
People may think that the scope is more efficient, because it covers a specific and narrow character range, and the collection clearly lists all ten numbers. However, the test results show that the performance differences between the two methods are the least.
The efficiency of
is lower than?
d
Test found that the efficiency of is significantly lower than [0-9]
. This is particularly puzzling, because is usually considered as abbreviation of
d
[0-9]
matching unicode numbers, d
matching ASCII numbers [0-9]
with only ASCII numbers (0-9), d
match [0-9]
all unicode numbers . This includes characters from other languages and characters, such as Persian numerals (۱۲۳۴۵۶۷۸۹) and Dava Garri numerals (०१२३४५६७८९). To prove this, the following code generates a string containing all Unicode numbers:
[0-9]
The generated string shows d
various characters identified as numbers, including characters from Arabic, Thai, Khamno, and other characters.
Performance impact
<code class="language-csharp">var sb = new StringBuilder(); for (UInt16 i = 0; i < 0x10FFFF; i++) { if (char.IsDigit((char)i)) { sb.Append((char)i); } } string unicodeDigits = sb.ToString();</code>
Check the wide range of character range (unicode numbers and ASCII numbers) explaining the performance differences between d
and
will produce better performance than .
The above is the detailed content of Why is `\d` Less Efficient Than `[0-9]` for Digit Matching in C# Regex?. For more information, please follow other related articles on the PHP Chinese website!