Home >Backend Development >C++ >Why is `\d` Less Efficient Than `[0-9]` for Digit Matching in C# Regex?

Why is `\d` Less Efficient Than `[0-9]` for Digit Matching in C# Regex?

Barbara Streisand
Barbara StreisandOriginal
2025-01-31 18:26:09888browse

Why is `d` Less Efficient Than `[0-9]` for Digit Matching in C# Regex?

C#regular expression

The reason for is lower than d [0-9]

The recent performance test shows that in the C#regular expression engine, the efficiency used to match the number of

is unexpectedly lower than the d and [0-9] character set. This discovery triggered two problems: [0123456789]

<.> 1. Why is the range

higher efficiency than setting ? [0-9] [0123456789] People may think that the scope is more efficient, because it covers a specific and narrow character range, and the collection clearly lists all ten numbers. However, the test results show that the performance differences between the two methods are the least.

<.> 2. Why

The efficiency of

is lower than

? d Test found that the efficiency of is significantly lower than [0-9]. This is particularly puzzling, because is usually considered as abbreviation of

. Further investigation reveals a fundamental difference:

d [0-9] matching unicode numbers, d matching ASCII numbers [0-9]

with only ASCII numbers (0-9), d match [0-9] all unicode numbers . This includes characters from other languages ​​and characters, such as Persian numerals (۱۲۳۴۵۶۷۸۹) and Dava Garri numerals (०१२३४५६७८९). To prove this, the following code generates a string containing all Unicode numbers:

[0-9] The generated string shows d various characters identified as numbers, including characters from Arabic, Thai, Khamno, and other characters. Performance impact

<code class="language-csharp">var sb = new StringBuilder();
for (UInt16 i = 0; i < 0x10FFFF; i++)
{
    if (char.IsDigit((char)i))
    {
        sb.Append((char)i);
    }
}
string unicodeDigits = sb.ToString();</code>

Check the wide range of character range (unicode numbers and ASCII numbers) explaining the performance differences between d and

. In the case of only ASCII numbers, use

will produce better performance than .

The above is the detailed content of Why is `\d` Less Efficient Than `[0-9]` for Digit Matching in C# Regex?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn