Home >Backend Development >C++ >Why is `\d` Less Efficient Than `[0-9]` in C# Regex?

Why is `\d` Less Efficient Than `[0-9]` in C# Regex?

Barbara Streisand
Barbara StreisandOriginal
2025-01-31 18:41:08138browse

Why is `d` Less Efficient Than `[0-9]` in C# Regex?

C#regular expression

The efficiency is lower than : Inquiry performance differences d [0-9] The recent performance is relatively unexpectedly discovered. In the C#regular expression engine, the

character -class ratio

The efficiency of the specified character is low. Usually, the efficiency of the scope specified character should be higher than the character set. In addition, even compared with the more lengthy d character set, [0-9] also shows the disadvantage of performance. [0123456789] d In order to explain this surprising result, let us explore some possible reasons:

<.> 1. Unicode digital recognition:

represents all unicode numbers, not just 10 characters specified in . Unicode contains other numbers from various languages, such as Persian numerals (۱۲۳۴۵۶۷۸۹). This wider range may introduce additional calculation overhead, thereby reducing the efficiency of

.

d In order to explain this concept, we can use the following code to generate a list of all Unicode numbers: [0-9] d

This will generate a long string containing the following characters (and other characters):

٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯
<code class="language-csharp">StringBuilder sb = new StringBuilder();
for (UInt16 i = 0; i < 0x10FFFF; i++)
{
    if (char.IsDigit((char)i))
    {
        sb.Append((char)i);
    }
}
string allUnicodeDigits = sb.ToString();</code>
<.> 2. Implementation differences:

The regular expression engine may achieve

and

in different ways, which may lead to performance differences. C#regular expression engine may specifically optimize to improve efficiency, and the wider range of may limit such optimization.

Conclusion: d [0-9] [0-9] Although it was surprising at the beginning, d slower performance can be attributed to its expansion recognition of Unicode numbers. When using a limited number set (such as 0-9),

or

can provide better efficiency. However, if you need to match a wider range of numbers from different languages, is still a powerful tool.

The above is the detailed content of Why is `\d` Less Efficient Than `[0-9]` in C# Regex?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn