Home >Backend Development >C++ >How to Remove Non-ASCII Characters from a String in C#?
Efficiently Removing Non-ASCII Characters from C# Strings
Data processing and validation frequently require removing non-ASCII characters from strings. This article demonstrates a concise C# method using regular expressions to accomplish this task.
Regular Expression Solution
The following code snippet uses a regular expression to remove all non-ASCII characters:
<code class="language-csharp">string s = "søme string"; s = Regex.Replace(s, @"[^\u0000-\u007F]+", string.Empty);</code>
Regex.Replace
substitutes all occurrences of the pattern with an empty string, effectively deleting non-ASCII characters. Let's analyze the pattern:
^
: The negation operator, inverting the match.u####-u####
: Specifies a Unicode code point range. u0000-u007F
defines the ASCII range.
: Matches one or more occurrences of the preceding character class.This pattern precisely targets and removes all characters outside the ASCII range.
Regex Explained
The regular expression can be further broken down:
[u0000-u007F]
: Matches one or more ASCII characters.[^...]
: The square brackets with a leading caret (^
) create a negated character class, selecting characters outside the specified range.string.Empty
: The replacement string; in this case, an empty string, removing the matched characters.This regular expression provides a clean and efficient way to filter a string, leaving only ASCII characters.
The above is the detailed content of How to Remove Non-ASCII Characters from a String in C#?. For more information, please follow other related articles on the PHP Chinese website!