Home >Backend Development >C++ >How Can I Encode and Decode Unicode Characters in C# to Preserve Non-ASCII Characters?

How Can I Encode and Decode Unicode Characters in C# to Preserve Non-ASCII Characters?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2025-01-28 04:56:08848browse

How Can I Encode and Decode Unicode Characters in C# to Preserve Non-ASCII Characters?

Handling Non-ASCII Characters in C# Strings

Exchanging data containing non-ASCII characters, such as the Greek letter Pi (π), requires careful handling to prevent data loss or corruption. Standard C# encoding methods sometimes replace these characters with question marks. This article demonstrates custom methods for encoding and decoding Unicode characters to ensure accurate preservation.

Encoding Non-ASCII Characters

The EncodeNonAsciiCharacters function processes each character in a string. Characters beyond the ASCII range (above 127) are converted to their four-digit hexadecimal Unicode escape sequences (e.g., "uXXXX"). ASCII characters remain unchanged.

Decoding Escaped Unicode Characters

The DecodeEncodedNonAsciiCharacters function uses regular expressions to identify and replace Unicode escape sequences with their corresponding Unicode characters. It parses the hexadecimal part of the escape sequence, converts it to an integer, and casts it to a char.

Example and Results

Here's a C# code snippet illustrating the encoding and decoding process:

<code class="language-csharp">string unicodeString = "This string contains the Unicode character Pi(π)";
Console.WriteLine(unicodeString); // Original string

string encodedString = EncodeNonAsciiCharacters(unicodeString);
Console.WriteLine(encodedString); // Encoded string with escape sequences

string decodedString = DecodeEncodedNonAsciiCharacters(encodedString);
Console.WriteLine(decodedString); // Decoded string, matching the original</code>

The output will show the original string, the string with Unicode characters replaced by escape sequences, and finally, the correctly decoded string, demonstrating the successful round-trip conversion. This technique ensures reliable handling of non-ASCII characters in your C# applications.

The above is the detailed content of How Can I Encode and Decode Unicode Characters in C# to Preserve Non-ASCII Characters?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn