Home >Backend Development >C++ >How to Encode and Decode Unicode Characters in Escaped ASCII?

How to Encode and Decode Unicode Characters in Escaped ASCII?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2025-01-28 05:01:09347browse

How to Encode and Decode Unicode Characters in Escaped ASCII?

Unicode and Escaped ASCII: Encoding and Decoding

Many programming tasks require handling Unicode characters. A common need is converting Unicode to its escaped ASCII equivalent, simplifying data storage and transmission. This involves replacing non-ASCII characters with their Unicode escape sequences (e.g., "uXXXX").

Encoding Unicode to Escaped ASCII:

This process can be achieved using a straightforward algorithm:

<code class="language-csharp">static string EncodeUnicodeToAscii(string input)
{
    StringBuilder result = new StringBuilder();
    foreach (char c in input)
    {
        if (c > 127) // Check for non-ASCII characters
        {
            result.Append("\u" + ((int)c).ToString("x4")); // Append escape sequence
        }
        else
        {
            result.Append(c); // Append ASCII characters directly
        }
    }
    return result.ToString();
}</code>

The function iterates through the Unicode string. Non-ASCII characters (those with values greater than 127) are converted to their hexadecimal escape sequences. ASCII characters remain unchanged.

Decoding Escaped ASCII to Unicode:

Decoding escaped ASCII back to Unicode involves a different strategy:

<code class="language-csharp">static string DecodeAsciiToUnicode(string input)
{
    return Regex.Replace(input, @"\u(?<value>[a-fA-F0-9]{4})", match =>
    {
        return ((char)int.Parse(match.Groups["value"].Value, NumberStyles.HexNumber)).ToString();
    });
}</code>

This function employs regular expressions to locate "uXXXX" sequences. It extracts the hexadecimal value, parses it as an integer, and converts it to its Unicode character equivalent. The output is the original Unicode string.

These methods provide efficient encoding and decoding between Unicode and escaped ASCII representations, ensuring proper handling of special characters in diverse applications.

The above is the detailed content of How to Encode and Decode Unicode Characters in Escaped ASCII?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn