Home >Backend Development >C++ >How Can I Programmatically Determine a Text File's Encoding with Precision?

How Can I Programmatically Determine a Text File's Encoding with Precision?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2025-01-17 01:51:09350browse

How Can I Programmatically Determine a Text File's Encoding with Precision?

Accurately identify text file encoding

Determining the encoding of a text file can be tricky, especially if you are unfamiliar with encoding concepts. This article will introduce a reliable method with accuracy comparable to Notepad.

The role of Byte Order Mark (BOM)

Byte Order Mark (BOM) A sequence of bytes located at the beginning of a text file that indicates how the file is encoded. The details are as follows:

  • UTF-7: 2b 2f 76
  • UTF-8: ef bb bf
  • UTF-32 (LE): ff fe 00 00
  • UTF-16 (LE): ff fe
  • UTF-16 (BE): fe ff
  • UTF-32 (BE): 00 00 fe ff

Code Example

Translate the above knowledge into code:

<code class="language-csharp">/// <summary>
/// 通过分析字节顺序标记 (BOM) 来确定文本文件的编码方式。
/// 如果无法检测文本文件的字节序,则默认为 ASCII。
/// </summary>
/// <param name="filename">要分析的文本文件。</param>
/// <returns>检测到的编码。</returns>
public static Encoding GetEncoding(string filename)
{
    // 读取 BOM
    var bom = new byte[4];
    using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read))
    {
        file.Read(bom, 0, 4);
    }

    // 分析 BOM
    if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7;
    if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8;
    if (bom[0] == 0xff && bom[1] == 0xfe && bom[2] == 0 && bom[3] == 0) return Encoding.UTF32; //UTF-32LE
    if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE
    if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE
    if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return new UTF32Encoding(true, true);  //UTF-32BE

    // 如果编码检测失败,则默认为 ASCII
    return Encoding.ASCII;
}</code>

With these tools, you can now determine the encoding of any text file with the confidence of a pro.

The above is the detailed content of How Can I Programmatically Determine a Text File's Encoding with Precision?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn