Home >Java >javaTutorial >How to Handle BOM Markers in UTF-8 File Reading?

How to Handle BOM Markers in UTF-8 File Reading?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-11-26 10:59:12933browse

How to Handle BOM Markers in UTF-8 File Reading?

Handling BOM Markers in UTF-8 File Reading

When dealing with UTF-8 encoded text files that may contain a Byte Order Mark (BOM), it's crucial to handle the BOM correctly to avoid unexpected output. A BOM is a special sequence of bytes that indicates the byte order of the file. In UTF-8, the BOM is the three-byte sequence EF BB BF.

Consider the following code:

fr = new FileReader(file);
br = new BufferedReader(fr);
String tmp = null;
while ((tmp = br.readLine()) != null) {
    String text;    
    text = new String(tmp.getBytes(), "UTF-8");
    content += text + System.getProperty("line.separator");
}

In this code, the BOM will be included in the output string text because the getBytes() method retrieves the raw bytes of the string, including the BOM. To handle the BOM correctly, you can use the following techniques:

Method 1: Specify the BOM Character Set

You can specify the character set as UTF-8 with BOM using the Charset class:

Charset charset = Charset.forName("UTF-8");
String text = new String(tmp.getBytes(charset), charset);

Method 2: Read and Drop the BOM

If the BOM is not required, you can read and drop it before processing the rest of the file:

char[] bom = new char[3];
int bytesRead = br.read(bom, 0, bom.length);
if (bytesRead == bom.length && new String(bom).equals("\uFEFF")) {
    // BOM exists, drop it
}

By implementing one of these techniques, you can ensure that the BOM is handled correctly and that the output string does not include the BOM marker.

The above is the detailed content of How to Handle BOM Markers in UTF-8 File Reading?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn