Home >Java >javaTutorial >How to Avoid Outputting the BOM Marker When Reading a UTF-8 Encoded File?

How to Avoid Outputting the BOM Marker When Reading a UTF-8 Encoded File?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-11-16 22:43:03330browse

How to Avoid Outputting the BOM Marker When Reading a UTF-8 Encoded File?

Unicode BOM and FileReader

When reading a UTF-8 encoded file with a Byte Order Mark (BOM), you may encounter the issue of the BOM marker being outputted along with the file content. This occurs because Unicode defines a BOM to specify the endianness of the encoded text, which can be interpreted as a character sequence if not handled properly.

In your code snippet:

  • fr and br are used to read the file as bytes and convert them into characters.
  • tmp reads each line of the file as a byte array.
  • text converts the byte array into a UTF-8 encoded string.
  • content concatenates the lines of the file, including the BOM marker as it is part of the file's content.

To avoid the BOM marker from being included in the output:

  1. Read the file as a String, not as a byte array. This skips the need to convert bytes to characters, avoiding the BOM issue.
String content = new String(Files.readAllBytes(Paths.get(file)), "UTF-8"));
  1. If you must read the file as a byte array, you can manually remove the BOM marker before converting it to a string. The BOM marker is a three-byte sequence:
if (tmp.length >= 3 &&
    tmp[0] == (byte) 0xEF &&
    tmp[1] == (byte) 0xBB &&
    tmp[2] == (byte) 0xBF) {

    // Remove the BOM marker
    tmp = Arrays.copyOfRange(tmp, 3, tmp.length);
}

The above is the detailed content of How to Avoid Outputting the BOM Marker When Reading a UTF-8 Encoded File?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn