在 Java 中讀取 CSV 檔案時如何處理字節順序標記 (BOM)？-java教程-PHP中文網

首頁

Java

java教程

在 Java 中讀取 CSV 檔案時如何處理字節順序標記 (BOM)？

Patricia Arquette

Dec 27, 2024 am 09:57 AM

How to Handle Byte Order Marks (BOMs) When Reading CSV Files in Java?

位元組順序標記導致Java 中的CSV 檔案讀取出現問題

位元組順序標記(BOM) 可能出現在某些CSV 的開頭文件，但不是全部。如果存在，BOM 會與檔案的第一行一起讀取，從而在比較字串時造成問題。

以下是解決此問題的方法：

解決方案：

實作一個包裝類別UnicodeBOMInputStream，用於偵測輸入開頭是否存在Unicode BOM溪流。如果偵測到 BOM，可以使用skipBOM() 方法將其刪除。

以下是UnicodeBOMInputStream 類別的範例：

import java.io.IOException;
import java.io.InputStream;
import java.io.PushbackInputStream;

public class UnicodeBOMInputStream extends InputStream {

    private PushbackInputStream in;
    private BOM bom;
    private boolean skipped = false;

    public UnicodeBOMInputStream(InputStream inputStream) throws IOException {
        if (inputStream == null)
            throw new NullPointerException("Invalid input stream: null is not allowed");

        in = new PushbackInputStream(inputStream, 4);

        byte[] bom = new byte[4];
        int read = in.read(bom);

        switch (read) {
            case 4:
                if ((bom[0] == (byte) 0xFF) &&
                        (bom[1] == (byte) 0xFE) &&
                        (bom[2] == (byte) 0x00) &&
                        (bom[3] == (byte) 0x00)) {
                    this.bom = BOM.UTF_32_LE;
                    break;
                } else if ((bom[0] == (byte) 0x00) &&
                        (bom[1] == (byte) 0x00) &&
                        (bom[2] == (byte) 0xFE) &&
                        (bom[3] == (byte) 0xFF)) {
                    this.bom = BOM.UTF_32_BE;
                    break;
                }
            case 3:
                if ((bom[0] == (byte) 0xEF) &&
                        (bom[1] == (byte) 0xBB) &&
                        (bom[2] == (byte) 0xBF)) {
                    this.bom = BOM.UTF_8;
                    break;
                }
            case 2:
                if ((bom[0] == (byte) 0xFF) &&
                        (bom[1] == (byte) 0xFE)) {
                    this.bom = BOM.UTF_16_LE;
                    break;
                } else if ((bom[0] == (byte) 0xFE) &&
                        (bom[1] == (byte) 0xFF)) {
                    this.bom = BOM.UTF_16_BE;
                    break;
                }
            default:
                this.bom = BOM.NONE;
                break;
        }

        if (read > 0)
            in.unread(bom, 0, read);
    }

    public BOM getBOM() {
        return bom;
    }

    public UnicodeBOMInputStream skipBOM() throws IOException {
        if (!skipped) {
            in.skip(bom.bytes.length);
            skipped = true;
        }
        return this;
    }

    @Override
    public int read() throws IOException {
        return in.read();
    }

    @Override
    public int read(byte[] b) throws IOException {
        return in.read(b, 0, b.length);
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        return in.read(b, off, len);
    }

    @Override
    public long skip(long n) throws IOException {
        return in.skip(n);
    }

    @Override
    public int available() throws IOException {
        return in.available();
    }

    @Override
    public void close() throws IOException {
        in.close();
    }

    @Override
    public synchronized void mark(int readlimit) {
        in.mark(readlimit);
    }

    @Override
    public synchronized void reset() throws IOException {
        in.reset();
    }

    @Override
    public boolean markSupported() {
        return in.markSupported();
    }

    private enum BOM {
        NONE, UTF_8, UTF_16_LE, UTF_16_BE, UTF_32_LE, UTF_32_BE
    }
}

用法：

使用UnicodeBOMInputStream 包裝器作為如下：

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStreamReader;

public class CSVReaderWithBOM {

    public static void main(String[] args) throws Exception {
        FileInputStream fis = new FileInputStream("test.csv");
        UnicodeBOMInputStream ubis = new UnicodeBOMInputStream(fis);

        System.out.println("Detected BOM: " + ubis.getBOM());

        System.out.print("Reading the content of the file without skipping the BOM: ");
        InputStreamReader isr = new InputStreamReader(ubis);
        BufferedReader br = new BufferedReader(isr);

        System.out.println(br.readLine());

        br.close();
        isr.close();
        ubis.close();
        fis.close();

        fis = new FileInputStream("test.csv");
        ubis = new UnicodeBOMInputStream(fis);
        isr = new InputStreamReader(ubis);
        br = new BufferedReader(isr);

        ubis.skipBOM();

        System.out.print("Reading the content of the file after skipping the BOM: ");
        System.out.println(br.readLine());

        br.close();
        isr.close();
        ubis.close();
        fis.close();
    }
}

使用UnicodeBOMInputStream 包裝器作為如下：

此方法可讓您讀取帶有或不帶有BOM 的CSV 文件，並避免因文件第一行中存在 BOM 而導致的字串比較問題。

以上是在 Java 中讀取 CSV 檔案時如何處理字節順序標記 (BOM)？的詳細內容。更多資訊請關注PHP中文網其他相關文章！

陳述

本文內容由網友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容，請聯絡admin@php.cn

Java平台獨立性：OS之間的差異May 16, 2025 am 12:18 AM

Java在不同操作系統上的表現存在細微差異。 1）JVM實現不同，如HotSpot、OpenJDK，影響性能和垃圾回收。 2）文件系統結構和路徑分隔符不同，需使用Java標準庫處理。 3）網絡協議實現差異影響網絡性能。 4）GUI組件外觀和行為在不同系統上有別。通過使用標準庫和虛擬機測試，可減少這些差異的影響，確保Java程序穩定運行。

Java的最佳功能：從面向對象的編程到安全性May 16, 2025 am 12:15 AM

javaoffersrobustobject-IentiendedProgrammming（OOP）和Top-Notchsecurityfeatures.1）OopinjavainCludesClasses，對象，繼承，多態性，和列出，andeclingfleximaintainablesys.ss.2）SecurityFeateTuersLudEtersludEterMachine（

JavaScript與Java的最佳功能May 16, 2025 am 12:13 AM

JavaScriptandJavahavedistinctstrengths:JavaScriptexcelsindynamictypingandasynchronousprogramming,whileJavaisrobustwithstrongOOPandtyping.1)JavaScript'sdynamicnatureallowsforrapiddevelopmentandprototyping,withasync/awaitfornon-blockingI/O.2)Java'sOOPf

Java平台獨立性：收益，限制和實施May 16, 2025 am 12:12 AM

JAVAACHIEVESPLATFORMINDEPENTENCETHROUGHJAVAVIRTAILMACHINE（JVM）和BYTECODE.1）THEJVMINTERPRETSBBYTECODE，允許theingthesmecodetorunonanyanyanyanyplatformwithajvm.2）

Java：真實詞的平台獨立性May 16, 2025 am 12:07 AM

java'splatformendependecemeansapplicationscanrunonanyplatformwithajvm，使“ Writeonce，runanywhere”。

JVM性能與其他語言May 14, 2025 am 12:16 AM

JVM'SperformanceIsCompetitiveWithOtherRuntimes，operingabalanceOfspeed，安全性和生產性。 1）JVMUSESJITCOMPILATIONFORDYNAMICOPTIMIZAIZATIONS.2）c提供NativePernativePerformanceButlanceButlactsjvm'ssafetyFeatures.3）

Java平台獨立性：使用示例May 14, 2025 am 12:14 AM

JavaachievesPlatFormIndependencEthroughTheJavavIrtualMachine（JVM），允許CodeTorunonAnyPlatFormWithAjvm.1）codeisscompiledIntobytecode，notmachine-specificodificcode.2）bytecodeisisteredbytheybytheybytheybythejvm，enablingcross-platerssectectectectectross-eenablingcrossectectectectectection.2）

JVM架構：深入研究Java虛擬機May 14, 2025 am 12:12 AM

TheJVMisanabstractcomputingmachinecrucialforrunningJavaprogramsduetoitsplatform-independentarchitecture.Itincludes:1)ClassLoaderforloadingclasses,2)RuntimeDataAreafordatastorage,3)ExecutionEnginewithInterpreter,JITCompiler,andGarbageCollectorforbytec

See all articles