Home >Java >javaTutorial >How Can I Efficiently Count Lines in Large Java Data Files?

How Can I Efficiently Count Lines in Large Java Data Files?

Patricia Arquette
Patricia ArquetteOriginal
2024-12-09 09:18:07347browse

How Can I Efficiently Count Lines in Large Java Data Files?

Counting Lines in Large Data Files in Java

Counting the number of lines in massive data files can be a daunting task. While iterating through the file line by line is a common approach, it is time-consuming and inefficient.

A more efficient alternative is to utilize the following optimized method:

public static int countLines(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];
        int count = 0;
        int readChars = 0;
        boolean empty = true;
        while ((readChars = is.read(c)) != -1) {
            empty = false;
            for (int i = 0; i < readChars; ++i) {
                if (c[i] == '\n') {
                    ++count;
                }
            }
        }
        return (count == 0 && !empty) ? 1 : count;
    } finally {
        is.close();
    }
}

public static int countLinesNew(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];

        int readChars = is.read(c);
        if (readChars == -1) {
            // bail out if nothing to read
            return 0;
        }

        // make it easy for the optimizer to tune this loop
        int count = 0;
        while (readChars == 1024) {
            for (int i = 0; i < 1024;) {
                if (c[i++] == '\n') {
                    ++count;
                }
            }
            readChars = is.read(c);
        }

        // count remaining characters
        while (readChars != -1) {
            for (int i = 0; i < readChars; ++i) {
                if (c[i] == '\n') {
                    ++count;
                }
            }
            readChars = is.read(c);
        }

        return count == 0 ? 1 : count;
    } finally {
        is.close();
    }
}

This method reads the file in chunks of 1024 bytes, significantly reducing the number of file system accesses compared to reading line by line. It maintains a count of lines encountered during each chunk and accumulates the total count.

Benchmarks have shown that this method is significantly faster than using LineNumberReader. For a 1.3GB text file, the optimized method takes around 0.35 seconds to count the lines, while LineNumberReader takes approximately 2.40 seconds.

The above is the detailed content of How Can I Efficiently Count Lines in Large Java Data Files?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn