Home  >  Article  >  Java  >  Java reads large files efficiently

Java reads large files efficiently

伊谢尔伦
伊谢尔伦Original
2016-11-26 11:00:491600browse

 1. Overview

 This tutorial will demonstrate how to read large files efficiently with Java. This article is part of the "Java - Back to Basics" tutorial series on Baeldung (http://www.baeldung.com/).

 2. Read in memory

 The standard way to read file lines is to read in memory. Both Guava and Apache Commons IO provide a method to quickly read file lines as shown below:

Files.readLines(new File(path), Charsets.UTF_8);
 
FileUtils.readLines(new File(path));

This method The problem is that all lines of the file are stored in memory. When the file is large enough, it will soon cause the program to throw an OutOfMemoryError exception.

For example: read a file of about 1G:

@Test
public void givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException {
    String path = ...
    Files.readLines(new File(path), Charsets.UTF_8);
}

This method only takes up very little memory at the beginning: (consuming about 0Mb of memory)

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128 Mb
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116 Mb

However, when the file is all read into the memory, we finally You can see (approximately 2GB of memory consumed):

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666 Mb
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490 Mb

This means that this process consumes approximately 2.1GB of memory - the reason is simple: now all the lines of the file are stored in memory.

Putting the entire contents of a file in memory will quickly exhaust the available memory - no matter how much memory is actually available, this is obvious.

Furthermore, we usually don’t need to put all the lines of the file into memory at once - instead, we just need to iterate through each line of the file, do the corresponding processing, and throw it away after processing. So, that's exactly what we're going to do - iterate through the rows instead of holding all the rows in memory.

 3. File stream

Now let’s look at this solution - we will use the java.util.Scanner class to scan the contents of the file and read it continuously line by line:

FileInputStream inputStream = null;
Scanner sc = null;
try {
    inputStream = new FileInputStream(path);
    sc = new Scanner(inputStream, "UTF-8");
    while (sc.hasNextLine()) {
        String line = sc.nextLine();
        // System.out.println(line);
    }
    // note that Scanner suppresses exceptions
    if (sc.ioException() != null) {
        throw sc.ioException();
    }
} finally {
    if (inputStream != null) {
        inputStream.close();
    }
    if (sc != null) {
        sc.close();
    }
}

This solution will traverse the file All rows in - Allows processing of each row without keeping a reference to it. In short, they are not stored in memory: (approximately 150MB of memory is consumed)

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 763 Mb
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 605 Mb

 4. Apache Commons IO stream

  It can also be implemented using the Commons IO library, using the custom LineIterator provided by the library:

LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");
try {
    while (it.hasNext()) {
        String line = it.nextLine();
        // do something with line
    }
} finally {
    LineIterator.closeQuietly(it);
}

Due to the entire Not all files are stored in memory, which results in a very conservative memory consumption: (approximately 150MB of memory consumed)

[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Total Memory: 752 Mb
[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Free Memory: 564 Mb


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn