Home  >  Q&A  >  body text

java - 读取大于内存的大文件怎么读?

参考:
    有一个1G大小的一个文件,内存限制大小是10M,有序返回频数最高的50个词,该怎么做?

网上有很多该问题的解决方案,都是用分而治之的思想,提到了遍历整个文件。

那么我的问题是:
如果单纯地逐行读取大文件,算是把1G文件全都加载进内存吗?
或者说是读取大于内存的文件应该怎么读?

PHP中文网PHP中文网2742 days ago958

reply all(6)I'll reply

  • 黄舟

    黄舟2017-04-18 10:57:16

    The memory here is like a pipe. Line-by-line reading is just to pass the 1G file through the memory. 10M represents the thickness of the pipe.
    So, line-by-line reading takes 1G file into 加载进去过memory.

    reply
    0
  • 伊谢尔伦

    伊谢尔伦2017-04-18 10:57:16

    try (BufferedReader in = new BufferedReader(new FileReader(file))) {
        String line;
        while ((line = in.readLine()) != null) {
            // parse line
        }
    }
    

    No matter how big the file is, as long as the length of each line is limited, it will take a lot of time to read the entire file, but it will not take up too much memory.

    reply
    0
  • 伊谢尔伦

    伊谢尔伦2017-04-18 10:57:16

    Read in chunks, read one result set for each chunk, and finally aggregate the result set
    If you are processing text, it will be better to know the number of lines

    reply
    0
  • 高洛峰

    高洛峰2017-04-18 10:57:16

    linux上面有个指令叫做splitYou can quickly divide large text into small files concurrently, and then process it conveniently. This algorithm is called external sorting

    reply
    0
  • 怪我咯

    怪我咯2017-04-18 10:57:16

    Memory is like scratch paper. Once you finish writing an article, turn it over. Used and unused data are thrown away.

    A simple example, create a variable buff, set its size, open the file stream and fill it in. After it is filled, check the content you want. If found, it will be counted in another variable. Then clear the buff, continue to load the content again at the previously read position... Until the reading is completed, the statistics are completed.

    reply
    0
  • 阿神

    阿神2017-04-18 10:57:16

    For different systems, an API will be provided to operate files larger than the memory, that is, the file will be treated as memory:

    内存映射

    • mmap

    • CreateFileMapping

    reply
    0
  • Cancelreply