Home  >  Article  >  Backend Development  >  Interview question: Given 256M of memory, sort a 10G file (one number per line in the file)

Interview question: Given 256M of memory, sort a 10G file (one number per line in the file)

WBOY
WBOYOriginal
2016-08-10 09:07:185435browse

Given you 256M of memory, how to sort 10G files (1 number per line of file)? How to search a 10G file? How to count the number of occurrences of each keyword in a 10G file

Reply content:

Given you 256M of memory, how to sort 10G files (1 number per line of file)? How to search a 10G file? How to count the number of occurrences of each keyword in a 10G file

Exchange time for space
The specific implementation is to load files in batches and then calculate

java? Use nio and the idea of ​​using mapreduce

I don’t understand php, but this question seems familiar.
Let’s talk about your ideas.
1. Implementation of sorting
This is a typical problem of single-machine external sorting. The specific method is to first sort in chunks and then multi-channel merge into an output file.
2. Search
If the file cannot be processed, it can only be searched through traversal.
If the files can be processed, then the files have been sorted above and you can perform binary search.
3. Statistics
If the file cannot be processed, there is still no good way but to traverse it once.
If you have already captured the sequence, you can directly perform a binary search. Search both ends for the number of occurrences at the found position.

You can read the book "Programming Pearls", it seems to have this problem.

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn