It is a common interview question, and some major companies also like to use this question as an interview question.
Question: For example, there is a 1g file, which stores these disordered and non-unique numbers. How to use 100m to complete the overall sorting?
The implementation process is:
1. First read the large file line by line, with every 10,000 lines in a group, and then write the file after sorting, the file name is similar to t1.txt, t2.txt... until the entire file is read and split,
2. Then traverse all files, and each file first reads the first In one line, is put into the temporary sorting array $tmpNums**, and then the minimum value is taken, ** is put into the temporary storage array $nums, and the index value of the current position is recorded at the same time $idx
3. From Take the next number from the file corresponding to the index where the minimum value is located, put it into the current index of the temporary sorting array, and then continue to repeat the operation of step 2 until all the contents of all files are read, then the entire file is reordered.The following is a PHP multi-way merge sort demo code, which is just a simple skeleton structure. For example, min takes the minimum value part, which can be expanded into a fixed-length minimum heap implementation, or a priority queue, which can save the value and the file where it is located. .
PHP multi-channel merge demo code:
function multiWaySort() { // 读取所有的文件描述符 $fds = []; $dir = scandir('./runtime/txt/'); foreach ($dir as $file) { if ($file != '.' && $file != '..') { $fds[] = fopen('./runtime/txt/' . $file, 'rb'); } } // 读取每个文件的第一行内容,放入临时排序数组 $tmpNums = []; foreach ($fds as $fd) { $tmpNums[] = (int)fgets($fd); } $nums = []; $count = 0; while (true) { if (empty($tmpNums)) break; // 最小值放入临时存储数组 $value = min($tmpNums); $nums[] = $value; // 读取最小值所在索引,对应的文件下一行内容 $idx = array_search($value, $tmpNums); $next = fgets($fds[$idx]); if (!$next) { unset($tmpNums[$idx]); unset($fds[$idx]); } else { $tmpNums[$idx] = (int)$next; } // 临时存储数组到达一定数量追加写入文件一次 if (count($nums) == 20) { foreach ($nums as $value) { $f = fopen('./runtime/result.txt', 'ab+'); fwrite($f, $value . PHP_EOL); } $nums = []; } if ($count == 4999999) { continue; } $count++; } }
Recommended learning: "PHP Video Tutorial"
Reference :
《Detailed explanation of K-way merge sorting (actual combat)》
《One article to understand the problem of large file sorting/external memory sorting》