Home > Article > Backend Development > Sample code sharing for PHP to implement file content deduplication and sorting
This article will use the php and linux sort commands to achieve deduplication and sorting of file contents respectively, and provide complete demonstration code.
Write1000000 numbers, one number per line
<?php $file = 'user_id.txt'; $num = 1000000; $tmp = ''; for($i=0; $i<$num; $i++){ $tmp .= mt_rand(0,999999).PHP_EOL; if($i>0 && $i%1000==0 || $i==$num-1){ file_put_contents($file, $tmp, FILE_APPEND); $tmp = ''; } }?>
View Number of file lines
wc -l user_id.txt 1000000 user_id.txt
Because 1000000 lines of data need to be processed, the memory available to php is set to 256m , to prevent insufficient memory during execution.
<?php/** * 文件内容去重及排序 * @param String $source 源文件 * @param String $dest 目标文件 * @param String $order 排序顺序 * @param Int $sort_flag 排序类型 */function fileUniSort($source, $dest, $order='asc', $sort_flag=SORT_NUMERIC){ // 读取文件内容 $file_data = file_get_contents($source); // 文件内容按行分割为数组 $file_data_arr = explode(PHP_EOL, $file_data); // 去除空行数据 $file_data_arr = array_filter($file_data_arr, 'filter'); // 去重 $file_data_arr = array_flip($file_data_arr); $file_data_arr = array_flip($file_data_arr); // 排序 if($order=='asc'){ sort($file_data_arr, $sort_flag); }else{ rsort($file_data_arr, $sort_flag); } // 数组合拼为文件内容 $file_data = implode(PHP_EOL, $file_data_arr).PHP_EOL; // 写入文件 file_put_contents($dest, $file_data, true); }// 过滤空行function filter($data){ if(!$data && $data!=='0'){ return false; } return true; }// 设置可使用内存为256mini_set('memory_limit', '256m');$source = 'user_id.txt';$dest = 'php_sort_user_id.txt'; fileUniSort($source, $dest);?>
View the deduplicated and sorted files
wc -l php_sort_user_id.txt 632042 php_sort_user_id.txt head php_sort_user_id.txt 012357891112...
The linux sort command is used to sort text files by lines
Format:
sort [OPTION]... [FILE]...
Parameter description:
-u Deduplication
-n Numeric sorting type
-r Descending order
-o Path to output file
Use sort to perform deduplication and Sorting
sort -uno linux_sort_user_id.txt user_id.txt
View the deduplicated and sorted files
wc -l linux_sort_user_id.txt 632042 linux_sort_user_id.txt head linux_sort_user_id.txt 012357891112...
Summary: File deduplication and sorting can be achieved using the php or linux sort command, and the execution time is different. Not big, but it is recommended that for file operations, it is easier to use system commands directly.
This article will use the php and linux sort commands to achieve deduplication and sorting of file contents respectively, and provide complete demonstration code .
The above is the detailed content of Sample code sharing for PHP to implement file content deduplication and sorting. For more information, please follow other related articles on the PHP Chinese website!