python - 對於不同文件夾的特徵提取策略問題

Question

問題描述： 有很多文件夾，一個文件夾下有很多文件。目的是將每個文件夾的特徵提取出來，特徵規定是該文件夾下的一個或少量文件（名，內容hash及相對位置）。有什麼較好的算法或策略來解決該問題嗎？ 我想的是暴...

高洛峰 · Answer

Randomly select a fixed number of files from the current folder, combine their file names, sizes, modification times, permissions, etc. to make a hash, and then determine the duplication rate. Generally, the repetition rate will not be very high, because even if the file names and The size is the same, but the modification time is generally different.

python - 對於不同文件夾的特徵提取策略問題

reply all(1)I'll reply