Home  >  Q&A  >  body text

mysql - 如何在数据库里优化 汉明距离 查询?

项目里需要一个搜索相似图片的功能,百度了一些dhash的算法,生成了16个长度的hash值,在mysql里这样查询:

 SELECT pk, hash, 
 BIT_COUNT(   CONV(hash, 16, 10) ^ CONV('4c8e3366c275650f', 16, 10) ) as hamming_distance   
 FROM image_hashes   HAVING hamming_distance < 4   ORDER BY hamming_distance ASC; 

经测试,15W条数据,搜索需要很长时间。第二次速度会快些,有啥办法可以优化?

PHP中文网PHP中文网2741 days ago1762

reply all(2)I'll reply

  • 巴扎黑

    巴扎黑2017-04-17 14:56:25

    Thank you for the invitation.

    Sorry to say, I have never done this, but I have heard of Hamming distance before. 15k data, not a lot, but using the mysql function, there is no way to create an index. . .

    I probably searched and found the same problem.

    1. You can refer to mysql picture Hamming distance calculation, nearly 40w XOR calculation, how to break

    2. Hamming distance on binary strings in SQL

    3. You can try mysql’s memory table. Introduction to the features and use of MySQL memory table

    reply
    0
  • 大家讲道理

    大家讲道理2017-04-17 14:56:25

    Sorry, I have no experience with "Hamming distance".

    If there is only 15W data, you can load the data into memory and cache it, and then perform calculations in the program. After all, SQL is not good at doing this kind of thing.

    Based on your SQL, there really isn’t much room for optimization. The only things I can think of are the following:

      Use
    • in the

      hashchar column instead of varchar

    • Discard ORDER BY hamming_distance ASC and sort by program instead

    It is recommended to send the DDL statement and EXPLAIN results for analysis together, but the biggest performance checkpoint should be BIT_COUNT(CONV(hash, 16, 10) ^ CONV('4c8e3366c275650f', 16, 10)) as hamming_distance, which is actually equivalent to a full table scan.

    reply
    0
  • Cancelreply