Home  >  Q&A  >  body text

group-by - mysql group by 优化的问题

我有一张接近八千万条数据的彩虹表,然后里面有二十多万条的重复数据,我想使用GROUP BY 进行分组,但是使用了GROUP BY语句以后,mysql会生出一张零时表,而零时表,把我的硬盘全部沾满了,所以,求个办法,能帮去去处这重复的数据?

我的sql:

sql:

SELECT COUNT(id) FROM password WHERE length = 4 GROUP BY ciphertext

ringa_leeringa_lee2766 days ago587

reply all(1)I'll reply

  • 迷茫

    迷茫2017-04-17 16:33:14

    It is not recommended to use a SQL statement to solve this kind of problem, because the database cannot afford it. It can be implemented using a program (such as writing a Python script). The steps are as follows:

    1. Copy the table structure of table A (80 million rows) to generate an empty table B;

    2. Make a unique index on the ciphertext column of B;

    3. Traverse each row of table A and insert it into table B. You can use REPLACEINSERT INTO ... ON DUPLICATE KEY UPDATE ...

    4. Verify the data in table B. If the result is correct, delete table A and rename table B to table A.

    reply
    0
  • Cancelreply