Home  >  Q&A  >  body text

MySQL导入大量数据(36.6G)去重问题

有个36.6G的csv文件,需要去重并导入到数据库(顺序无所谓,只需要结果是一个无重复的表),如何处理?

ringa_leeringa_lee2764 days ago531

reply all(4)I'll reply

  • PHPz

    PHPz2017-04-17 13:29:41

    If the Foo field cannot be repeated, then just define Unique and it will be automatically removed:

    CREATE TABLE xxx (
       ...
       Foo varchar unique not null,
       ...
    );
    

    reply
    0
  • 大家讲道理

    大家讲道理2017-04-17 13:29:41

    You can import all the database and delete duplicate data through sql operation

    reply
    0
  • 伊谢尔伦

    伊谢尔伦2017-04-17 13:29:41

    Create a unique index for possible duplicate fields

    When inserting, use insert ignore into ...

    reply
    0
  • 怪我咯

    怪我咯2017-04-17 13:29:41

    You can use bash, sort first, and then use awk to check whether adjacent lines are the same. If not, output them to a new file. This is actually not slow, but it may require a lot of space.

    A better approach is to let the database handle it by itself when importing, such as defining unique fields as mentioned above.

    reply
    0
  • Cancelreply