MySQL导入大量数据（36.6G）去重问题

Question

有个36.6G的csv文件，需要去重并导入到数据库（顺序无所谓，只需要结果是一个无重复的表），如何处理？

PHPz · Answer

If the Foo field cannot be repeated, then just define Unique and it will be automatically removed:

CREATE TABLE xxx (
   ...
   Foo varchar unique not null,
   ...
);

大家讲道理 · Answer

You can import all the database and delete duplicate data through sql operation

伊谢尔伦 · Answer

Create a unique index for possible duplicate fields

When inserting, use insert ignore into ...

怪我咯 · Answer

You can use bash, sort first, and then use awk to check whether adjacent lines are the same. If not, output them to a new file. This is actually not slow, but it may require a lot of space.

A better approach is to let the database handle it by itself when importing, such as defining unique fields as mentioned above.

MySQL导入大量数据（36.6G）去重问题

reply all(4)I'll reply