Home  >  Q&A  >  body text

Mysql批量插入数据之前如何判断重复?

使用MySQL做统计,需要查询大量数据计算后重新组装各种数据入库,考虑到大数据量和性能问题,要批量插入数据库,而且可能会出现重复的情况,如何判断重复呢?

如果在入库前判断重复,需要每条数据都select一下判断,不重复的放到数组再一次批量插入,但是这个查询量也是很大的。或者建立联合唯一约束,但是如果其中一条数据插入失败,则整个插入操作都失败。

现在的做法是先不判断重复,入库后,再找出重复的删掉。。

有没有更好的办法呢?

2016-01-19号补充:重复的判断条件是某几个字段的值是否相同

阿神阿神2742 days ago613

reply all(5)I'll reply

  • 怪我咯

    怪我咯2017-04-17 13:43:14

    You can try replace into or Insert into ..... on duplicate key update

    Reference:
    http://blog.csdn.net/mchdba/article/details/8647560
    http://dev.mysql.com/doc/refman/5.7/en/insert-on- duplicate.html

    reply
    0
  • PHP中文网

    PHP中文网2017-04-17 13:43:14

    When importing batches into the database, it is recommended to use the mysql import tool - mysqlimport, which can be set to ignore duplicate data.
    http://www.runoob.com/mysql/mysql-database-import.html

    reply
    0
  • 高洛峰

    高洛峰2017-04-17 13:43:14

    I think your method of inserting first and then deleting is good.
    The duplication you are talking about is "primary key duplication" data, right? Then what you want to insert must be the latest data. I will delete the old data first, assuming the primary key is 'uid', start a transaction first, then 'delete ... where uid in (...)', then insert new data and submit the transaction.
    If you still want to optimize, use 'select ...where uid in ()' to find out the existing data at once, and then do not insert the duplicate data.

    reply
    0
  • 迷茫

    迷茫2017-04-17 13:43:14

    Create a temporary table and insert it all, then insert

    reply
    0
  • 迷茫

    迷茫2017-04-17 13:43:14

    @好雨云 What he said about replace into or Insert into ..... on duplicate key update is a solution.

    However, it is recommended to use Insert into ..... on duplicate key update

    When you have a large amount of data, the efficiency is higher than replace. The reason is that replace requires additional maintenance of the primary key index when inserting data.

    reply
    0
  • Cancelreply