search

Home  >  Q&A  >  body text

php - Import 100,000 pieces of data. There are 1 million pieces of data in the database. How to determine duplication?

Specific needs

  1. There are 1 million pieces of data in the database;

  2. 100,000 pieces of data need to be imported;

  3. If the data to be imported exists in the data table, terminate the data import;

Questions

How to quickly import each piece of data when it needs to be compared, and roll back the data if the import fails?

PHP中文网PHP中文网2785 days ago777

reply all(4)I'll reply

  • phpcn_u1582

    phpcn_u15822017-05-16 13:11:47

    Assume that the target table is called target (1 million data), and the table to be imported is called source (100,000 data).
    A large part of the time consumption in this example is to determine whether the records in the two tables are the same. I guess the author wants all fields to be the same (except the id field) before they are considered to be the same record.

    Calculate a Hash-like value for each record, which can be stored in the newly created fields of target and source (hashValue). The calculation amount is 1.1 million.
    Then use the following statement to import the data

    IF EXISTS(
        SELECT TOP 1 1
          FROM source a
          INNER JOIN target b
        ON a.hashValue = b.hashValue
    ) BEGIN
    
      -- 返回重复记录
        SELECT a.*
          FROM source a
          INNER JOIN target b
        ON a.hashValue = b.hashValue
    
    END
    ELSE 
    BEGIN
        INSERT INTO Target(filed1, field2, field3, field4 ... hashValue)
        SELECT
          t.filed1,
          t.field2,
          t.field3,
          t.field4, ..., t.hashValue
        FROM target t
    END

    reply
    0
  • 我想大声告诉你

    我想大声告诉你2017-05-16 13:11:47

    If you consider transaction efficiency, it must not be high

    Please refer to the following for personal ideas:

    1. First determine whether the data to be inserted exists

    select id from table where id in('k1,k2,k3');

    2. Does not exist. Write sql to the database

    10w data, 1 sql, insertion speed, solved within 2 seconds

    You can read an article I wrote before, optimizing the insertion of 20w data at a time to be completed in 3 seconds
    https://my.oschina.net/famous...

    reply
    0
  • 滿天的星座

    滿天的星座2017-05-16 13:11:47

    My personal idea is: the key point is to judge the weight and establish a unique index to ensure uniqueness when importing. For batch import, you can refer to @石记's method.

    reply
    0
  • 怪我咯

    怪我咯2017-05-16 13:11:47

    Based on this problem, a big problem occurred during the subsequent import, and the data import was slow, so I wrote an article about this:

    /a/11...

    I hope someone can use it as a reference after seeing it later.

    reply
    0
  • Cancelreply