Specific needs
There are 1 million pieces of data in the database;
100,000 pieces of data need to be imported;
If the data to be imported exists in the data table, terminate the data import;
Questions
How to quickly import each piece of data when it needs to be compared, and roll back the data if the import fails?
phpcn_u15822017-05-16 13:11:47
Assume that the target table is called target (1 million data), and the table to be imported is called source (100,000 data).
A large part of the time consumption in this example is to determine whether the records in the two tables are the same. I guess the author wants all fields to be the same (except the id field) before they are considered to be the same record.
Calculate a Hash-like value for each record, which can be stored in the newly created fields of target and source (hashValue). The calculation amount is 1.1 million.
Then use the following statement to import the data
IF EXISTS(
SELECT TOP 1 1
FROM source a
INNER JOIN target b
ON a.hashValue = b.hashValue
) BEGIN
-- 返回重复记录
SELECT a.*
FROM source a
INNER JOIN target b
ON a.hashValue = b.hashValue
END
ELSE
BEGIN
INSERT INTO Target(filed1, field2, field3, field4 ... hashValue)
SELECT
t.filed1,
t.field2,
t.field3,
t.field4, ..., t.hashValue
FROM target t
END
我想大声告诉你2017-05-16 13:11:47
If you consider transaction efficiency, it must not be high
Please refer to the following for personal ideas:
1. First determine whether the data to be inserted exists
select id from table where id in('k1,k2,k3');
2. Does not exist. Write sql to the database
10w data, 1 sql, insertion speed, solved within 2 seconds
You can read an article I wrote before, optimizing the insertion of 20w data at a time to be completed in 3 seconds
https://my.oschina.net/famous...
滿天的星座2017-05-16 13:11:47
My personal idea is: the key point is to judge the weight and establish a unique index to ensure uniqueness when importing. For batch import, you can refer to @石记's method.
怪我咯2017-05-16 13:11:47
Based on this problem, a big problem occurred during the subsequent import, and the data import was slow, so I wrote an article about this:
/a/11...
I hope someone can use it as a reference after seeing it later.