Home >Database >Mysql Tutorial >Database deduplication based on specified fields

Database deduplication based on specified fields

步履不停
步履不停Original
2019-06-19 13:14:597258browse

Database deduplication based on specified fields

Requirements: Remove duplicate data from a user table based on the name/email/card_num field;

Ideas: Use the group by method to query the 'duplicate' data, store the data in a temporary table, and then store the data in the temporary table into the specified table;

Misunderstandings and Solution: The group by method can only obtain some fields (specified fields without duplication), and cannot obtain complete data at one time. However, the id in the group by result set can be obtained through the max function, and then all the data can be queried based on the id set. Record.

Test ideas

  • Query the data after deduplication

SELECT max(id) as id,name,email,card_num FROM users GROUP BY name,email,card_num;

  • Get the id set from the deduplicated data

SELECT ID from (SELECT max(id ) as id,name,email,card_num FROM users GROUP BY name,email,card_num) as T;

  • Get the id set from the deduplicated data and obtain it from the source data Record list

SELECT * from users where id in (SELECT ID from (SELECT max(id) as id,name,email,card_num FROM users GROUP BY name,email,card_num) as T);

Actual method

  • Acquire the id set from the deduplicated data, obtain the record list from the source data, and store these list data in a temporary In the table

create TEMP TABLE tmp_data as SELECT * from users where id in (SELECT ID from (SELECT max(id) as id,name,email,card_num FROM users GROUP BY name, email,card_num) as T);

  • Save the data in the temporary table into the specified data table, complete

insert into users_copy1 select * from tmp_data;

Detection

  • Is the detection result the same as the total number of data after deduplication in the first step of query

select count(*) from users_copy1;

Test results: 2300 pieces of data are duplicated among 1.4w pieces of data. The actual running result is 0.7s, which basically meets the current needs.

For more MySQL related technical articles, please visit the MySQL Tutorial column to learn!

The above is the detailed content of Database deduplication based on specified fields. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn