Home  >  Q&A  >  body text

MySQL deduplication, why can only a few hundred or dozens of duplicate data be removed each time, when there are obviously hundreds of thousands of duplicate data?

Requirement: Delete the items with the same primaryid and drugname in DRUG, and keep the smallest row of drug_seq

Question: Mysql uses these methods to remove duplicates, why can only a few hundred or dozens of items be removed each time? Duplicate data is just like squeezing out toothpaste. Every time you execute it, you can remove a little more duplication. There are obviously hundreds of thousands of duplicate data. I have tried several methods, but they are still incomplete.

#Method 1:

DELETE FROM `drug2022` WHERE drug_seq IN(

SELECT drug_seq FROM (

SELECT drug_seq FROM `drug2022` WHERE (primaryid,drugname) IN (SELECT primaryid,drugname FROM `drug2022` GROUP BY primaryid,drugname HAVING COUNT(*) > 1)

AND

drug_seq NOT IN (SELECT MIN(drug_seq) FROM `drug2022` GROUP BY primaryid,drugname HAVING COUNT (*) > 1))AS a1);

#Method 2:

DELETE

FROM `drug2022`

WHERE drug_seq NOT IN (

) (SELECT t1.min_drug_seq

) FROM (SELECT MIN(drug_seq) AS min_drug_seq FROM `drug2022` GROUP BY drugname, primaryid HAVING COUNT(1) > 1) t1))

AND (drugname, primaryid) IN

(SELECT t2.drugname, t2.primaryid

) FROM (SELECT drugname, primaryid FROM `drug2022` GROUP BY drugname , primaryid HAVING COUNT(1) > 1) t2);

#Method 3:

DELETE t1

FROM `drug2022` t1,

`drug2022` t2

WHERE t1.primaryid = t2.primaryid

AND t1.drugname = t2.drugname

AND t1.drug_seq < t2.drug_seq;

#Method 4

DELETE

FROM `drug2022`

WHERE drug_seq NOT IN (SELECT * FROM (SELECT MIN(drug_seq) FROM `drug2022` GROUP BY primaryid, drugname) t2);

P粉029305743P粉029305743690 days ago671

reply all(0)I'll reply

No reply
  • Cancelreply