Home >Database >Mysql Tutorial >How to Quickly Remove Duplicates from a Massive MySQL Database?

How to Quickly Remove Duplicates from a Massive MySQL Database?

Barbara Streisand
Barbara StreisandOriginal
2024-12-29 22:36:12678browse

How to Quickly Remove Duplicates from a Massive MySQL Database?

MySQL: Remove Duplicates from Massive Database Swiftly

Encounters with vast MySQL databases marred by duplicate entries demand prompt resolution, especially in scenarios where time is of the essence. With concerns over query execution time, one such database, comprising over a million rows, has been riddled with duplicates, potentially occupying up to half of its storage capacity.

The aim is to swiftly cleanse the database, identifying unique combinations of "text1" and "text2," while preserving a single record with a non-null "text3" value for each unique combination. This process prevents data loss and ensures data integrity.

Conventional approaches such as DISTINCT and GROUP BY, though effective on smaller databases, have proven inadequate, with query execution times exceeding the 20-minute threshold. Therefore, the search is on for a more efficient solution.

One such solution, leveraging the combined power of "ON DUPLICATE KEY" and "IFNULL()", offers exceptional performance:

CREATE TABLE tmp LIKE yourtable;

ALTER TABLE tmp ADD UNIQUE (text1, text2);

INSERT INTO tmp SELECT * FROM yourtable
ON DUPLICATE KEY UPDATE text3=IFNULL(text3, VALUES(text3));

RENAME TABLE yourtable TO deleteme, tmp TO yourtable;

DROP TABLE deleteme;

This approach excels by evading resource-intensive operations like GROUP BY and DISTINCT. It also bypasses the performance pitfalls associated with file sorting, a major impediment for large temporary tables. While a full scan of the original table remains unavoidable, this optimized solution provides the most efficient approach to duplicate removal for massive databases.

The above is the detailed content of How to Quickly Remove Duplicates from a Massive MySQL Database?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn