Home >Database >Mysql Tutorial >How to Efficiently Delete Duplicate Rows from a Table in SQL?

How to Efficiently Delete Duplicate Rows from a Table in SQL?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2025-01-19 02:47:08163browse

How to Efficiently Delete Duplicate Rows from a Table in SQL?

SQL: Eliminating Duplicate Rows in Tables

Challenge:

You're faced with a table lacking primary keys or constraints, containing multiple rows with identical data. The goal is to remove duplicate rows, preserving only a single instance of each unique row. We'll use a column named "key" to identify duplicates, ensuring only one entry exists for each distinct "key" value.

Solution:

This process involves two key steps:

  1. Identifying the First Occurrence: Locate the initial instance of each duplicate row.
  2. Removing Subsequent Duplicates: Delete all rows that are not the first occurrence identified in step 1.

The following SQL command accomplishes this:

<code class="language-sql">DELETE FROM dups a USING (
    SELECT MIN(ctid) as ctid, key
    FROM dups 
    GROUP BY key HAVING COUNT(*) > 1
) b
WHERE a.key = b.key 
AND a.ctid <> b.ctid;</code>

Important Note: This method doesn't offer control over which specific row is retained.

Illustrative Example:

Consider this sample table:

<code class="language-sql">CREATE TABLE people (
    name    varchar(50) NOT NULL,
    surname varchar(50) NOT NULL,
    age     integer NOT NULL
);

INSERT INTO people (name, surname, age) VALUES 
    ('A.', 'Tom', 30),
    ('A.', 'Tom', 10),
    ('B.', 'Tom', 20),
    ('B', 'Chris', 20);</code>

To find the first occurrence of duplicates:

<code class="language-sql">SELECT MIN(ctid) as ctid, name, surname
FROM people 
GROUP BY (name, surname) HAVING COUNT(*) > 1;</code>

Output:

ctid name surname
(0,1) A. Tom

Deleting the non-first duplicates:

<code class="language-sql">DELETE FROM people a USING (
    SELECT MIN(ctid) as ctid, name, surname
    FROM people 
    GROUP BY (name, surname) HAVING COUNT(*) > 1
) b
WHERE a.name = b.name
AND a.surname = b.surname
AND a.ctid <> b.ctid;</code>

The remaining rows:

<code class="language-sql">SELECT * FROM people;</code>

Output:

name surname age
A. Tom 30
B. Tom 20
B Chris 20

The above is the detailed content of How to Efficiently Delete Duplicate Rows from a Table in SQL?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn