Home >Database >Mysql Tutorial >How to Delete Duplicate Rows from a PostgreSQL Table While Preserving a Unique Column?

How to Delete Duplicate Rows from a PostgreSQL Table While Preserving a Unique Column?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2025-01-19 02:43:10160browse

How to Delete Duplicate Rows from a PostgreSQL Table While Preserving a Unique Column?

Remove duplicate rows from small table with unique columns

In PostgreSQL databases, eliminating duplicate rows can enhance data integrity and optimize performance. Let's say you have a table that contains unconstrained rows and duplicate data, specifically in a specific column called "key". The goal is to remove duplicates and keep a single instance of each unique "key" value.

Single SQL command solution

To do this with a single SQL command, you can use the following steps:

1. Identify the first repeated occurrence: First, we need to identify the first occurrence of each repeated row. This information is critical to retaining a single copy of the data.

<code class="language-sql">SELECT MIN(ctid) AS ctid, key
FROM dups
GROUP BY key HAVING COUNT(*) > 1;</code>

2. Delete non-first occurrence: Once the first occurrence is identified, we can remove all subsequent duplicates based on their "ctid" value. The "ctid" column represents the row's physical location in the table.

<code class="language-sql">DELETE FROM dups a USING (
    SELECT MIN(ctid) AS ctid, key
    FROM dups
    GROUP BY key HAVING COUNT(*) > 1
) b
WHERE a.key = b.key
AND a.ctid <> b.ctid;</code>

Consider line order

While this method effectively removes duplicates, it does not ensure which row is retained in the event of multiple occurrences. If there are specific criteria for selecting rows to keep, they should be incorporated into the query.

Example:

For better understanding, consider the following example:

<code class="language-sql">CREATE TABLE people (
    name    VARCHAR(50) NOT NULL,
    surname VARCHAR(50) NOT NULL,
    age     INTEGER NOT NULL
);

INSERT INTO people (name, surname, age) VALUES
    ('A.', 'Tom', 30),
    ('A.', 'Tom', 10),
    ('B.', 'Tom', 20),
    ('B', 'Chris', 20);

-- 显示重复项的第一次出现:
SELECT MIN(ctid) AS ctid, name, surname
FROM people
GROUP BY (name, surname) HAVING COUNT(*) > 1;

-- 删除重复项的非第一次出现:
DELETE FROM people a USING (
    SELECT MIN(ctid) AS ctid, name, surname
    FROM people
    GROUP BY (name, surname) HAVING COUNT(*) > 1
) b
WHERE a.name = b.name
AND a.surname = b.surname
AND a.ctid <> b.ctid;

SELECT * FROM people;</code>

This example table contains potentially duplicate personal data. After executing the second query, the duplicates are removed and only unique first and last names remain in the table.

The above is the detailed content of How to Delete Duplicate Rows from a PostgreSQL Table While Preserving a Unique Column?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn