search

Home  >  Q&A  >  body text

How to remove duplicates in SQL table based on multiple fields

I have a gaming table with the following description:

+---------------+-------------+------+-----+---------+----------------+
| Field         | Type        | Null | Key | Default | Extra          |
+---------------+-------------+------+-----+---------+----------------+
| id            | int(11)     | NO   | PRI | NULL    | auto_increment |
| date          | date        | NO   |     | NULL    |                |
| time          | time        | NO   |     | NULL    |                |
| hometeam_id   | int(11)     | NO   | MUL | NULL    |                |
| awayteam_id   | int(11)     | NO   | MUL | NULL    |                |
| locationcity  | varchar(30) | NO   |     | NULL    |                |
| locationstate | varchar(20) | NO   |     | NULL    |                |
+---------------+-------------+------+-----+---------+----------------+

But every game has a duplicate entry somewhere in the table because every game is on both teams' schedules. Is there a SQL statement I can use to see and remove all duplicates based on the same date, time, hometeam_id, awayteam_id, locationcity and locationstate fields?

P粉757556355P粉757556355410 days ago650

reply all(2)I'll reply

  • P粉781235689

    P粉7812356892023-10-21 10:50:13

    You can try a query like this:

    DELETE FROM table_name AS t1
    WHERE EXISTS (
     SELECT 1 FROM table_name AS t2 
     WHERE t2.date = t1.date 
     AND t2.time = t1.time 
     AND t2.hometeam_id = t1.hometeam_id 
     AND t2.awayteam_id = t1.awayteam_id 
     AND t2.locationcity = t1.locationcity 
     AND t2.id > t1.id )

    This will keep only one example of each game instance with the smallest ID in the database.

    reply
    0
  • P粉201448898

    P粉2014488982023-10-21 00:59:19

    You should be able to perform a correlated subquery to remove the data. Find all duplicate rows and delete all but the row with the smallest id. For MYSQL, you need to use inner joins (equivalent to the functionality of EXISTS), as follows:

    delete games from games inner join 
        (select  min(id) minid, date, time,
                 hometeam_id, awayteam_id, locationcity, locationstate
         from games 
         group by date, time, hometeam_id, 
                  awayteam_id, locationcity, locationstate
         having count(1) > 1) as duplicates
       on (duplicates.date = games.date
       and duplicates.time = games.time
       and duplicates.hometeam_id = games.hometeam_id
       and duplicates.awayteam_id = games.awayteam_id
       and duplicates.locationcity = games.locationcity
       and duplicates.locationstate = games.locationstate
       and duplicates.minid <> games.id)

    To test, replace remove game from game with select from game*. Don't just run delete on the database :-)

    reply
    0
  • Cancelreply