如何用SQL语句去掉重复记录-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

如何用SQL语句去掉重复记录

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 04:24 PM

sqlremovedataMassiveRecordstatementrepeat

海量数据（百万以上），其中有些全部字段都相同，有些部分字段相同，怎样高效去除重复？如果要删除手机(mobilePhone)，电话(officePhone)，邮件(email)同时都相同的数据，以前一直使用这条语句进行去重： delete from 表 where id not in (select max(id) f

海量数据（百万以上），其中有些全部字段都相同，有些部分字段相同，怎样高效去除重复？

如果要删除手机(mobilePhone)，电话(officePhone)，邮件(email)同时都相同的数据，以前一直使用这条语句进行去重：

delete from 表 where id not in 
(select max(id) from 表 group by mobilePhone,officePhone,email )
or
delete from 表 where id not in 
(select min(id) from 表 group by mobilePhone,officePhone,email )

其中下面这条会稍快些。上面这条数据对于100万以内的数据效率还可以，重复数1/5的情况下几分钟到几十分钟不等，但是如果数据量达到300万以上，效率骤降，如果重复数据再多点的话，常常会几十小时跑不完，有时候会锁表跑一夜都跑不完。无奈只得重新寻找新的可行方法，今天终于有所收获：

//查询出唯一数据的ID,并把他们导入临时表tmp中
select min(id) as mid into tmp from 表 group by mobilePhone,officePhone,email
//查询出去重后的数据并插入finally表中
insert into finally select (除ID以外的字段) from customers_1 where id in (select mid from tmp)

效率对比：用delete方法对500万数据去重（1/2重复）约4小时。4小时，很长的时间。

用临时表插入对500万数据去重（1/2重复）不到10分钟。

SQL语句去掉重复记录，获取重复记录

按照某几个字段名称查找表中存在这几个字段的重复数据并按照插入的时间先后进行删除，条件取决于order by 和row_num。

方法一按照多条件重复处理：

delete tmp from(
select row_num = row_number() over(partition by 字段，字段 order by 时间 desc)
    from 表 where 时间> getdate()-1
) tmp
where row_num > 1

方法二按照单一条件进行去重：

delete from 表 where 主键ID not in(
select max(主键ID) from 表 group by 需要去重的字段 having count(需要去重的字段)>=1
)

注意：为提高效率如上两个方法都可以使用临时表， not in 中的表可以先提取临时表#tmp，

然后采用not exists来执行，为避免数量过大，可批量用Top控制删除量

delete top(2) from 表 
        where  not exists (select 主键ID
  from #tmp where #tmp.主键ID=表.主键ID)

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

What Are the Limitations of Using Views in MySQL?May 14, 2025 am 12:10 AM

MySQLviewshavelimitations:1)Theydon'tsupportallSQLoperations,restrictingdatamanipulationthroughviewswithjoinsorsubqueries.2)Theycanimpactperformance,especiallywithcomplexqueriesorlargedatasets.3)Viewsdon'tstoredata,potentiallyleadingtooutdatedinforma

Securing Your MySQL Database: Adding Users and Granting PrivilegesMay 14, 2025 am 12:09 AM

ProperusermanagementinMySQLiscrucialforenhancingsecurityandensuringefficientdatabaseoperation.1)UseCREATEUSERtoaddusers,specifyingconnectionsourcewith@'localhost'or@'%'.2)GrantspecificprivilegeswithGRANT,usingleastprivilegeprincipletominimizerisks.3)

What Factors Influence the Number of Triggers I Can Use in MySQL?May 14, 2025 am 12:08 AM

MySQLdoesn'timposeahardlimitontriggers,butpracticalfactorsdeterminetheireffectiveuse:1)Serverconfigurationimpactstriggermanagement;2)Complextriggersincreasesystemload;3)Largertablesslowtriggerperformance;4)Highconcurrencycancausetriggercontention;5)M

MySQL: Is it safe to store BLOB?May 14, 2025 am 12:07 AM

Yes,it'ssafetostoreBLOBdatainMySQL,butconsiderthesefactors:1)StorageSpace:BLOBscanconsumesignificantspace,potentiallyincreasingcostsandslowingperformance.2)Performance:LargerrowsizesduetoBLOBsmayslowdownqueries.3)BackupandRecovery:Theseprocessescanbe

MySQL: Adding a user through a PHP web interfaceMay 14, 2025 am 12:04 AM

Adding MySQL users through the PHP web interface can use MySQLi extensions. The steps are as follows: 1. Connect to the MySQL database and use the MySQLi extension. 2. Create a user, use the CREATEUSER statement, and use the PASSWORD() function to encrypt the password. 3. Prevent SQL injection and use the mysqli_real_escape_string() function to process user input. 4. Assign permissions to new users and use the GRANT statement.

MySQL: BLOB and other no-sql storage, what are the differences?May 13, 2025 am 12:14 AM

MySQL'sBLOBissuitableforstoringbinarydatawithinarelationaldatabase,whileNoSQLoptionslikeMongoDB,Redis,andCassandraofferflexible,scalablesolutionsforunstructureddata.BLOBissimplerbutcanslowdownperformancewithlargedata;NoSQLprovidesbetterscalabilityand

MySQL Add User: Syntax, Options, and Security Best PracticesMay 13, 2025 am 12:12 AM

ToaddauserinMySQL,use:CREATEUSER'username'@'host'IDENTIFIEDBY'password';Here'showtodoitsecurely:1)Choosethehostcarefullytocontrolaccess.2)SetresourcelimitswithoptionslikeMAX_QUERIES_PER_HOUR.3)Usestrong,uniquepasswords.4)EnforceSSL/TLSconnectionswith

MySQL: How to avoid String Data Types common mistakes?May 13, 2025 am 12:09 AM

ToavoidcommonmistakeswithstringdatatypesinMySQL,understandstringtypenuances,choosetherighttype,andmanageencodingandcollationsettingseffectively.1)UseCHARforfixed-lengthstrings,VARCHARforvariable-length,andTEXT/BLOBforlargerdata.2)Setcorrectcharacters

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055612 fails to install in Windows 10?

4 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks agoByDDD

Nordhold: Fusion System, Explained

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Chinese version

Chinese version, very easy to use

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.