search
HomeDatabaseMysql Tutorialmysql中数据去重和优化_MySQL

bitsCN.com

mysql中数据去重和优化

 

更改表user_info的主键uid为自增的id后,忘了设置原来主键uid属性为unique,结果导致产生uid重复的记录。为此需要清理后来插入的重复记录。

 

基本方法可以参考后面的附上的资料,但是由于mysql不支持同时对一个表进行操作,即子查询和要进行的操作不能是同一个表,因此需要通过零时表中转一下。

 

写在前面:数据量大时,一定要多涉及的关键字段创建索引!!!否则很慢很慢很慢,慢到想死的心都有了

 

1 单字段重复

 

生成零时表,其中uid是需要去重的字段

 

create table tmpuid as (select uid from userinfo group by uid having count(uid))

 

create table tmpid as (select min(id) from userinfo group by uid having count(uid))

 

数据量大时一定要为uid创建索引

 

create index indexuid on tmpuid

 

create index indexid on tmpid

 

删除多余的重复记录,保留重复项中id最小的

 

delete from user_info where id not in (select id from tmp_id) and uid in (select uid from tmp_uid)

 

2.多字段重复

 

由uid的重复间接的导致了relationship中的记录重复,故继续去重。先介绍正常处理流程,在介绍本人根据自身数据特点实践的更加有效的方法!

 

2.1一般方法

 

基本的同上面:

 

生成零时表

 

create table tmp_relation as (select source,target from relationship group by source,target having count(*)>1)

 

create table tmprelationshipid as (select min(id) as id from relationship group by source,target having count(*)>1)

 

创建索引

 

create index indexid on tmprelationship_id

 

删除

 

delete from relationship where id not in (select id from tmprelationshipid) and (source,target) in (select source,target from relationship)

 

2.2 实践出真知

 

实践中发现上面的删除字段重复的方法,由于没有办法为多字段重建索引,导致数据量大时效率极低,低到无法忍受。最后,受不了等了半天没反应的状况,本人决定,另辟蹊径。

 

考虑到,估计同一记录的重复次数比较低。一般为2,或3,重复次数比较集中。所以可以尝试直接删除重复项中最大的,直到删除到不重复,这时其id自然也是当时重复的里边最小的。

 

大致流程如下:

 

1)选择每个重复项中id最大的一个记录

 

create table tmprelationid2 as (select max(id) from relationship group by source,target having count(*)>1)

 

2)创建索引(仅需在第一次时执行)

 

create index indexid on tmprelation_id2

 

3)删除 重复项中id最大的记录

 

delete from relationship where id in (select id from tmprelationid2)

 

4)删除临时表

 

drop table tmprelationid2

 

重复上述步骤1),2),3),4),直到创建的临时表中不存在记录就结束(对于重复次数的数据,比较高效)

 

查询及删除重复记录的方法

 

(一) 1、查找表中多余的重复记录,重复记录是根据单个字段(peopleId)来判断 select * from people where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1)

 

2、删除表中多余的重复记录,重复记录是根据单个字段(peopleId)来判断,只留有rowid最小的记录 delete from people where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1) and rowid not in (select min(rowid) from people group by peopleId having count(peopleId )>1)

 

3、查找表中多余的重复记录(多个字段) select * from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)

 

4、删除表中多余的重复记录(多个字段),只留有rowid最小的记录 delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count() > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count()>1)

 

bitsCN.com
Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Explain the InnoDB Buffer Pool and its importance for performance.Explain the InnoDB Buffer Pool and its importance for performance.Apr 19, 2025 am 12:24 AM

InnoDBBufferPool reduces disk I/O by caching data and indexing pages, improving database performance. Its working principle includes: 1. Data reading: Read data from BufferPool; 2. Data writing: After modifying the data, write to BufferPool and refresh it to disk regularly; 3. Cache management: Use the LRU algorithm to manage cache pages; 4. Reading mechanism: Load adjacent data pages in advance. By sizing the BufferPool and using multiple instances, database performance can be optimized.

MySQL vs. Other Programming Languages: A ComparisonMySQL vs. Other Programming Languages: A ComparisonApr 19, 2025 am 12:22 AM

Compared with other programming languages, MySQL is mainly used to store and manage data, while other languages ​​such as Python, Java, and C are used for logical processing and application development. MySQL is known for its high performance, scalability and cross-platform support, suitable for data management needs, while other languages ​​have advantages in their respective fields such as data analytics, enterprise applications, and system programming.

Learning MySQL: A Step-by-Step Guide for New UsersLearning MySQL: A Step-by-Step Guide for New UsersApr 19, 2025 am 12:19 AM

MySQL is worth learning because it is a powerful open source database management system suitable for data storage, management and analysis. 1) MySQL is a relational database that uses SQL to operate data and is suitable for structured data management. 2) The SQL language is the key to interacting with MySQL and supports CRUD operations. 3) The working principle of MySQL includes client/server architecture, storage engine and query optimizer. 4) Basic usage includes creating databases and tables, and advanced usage involves joining tables using JOIN. 5) Common errors include syntax errors and permission issues, and debugging skills include checking syntax and using EXPLAIN commands. 6) Performance optimization involves the use of indexes, optimization of SQL statements and regular maintenance of databases.

MySQL: Essential Skills for Beginners to MasterMySQL: Essential Skills for Beginners to MasterApr 18, 2025 am 12:24 AM

MySQL is suitable for beginners to learn database skills. 1. Install MySQL server and client tools. 2. Understand basic SQL queries, such as SELECT. 3. Master data operations: create tables, insert, update, and delete data. 4. Learn advanced skills: subquery and window functions. 5. Debugging and optimization: Check syntax, use indexes, avoid SELECT*, and use LIMIT.

MySQL: Structured Data and Relational DatabasesMySQL: Structured Data and Relational DatabasesApr 18, 2025 am 12:22 AM

MySQL efficiently manages structured data through table structure and SQL query, and implements inter-table relationships through foreign keys. 1. Define the data format and type when creating a table. 2. Use foreign keys to establish relationships between tables. 3. Improve performance through indexing and query optimization. 4. Regularly backup and monitor databases to ensure data security and performance optimization.

MySQL: Key Features and Capabilities ExplainedMySQL: Key Features and Capabilities ExplainedApr 18, 2025 am 12:17 AM

MySQL is an open source relational database management system that is widely used in Web development. Its key features include: 1. Supports multiple storage engines, such as InnoDB and MyISAM, suitable for different scenarios; 2. Provides master-slave replication functions to facilitate load balancing and data backup; 3. Improve query efficiency through query optimization and index use.

The Purpose of SQL: Interacting with MySQL DatabasesThe Purpose of SQL: Interacting with MySQL DatabasesApr 18, 2025 am 12:12 AM

SQL is used to interact with MySQL database to realize data addition, deletion, modification, inspection and database design. 1) SQL performs data operations through SELECT, INSERT, UPDATE, DELETE statements; 2) Use CREATE, ALTER, DROP statements for database design and management; 3) Complex queries and data analysis are implemented through SQL to improve business decision-making efficiency.

MySQL for Beginners: Getting Started with Database ManagementMySQL for Beginners: Getting Started with Database ManagementApr 18, 2025 am 12:10 AM

The basic operations of MySQL include creating databases, tables, and using SQL to perform CRUD operations on data. 1. Create a database: CREATEDATABASEmy_first_db; 2. Create a table: CREATETABLEbooks(idINTAUTO_INCREMENTPRIMARYKEY, titleVARCHAR(100)NOTNULL, authorVARCHAR(100)NOTNULL, published_yearINT); 3. Insert data: INSERTINTObooks(title, author, published_year)VA

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools