In-depth understanding of optimizing SQL queries - specific analysis of how to write high-performance SQL statements-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

In-depth understanding of optimizing SQL queries - specific analysis of how to write high-performance SQL statements

黄舟

Mar 09, 2017 am 11:10 AM

In-depth understanding of optimizing SQL queries-specific analysis of how to write high-performance SQL statements:

1. First of all, we need to understand what is an execution plan?

The execution plan is a query plan made by the database based on the statistical information of the SQL statement and related tables. This plan is automatically analyzed and generated by the query optimizer. For example, if a SQL statement is used to query 1 record from a table with 100,000 records, records, the query optimizer will choose the "index search" method. If the table is archived and there are currently only 5,000 records left, the query optimizer will change the plan and use the "full table scan" method.

It can be seen that the execution plan is not fixed, it is "personalized". There are two important points in generating a correct "execution plan":

(1) Does the SQL statement clearly tell the query optimizer what it wants to do?

(2) Is the database statistical information obtained by the query optimizer up to date and correct?

2. Unify the writing method of SQL statements

For the following two SQL statements, programmers think they are the same, but the database query optimizer thinks they are different.

select*from dual 

select*From dual

In fact, if the case is different, the query analyzer considers it to be two different SQL statements and must be parsed twice. Generate 2 execution plans. Therefore, as a programmer, you should ensure that the same query statement is consistent everywhere, even one more space will not work!

3. Don’t write SQL statements too complex

I often see that a SQL statement captured from the database is as long as 2 A4 sheets of paper when printed. Generally speaking, there are usually problems with such complex statements. I took this 2-page long SQL statement to ask the original author, but he said it took too long and he couldn't understand it for a while. It is conceivable that even the original author may be confused by the SQL statement, and the database will be confused as well.

Generally, the results of a Select statement are used as a subset, and then the query is performed from the subset. This kind of one-level nested statement is relatively common, but according to experience, if there are more than three levels of nesting, the query optimizer will easily give Wrong execution plan. Because it was stunned. Things like artificial intelligence are ultimately inferior to human resolution. If people are dizzy, I can guarantee that the database will be dizzy as well.

In addition, the execution plan can be reused, and the simpler the SQL statement, the higher the possibility of being reused. As long as one character changes in a complex SQL statement, it must be re-parsed, and then a lot of garbage will be stuffed in the memory. It is conceivable how inefficient the database will be.

4. Use "temporary table" to temporarily store intermediate results

An important way to simplify SQL statements is to use temporary tables to temporarily store intermediate results. However, the benefits of temporary tables are far more than these. Temporary results are stored in temporary tables, and subsequent queries are in tempdb, which can avoid multiple times in the program. Scanning the main table also greatly reduces "shared lock" blocking and "update lock" during program execution, reducing blocking and improving concurrency performance.

5. OLTP system SQL statements must use bind variables

select*from orderheader where changetime >'2010-10-20 00:00:01' 
select*from orderheader where changetime >'2010-09-22 00:00:01'

The above two statements are considered by the query optimizer to be different SQL statements and need to be parsed twice. If using bind variables

select*from orderheader where changetime >@chgtime

The @chgtime variable can pass in any value, so that a large number of similar queries can reuse the execution plan, which can greatly reduce the burden of the database parsing SQL statements. Parse once and reuse multiple times is the principle to improve database efficiency.

6. Bind variable snooping

There are two sides to everything, and bind variables are applicable to most OLTP processes, but there are exceptions. For example, when the field in the where condition is a "skewed field".

"Tilted field" means that most of the values in the column are the same. For example, in a census table, in the "Ethnicity" column, more than 90% of the values are Han. So if a SQL statement wants to query the population of Han people who are 30 years old, then the "ethnic" column must be placed in the where condition. At this time, there will be a big problem if you use the bind variable @nation.

Just imagine if the first value passed in by @nation is "Han", then the entire execution plan will inevitably choose table scan. Then, the second value passed in is "Buyei". It stands to reason that the proportion of "Buyi" may be only one ten thousandth, so index search should be used. However, since the execution plan of "Han" parsed for the first time is reused, the table scan method will also be used for the second time. This problem is the famous "bind variable snooping". It is recommended not to use bind variables for "skewed fields".

7. Use begin tran

only when necessary. A SQL statement in SQL Server is a transaction by default, and it is committed by default after the statement is executed. In fact, this is a minimized form of begin tran, just like a begin tran is implied at the beginning of each statement, and a commit is implied at the end.

In some cases, we need to explicitly declare begin tran. For example, when performing "insertion, deletion, and modification" operations, we need to modify several tables at the same time. It is required that either all modifications of several tables are successful or none of them are successful. begin tran can play such a role. It can execute several SQL statements together and finally commit them together. The advantage is that data consistency is guaranteed, but nothing is perfect. The price paid by Begin tran is that before submission, all resources locked by SQL statements cannot be released until they are committed.

It can be seen that if Begin tran traps too many SQL statements, the performance of the database will be poor. Before the large transaction is committed, other statements will inevitably be blocked, resulting in a lot of blocks.

The principle of using Begin tran is that on the premise of ensuring data consistency, the fewer SQL statements trapped by begin tran, the better! In some cases, triggers can be used to synchronize data, and begin tran is not necessarily used.

8. Some SQL query statements should be added with nolock

Adding nolock to the SQL statement is an important means to improve the concurrency performance of SQL Server. This is not necessary in Oracle because the structure of Oracle is more reasonable and there is an undo table space to save the "data foreshadow". If the data is being modified It has not been committed yet, so what you read is the copy before it was modified, which is placed in the undo table space. In this way, Oracle's reading and writing can be independent of each other, which is why Oracle is widely praised. SQL Server's reading and writing will block each other. In order to improve concurrency performance, nolock can be added to some queries, so that writing can be allowed during reading. However, the disadvantage is that uncommitted dirty data may be read. There are three principles for using nolock.

(1) If the query results are used for "insertion, deletion, and modification", nolock cannot be added!

(2) The queried table is one where page splits occur frequently, so use nolock with caution!

(3) Using a temporary table can also save the "data foreshadow", which has a function similar to Oracle's undo table space,

If you can use temporary tables to improve concurrency performance, do not use nolock.

9. The clustered index is not built on the sequential fields of the table, and the table is prone to page splits

For example, in the order table, there is the order number orderid and the customer number contactid. So which field should the clustered index be added to? For this table, the order numbers are added sequentially. If a clustered index is added to the orderid, the new rows will be added at the end, so that page splits will not occur frequently. However, since most queries are based on customer numbers, it only makes sense to add a clustered index to contactid. For the order table, contactid is not a sequential field.

For example, the "contactid" of "Zhang San" is 001, then the order information of "Zhang San" must be placed on the first data page of this table. If "Zhang San" places a new order today, then the order The information cannot be placed on the last page of the table, but on the first page! What if the first page is full? Sorry, all the data in this table needs to be moved back to make room for this record.

SQL Server's indexes are different from Oracle's indexes. SQL Server's clustered index actually sorts the table in the order of the clustered index fields, which is equivalent to Oracle's index-organized table. The clustered index of SQL Server is an organizational form of the table itself, so its efficiency is very high. Because of this, when a record is inserted, its location is not placed randomly, but on the data page where it should be placed in order. If there is no space on that data page, it will cause page splits. So obviously, the clustered index is not built on the sequential fields of the table, and the table is prone to page splits.

I once encountered a situation where a friend's insertion efficiency dropped significantly after re-indexing a certain table. It is estimated that the situation is probably like this. The clustered index of the table may not be built on the sequential fields of the table. The table is often archived, so the data of the table exists in a sparse state. For example, Zhang San has placed 20 orders, but there are only 5 orders in the last three months. The archiving strategy is to retain 3 months of data. Then Zhang San’s past 15 orders have been archived, leaving 15 vacancies, which can be entered in insert Repurposed as they occur. In this case, since there are free spaces available, no page split will occur. However, the query performance will be relatively low, because the query must scan those empty positions without data.

The situation changed after rebuilding the clustered index, because rebuilding the clustered index means rearranging the data in the table. The original gaps are gone, and the page fill rate is very high. Page splits often occur when inserting data, so the performance drops significantly.

For tables whose clustered indexes are not built on sequential fields, should we give a lower page fill rate? Do you want to avoid rebuilding the clustered index? It’s a question worth considering!

10、加nolock后查询经常发生页分裂的表，容易产生跳读或重复读

加nolock后可以在“插、删、改”的同时进行查询，但是由于同时发生“插、删、改”，在某些情况下，一旦该数据页满了，那么页分裂不可避免，而此时nolock的查询正在发生，比如在第100页已经读过的记录，可能会因为页分裂而分到第101页，这有可能使得nolock查询在读101页时重复读到该条数据，产生“重复读”。同理，如果在100页上的数据还没被读到就分到99页去了，那nolock查询有可能会漏过该记录，产生“跳读”。

上面提到的哥们，在加了nolock后一些操作出现报错，估计有可能因为nolock查询产生了重复读，2条相同的记录去插入别的表，当然会发生主键冲突。

11、使用like进行模糊查询时应注意

有的时候会需要进行一些模糊查询比如

select*from contact where username like ‘%yue%’

关键词%yue%，由于yue前面用到了“%”，因此该查询必然走全表扫描，除非必要，否则不要在关键词前加%，

12、数据类型的隐式转换对查询效率的影响

sql server2000的数据库，我们的程序在提交sql语句的时候，没有使用强类型提交这个字段的值，由sql server 2000自动转换数据类型，会导致传入的参数与主键字段类型不一致，这个时候sql server 2000可能就会使用全表扫描。Sql2005上没有发现这种问题，但是还是应该注意一下。

13、SQL Server 表连接的三种方式

(1) Merge Join

(2) Nested Loop Join

(3) Hash Join

SQL Server 2000只有一种join方式——Nested Loop Join，如果A结果集较小，那就默认作为外表，A中每条记录都要去B中扫描一遍，实际扫过的行数相当于A结果集行数x B结果集行数。所以如果两个结果集都很大，那Join的结果很糟糕。

SQL Server 2005新增了Merge Join，如果A表和B表的连接字段正好是聚集索引所在字段，那么表的顺序已经排好，只要两边拼上去就行了，这种join的开销相当于A表的结果集行数加上B表的结果集行数，一个是加，一个是乘，可见merge join 的效果要比Nested Loop Join好多了。

如果连接的字段上没有索引，那SQL2000的效率是相当低的，而SQL2005提供了Hash join，相当于临时给A，B表的结果集加上索引，因此SQL2005的效率比SQL2000有很大提高，我认为，这是一个重要的原因。

总结一下，在表连接时要注意以下几点：

(1) 连接字段尽量选择聚集索引所在的字段

(2) 仔细考虑where条件，尽量减小A、B表的结果集

(3) 如果很多join的连接字段都缺少索引，而你还在用SQL Server 2000，赶紧升级吧。

The above is the detailed content of In-depth understanding of optimizing SQL queries - specific analysis of how to write high-performance SQL statements. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Explain the InnoDB Buffer Pool and its importance for performance.Apr 19, 2025 am 12:24 AM

InnoDBBufferPool reduces disk I/O by caching data and indexing pages, improving database performance. Its working principle includes: 1. Data reading: Read data from BufferPool; 2. Data writing: After modifying the data, write to BufferPool and refresh it to disk regularly; 3. Cache management: Use the LRU algorithm to manage cache pages; 4. Reading mechanism: Load adjacent data pages in advance. By sizing the BufferPool and using multiple instances, database performance can be optimized.

MySQL vs. Other Programming Languages: A ComparisonApr 19, 2025 am 12:22 AM

Compared with other programming languages, MySQL is mainly used to store and manage data, while other languages such as Python, Java, and C are used for logical processing and application development. MySQL is known for its high performance, scalability and cross-platform support, suitable for data management needs, while other languages have advantages in their respective fields such as data analytics, enterprise applications, and system programming.

Learning MySQL: A Step-by-Step Guide for New UsersApr 19, 2025 am 12:19 AM

MySQL is worth learning because it is a powerful open source database management system suitable for data storage, management and analysis. 1) MySQL is a relational database that uses SQL to operate data and is suitable for structured data management. 2) The SQL language is the key to interacting with MySQL and supports CRUD operations. 3) The working principle of MySQL includes client/server architecture, storage engine and query optimizer. 4) Basic usage includes creating databases and tables, and advanced usage involves joining tables using JOIN. 5) Common errors include syntax errors and permission issues, and debugging skills include checking syntax and using EXPLAIN commands. 6) Performance optimization involves the use of indexes, optimization of SQL statements and regular maintenance of databases.

MySQL: Essential Skills for Beginners to MasterApr 18, 2025 am 12:24 AM

MySQL is suitable for beginners to learn database skills. 1. Install MySQL server and client tools. 2. Understand basic SQL queries, such as SELECT. 3. Master data operations: create tables, insert, update, and delete data. 4. Learn advanced skills: subquery and window functions. 5. Debugging and optimization: Check syntax, use indexes, avoid SELECT*, and use LIMIT.

MySQL: Structured Data and Relational DatabasesApr 18, 2025 am 12:22 AM

MySQL efficiently manages structured data through table structure and SQL query, and implements inter-table relationships through foreign keys. 1. Define the data format and type when creating a table. 2. Use foreign keys to establish relationships between tables. 3. Improve performance through indexing and query optimization. 4. Regularly backup and monitor databases to ensure data security and performance optimization.

MySQL: Key Features and Capabilities ExplainedApr 18, 2025 am 12:17 AM

MySQL is an open source relational database management system that is widely used in Web development. Its key features include: 1. Supports multiple storage engines, such as InnoDB and MyISAM, suitable for different scenarios; 2. Provides master-slave replication functions to facilitate load balancing and data backup; 3. Improve query efficiency through query optimization and index use.

The Purpose of SQL: Interacting with MySQL DatabasesApr 18, 2025 am 12:12 AM

SQL is used to interact with MySQL database to realize data addition, deletion, modification, inspection and database design. 1) SQL performs data operations through SELECT, INSERT, UPDATE, DELETE statements; 2) Use CREATE, ALTER, DROP statements for database design and management; 3) Complex queries and data analysis are implemented through SQL to improve business decision-making efficiency.

MySQL for Beginners: Getting Started with Database ManagementApr 18, 2025 am 12:10 AM

The basic operations of MySQL include creating databases, tables, and using SQL to perform CRUD operations on data. 1. Create a database: CREATEDATABASEmy_first_db; 2. Create a table: CREATETABLEbooks(idINTAUTO_INCREMENTPRIMARYKEY, titleVARCHAR(100)NOTNULL, authorVARCHAR(100)NOTNULL, published_yearINT); 3. Insert data: INSERTINTObooks(title, author, published_year)VA

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks agoByDDD

Hot Tools

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

Hot Topics

Where is the login entrance for gmail email?

7605

CakePHP Tutorial

1387

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

132