Home >Database >Mysql Tutorial >How to write high-performance sql

How to write high-performance sql

(*-*)浩
(*-*)浩Original
2019-05-09 09:34:422125browse

First of all, we need to understand what an execution plan is?

The execution plan is a query plan made by the database based on the statistical information of the SQL statement and related tables. This plan is automatically analyzed by the query optimizer. For example, if a SQL statement is used to extract 100,000 records To search for 1 record in the table, the query optimizer will choose the "index search" method. If the table is archived and there are only 5,000 records left, the query optimizer will change the plan and use "full table scan" "Way.

It can be seen that the execution plan is not fixed, it is "personalized". There are two important points in generating a correct "execution plan":

Does the SQL statement clearly tell the query optimizer what it wants to do?

Is the database statistics obtained by the query optimizer up to date and correct?

Recommended courses: MySQL Tutorial.

How to write high-performance sql

Unified way of writing SQL statements

For the following two SQL statements, programmers think they are the same, database query optimizer Think it's different.

select*from dual 
select*From dual

In fact, the case is different. The query analyzer considers it to be two different SQL statements and must be parsed twice. Generate 2 execution plans. Therefore, as a programmer, you should ensure that the same query statement is consistent everywhere, even one more space will not work!

Don’t write the SQL statement too complicated

I often see that a SQL statement captured from the database is printed out with 2 A4 sheets of paper So long. Generally speaking, there are usually problems with such complex statements. I took this 2-page long SQL statement to ask the original author, but he said it took too long and he couldn't understand it for a while. It is conceivable that even the original author may be confused by the SQL statement, and the database will be confused as well.

Generally, the results of a Select statement are used as a subset, and then the query is performed from the subset. This kind of one-level nested statement is relatively common, but according to experience, more than three levels of nested statements are not allowed. set, the query optimizer can easily give wrong execution plans. Because it was stunned. Things like artificial intelligence are ultimately inferior to human resolution. If people are dizzy, I can guarantee that the database will be dizzy as well.

In addition, the execution plan can be reused. The simpler the SQL statement, the higher the possibility of being reused. As long as one character changes in a complex SQL statement, it must be re-parsed, and then a lot of garbage will be stuffed in the memory. It is conceivable how inefficient the database will be.

Use "temporary table" to temporarily store intermediate results

An important way to simplify SQL statements is to use temporary tables to temporarily store intermediate results. However, The benefits of temporary tables are far more than these. Temporary results are temporarily stored in the temporary table, and subsequent queries are in tempdb. This can avoid multiple scans of the main table in the program, and also greatly reduces "shared lock" blocking during program execution. Update Lock", which reduces blocking and improves concurrency performance.

OLTP system SQL statements must use bind variables

select*from orderheader where changetime >'2010-10-20 00:00:01' 
select*from orderheader where changetime >'2010-09-22 00:00:01'

The query optimizer considers the above two statements to be different SQL statements and needs to be parsed twice. If you use a bind variable

select*from orderheader where changetime >@chgtime

@chgtime variable, you can pass in any value, so that a large number of similar queries can reuse the execution plan, which can greatly reduce the burden of parsing SQL statements on the database. Parse once and reuse multiple times is the principle to improve database efficiency.

Bind variable peek

Everything has two sides, bind variables are applicable to most OLTP processing, but there are exceptions. For example, when the field in the where condition is a "skewed field".

"Tilted field" means that most of the values ​​in the column are the same. For example, in a census form, in the "Ethnicity" column, more than 90% are Han. So if a SQL statement wants to query the population of Han people who are 30 years old, then the "ethnic" column must be placed in the where condition. At this time, there will be a big problem if you use the bind variable @nation.

Just imagine if the first value passed in by @nation is "Han", then the entire execution plan will inevitably choose table scan. Then, the second value passed in is "Buyei". It stands to reason that the proportion of "Buyi" may be only one ten thousandth, so index search should be used. However, since the execution plan of "Han" parsed for the first time is reused, the table scan method will also be used for the second time. This problem is the famous "bind variable snooping". It is recommended not to use bind variables for "skewed fields".

Use begin tran only when necessary

A SQL statement in SQL Server is a transaction by default, and it is committed by default after the statement is executed. In fact, this is a minimized form of begin tran, just like a begin tran is implied at the beginning of each statement, and a commit is implied at the end.

In some cases, we need to explicitly declare begin tran. For example, when performing "insert, delete, and modify" operations, we need to modify several tables at the same time. It is required that either all modifications of several tables are successful or none of them are successful. begin tran can play such a role. It can execute several SQL statements together and finally commit them together. The advantage is that data consistency is guaranteed, but nothing is perfect. The price paid by Begin tran is that before submission, all resources locked by SQL statements cannot be released until they are committed.

It can be seen that if Begin tran traps too many SQL statements, the performance of the database will be terrible. Before the large transaction is committed, other statements will inevitably be blocked, resulting in a lot of blocks.

The principle of using Begin tran is that on the premise of ensuring data consistency, the fewer SQL statements trapped by begin tran, the better! In some cases, triggers can be used to synchronize data, and begin tran is not necessarily used.

Some SQL query statements should be added with nolock

Adding nolock to the SQL statement is an important means to improve the concurrent performance of SQL Server. This is not required in Oracle , because the structure of Oracle is more reasonable, and there is an undo table space to save the "previous data". If the data has not been committed during modification, then what you read is the copy before it was modified, and the copy is placed in the undo table space. . In this way, Oracle's reading and writing can be independent of each other, which is why Oracle is widely praised. SQL Server's reading and writing will block each other. In order to improve concurrency performance, nolock can be added to some queries, so that writing can be allowed during reading. However, the disadvantage is that uncommitted dirty data may be read. There are three principles for using nolock.

(1) If the query results are used for "insertion, deletion, and modification", nolock cannot be added!

(2) The queried table is one where page splits occur frequently, so use nolock with caution!

(3) Using a temporary table can also save the "data foreshadow", which has a function similar to Oracle's undo table space.

If you can use a temporary table to improve concurrency performance, do not use nolock .

The clustered index is not built on the sequence field of the table, and the table is prone to page splits

For example, in the order table, there is the order number orderid and the customer number contactid, then On which field should the clustered index be added? For this table, the order numbers are added sequentially. If a clustered index is added to the orderid, the new rows will be added at the end, so that page splits will not occur frequently. However, since most queries are based on customer IDs, it only makes sense to add a clustered index to contactid. For the order table, contactid is not a sequential field.

For example, the "contactid" of "Zhang San" is 001, then the order information of "Zhang San" must be placed on the first data page of this table. If "Zhang San" places a new order today An order, then the order information cannot be placed on the last page of the table, but on the first page! What if the first page is full? Sorry, all the data in this table needs to be moved back to make room for this record.

SQL Server's index is different from Oracle's index. SQL Server's clustered index actually sorts the table in the order of the clustered index fields, which is equivalent to Oracle's index-organized table. The clustered index of SQL Server is an organizational form of the table itself, so its efficiency is very high. Because of this, when a record is inserted, its position is not placed randomly, but on the data page where it should be placed in order. If there is no space on that data page, it will cause page splits. So obviously, the clustered index is not built on the sequential fields of the table, and the table is prone to page splits.

I once encountered a situation where a friend’s insertion efficiency dropped significantly after re-indexing a certain table. It is estimated that the situation is probably like this. The clustered index of the table may not be built on the sequential fields of the table. The table is often archived, so the data of the table exists in a sparse state. For example, Zhang San has placed 20 orders, but there are only 5 orders in the last three months. The archiving strategy is to retain 3 months of data. Then Zhang San’s past 15 orders have been archived, leaving 15 vacancies, which can be entered in insert Repurposed as they occur. In this case, since there are free spaces available, no page split will occur. However, the query performance will be relatively low, because the query must scan those empty spaces without data.

The situation changed after rebuilding the clustered index, because rebuilding the clustered index means rearranging the data in the table. The original vacancies are gone, and the fill rate of the page is very high. When inserting data, page splits often occur. , so the performance drops significantly.

For tables whose clustered indexes are not built on sequential fields, should we give a relatively low page fill rate? Do you want to avoid rebuilding the clustered index? It’s a question worth considering!

After adding nolock, querying tables where page splits often occur may easily cause skipped or repeated reads

加nolock后可以在“插、删、改”的同时进行查询,但是由于同时发生“插、删、改”,在某些情况下,一旦该数据页满了,那么页分裂不可避免,而此时nolock的查询正在发生,比如在第100页已经读过的记录,可能会因为页分裂而分到第101页,这有可能使得nolock查询在读101页时重复读到该条数据,产生“重复读”。同理,如果在100页上的数据还没被读到就分到99页去了,那nolock查询有可能会漏过该记录,产生“跳读”。

上面提到的哥们,在加了nolock后一些操作出现报错,估计有可能因为nolock查询产生了重复读,2条相同的记录去插入别的表,当然会发生主键冲突。

使用like进行模糊查询时应注意

有的时候会需要进行一些模糊查询比如

select*from contact where username like ‘%yue%’

关键词%yue%,由于yue前面用到了“%”,因此该查询必然走全表扫描,除非必要,否则不要在关键词前加%,

数据类型的隐式转换对查询效率的影响

sql server2000的数据库,我们的程序在提交sql语句的时候,没有使用强类型提交这个字段的值,由sql server 2000自动转换数据类型,会导致传入的参数与主键字段类型不一致,这个时候sql server 2000可能就会使用全表扫描。Sql2005上没有发现这种问题,但是还是应该注意一下。

The above is the detailed content of How to write high-performance sql. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn