Home >headlines >Tips for speeding up SQL queries to prevent database code from locking up the database

Tips for speeding up SQL queries to prevent database code from locking up the database

-
-Original
2018-03-09 09:15:511999browse

Tips for speeding up SQL queries to prevent database code from locking up the database

Because the database field is still relatively immature, SQL developers on every platform struggle and make the same mistakes over and over again. Of course, database vendors are making some progress and continue to grapple with larger issues.

Whether SQL developers write code on SQL Server, Oracle, DB2, Sybase, MySQL, or any other relational database platform, concurrency, resource management, space management, and runtime speed still plague them.

One aspect of the problem is that there is no magic bullet; I can point to at least one exception to almost every best practice.

We say that tuning a database is both an art and a science, which makes sense because there are few hard and fast rules that apply across the board. A problem you solve on one system is not a problem on another, and vice versa.

When it comes to tuning queries, there’s no right answer, but that doesn’t mean you should give up. You can follow some of the following principles and hopefully get great results.

Don’t use UPDATE instead of CASE

This problem is very common, but it is difficult to detect. Many developers often ignore this problem because using UPDATE is natural and seems logical.

Take this scenario as an example: you insert data into a temporary table, and if another value exists, you need it to display a certain value.

Perhaps you extract records from the Customer table and want to mark customers with orders over $100,000 as "Preferred."

So you insert data into the table, run an UPDATE statement, and set the CustomerRank column to "Preferred" for any customer whose order amount exceeds $100,000.

The problem is that the UPDATE statement is logged, which means that each time it is written to the table, it must be written twice.

Solution: Use an inline CASE statement in the SQL query. This checks the order amount condition of each row and sets the "Preferred" tag before writing it to the table. This improves processing performance astonishingly. .

Don’t blindly reuse code

This problem is also very common, it is easy for us to copy code written by others because you know it can get the data you need.

The problem is that it often takes too much data that you don't need, and developers rarely streamline it, so you end up with a lot of data.

This usually manifests itself as an extra outer join or extra condition in the WHERE clause. If you streamline the reused code to your exact requirements, you can significantly improve performance.

Extract as many columns as you need

This question is similar to the second question, but it is unique to the columns. It's easy to code all queries using SELECT* instead of listing the columns one by one.

The problem is also that it extracts too much data that you don’t need. I have seen this mistake countless times. A developer performs a SELECT* query on a 120-column table with millions of rows, but only uses three or five of the columns.

So you're dealing with a lot more data than you actually need, and it's a miracle that the query returns results. Not only are you processing too much data that you don't need, you're also taking away resources from other processes.

Don’t query twice (double-dip)

This is another mistake I see a lot of people making: writing a stored procedure to extract data from a table with hundreds of millions of rows.

The developer wanted to extract information about customers who lived in California and made more than $40,000 a year. So, he queries customers who live in California and puts the query results into a temporary table.

Then query the customers whose annual income is higher than $40,000 and put those results into another temporary table. Finally he joins the two tables to get the final result.

Are you kidding me? This should be done with a single query, instead you query a very large table twice. Don't be stupid: try to query a large table only once, and you will find that the stored procedure executes much faster.

A slightly different scenario is when several steps of a process require a subset of a large table, resulting in the large table being queried each time.

To avoid this problem, just query this subset, persist it elsewhere, and then direct subsequent steps to this smaller data set.

Know when to use temporary tables

This problem is a little more troublesome to solve, but the effect is significant. Temporary tables can be used in many situations, such as to prevent a large table from being queried twice. Temporary tables can also be used to significantly reduce the processing power required to join large tables.

If you have to join a table to a large table, and there are conditions on the large table, just extract the required part of the data from the large table into a temporary table, and then join with the temporary table , which can improve query performance.

This is also helpful if there are several queries in the stored procedure that need to perform similar joins to the same table.

Pre-stage data

This is one of my favorite topics to talk about because it is an old method that is often overlooked.

If you have a report or stored procedure (or a group of them) that performs similar join operations on large tables, pre-stage the data by joining the tables ahead of time and persisting them into a table, It can be of great help to you.

Now, reports can be run against this pre-staging table, avoiding large joins. You won't always be able to use this method, but once you do, you'll find it's a great way to save server resources.

Please note: Many developers get around this join issue by focusing on the query itself, creating a read-only view based on the join, so that they don't have to type the join conditions again and again.

But the problem with this approach is that you still have to run the query for every report that requires it. If you pre-stage the data, you only have to run the connection once (say 10 minutes before the report) and others can avoid the large connection.

You don't know how much I like this trick. In most environments, some commonly used tables are always connected, so there is no reason why they can't be pre-staged first.

BULK DELETE AND UPDATE

This is another often overlooked tip, deleting or updating large amounts of data from a large table can be a nightmare if you don't do it correctly.

The problem is that both statements are run as a single transaction. If you need to terminate them, or the system encounters a problem while they are executing, the system must roll back the entire transaction, which takes a long time.

These operations will also block other transactions during the duration, which actually brings a bottleneck to the system. The solution is to delete or update in small batches.

This solves the problem in several ways:

No matter what reason the transaction is terminated, it only has a small number of rows that need to be rolled back, so the database returns online much faster.

When small-batch transactions are committed to disk, other transactions can come in to handle some work, thus greatly improving concurrency.

Similarly, many developers have always stubbornly believed that these delete and update operations must be completed on the same day. This is not always the case, especially if you are archiving.

If you need to extend the operation, you can do so. Small batches help achieve this; if you take longer to perform these intensive operations, do not slow down the system.

Use temporary tables to improve cursor performance

It is best to avoid cursors if possible. Cursors not only have speed issues, which themselves are a big problem for many operations, but they can also cause your operations to block other operations for a long time, which greatly reduces the concurrency of the system.

However, the use of cursors cannot always be avoided. When the use of cursors cannot be avoided, you can instead perform cursor operations on temporary tables to get rid of the performance problems caused by cursors.

As an example, consider checking a table and updating the cursor of several columns based on some comparison results. You might be able to put that data into a temporary table and compare against the temporary table instead of the active table.

You can then run a single UPDATE statement against a much smaller, shorter-locked active table.

Performing such data modification can greatly improve concurrency. I'll end by saying that you don't need to use cursors at all, there is always a collection based solution.

Using table-valued functions

This is one of my all-time favorite techniques because it's the kind of secret that only experts know.

When using a scalar function in the SELECT list of a query, the function is called for each row in the result set, which can significantly reduce the performance of large queries.

However, you can convert the scalar function into a table-valued function and then use CROSS APPLY in the query, which can greatly improve performance. This wonderful trick can significantly improve performance.

Don’t perform large operations on many tables in the same batch

This may seem obvious, but it’s not. I'll use another vivid example because it illustrates the point better.

I have a system with a lot of blocking and many operations are at a standstill. It was found that a delete routine that was run several times a day was deleting data from 14 tables in explicit transactions. Processing all 14 tables in one transaction means locking each table until all deletes are complete.

The solution is to break down the deletion of each table into separate transactions so that each deletion transaction only locks one table.

This frees up other tables and relieves blocking, allowing other operations to continue running. You should always break large transactions like this into separate smaller transactions to prevent blocking.

Don’t use triggers

This one is roughly the same as the previous one, but it’s still worth mentioning. The problem with triggers: whatever you want the trigger to do will be done in the same transaction as the original operation.

If you write a trigger that inserts data into another table while updating a row in the Orders table, both tables will be locked until the trigger completes.

If you need to insert data into another table after updating, put the update and insert into a stored procedure and execute it in separate transactions.

If you need to roll back, it's easy to roll back without having to lock both tables at the same time. As always, keep transactions short and don't lock multiple resources at a time.

Don’t cluster on GUIDs

After all these years, I can’t believe we’re still struggling with this problem. But I still encounter clustered GUIDs at least twice a year.

A GUID (Globally Unique Identifier) ​​is a 16-byte randomly generated number. Sorting the data in your table by this column results in table fragmentation much faster than using a steadily increasing value such as DATE or IDENTITY.

A few years ago I did a benchmark where I inserted a bunch of data into a table with a clustered GUID and the same data into another table with an IDENTITY column.

The GUID table was extremely fragmented, and after just 15 minutes, performance dropped by thousands of percentage points.

After 5 hours, the performance of the IDENTITY table only dropped a few percent, and this doesn't just apply to GUIDs, it applies to any volatile column.

If you just want to check whether the data exists, don’t count the rows

This situation is very common. You need to check whether the data exists in the table. Based on the results of this check, you have to perform a certain operations.

I often see people executing SELECT COUNT(*) FROMdbo.T1 to check whether the data exists:

  1. SET @CT=(SELECT COUNT(*) FROM

  2. dbo.T1);

  3. ## If@CT>0

  4. BEGIN

  5. END

This is completely unnecessary if you want to check if the data exists, just do this:

  1. If EXISTS (SELECT 1 FROM dbo.T1)

  2. BEGIN

  3. ## END
  4. Don't count everything in the table, just get back the first row you find. SQL Server is smart enough to use EXISTS correctly, and the second piece of code returns results super fast.

The larger the table, the more obvious the gap in this aspect. Do the right thing before your data gets too big. It's never too early to tune your database.

Actually, I just ran this example on one of my production databases, against a table with 270 million rows.

The first query took 15 seconds and contained 456197 logical reads. The second query returned the results in less than 1 second and contained only 5 logical reads.

However, if you really need to count the number of rows in the table, and the table is large, another method is to extract from the system table,

SELECT rows fromsysindexes will get you all the indexed rows number.

And since the clustering index represents the data itself, you can get the table rows by just adding WHERE indid = 1, and then just include the table name.

So, the final query is:

1.SELECT rows from sysindexes where object_name(id)='T1'and indexid =1


In my 2.7 In a table with 100 million rows, the results are returned in less than 1 second, and there are only 6 logical reads. Now the performance is different.

Do not perform reverse search

Take the simple query SELECT * FROMCustomers WHERE RegionID 3 as an example. You cannot use an index with this query because it is a reverse search that requires row-by-row comparison with the help of a table scan. If you need to perform such a task, you may find that performance is much better if you rewrite the query to use an index.

The query is easily rewritten, like this:

1.SELECT * FROM Customers WHERE RegionID


This query will use an index, so if your data set is large, its performance will be much better than the table scan version.

Of course, nothing is that easy, and maybe the performance is worse, so try it before using it. It works 100%, although there are so many factors involved.

Finally, I realized that this query violated rule number 4: don't query twice, but it also goes to show that there are no hard and fast rules. Although we query twice here, we do this to avoid expensive table scans.

You won’t be able to use all of these tips all the time, but if you keep them in mind, you’ll one day use them to solve some big problems.

The most important thing to remember is, don’t take what I say as dogma. Try it out in your actual environment. The same solutions won't work in every case, but they are what I use all the time when troubleshooting poor performance and they always work.

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn