Home >System Tutorial >LINUX >The Ultimate Guide - How to Write Better SQL Queries?

The Ultimate Guide - How to Write Better SQL Queries?

王林
王林forward
2024-01-12 12:15:04460browse
Queries based on collection and program methods

Implicit in the reverse model is the fact that there is a difference between set-based and program-based approaches to query building.

  • The procedural approach to querying is one very similar to programming: you tell the system what needs to be done and how to do it. For example, as in the example in the previous article, query the database by executing one function and then calling another function, or use a logical approach involving loops, conditions, and user-defined functions (UDFs) to obtain the final query results. You will find that in this way, you are always requesting a subset of the data in each layer. This approach is also often referred to as step-by-step or row-by-row querying.
  • The other is a collection-based method, where you only need to specify the operations that need to be performed. What you have to do with this method is specify the conditions and requirements for the results you want to obtain through the query. When retrieving data, you don't need to pay attention to the internal mechanisms that implement the query: the database engine determines the best algorithm and logic to execute the query.

Since SQL is set-based, this approach is more efficient than the procedural approach, which explains why in some cases, SQL can work faster than code.

Set-based query methods are also skills that the data mining analysis industry requires you to master! Because you need to be skilled in switching between these two methods. If you find that you have procedural queries in your queries, you should consider whether this part needs to be rewritten.

The Ultimate Guide - How to Write Better SQL Queries?

From query to execution plan

Reverse mode is not static. As you progress towards becoming a SQL developer, avoiding query reverse models and rewriting queries can be a daunting task. So you often need to use tools to optimize your queries in a more structured way.

Thinking about performance requires not only a more structured approach, but also a deeper approach.

However, this structured and in-depth approach is primarily based on query plans. The query plan is first parsed into a "parse tree" and defines exactly what algorithm is used for each operation and how the operations are coordinated.

Query Optimization

When optimizing a query, you will most likely need to manually inspect the plan generated by the optimizer. In this case, you will need to analyze your query again by looking at the query plan.

To master such a query plan, you need to use some tools provided by the database management system. You can use some of the following tools:

  • Some software package functionality tools can generate graphical representations of query plans.
  • Other tools can provide you with a text description of the query plan.

Note that if you are using PostgreSQL, you can differentiate between different EXPLAINs, you just get a description of how the planner executes the query without running the plan. At the same time, EXPLAIN ANALYZE will execute the query and return you an analysis report that evaluates the query plan and the actual query plan. Generally speaking, the actual execution plan will actually execute the plan, while the evaluated execution plan can solve this problem without executing the query. Logically, the actual execution plan is more useful because it contains additional details and statistics about what actually happened when the query was executed.

Next you will learn more about XPLAIN and ANALYZE, and how to use these two commands to further understand your query plans and query performance. To do this, you need to start doing some examples using two tables: one_million and half_million.

You can use EXPLAIN to retrieve the current information of the one_million table: make sure you put it in the first place when running the query, and after the run is completed, it will be returned to the query plan:

EXPLAIN
SELECT *
FROM one_million;
QUERY PLAN
<span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-emphasis">___</span>_
Seq Scan on one_million
(cost=0.00..18584.82 rows=1025082 width=36)
(1 row)

In the above example, we see that the cost of the query is 0.00..18584.82, the number of rows is 1025082, and the column width is 36.

At the same time, you can also use ANALYZE to update statistical information.

ANALYZE one_million;
EXPLAIN
SELECT *
FROM one_million;
QUERY PLAN
<span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-emphasis">___</span>_
Seq Scan on one_million
(cost=0.00..18334.00 rows=1000000 width=37)
(1 row)

In addition to EXPLAIN and ANALYZE, you can also use EXPLAIN ANALYZE to retrieve the actual execution time:

EXPLAIN ANALYZE
SELECT *
FROM one_million;
QUERY PLAN
<span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span>_
Seq Scan on one_million
(cost=0.00..18334.00 rows=1000000 width=37)
(actual time=0.015..1207.019 rows=1000000 loops=1)
Total runtime: 2320.146 ms
(2 rows)

The disadvantage of using EXPLAIN ANALYZE is that you need to actually execute the query, which is worth noting!

All the algorithms we have seen so far are sequential scans or full table scans: this is a method of performing a scan on a database in which each row of the table is scanned in sequential (serial) order When reading, each column is checked to see if it meets the criteria. In terms of performance, a sequential scan is not the best execution plan because the entire table needs to be scanned. But if you use a slow disk, sequential reads will also be fast.

There are also some examples of other algorithms:

EXPLAIN ANALYZE
SELECT *
FROM one<span class="hljs-emphasis">_million JOIN half_</span>million
ON (one<span class="hljs-emphasis">_million.counter=half_</span>million.counter);
QUERY PLAN
<span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span>_
Hash Join (cost=15417.00..68831.00 rows=500000 width=42)
(actual time=1241.471..5912.553 rows=500000 loops=1)
Hash Cond: (one<span class="hljs-emphasis">_million.counter = half_</span>million.counter)
<span class="hljs-code">    -> Seq Scan on one_million</span>
<span class="hljs-code">    (cost=0.00..18334.00 rows=1000000 width=37)</span>
<span class="hljs-code">    (actual time=0.007..1254.027 rows=1000000 loops=1)</span>
<span class="hljs-code">    -> Hash (cost=7213.00..7213.00 rows=500000 width=5)</span>
<span class="hljs-code">    (actual time=1241.251..1241.251 rows=500000 loops=1)</span>
<span class="hljs-code">    Buckets: 4096 Batches: 16 Memory Usage: 770kB</span>
<span class="hljs-code">    -> Seq Scan on half_million</span>
<span class="hljs-code">    (cost=0.00..7213.00 rows=500000 width=5)</span>
(actual time=0.008..601.128 rows=500000 loops=1)
Total runtime: 6468.337 ms

We can see that the query optimizer selected Hash Join. Remember this operation because we need to use this to evaluate the time complexity of the query. We noticed that there is no half_million.counter index in the above example, we can add the index in the following example:

<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">INDEX</span> <span class="hljs-keyword">ON</span> half_million(counter);
<span class="hljs-keyword">EXPLAIN</span> <span class="hljs-keyword">ANALYZE</span>
<span class="hljs-keyword">SELECT</span> *
<span class="hljs-keyword">FROM</span> one_million <span class="hljs-keyword">JOIN</span> half_million
<span class="hljs-keyword">ON</span> (one_million.counter=half_million.counter);
QUERY PLAN
______________________________________________________________
<span class="hljs-keyword">Merge</span> <span class="hljs-keyword">Join</span> (<span class="hljs-keyword">cost</span>=<span class="hljs-number">4.12</span>.<span class="hljs-number">.37650</span><span class="hljs-number">.65</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500000</span> width=<span class="hljs-number">42</span>)
(actual <span class="hljs-keyword">time</span>=<span class="hljs-number">0.033</span>.<span class="hljs-number">.3272</span><span class="hljs-number">.940</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500000</span> loops=<span class="hljs-number">1</span>)
<span class="hljs-keyword">Merge</span> Cond: (one_million.counter = half_million.counter)
    -> <span class="hljs-keyword">Index</span> <span class="hljs-keyword">Scan</span> <span class="hljs-keyword">using</span> one_million_counter_idx <span class="hljs-keyword">on</span> one_million
    (<span class="hljs-keyword">cost</span>=<span class="hljs-number">0.00</span>.<span class="hljs-number">.32129</span><span class="hljs-number">.34</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">1000000</span> width=<span class="hljs-number">37</span>)
    (actual <span class="hljs-keyword">time</span>=<span class="hljs-number">0.011</span>.<span class="hljs-number">.694</span><span class="hljs-number">.466</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500001</span> loops=<span class="hljs-number">1</span>)
    -> <span class="hljs-keyword">Index</span> <span class="hljs-keyword">Scan</span> <span class="hljs-keyword">using</span> half_million_counter_idx <span class="hljs-keyword">on</span> half_million
    (<span class="hljs-keyword">cost</span>=<span class="hljs-number">0.00</span>.<span class="hljs-number">.14120</span><span class="hljs-number">.29</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500000</span> width=<span class="hljs-number">5</span>)
(actual <span class="hljs-keyword">time</span>=<span class="hljs-number">0.010</span>.<span class="hljs-number">.683</span><span class="hljs-number">.674</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500000</span> loops=<span class="hljs-number">1</span>)
Total runtime: <span class="hljs-number">3833.310</span> ms
(<span class="hljs-number">5</span> <span class="hljs-keyword">rows</span>)

By creating the index, the query optimizer has decided how to find the Merge join when the index is scanned.

Please note the difference between index scan and full table scan (sequential scan): the latter (also called "table scan") finds suitable results by scanning all data or indexing all pages, while the former Scan only every row in the table.

The second part of the tutorial is introduced here. The final article in the series "How to Write Better SQL Queries" will follow, so stay tuned.

Please indicate the source of reprinting: Grape City Control

The above is the detailed content of The Ultimate Guide - How to Write Better SQL Queries?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:linuxprobe.com. If there is any infringement, please contact admin@php.cn delete