Home >System Tutorial >LINUX >The Ultimate Guide - How to Write Better SQL Queries?
Implicit in the reverse model is the fact that there is a difference between set-based and program-based approaches to query building.
Since SQL is set-based, this approach is more efficient than the procedural approach, which explains why in some cases, SQL can work faster than code.
Set-based query methods are also skills that the data mining analysis industry requires you to master! Because you need to be skilled in switching between these two methods. If you find that you have procedural queries in your queries, you should consider whether this part needs to be rewritten.
From query to execution planReverse mode is not static. As you progress towards becoming a SQL developer, avoiding query reverse models and rewriting queries can be a daunting task. So you often need to use tools to optimize your queries in a more structured way.
Thinking about performance requires not only a more structured approach, but also a deeper approach.
However, this structured and in-depth approach is primarily based on query plans. The query plan is first parsed into a "parse tree" and defines exactly what algorithm is used for each operation and how the operations are coordinated.
Query OptimizationWhen optimizing a query, you will most likely need to manually inspect the plan generated by the optimizer. In this case, you will need to analyze your query again by looking at the query plan.
To master such a query plan, you need to use some tools provided by the database management system. You can use some of the following tools:
Note that if you are using PostgreSQL, you can differentiate between different EXPLAINs, you just get a description of how the planner executes the query without running the plan. At the same time, EXPLAIN ANALYZE will execute the query and return you an analysis report that evaluates the query plan and the actual query plan. Generally speaking, the actual execution plan will actually execute the plan, while the evaluated execution plan can solve this problem without executing the query. Logically, the actual execution plan is more useful because it contains additional details and statistics about what actually happened when the query was executed.
Next you will learn more about XPLAIN and ANALYZE, and how to use these two commands to further understand your query plans and query performance. To do this, you need to start doing some examples using two tables: one_million and half_million.
You can use EXPLAIN to retrieve the current information of the one_million table: make sure you put it in the first place when running the query, and after the run is completed, it will be returned to the query plan:
EXPLAIN SELECT * FROM one_million; QUERY PLAN <span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-emphasis">___</span>_ Seq Scan on one_million (cost=0.00..18584.82 rows=1025082 width=36) (1 row)
In the above example, we see that the cost of the query is 0.00..18584.82, the number of rows is 1025082, and the column width is 36.
At the same time, you can also use ANALYZE to update statistical information.
ANALYZE one_million; EXPLAIN SELECT * FROM one_million; QUERY PLAN <span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-emphasis">___</span>_ Seq Scan on one_million (cost=0.00..18334.00 rows=1000000 width=37) (1 row)
In addition to EXPLAIN and ANALYZE, you can also use EXPLAIN ANALYZE to retrieve the actual execution time:
EXPLAIN ANALYZE SELECT * FROM one_million; QUERY PLAN <span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span>_ Seq Scan on one_million (cost=0.00..18334.00 rows=1000000 width=37) (actual time=0.015..1207.019 rows=1000000 loops=1) Total runtime: 2320.146 ms (2 rows)
The disadvantage of using EXPLAIN ANALYZE is that you need to actually execute the query, which is worth noting!
All the algorithms we have seen so far are sequential scans or full table scans: this is a method of performing a scan on a database in which each row of the table is scanned in sequential (serial) order When reading, each column is checked to see if it meets the criteria. In terms of performance, a sequential scan is not the best execution plan because the entire table needs to be scanned. But if you use a slow disk, sequential reads will also be fast.
There are also some examples of other algorithms:
EXPLAIN ANALYZE SELECT * FROM one<span class="hljs-emphasis">_million JOIN half_</span>million ON (one<span class="hljs-emphasis">_million.counter=half_</span>million.counter); QUERY PLAN <span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span><span class="hljs-strong">_____</span>_ Hash Join (cost=15417.00..68831.00 rows=500000 width=42) (actual time=1241.471..5912.553 rows=500000 loops=1) Hash Cond: (one<span class="hljs-emphasis">_million.counter = half_</span>million.counter) <span class="hljs-code"> -> Seq Scan on one_million</span> <span class="hljs-code"> (cost=0.00..18334.00 rows=1000000 width=37)</span> <span class="hljs-code"> (actual time=0.007..1254.027 rows=1000000 loops=1)</span> <span class="hljs-code"> -> Hash (cost=7213.00..7213.00 rows=500000 width=5)</span> <span class="hljs-code"> (actual time=1241.251..1241.251 rows=500000 loops=1)</span> <span class="hljs-code"> Buckets: 4096 Batches: 16 Memory Usage: 770kB</span> <span class="hljs-code"> -> Seq Scan on half_million</span> <span class="hljs-code"> (cost=0.00..7213.00 rows=500000 width=5)</span> (actual time=0.008..601.128 rows=500000 loops=1) Total runtime: 6468.337 ms
We can see that the query optimizer selected Hash Join. Remember this operation because we need to use this to evaluate the time complexity of the query. We noticed that there is no half_million.counter index in the above example, we can add the index in the following example:
<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">INDEX</span> <span class="hljs-keyword">ON</span> half_million(counter); <span class="hljs-keyword">EXPLAIN</span> <span class="hljs-keyword">ANALYZE</span> <span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> one_million <span class="hljs-keyword">JOIN</span> half_million <span class="hljs-keyword">ON</span> (one_million.counter=half_million.counter); QUERY PLAN ______________________________________________________________ <span class="hljs-keyword">Merge</span> <span class="hljs-keyword">Join</span> (<span class="hljs-keyword">cost</span>=<span class="hljs-number">4.12</span>.<span class="hljs-number">.37650</span><span class="hljs-number">.65</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500000</span> width=<span class="hljs-number">42</span>) (actual <span class="hljs-keyword">time</span>=<span class="hljs-number">0.033</span>.<span class="hljs-number">.3272</span><span class="hljs-number">.940</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500000</span> loops=<span class="hljs-number">1</span>) <span class="hljs-keyword">Merge</span> Cond: (one_million.counter = half_million.counter) -> <span class="hljs-keyword">Index</span> <span class="hljs-keyword">Scan</span> <span class="hljs-keyword">using</span> one_million_counter_idx <span class="hljs-keyword">on</span> one_million (<span class="hljs-keyword">cost</span>=<span class="hljs-number">0.00</span>.<span class="hljs-number">.32129</span><span class="hljs-number">.34</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">1000000</span> width=<span class="hljs-number">37</span>) (actual <span class="hljs-keyword">time</span>=<span class="hljs-number">0.011</span>.<span class="hljs-number">.694</span><span class="hljs-number">.466</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500001</span> loops=<span class="hljs-number">1</span>) -> <span class="hljs-keyword">Index</span> <span class="hljs-keyword">Scan</span> <span class="hljs-keyword">using</span> half_million_counter_idx <span class="hljs-keyword">on</span> half_million (<span class="hljs-keyword">cost</span>=<span class="hljs-number">0.00</span>.<span class="hljs-number">.14120</span><span class="hljs-number">.29</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500000</span> width=<span class="hljs-number">5</span>) (actual <span class="hljs-keyword">time</span>=<span class="hljs-number">0.010</span>.<span class="hljs-number">.683</span><span class="hljs-number">.674</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">500000</span> loops=<span class="hljs-number">1</span>) Total runtime: <span class="hljs-number">3833.310</span> ms (<span class="hljs-number">5</span> <span class="hljs-keyword">rows</span>)
By creating the index, the query optimizer has decided how to find the Merge join when the index is scanned.
Please note the difference between index scan and full table scan (sequential scan): the latter (also called "table scan") finds suitable results by scanning all data or indexing all pages, while the former Scan only every row in the table.
The second part of the tutorial is introduced here. The final article in the series "How to Write Better SQL Queries" will follow, so stay tuned.
Please indicate the source of reprinting: Grape City Control
The above is the detailed content of The Ultimate Guide - How to Write Better SQL Queries?. For more information, please follow other related articles on the PHP Chinese website!