Home  >  Article  >  Database  >  Why do code specifications require SQL statements not to have too many joins?

Why do code specifications require SQL statements not to have too many joins?

Java学习指南
Java学习指南forward
2023-07-26 16:51:041153browse

Send sub-questions

Interviewer: Have you ever operated Linux?

Me: Yes

Interviewer: What command should I use to check the memory usage?

Me: free or top

Interviewer: Then tell me what information you can see using the free command

Me: Then, as shown in the figure below, you can see the usage of memory and cache.

  • total total memory

  • ##used used memory

  • free free memory

  • buff/cache used cache

  • avaiable memory

Why do code specifications require SQL statements not to have too many joins?

##Interviewer:

Then you know how Clear the used cache (buff/cache)

Me: em... I don’t know

Interviewer: sync; echo 3 > /proc/sys/vm/drop_caches You can clear the buff/cache. Can you tell me if I can execute this command online?

Why do code specifications require SQL statements not to have too many joins?

##Me: (Send points, Overjoyed) The benefits are huge. After clearing the cache, we will have more available memory space. Just like the little rocket of xx Guardian on the PC, a lot of memory will be released with one click.

Interviewer: em…., go back and wait for notification

Let’s talk about SQL Join

Interviewer: Change the topic and let’s talk Your understanding of join

Me: Okay (if you answer it wrong again, it’s over, seize the opportunity)

Review

join in SQL can combine specified tables according to certain conditions and return data to the client

Join methods include

inner join inner join

Why do code specifications require SQL statements not to have too many joins?

##left join left join

Why do code specifications require SQL statements not to have too many joins?

right join right join

Why do code specifications require SQL statements not to have too many joins?

full join Full join

Why do code specifications require SQL statements not to have too many joins?


#Picture source: https://www.cnblogs.com/reaptomorrow-flydream/p/8145610.html

Interviewer: If you need to use join statements during project development, how to optimize and improve performance?

Me: Divided into two In this case, the data size is small and the data size is large.

Interviewer: Then?

Me: For

1. The data size is small and all is put into the memory. Wow

2. The data scale is large

  • #You can optimize the execution speed of the join statement by adding indexes

  • You can use redundant information to reduce the number of joins

  • Reduce the number of table connections as much as possible, the number of table connections for one SQL statement No more than 5 times

Interviewer: It can be summarized that the join statement is relatively performance-consuming, right?

Me: Yes

Interviewer: Why?

Buffer

Me: There must be a comparison process when executing the join statement

Interviewer: Yes

Me: The statement comparing two tables one by one is relatively slow, so we can read the data in the two tables into a memory block in sequence, using MySQL Taking the InnoDB engine as an example, we can definitely find the relevant memory area by using the following statement show variables like '%buffer%'

Why do code specifications require SQL statements not to have too many joins?

As shown in the figure Indicates that the size of join_buffer_size will affect the execution performance of our join statement

Interviewer: What else?

A major premise

Me: Any project will eventually go online, it is inevitable to generate data, and the scale of the data cannot be too small

Interviewer: Yes Like this

Me:Most of the data in the database will eventually be saved to the hard disk and stored in the form of files.

Take MySQL's InnoDB engine as an example

  • InnoDB uses page as the basic IO unit, and the size of each page is 16KB

  • InnoDB will create an .ibd file for each table to store data

Why do code specifications require SQL statements not to have too many joins?

Verification

Why do code specifications require SQL statements not to have too many joins?

Me: This means that we need to read as many files as there are tables to connect, although it can be used Index, but it is still inevitable to move the hard disk head frequently

Interviewer:In other words, frequent movement of the head will affect the performance, right

Me:Yes, don’t the current open source frameworks like to say that they have greatly improved performance through sequential reading and writing, such as hbase and kafka

Interviewer: That’s right, then Do you think Linux has optimized this? Tip, you can execute the free command again to take a look

Me:Strange why the cache occupies more than 1.2G

Why do code specifications require SQL statements not to have too many joins?

Why do code specifications require SQL statements not to have too many joins?

##Image source: https://www.linuxatemyram.com/

Interviewer: Have you ever thought about

  • buff/cache is stored in What?

  • Why does buff/cache occupy so much memory, and the available memory is available and there is still 1.1G?

  • Why can you clear the memory occupied by buff/cache through two commands, but you can only release used by ending the process?

Taste it carefully

After thinking for a few minutes

Why do code specifications require SQL statements not to have too many joins?

Me: Releasing the memory occupied by buff/cache so casually means that it is not important, and clearing it will not affect the operation of the system

Interviewer: Not entirely true

Me: Is that so? I think of a sentence in "CSAPP" (In-depth Understanding of Computer Systems)

The essence of the memory hierarchy is that each layer of storage device is the cache of the lower layer device

Why do code specifications require SQL statements not to have too many joins?

In layman’s terms, it means that Linux will treat the memory as the cache of the hard disk

Related information: http://tldp.org /LDP/sag/html/buffer-cache.html

Interviewer: Now you know how to answer the scoring question

Me: I….

Why do code specifications require SQL statements not to have too many joins?

##Join Algorithm

Interviewer: Give it to you again Given an opportunity, what would you do if you were asked to implement the Join algorithm?

Me: If there is no index, the nested loop will be finished. If there is an index, you can use the index to improve performance.

Interviewer: Back to join_buffer, what do you think is stored in join_buffer?

Me: During the scanning process, the database will select a table and add it to The data that needs to be returned and compared with other tables is put into join_buffer

Interviewer: How to deal with it when there is an index?

Me: This is relatively simple. Just read the index trees of the two tables directly for comparison and that's it. Let me introduce the non-index processing method here

Nested Loop Join

Why do code specifications require SQL statements not to have too many joins?

##Nested loop only reads one row of data in the table at a time, that is to say If the outerTable has 100,000 rows of data and the innerTable has 100 rows of data, it needs to be read 10,000,000 times (assuming that the files of these two tables are not cached in memory by the operating system, we call them cold data tables)

Of course, no database engine currently uses this algorithm (too slow)

Block nested loop

Why do code specifications require SQL statements not to have too many joins?

Block block, that is to say, a piece of data will be fetched into the memory each time to reduce I/O overhead

MySQL InnoDB will use this algorithm when no index can be used

Consider the following two tables t_a and t_b

Why do code specifications require SQL statements not to have too many joins?

When it is not possible When using an index to perform a join operation, InnoDB will automatically use the Block nested loop algorithm

Why do code specifications require SQL statements not to have too many joins?

Summary

When I was in school, the database teacher most I like to study database paradigms, and it wasn’t until I got to work that I learned that everything should be based on performance. If redundancy is possible, use redundancy. If redundancy is not possible, join if join really affects performance. Try increasing your join_buffer_size, or change to a solid state drive.

Reference materials

"In-depth understanding of computer systems"-Chapter 6 Memory Hierarchy
Author of "Experiments and fun with the Linux disk cache" Use several examples to illustrate the impact of hard disk cache on program execution performance
《Linux ate my ram》Explanation of Free parameters
How to clear the buffer/pagecache (disk cache) under Linux The sub-question command is given at the beginning of the article Explain
How MySQL runs: Understand MySQL from the root
Block bested loop The official documentation from MariaDB explains the implementation of the Block-Nested-Loop algorithm

The above is the detailed content of Why do code specifications require SQL statements not to have too many joins?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:Java学习指南. If there is any infringement, please contact admin@php.cn delete
Previous article:sql case when usageNext article:sql case when usage