Which is more efficient, MySQL multi-table related query or multiple single-table query?
When the amount of data is not large enough, there is no problem in using join, but it is usually done on the service layer.
First: Stand-alone database computing resources are very expensive, and the database requires Service writing and reading both require CPU consumption. In order to increase the throughput of the database, and the business does not care about the delay gap of hundreds of microseconds to milliseconds, the business will put more calculations into the service layer. After all, Computing resources are easy to expand horizontally, but databases are difficult. Therefore, most businesses will put pure computing operations on the service layer, and use the database as a KV system with transaction capabilities. This is a business-focused, light-weighted system. DB architecture ideas
Second: Many complex businesses may not use only one database due to historical development reasons. Generally, a layer of middleware will be added to multiple databases. Multiple databases There is no way to join between them. Naturally, the business will abstract a service layer to reduce the coupling to the database.
Third: For some large companies, due to the large scale of data, they have to divide the database into separate databases and tables. For the application of separate databases and tables, the use of join is also subject to many restrictions, unless the business can be well based on The sharding key makes it clear that the two tables to be joined are in the same physical database. Middleware generally does not support cross-database joins well.
To give a very common business example, in a sub-database and sub-table, two tables need to be updated synchronously. The two tables are located in different physical libraries. In order to ensure data consistency, one way is to use Distributed transaction middleware puts two update operations into one transaction, but such operations generally require a global lock, which is very slow in performance. However, some businesses can tolerate short-term data inconsistencies. How to do this? Let them be updated separately, but there will be a problem of data writing failure, then start a scheduled task, scan the A table for failed rows, and then see if the B table is also written successfully, and then pair the two associations. Record correction cannot be achieved using join at this time. The data can only be pulled to the service layer and merged by the application itself. . .
In fact, reconstructing the query by decomposing the associated query has the following advantages:
Make the cache more efficient.
Many applications can easily cache the result objects corresponding to single-table queries. In addition, for MySQL's query cache, if a table in the association changes, the query cache cannot be used. After splitting, if a table rarely changes, queries based on the table can be repeated. Use query cache results.
After breaking down the query, executing a single query can reduce lock contention.
Making associations at the application layer makes it easier to split the database and achieve high performance and scalability.
The efficiency of the query itself may also be improved
Queries that can reduce redundant records.
Furthermore, this is equivalent to implementing a hash association in the application instead of using MySQL's nested ring association. In some scenarios, hash association is much more efficient.
Execution order of MySQL
1) from sub Sentences assemble data from different data sources;
2) Use on to filter data for join connections
3) The where clause filters record rows based on specified conditions;
4) The group by clause divides the data into multiple groups;
5) cube, rollup
6) Use aggregate functions for calculation;
7) Use The having clause filters grouping;
8) Calculate all expressions;
9) Calculate select fields;
10) Use distinct to deduplicate data
11) Use order by to sort the result set.
12) Select TOPN data
If the association is from tableA, tableB, these two tables will first be organized for Cartesian product, and then Perform the following operations such as where and group by.
If you use left join, inner join or outer full join, use on to filter conditions and then join.
Look at the following 2 sql and results. The difference between the two lies in the position after the on and where statements. First use on for conditional filtering, then perform join operation, and then apply where conditional filtering.
Use join to connect first, and then use on to filter, which will form a Cartesian product. There is no difference between such a left join and a direct join. So you must first filter on conditions and then join.
If a JOIN operation is performed after WHERE and above ON, the results of the following two SQL queries should be the same. It can be seen that where is filtering for the set after join.
To summarize: First perform on condition filtering, then join, and finally perform where filtering
SELECT DISTINCT a.domain , b.domain FROM mal_nxdomains_raw a LEFT JOIN mal_nxdomains_detail b ON a.domain = b.domain AND b.date = ‘20160403' WHERE a.date = ‘20160403'
SELECT DISTINCT a.domain , b.domain FROM mal_nxdomains_raw a LEFT JOIN mal_nxdomains_detail b ON a.domain = b.domain #and b.date = ‘20160403' WHERE a.date = ‘20160403' AND b.date = ‘20160403'
1、使用位置
on 条件位置在join后面
where 条件在join 与on完成的后面
2、使用对象
on 的使用对象是被关联表
where的使用对象可以是主表,也可以是关联表
3、选择与使用
主表条件筛选:只能在where后面使用。
被关联表,如果是想缩小join范围,可以放置到on后面。如果是关联后再查询,可以放置到where 后面。
如果left join 中,where条件有对被关联表的 关联字段的 非空查询,与使用inner join的效果后,在进行where 筛选的效果是一样的。不能起到left join的作用。
在表A和表B的联接中,从A表中选出一条记录,并将其传递到B表进行扫描和匹配。所以A的行数决定查询次数,B表的行数决定扫描范围。需要运行100次从A表中取出一条数据,然后进行200次比对,将结果存储到B表中。
相对来说从A表取数据消耗的资源比较多。所以尽量tableA选择比较小的表。同时缩小B表的查询范围。
但是实际应用中,因为二者返回的数据结果不同,使用的索引也不同,导致条件放置在on 和 where 效率是不一定谁更好。要根据需求来确定。
The above is the detailed content of What are mysql's join query and multiple query methods?. For more information, please follow other related articles on the PHP Chinese website!