When the previous article talked about the basic architecture of Mysql, it started with "How are SQL query statements executed in the MySql architecture?" A comprehensive explanation was given. I know the specific execution process of sql query statements in the MySql architecture, but in order to write sql statements better and faster, I think it is very necessary to know the execution order of each clause in the sql statement. Friends who have read the previous article should know that the execution of each clause at the end of the SQL statement should be completed in the executor, and the storage engine provides a data reading and writing interface to the executor. Now let’s start our study of the complete execution sequence of each clause in the
from (Note: This also includes the sub-statements in from )
join
on
The group by clause combines the unique values in into a group to obtain virtual table T4. If group by is applied, all subsequent steps can only operate on T4 columns or perform 6. Aggregation functions (count, sum, avg, etc.). (Note: The reason is that the final result set after grouping only contains one row from each group. Remember, otherwise there will be many problems here, and the following code misunderstandings will be specifically mentioned.)
The aggregate function only performs some processing on the grouped results to obtain some desired aggregate values, such as sums, statistical quantities, etc., and does not generate virtual tables.
Apply the having filter to generate T5. The HAVING clause is mainly used in conjunction with the GROUP BY clause. The having filter is the first and only filter that is applied to grouped data.
Perform the select operation, select the specified column, and insert it into virtual table T6.
Deduplicate the records in T6. Remove the same rows to generate virtual table T7. (Note: In fact, if the group by clause is applied, distinct is redundant. The reason is also that when grouping, the unique values in the column are grouped into one group, and only for Each group returns one row of records, so all the records will be different.)
Apply the order by clause. Sort T7 according to order_by_condition. At this time, a cursor is returned instead of a virtual table. SQL is based on the theory of sets. A set does not pre-sort its rows. It is just a logical collection of members, and the order of the members is irrelevant. A query that sorts a table can return an object that contains a logical organization in a specific physical order. This object is called a cursor.
Some notes on oder by
Because the return value of order by is a cursor, the query using the order by clause cannot be applied to table expressions.
Order by sorting is very costly. Unless you must sort, it is best not to specify order by,
order by. Two parameters asc (ascending order) desc (descending order)
Retrieve the records of the specified row, generate virtual table T9, and return the result.
The parameter after limit can be a limit m or a limit m n, which means from the mth piece to the nth piece of data.
(Note: Many developers like to use this statement to solve paging problems. For small data, there is no problem using the LIMIT clause. When the amount of data is very large, using LIMIT n, m is very inefficient. . Because the LIMIT mechanism starts scanning from the beginning every time. If you need to read 3 pieces of data starting from the 600,000th row, you need to scan and locate the 600,000th row first, and then read it. The scanning process It is a very inefficient process. Therefore, when processing big data, it is very necessary to establish a certain caching mechanism at the application layer)
SELECT `userspk`.`avatar` AS `user_avatar`, `a`.`user_id`, `a`.`answer_record`, MAX(`score`) AS `score`FROM (select * from pkrecord order by score desc) as a INNER JOIN `userspk` AS `userspk` ON `a`.`user_id` = `userspk`.`user_id`WHERE `a`.`status` = 1 AND `a`.`user_id` != 'm_6da5d9e0-4629-11e9-b5f7-694ced396953' GROUP BY `user_id`ORDER BY `a`.`score` DESC LIMIT 9;
Result:
Let me briefly talk about what I want to query:
想要查询pk记录表中分数最高的9个用户记录和他们的头像。
通过这段sql实际想一遍sql各字句的执行顺序
pk记录表的数据结构设计,每个用户每天每个馆下可能会有多条记录,所以需要进行分组,并且查询结果只想拿到每个分组内最高的那条记录。
这段sql的一些说明:
可能有些同学会认为子查询没有必要 直接查询pk记录表就可以,但是并不能拿到预期的结果,因为分组后的每个组结果是不进行排序的,而且max拿到的最高分数肯定是对应的该分组下最高分数,但是其它记录可能就不是最高分数对应的那条记录。所以子查询非常有必要,它能够对原始的数据首先进行排序,分数最高的那条就是第一条对应的第一条记录。
看一下代码和执行结果与带有子查询的进行比较,就能理解我上面说的一段话:
//不使用子查询SELECT `userspk`.`avatar` AS `user_avatar`, `pkrecord`.`user_id`, `pkrecord`.`answer_record`, `pkrecord`.`id`, MAX(`score`) AS `score`FROM pkrecordINNER JOIN `userspk` AS `userspk` ON `pkrecord`.`user_id` = `userspk`.`user_id`WHERE `pkrecord`.`status` = 1 AND `pkrecord`.`user_id` != 'm_6da5d9e0-4629-11e9-b5f7-694ced396953' GROUP BY `user_id`ORDER BY `pkrecord`.`score` DESC LIMIT 9;
查询结果
2. 在子查询中对数据已经进行排序后,外层排序方式如果和子查询排序分数相同,都是分数倒序,外层的排序可以去掉,没有必要写两遍。
在 SQL 语句中,可以为表名称及字段(列)名称指定别名
表名称指定别名
同时查询两张表的数据的时候: 未设置别名前:
SELECT article.title,article.content,user.username FROM article, userWHERE article.aid=1 AND article.uid=user.uid
设置别名后:
SELECT a.title,a.content,u.username FROM article AS a, user AS u where a.aid=1 and a.uid=u.uid
好处:使用表别名查询,可以使 SQL 变得简洁而更易书写和阅读,尤其在 SQL 比较复杂的情况下
查询字段指定别名
查询一张表,直接对查询字段设置别名
SELECT username AS name,email FROM user
查询两张表
好处:字段别名一个明显的效果是可以自定义查询数据返回的字段名;当两张表有相同的字段需要都被查询出,使用别名可以完美的进行区分,避免冲突
SELECT a.title AS atitle,u.username,u.title AS utitle FROM article AS a, user AS u where a.uid=u.uid
关联查询时候,关联表自身的时候,一些分类表,必须使用别名。
别名也可以在group by与having的时候都可使用
别名可以在order by排序的时候被使用
查看上面一段sql
delete , update MySQL都可以使用别名,别名在多表(级联)删除尤为有用
delete t1,t2 from t_a t1 , t_b t2 where t1.id = t2.id
子查询结果需要使用别名
查看上面一段sql
虽然定义字段别名的 AS 关键字可以省略,但是在使用别名时候,建议不要省略 AS 关键字
字符串类型的要加单引号
select后面的每个字段要用逗号分隔,但是最后连着from的字段不要加逗号
使用子查询创建临时表的时候要使用别名,否则会报错。
不要使用“select * from ……”返回所有列,只检索需要的列,可避免后续因表结构变化导致的不必要的程序修改,还可降低额外消耗的资源
不要检索已知的列
select user_id,name from User where user_id = ‘10000050’
使用可参数化的搜索条件,如=, >, >=, , !=, !>, !
当需要验证是否有符合条件的记录时,使用exists,不要使用count(*),前者在第一个匹配记录处返回,后者需要遍历所有匹配记录
Where子句中列的顺序与需使用的索引顺序保持一致,不是所有数据库的优化器都能对此顺序进行优化,保持良好编程习惯(索引相关)
不要在where子句中对字段进行运算或函数(索引相关)
如where amount / 2 > 100,即使amount字段有索引,也无法使用,改成where amount > 100 * 2就可使用amount列上的索引
如where substring( Lastname, 1, 1) = ‘F’就无法使用Lastname列上的索引,而where Lastname like ‘F%’或者where Lastname >= ‘F’ and Lastname
在有min、max、distinct、order by、group by操作的列上建索引,避免额外的排序开销(索引相关)
小心使用or操作,and操作中任何一个子句可使用索引都会提高查询性能,但是or条件中任何一个不能使用索引,都将导致查询性能下降,如where member_no = 1 or provider_no = 1,在member_no或provider_no任何一个字段上没有索引,都将导致表扫描或聚簇索引扫描(索引相关)
Between一般比in/or高效得多,如果能在between和in/or条件中选择,那么始终选择between条件,并用>=和=和
Adjust the order of join operations to optimize performance. The join operation is top-down. Try to put the association of two tables with smaller result sets in front to improve performance. (join related) Note: I will take out two separate articles to explain indexing and association in detail. I will only briefly mention it in this note.
For more MySQL related technical articles, please visit the MySQL Tutorial column to learn!
The above is the detailed content of How to write elegant SQL native statements. For more information, please follow other related articles on the PHP Chinese website!