Home  >  Article  >  Database  >  Interviewer: Are you familiar with SQL optimization? I only know 20 kinds, but there are far more...

Interviewer: Are you familiar with SQL optimization? I only know 20 kinds, but there are far more...

Java后端技术全栈
Java后端技术全栈forward
2023-08-17 16:36:221192browse


During the interview, the interviewer likes to ask:

Are you familiar with SQL optimization?

Face this kind of problem, don’t be afraid, Brother Tian has prepared the following 52 SQL statementsPerformance optimization strategies for you. If it doesn’t work, just memorize a few more items, and it will be fine to deal with your immediate needs.

「Optimization Strategy」

1. To optimize the query, you should try to avoid full table scan. You should first consider WHERE and ORDER BY. Create an index on the involved columns.

2. Try to avoid NULL value judgment on fields in the WHERE clause. NULL is the default value when creating a table, but most of the time you should use NOT NULL, or use a special value, such as 0, -1 as default.

3. Try to avoid using the != or a8093152e673feb7aba1828c43532094 operator in the WHERE clause. MySQL uses indexes only for the following operators: 23735d90c0339e974c9b8bbfdf9ee4cb, >=, BETWEEN, IN, and sometimes LIKE.

4. Try to avoid using OR in the WHERE clause to connect conditions, otherwise the engine will give up using the index and perform a full table scan. You can use UNION to merge the query:

select id from t where num=10 union all select id from t where num=20。

5. IN and NOT IN should also be used with caution, otherwise it will lead to a full table scan. For continuous values, do not use IN if you can use BETWEEN:

select id from t where num between 1 and 3。

6. The following query will also cause a full table scan:

select id from t where name like‘%abc%’

or

select id from t where name like‘%abc’

To improve For efficiency, full-text search can be considered. and

select id from t where name like‘abc%’

才用到索引。

7、如果在 WHERE 子句中使用参数,也会导致全表扫描。

8、应尽量避免在 WHERE 子句中对字段进行表达式操作,应尽量避免在 WHERE 子句中对字段进行函数操作。

9、很多时候用 EXISTS 代替 IN 是一个好的选择:

select num from a where num in(select num from b)

用下面的语句替换:

select num from a where exists(select 1 from b where num=a.num)

10、索引固然可以提高相应的 SELECT 的效率,但同时也降低了 INSERT 及 UPDATE 的效。因为 INSERT 或 UPDATE 时有可能会重建索引,所以怎样建索引需要慎重考虑,视具体情况而定。一个表的索引数最好不要超过 6 个,若太多则应考虑一些不常使用到的列上建的索引是否有必要。

11、应尽可能的避免更新 clustered 索引数据列, 因为 clustered 索引数据列的顺序就是表记录的物理存储顺序,一旦该列值改变将导致整个表记录的顺序的调整,会耗费相当大的资源。若应用系统需要频繁更新 clustered 索引数据列,那么需要考虑是否应将该索引建为 clustered 索引。

12、尽量使用数字型字段,若只含数值信息的字段尽量不要设计为字符型,这会降低查询和连接的性能,并会增加存储开销。

13、尽可能的使用 varchar, nvarchar 代替 char, nchar。因为首先变长字段存储空间小,可以节省存储空间,其次对于查询来说,在一个相对较小的字段内搜索效率显然要高些。

14、最好不要使用返回所有:select from t ,用具体的字段列表代替 “*”,不要返回用不到的任何字段。

15、尽量避免向客户端返回大数据量,若数据量过大,应该考虑相应需求是否合理。

16、使用表的别名(Alias):当在 SQL 语句中连接多个表时,请使用表的别名并把别名前缀于每个 Column 上。这样一来,就可以减少解析的时间并减少那些由 Column 歧义引起的语法错误。

17、使用“临时表”暂存中间结果 :

简化 SQL 语句的重要方法就是采用临时表暂存中间结果。但是临时表的好处远远不止这些,将临时结果暂存在临时表,后面的查询就在 tempdb 中了,这可以避免程序中多次扫描主表,也大大减少了程序执行中“共享锁”阻塞“更新锁”,减少了阻塞,提高了并发性能。

18、一些 SQL 查询语句应加上 nolock,读、写是会相互阻塞的,为了提高并发性能。对于一些查询,可以加上 nolock,这样读的时候可以允许写,但缺点是可能读到未提交的脏数据。

使用 nolock 有3条原则:

  • 查询的结果用于“插、删、改”的不能加 nolock;
  • 查询的表属于频繁发生页分裂的,慎用 nolock ;
  • 使用临时表一样可以保存“数据前影”,起到类似 Oracle 的 undo 表空间的功能,能采用临时表提高并发性能的,不要用 nolock。

19、常见的简化规则如下:

不要有超过 5 个以上的表连接(JOIN),考虑使用临时表或表变量存放中间结果。少用子查询,视图嵌套不要过深,一般视图嵌套不要超过 2 个为宜。

20、将需要查询的结果预先计算好放在表中,查询的时候再Select。这在SQL7.0以前是最重要的手段,例如医院的住院费计算。

21、用 OR 的字句可以分解成多个查询,并且通过 UNION 连接多个查询。他们的速度只同是否使用索引有关,如果查询需要用到联合索引,用 UNION all 执行的效率更高。多个 OR 的字句没有用到索引,改写成 UNION 的形式再试图与索引匹配。一个关键的问题是否用到索引。

22、在IN后面值的列表中,将出现最频繁的值放在最前面,出现得最少的放在最后面,减少判断的次数。

23、尽量将数据的处理工作放在服务器上,减少网络的开销,如使用存储过程。

存储过程是编译好、优化过、并且被组织到一个执行规划里、且存储在数据库中的 SQL 语句,是控制流语言的集合,速度当然快。反复执行的动态 SQL,可以使用临时存储过程,该过程(临时表)被放在 Tempdb 中。

24、当服务器的内存够多时,配制线程数量 = 最大连接数+5,这样能发挥最大的效率;否则使用配制线程数量1b837df401709d65ad33b953e9387142=”,不要使用 “>”。

28、索引的使用规范:

索引的创建要与应用结合考虑,建议大的 OLTP 表不要超过 6 个索引;尽可能的使用索引字段作为查询条件,尤其是聚簇索引,必要时可以通过 index index_name 来强制指定索引;避免对大表查询时进行 table scan,必要时考虑新建索引;在使用索引字段作为条件时,如果该索引是联合索引,那么必须使用到该索引中的第一个字段作为条件时才能保证系统使用该索引,否则该索引将不会被使用;要注意索引的维护,周期性重建索引,重新编译存储过程。

29、下列 SQL 条件语句中的列都建有恰当的索引,但执行速度却非常慢:

SELECT * FROM record WHERE substrINg(card_no, 1, 4) = '5378' --13秒 
SELECT * FROM record WHERE amount/30 < 1000 --11秒 
SELECT * FROM record WHERE convert(char(10), date, 112) = &#39;19991201&#39; --10秒

分析

WHERE 子句中对列的任何操作结果都是在 SQL 运行时逐列计算得到的,因此它不得不进行表搜索,而没有使用该列上面的索引。

如果这些结果在查询编译时就能得到,那么就可以被 SQL 优化器优化,使用索引,避免表搜索,因此将 SQL 重写成下面这样:

SELECT * FROM record WHERE card_no like &#39;5378%&#39; -- < 1秒 
SELECT * FROM record WHERE amount < 1000*30 -- < 1秒 
SELECT * FROM record WHERE date = &#39;1999/12/01&#39; -- < 1秒

30、当有一批处理的插入或更新时,用批量插入或批量更新,绝不会一条条记录的去更新。

31、在所有的存储过程中,能够用 SQL 语句的,我绝不会用循环去实现。

例如:列出上个月的每一天,我会用 connect by 去递归查询一下,绝不会去用循环从上个月第一天到最后一天。

32、选择最有效率的表名顺序(只在基于规则的优化器中有效):

Oracle 的解析器按照从右到左的顺序处理 FROM 子句中的表名,FROM 子句中写在最后的表(基础表 driving table)将被最先处理,在 FROM 子句中包含多个表的情况下,你必须选择记录条数最少的表作为基础表。

如果有 3 个以上的表连接查询,那就需要选择交叉表(intersection table)作为基础表,交叉表是指那个被其他表所引用的表。

33、提高 GROUP BY 语句的效率,可以通过将不需要的记录在 GROUP BY 之前过滤掉。下面两个查询返回相同结果,但第二个明显就快了许多。

低效

SELECT JOB, AVG(SAL) 
FROM EMP 
GROUP BY JOB 
HAVING JOB = &#39;PRESIDENT&#39; 
OR JOB = &#39;MANAGER&#39;

高效

SELECT JOB, AVG(SAL) 
FROM EMP
WHERE JOB = &#39;PRESIDENT&#39; 
OR JOB = &#39;MANAGER&#39; 
GROUP BY JOB

34、SQL 语句用大写,因为 Oracle 总是先解析 SQL 语句,把小写的字母转换成大写的再执行。

35、别名的使用,别名是大型数据库的应用技巧,就是表名、列名在查询中以一个字母为别名,查询速度要比建连接表快 1.5 倍。

36、避免死锁,在你的存储过程和触发器中访问同一个表时总是以相同的顺序;事务应经可能地缩短,在一个事务中应尽可能减少涉及到的数据量;永远不要在事务中等待用户输入。

37、避免使用临时表,除非却有需要,否则应尽量避免使用临时表,相反,可以使用表变量代替。大多数时候(99%),表变量驻扎在内存中,因此速度比临时表更快,临时表驻扎在 TempDb 数据库中,因此临时表上的操作需要跨数据库通信,速度自然慢。

38、最好不要使用触发器:

触发一个触发器,执行一个触发器事件本身就是一个耗费资源的过程;如果能够使用约束实现的,尽量不要使用触发器;不要为不同的触发事件(Insert、Update 和 Delete)使用相同的触发器;不要在触发器中使用事务型代码。

39、索引创建规则:

The primary key and foreign key of the table must have indexes; tables with more than 300 data volumes should have indexes; tables that are often connected to other tables should have indexes on the connection fields; fields that often appear in the WHERE clause, Especially the fields of large tables should be indexed; the index should be built on highly selective fields; the index should be built on small fields. For large text fields or even very long fields, do not build indexes; the establishment of composite indexes needs to be done Analyze carefully and try to consider replacing it with a single-field index; correctly select the main column field in the composite index, which is generally a field with better selectivity; do several fields of the composite index often appear in the WHERE clause in AND mode at the same time? Are there few or no single-field queries? If so, you can build a composite index; otherwise consider a single-field index; if the fields contained in the composite index often appear alone in the WHERE clause, break it into multiple single-field indexes; if the composite index contains more than 3 fields , then carefully consider the necessity and consider reducing the number of compound fields; if there are both single-field indexes and compound indexes on these fields, you can generally delete the compound index; do not create too many tables that frequently perform data operations. Indexes; delete useless indexes to avoid negative impacts on execution plans; each index created on the table will increase storage overhead, and indexes will also increase processing overhead for insertion, deletion, and update operations. In addition, too many compound indexes are generally of no value when there are single-field indexes; on the contrary, they will also reduce the performance when data is added and deleted, especially for frequently updated tables, the negative impact is even greater big. Try not to index a field in the database that contains a large number of duplicate values.

40. MySQL query optimization summary:

Use slow query logs to find slow queries, use execution plans to determine whether queries are running normally, and always test your queries to see if they run At its best.

Performance will always change over time, avoid using count(*) on the entire table, it may lock the entire table, make the query consistent so that subsequent similar queries can use the query cache, under appropriate circumstances Use GROUP BY instead of DISTINCT, use indexed columns in the WHERE, GROUP BY, and ORDER BY clauses, keep indexes simple, and do not include the same column in multiple indexes.

Sometimes MySQL will use the wrong index. In this case, use USE INDEX and check the problem of using SQL_MODE=STRICT. For index fields with less than 5 records, using LIMIT in UNION is not OR. .

In order to avoid SELECT before updating, use INSERT ON DUPLICATE KEY or INSERT IGNORE; do not use UPDATE to implement, do not use MAX; use index fields and ORDER BY clause LIMIT M, N can actually slow down the query In some cases, use sparingly, use UNION in the WHERE clause instead of a subquery, before restarting MySQL, remember to warm your database to ensure data is in memory and queries are fast, consider persistent connections instead Multiple connections to reduce overhead.

Benchmark queries, including using the load on the server. Sometimes a simple query can affect other queries. When the load increases on the server, use SHOW PROCESSLIST to see slow and problematic queries that are generated in the development environment. All suspicious queries tested on the mirror data.

41. MySQL backup process:

Backup from the secondary replication server; stop replication during the backup to avoid inconsistencies in data dependencies and foreign key constraints; stop MySQL completely and back up from the database file; if using MySQL dump for backup, please Also back up the binary log files – ensure replication is not interrupted; do not trust LVM snapshots, which are likely to produce data inconsistencies that will cause you trouble in the future; for easier single-table recovery, export data in table units – if the data is Isolated from other tables. Use –opt when using mysqldump; check and optimize tables before backing up; for faster import, temporarily disable foreign key constraints during import. ; For faster import, temporarily disable uniqueness detection during import; calculate the size of database, table and index after each backup to better monitor the growth of data size; monitor replication instance errors and errors through automatic scheduling scripts Delay; perform backups regularly.

42. The query buffer does not automatically process spaces. Therefore, when writing SQL statements, you should try to reduce the use of spaces, especially the spaces at the beginning and end of SQL (because the query cache does not automatically intercept the spaces at the beginning and end of the SQL statement) ).

43. Can member use mid as the standard to divide the table into tables for easy query? In general business requirements, username is basically used as the query basis. Normally, username should be used as a hash modulus to divide tables.

When it comes to splitting tables, MySQL's partition function does this and is transparent to the code; it seems unreasonable to implement it at the code level.

44. We should set an ID as the primary key for each table in the database, and the best is an INT type (UNSIGNED is recommended), and set the automatically increased AUTO_INCREMENT flag.

45. Set SET NOCOUNT ON at the beginning of all stored procedures and triggers, and set SET NOCOUNT OFF at the end. There is no need to send a DONE_IN_PROC message to the client after each statement of stored procedures and triggers.

46. MySQL query can enable high-speed query cache. This is one of the effective MySQL optimization methods to improve database performance. When the same query is executed multiple times, it is much faster to pull the data from the cache and return it directly from the database.

47. EXPLAIN SELECT query is used to track the viewing effect:

Using the EXPLAIN keyword can let you know how MySQL processes your SQL statement. This can help you analyze the performance bottlenecks of your query statements or table structures. EXPLAIN query results will also tell you how your index primary keys are used and how your data tables are searched and sorted.

48. Use LIMIT 1 when there is only one row of data:

Sometimes when you query the table, you already know that the result will only be one result, but because you may need to fetch the cursor, Or you might check the number of records returned.

In this case, adding LIMIT 1 can increase performance. In this way, the MySQL database engine will stop searching after finding a piece of data, instead of continuing to search for the next piece of data that matches the record.

49. Select a suitable storage engine for the table:

myisam: The application is mainly based on read and insert operations, with only a small amount of updates and deletions, and the integrity and concurrency of the transaction Sexual requirements are not very high. InnoDB: Transaction processing, and data consistency required under concurrent conditions. In addition to inserts and queries, it also includes many updates and deletes. (InnoDB effectively reduces locking caused by deletes and updates). For InnoDB type tables that support transactions, the main reason that affects the speed is that the default setting of AUTOCOMMIT is turned on, and the program does not explicitly call BEGIN to start the transaction, causing each insert to be automatically submitted, seriously affecting the speed. You can call begin before executing SQL. Multiple SQLs form one thing (even if autocommit is turned on), which will greatly improve performance.

50. Optimize the data type of the table and choose the appropriate data type:

Principle: Smaller is usually better, simple is good, and all fields must have defaults value, try to avoid NULL.

For example: When designing database tables, use smaller integer types to occupy less disk space. (mediumint is more suitable than int)

For example, time fields: datetime and timestamp. datetime occupies 8 bytes, timestamp occupies 4 bytes, only half is used. The range represented by timestamp is 1970-2037, which is suitable for update time.

MySQL can well support the access of large amounts of data, but generally speaking, the smaller the table in the database, the faster the queries executed on it will be.

Therefore, when creating a table, in order to obtain better performance, we can set the width of the fields in the table as small as possible.

For example: When defining the postal code field, if it is set to CHAR(255), it will obviously add unnecessary space to the database. Even using the VARCHAR type is redundant, since CHAR(6) does the job just fine.

Similarly, if possible, we should use MEDIUMINT instead of BIGIN to define integer fields, and we should try to set the fields to NOT NULL, so that when executing queries in the future, the database does not need to compare NULL values.

For some text fields, such as "province" or "gender", we can define them as ENUM type. Because in MySQL, the ENUM type is treated as numeric data, and numeric data is processed much faster than text types. In this way, we can improve the performance of the database.

51. String data type: char, varchar, text. Select the difference.

52. Any operation on the column will result in table scan, which includes database functions, calculation expressions, etc. When querying, the operation should be moved to the right side of the equal sign as much as possible.

「Summary」

This article describes a total of 52 SQL optimization strategies. If you can name more than 10, it proves that this interview is still very interesting. If you can After you name 20 types, the interviewer will basically stop waiting for you to continue speaking. You are already very good. At this time, the interviewer's impression will increase.

The above is the detailed content of Interviewer: Are you familiar with SQL optimization? I only know 20 kinds, but there are far more.... For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:Java后端技术全栈. If there is any infringement, please contact admin@php.cn delete