Home  >  Article  >  Database  >  My opinion on MySQL development specifications

My opinion on MySQL development specifications

黄舟
黄舟Original
2017-02-22 11:05:23974browse

Most MySQL specifications can also be shared online. What I want to share here are some things that Lao Ye personally thinks are more important, or are easily overlooked, and are easily confused.

 1. Use InnoDB engine by default

【Lao Ye’s point of view】I have called for it many times, and InnoDB is applicable For almost 99% of MySQL application scenarios, and the system tables in MySQL 5.7 have been changed to InnoDB, there is no reason to stick to MyISAM.

In addition, InnoDB tables that are frequently read and written must use integers with auto-increment/sequential characteristics as explicit primary keys.

[Reference]: [MySQL FAQ] series - Why is it recommended to use auto-increment columns as primary keys for InnoDB tables.

 2. Select utf-8 as the character set

【Lao Ye’s point of view】If you want to save disk space, it is recommended to choose latin1. It is recommended to choose UTF-8 usually for the so-called "universality", but in fact the UTF-8 data submitted by the user can also be stored in the latin1 character set.

The trouble you may encounter when using latin1 to store utf-8 data is that if there is a Chinese-based retrieval, it may not be 100% accurate (Lao Ye personally simply tested the regular Chinese complete retrieval and it was not a problem at all, that is, General Chinese comparison is no problem).

The method of using latin1 character set to store utf-8 data is: the character set on the web side (user side) is utf-8, and the back-end program also uses utf-8 for processing, but character_set_client, character_set_connection, character_set_results, character_set_database, and character_set_server are all latin1, and the character sets of data tables and fields are also latin1. Or the data table uses latin1, just execute SET NAMES LATIN1 after each connection.

[Reference]: A brief talk about the MySQL character set.

 3. The physical length of InnoDB table row records does not exceed 8KB

【Lao Ye’s point of view】The default data page of InnoDB is 16KB. Based on the characteristics of B+Tree, a At least 2 records need to be stored in the data page. Therefore, when the actual storage length exceeds 8KB (especially TEXT/BLOB columns) large columns (large columns) will cause "page-overflow storage", similar to "row migration" in ORACLE.

Therefore, if you must use large columns (especially TEXT/BLOB types) and read and write frequently, it is best to split these columns into subtables and not store them together with the main table. If it's not too frequent, consider keeping it in the main table.

Of course, if the innodb_page_size option is modified to 8KB, then the physical length of the row record is recommended not to exceed 4KB.

 [Reference]: [MySQL Optimization Case] ​​Series - Optimizing the storage efficiency of BLOB columns in InnoDB tables.

 4. Whether to use partition tables

【Lao Ye’s point of view】In some scenarios where the use of partition tables can obviously improve performance or operation and maintenance convenience, it is still recommended. Use partitioned tables.

For example, Laoye uses the TokuDB engine in zabbix's database and uses partition tables based on the time dimension. The advantage of this is to ensure that the daily application of Zabbix is ​​not affected, and it is convenient for administrators to routinely delete past data. They only need to delete the corresponding partition, and there is no need to execute a very slow DELETE that affects the overall performance.

[Reference]: Migrate Zabbix database to TokuDB.

 5. Whether to use stored procedures and triggers

【Lao Ye’s point of view】In some suitable scenarios, it is absolutely fine to use stored procedures and triggers.

We used to use storage to complete game business logic processing. Performance is not a problem, and once the requirements change, we only need to modify the stored procedure, and the change cost is very low. We also use triggers to maintain a frequently updated table. All changes to this table will synchronously update some fields to another table (similar to the disguised implementation of materialized views), and there are no performance issues.

Don’t regard MySQL’s stored procedures and triggers as scourges. If you use them well, there will be no problems. If you encounter problems, it’s not too late to optimize them. In addition, MySQL does not have materialized views, so use views as little as possible.

 6. Choose the right type

【Lao Ye’s point of view】In addition to the common suggestions, there are several other important points:

 6.1. Use INT UNSIGNED to store the IPV4 address, and use INET_ATON() and INET_NTOA() for conversion. There is basically no need to use CHAR(15) for storage.

6.2. Enumeration types can use ENUM. The internal storage mechanism of ENUM uses TINYINT or SMALLINT (not CHAR/VARCHAR). The performance is not bad at all. Remember not to use CHAR/VARCHAR to store enumerations. Give data.

6.3. As a reference to the "common sense misinformation" that has been spreading earlier, it is recommended to use TIMESTAMP instead of DATETIME. In fact, starting from 5.6, it is recommended to give priority to DATETIME to store date and time, because its available range is larger than TIMESTAMP, and the physical storage is only 1 byte more than TIMESTAMP, so the overall performance loss is not large.

 6.4. In all field definitions, NOT NULL constraints are added by default, unless it must be NULL (but I can't think of any scenarios where NULL values ​​must be stored in the database, which can be represented by 0). When performing COUNT() statistics on this field, the statistical results will be more accurate (those with NULL values ​​will not be counted by COUNT), or when performing WHERE column IS NULL retrieval, the results can also be returned quickly.

6.5. Try not to directly SELECT * to read all fields, especially when there are large TEXT/BLOB columns in the table. There may not be a need to read these columns, but because I was lazy and wrote SELECT *, the memory buffer pool was washed out by these "junk" data, and the hot data that really needed to be buffered was washed out.

 8. About the index

 【Lao Ye’s point of view】In addition to common suggestions, there are several key points:

 8.1, more than 20 For length string columns, it is best to create a prefix index rather than a full column index (for example: ALTER TABLE t1 ADD INDEX(user(20))), which can effectively improve index utilization. However, its disadvantage is that it does not need to be used when sorting this column. to the prefix index. The length of the prefix index can be determined based on the statistics of the field, and is generally slightly larger than the average length.

 8.2. Regularly use the pt-duplicate-key-checker tool to check and delete duplicate indexes. For example, if the index idx1(a, b) already covers index idx2(a), the idx2 index can be deleted.

8.3. When there is a multi-field joint index, the field order of the filter conditions in WHERE does not need to be consistent with the index, but if there is sorting and grouping, it must be consistent.

For example, if there is a joint index idx1(a, b, c), then the following SQL can completely use the index:

SELECT ... WHERE b = ? AND c = ? AND a = ?;  --注意到,WHERE中字段顺序并没有和索引字段顺序一致
SELECT ... WHERE b = ? AND a = ? AND c = ?;
SELECT ... WHERE a = ? AND b IN (?, ?) AND c = ?;
SELECT ... WHERE a = ? AND b = ? ORDER BY c;
SELECT ... WHERE a = ? AND b IN (?, ?) ORDER BY c;
SELECT ... WHERE a = ? ORDER BY b, c;
SELECT ... ORDER BY a, b, c;  -- 可利用联合索引完成排序

And the following SQL areOnly part of the index can be used:

SELECT ... WHERE b = ? AND a = ?;   -- 只能用到 (a, b) 部分
SELECT ... WHERE a IN (?, ?) AND b = ?;   -- 只能用到 (a, b) 部分
SELECT ... WHERE a = ? AND c = ?;   -- 只能用到 (a) 部分
SELECT ... WHERE a = ? AND b IN (?, ?);    -- 只能用到 (a, b) 部分
SELECT ... WHERE (a BETWEEN ? AND ?) AND b = ?;   -- 只能用到 (a) 部分,注意BETWEEN和IN的区别
SELECT ... WHERE a = ? AND (b BETWEEN ? AND ?) AND c = ?;    -- 只能用到 (a, b) 部分

The following SQLdoes not use this index at all:

SELECT ... WHERE b = ?;
SELECT ... WHERE b = ? AND c = ?;
SELECT ... WHERE b = ? AND c = ?;
SELECT ... ORDER BY b;
SELECT ... ORDER BY b, a;

From the above examples It can be seen that the "common sense misleading" that was emphasized in the past that the order of the WHERE condition fields must be consistent with the order of the index before using the index does not need to be strictly followed.

In addition, sometimes the index or execution plan specified by the query optimizer may not be optimal. You can manually specify the optimal index, or modify the session-level optimizer_switch option to turn off some factors that will cause worse results. Features (for example, index merge is usually a good thing, but it has also been encountered that it is worse after using index merge. In this case, one of the indexes must be forced to be specified, or the index merge feature can be temporarily turned off).

 is 20%, MySQL is currently 30%, maybe it will be adjusted in the future), it will directly change the execution plan to a full table scan, and no longer use the index.  9.2. When joining multiple tables, the table with the greatest filterability (not necessarily the smallest amount of data, but the one with the greatest filterability after only adding the WHERE condition) should be selected as the driver table. In addition, if there is sorting after JOIN, the sorting field must belong to the driver table, so that the index on the driver table can be used to complete the sorting.

9.3. In most cases, the sorting index is usually higher, so if you see Using filesort in the execution plan, create a sorting index first.

 9.4. Use pt-query-digest to regularly analyze slow query log, and combine it with Box Anemometer to build a slow query log analysis and optimization system.

 [

Reference

]: [MySQL FAQ] series - What information in EXPLAIN results should attract attention.

Note: Unless otherwise specified, the above specifications are recommended for MySQL 5.6 and earlier versions. There may be some changes in versions 5.7 and later, and individual specification recommendations need to be adjusted accordingly. The above is my opinion on the MySQL development specifications. For more related content, please pay attention to the PHP Chinese website (www.php.cn)!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn