Home  >  Article  >  Database  >  Some practices of mysql stand-alone database optimization_MySQL

Some practices of mysql stand-alone database optimization_MySQL

WBOY
WBOYOriginal
2016-10-09 08:33:41870browse

There is a lot to talk about in database optimization. According to the amount of data supported, it can be divided into two stages: stand-alone database and sub-database and sub-table. The former can generally support data within 500W or 10G. If it exceeds this value, you need to consider sub-database and sub-table. surface. In addition, interviews with large companies often start with questions about a single-machine database, and then ask about sub-databases and tables step by step, with many database optimization questions interspersed in the middle. This article attempts to describe some practices of stand-alone database optimization. The database is based on mysql. If there is anything unreasonable, please correct me.

1. Table structure optimization

When starting to build an application, the table structure design of the database will often affect the performance of the application later, especially the performance after the number of users increases. Therefore, table structure optimization is a very important step.

1.1, character set

Generally speaking, try to choose UTF-8. Although GBK uses less storage space than UTF-8 when storing data, UTF-8 is compatible with various languages. In fact, we do not have to sacrifice scalability for this storage space. In fact, if you want to convert from GBK to UTF-8 later, the cost will be very high, and data migration will be required, and the storage space can be solved by spending money to expand the hard disk.

1.2, primary key

When using mysql's innodb, the underlying storage model of innodb is a B+ tree, which uses the primary key as a clustered index and the inserted data as leaf nodes. Leaf nodes can be quickly found through the primary key, thereby quickly obtaining records. Therefore, you need to add a primary key when designing the table, and it is best to increase it automatically. Because the auto-increasing primary key allows the inserted data to be inserted into the leaf nodes of the underlying B+ tree in the order of the primary key. Since it is in order, this kind of insertion almost does not need to move other existing data, so the insertion efficiency is very high. If the primary key is not auto-increasing, then the value of the primary key will be approximately random each time. At this time, a large amount of data may need to be moved to ensure the characteristics of the B+ tree, adding unnecessary overhead.

1.3, field

1.3.1. The indexed fields must be added with the not null constraint and set the default value

1.3.2. It is not recommended to use float or double to store decimals to prevent loss of precision. It is recommended to use decimal

1.3.3. It is not recommended to use Text/blob to save large amounts of data, because reading and writing large text will cause relatively large I/O overhead and occupy the mysql cache. High concurrency will greatly reduce the throughput of the database. It is recommended to save large text data in a special file storage system. MySQL only saves the access address of this file. For example, blog posts can be saved in files. MySQL only saves the relative address of the file.

1.3.4. It is recommended that the length of varchar type should not exceed 8K.

1.3.5. It is recommended to use Datetime instead of timestamp for time type. Although Datetime occupies 8 bytes and timestamp only occupies 4 bytes, the latter must be non-empty and is sensitive to time zones.

1.3.6. It is recommended to add two fields, gmt_create and gmt_modified, to the table to record the modification time of data creation. The reason for establishing these two fields is to facilitate troubleshooting.

1.4, index creation

1.4.1. At this stage, because you don’t understand the business, try not to blindly add indexes, and only add ordinary indexes to some fields that will definitely be used.

1.4.2. The length of creating an innodb single column index should not exceed 767 bytes. If it exceeds, the first 255 bytes will be used as the prefix index

1.4.3. The length of each column index when creating an innodb combined index should not exceed 767 bytes, and the total should not exceed 3072 bytes

2. SQL optimization

Generally speaking, there are only a few types of SQL: basic add, delete, modify, query, paging query, range query, fuzzy search, multi-table connection

2.1, basic query

General queries need to be indexed. If there is no index, it is recommended to modify the query and add the field with the index. If this field cannot be used due to the business scenario, then you need to check whether the call volume of this query is large. If it is large, for example, it is called every day. 10W+, this requires adding a new index. If it is not large, such as 100+ calls per day, you can consider leaving it as is. In addition, use select * as little as possible. Just add whatever fields are used in the sql statement. Don't check unnecessary fields, which wastes I/O and memory space.

2.2, efficient paging

The essence of limit m,n is to first execute limit m+n, and then take n rows from the mth row. In this way, when the limit page is turned backward, m will become larger and the performance will be lower. For example

select * from A limit 100000,10. The performance of this SQL statement is very poor. It is recommended to change it to the following version:

selec id,name,age from A where id >=(select id from A limit 100000,1) limit 10

2.3, range query

Range queries include between, greater than, less than and in. The conditions of the in query in Mysql are limited in number. If the number is small, index query can be used. If the number is large, it becomes a full table scan. As for between, greater than, less than, etc., these queries will not be indexed, so try to put them after the query conditions that are indexed.

2.4, fuzzy query like

Using statements like %name% will not go through the index, which is equivalent to a full table scan. When the amount of data is small, there will not be much of a problem. When the amount of data is large, the performance will drop drastically. It is recommended to use a large amount of data. In the future, search engines will be used to replace this fuzzy search. If this is not possible, a condition for indexing should be added before the fuzzy query.

2.5, multi-table connection

Both subquery and join can be used to fetch data from multiple tables, but the performance of subquery is poor. It is recommended to change the subquery to join. For mysql's join, it uses the Nested Loop Join algorithm, which means querying the next table through the result set of the previous table query. For example, the result set of the previous table is 100 pieces of data, and the latter table has 10W data. Then you need to filter the 100*10W data set to get the final result set. Therefore, try to use a table with a small result set to join a large table, and at the same time create an index on the join field. If the index cannot be built, you need to set a large enough join buffer size. If none of the above techniques can solve the performance degradation problem caused by join, then simply stop using join and split a join query into two simple queries. In addition, multi-table connections should not exceed three tables. Generally speaking, the performance of more than three tables will be very poor. It is recommended to split the SQL.

3. Database connection pool optimization

The database connection pool is essentially a cache, which is a means of resisting high concurrency. Database connection pool optimization is mainly about optimizing parameters. Generally, we use DBCP connection pool, and its specific parameters are as follows:

3.1 initialSize

The number of initial connections. The initial here refers to the first time getConnection is obtained, not when the application starts. The initial value can be set to the historical average of concurrency

3.2, minIdle

Minimum number of reserved idle connections. DBCP will start a thread in the background to recycle idle connections. When the thread recycles idle connections, it will retain the number of minIdle connections. Generally set to 5, if the amount of concurrency is really small, it can be set to 1.

3.3, maxIdle

The maximum number of reserved idle connections is set according to business concurrency peaks. For example, if the concurrency peak is 20, then when the peak passes, these connections will not be recycled immediately. If there is another peak after a short period of time, the connection pool can reuse these idle connections without the need to frequently create and close connections.

3.4, maxActive

The maximum number of active connections is set according to the acceptable concurrency extreme value. For example, the acceptable extreme value of single-machine concurrency is 100. Then after maxActive is set to 100, it can only serve 100 requests at the same time, and excess requests will be abandoned after the maximum waiting time. This value must be set to prevent malicious concurrency attacks and protect the database.

3.5, maxWait

The maximum waiting time for obtaining a connection is recommended to be set shorter, such as 3s, so that the request can fail quickly, because when a request is waiting to obtain a connection, the thread cannot be released, and the thread concurrency of a single machine is limited. Yes, if this time is set too long, such as the 60s recommended on the Internet, then this thread cannot be released within these 60s. As long as there are more such requests, the application will have fewer available threads and the service will become unavailable. .

3.6, minEvictableIdleTimeMillis

The time the connection remains idle without being recycled, the default is 30 minutes.

3.7, validationQuery

The sql statement used to check whether the connection is valid, usually a simple sql, it is recommended to set it

3.8, testOnBorrow

The connection is detected when applying for a connection. It is not recommended to turn it on as it will seriously affect the performance

3.9, testOnReturn

Check the connection when returning it. It is not recommended to turn it on as it will seriously affect the performance

3.10, testWhileIdle

After turning it on, the thread that cleans up the connection in the background will validate the idle connection every once in a while. If the connection fails, it will be cleared. It will not affect the performance. It is recommended to turn it on

3.11, numTestsPerEvictionRun

Represents the number of links checked each time. It is recommended to set it as large as maxActive so that all links can be effectively checked each time.

3.12. Preheat connection pool

For the connection pool, it is recommended to preheat the application when starting it, and perform a simple SQL query before providing external access to fill the connection pool with the necessary number of connections.

4. Index optimization

When the amount of data increases to a certain level, performance can no longer be improved by sql optimization. At this time, a big move is needed: indexing. There are three levels of indexing. Generally speaking, it is enough to master these three levels. In addition, the selectivity of the fields to be indexed needs to be considered.

4.1, primary index

Create an index on the condition behind where. A single column can create a normal index, and multiple columns can create a combined index. Composite indexes need to pay attention to the leftmost prefix principle.

4.2, secondary index

If there is a field used by order by or group by, you can consider building an index on this field. In this way, since the index is naturally ordered, the sorting caused by order by and group by can be avoided, thereby improving performance. .

4.3, three-level index

If the above two methods are not enough, then add the index to the queried field. At this time, the so-called index coverage is formed. This can reduce one I/O operation, because when MySQL queries the data, it first Check the primary key index, then check the ordinary index based on the primary key index, and then check the corresponding records based on the ordinary index. If the records we need are in the ordinary index, then the third step is not needed. Of course, this index building method is quite extreme and is not suitable for general scenarios.

4.4. Index selectivity

When building an index, try to build it on highly selective fields. What does high selectivity mean? The so-called high selectivity means that the amount of data found through this field is small. For example, if you check a person's information by name, the amount of data found will generally be very small, but if you check by gender, half of the data in the database may be found. Therefore, name is a highly selective field, and gender is a low-selective field.

5. Historical data archiving

When the amount of data increases by 5 million items a year, the index cannot do anything. At this time, the general idea is to consider sub-databases and tables. If the business does not grow explosively, but the data does increase slowly, you can ignore the complex technical means of sharding databases and sharding tables, and instead archive historical data. We archive historical data whose life cycle has ended, such as data from 6 months ago. We can use quartz's scheduling task to regularly check out the data 6 months ago in the early morning, and then store it in the remote hbase server. Of course, we also need to provide a query interface for historical data in case of emergency.

The above is the optimization data of mysql stand-alone database. We will continue to add relevant data in the future. Thank you for your support of this site!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn