Home >Backend Development >PHP Tutorial >MySQL optimization considerations
When operating the database, we need to optimize mysql. This article talks about optimization precautions.
The first point is that the hardware is too old
The hardware here is mainly discussed from the three aspects of CPU, memory, and disk. There are also some factors such as network cards, computer room networks, etc. Due to the length of the article, we will not introduce them one by one. Yes, there will be another opportunity to chat in the future.
First of all, let’s take a look at MySQL’s CPU utilization characteristics:
5.1 can utilize 4 cores, 5.5 can utilize 24 cores, and 5.6 can utilize 64 cores.
For example, MySQL5.6 can use If it has more than 48 CORE and runs well, 64 CORE can be used (between 48CORE-64CORE, the official announcement is 48 CORE, and in my actual test, it can reach 64 CORE).
MySQL 5.6 can use 48 cores
*Before MySQL 5.1, up to 4 cores can be used **
Now the general production environment servers are 32CORE or above.
So I recommend everyone here to use MySQL5.5 or MySQL5.6 as much as possible, unless your company's server has been using a very old server with only 4 cores or 1 core.
Because before 5.1 (the same as 5.0), it was hard-coded in the internal code and was based on the innobase storage engine. The database had poor hardware utilization. After it evolved into the InnoDB engine, it became much better.
Each connection is a thread (non-thread pool), and each query can only use one core.
In addition, each query in MySQL can only use one CPU.
Oracle uses parallel SQL and parallel query. This kind of function does not exist in MySQL.
No execution plan cache (no SQL execution plan precompilation)
Secondly, there is no SQL precompilation inside MySQL. Therefore, there is no structure like the library cache in Oracle's memory structure. Therefore, MySQL only has hard parsing, there is no soft parsing, let alone soft parsing.
MySQL will experience performance degradation as the number of connections increases
This is also a flaw of MySQL, but with the evolution of MySQL versions, many solutions have emerged.
For example: the officially launched thread pool, referred to as TP. It is to solve the problem of too high number of concurrent connections, but this is an additional component of MySQL, and the official TP requires additional money to purchase.
In addition, there is a person in China named Lou Fangxin who developed a OneSQL middleware to solve similar problems.
There is a Result cache, but it is useless.
MySQL also has a result cache similar to that in Oracle, called Query Cache, but it is a relatively useless function and is rarely used.
Because most actual production environments are OLTP systems with frequent update and modification operations, this Query Cache will seriously degrade the performance of MySQL when used in an environment where data is frequently updated and modified. Therefore, it is generally rarely used.
Nowadays, when using MySQL, the InnoDB storage engine is basically used. The previous MyISAM engines are rarely used. (What is a storage engine? If you don’t know this, you can gg)
There is no need to turn on the Query Cache for the InnoDB engine, because it is a transactional storage engine, and using InnoDB uses its transaction processing Ability, frequent data updates and modifications will definitely occur.
Let’s look at the memory utilization characteristics of MySQL again
The server with a 64-bit operating system can use memory ((2^64-1)/1024/1024/1024)G
In a high-speed concurrent environment, it basically relies on Memory caching to reduce the IO impact on the disk
Usually the memory is planned according to 15%-20% of the actual data. If the data is particularly hot, a larger proportion needs to be considered to cache the data
This 15%-20% Data is usually called hot data. (This is also a common experience value)
For example, if you estimate that the total data volume of your MySQL is about 500G, then the memory provided by MySQL may be 75G (5000.15), then you may need a 128G machine memory server.
In addition, some businesses will have particularly hot and large amounts of hot data (it is possible to greatly exceed the 15%-20% range), such as QQ Farm.
I believe everyone has played the food-stealing games before, such as QQ Farm, Happy Farm and the like. (There is also a 12306 website for booking tickets).
This type of business is of high concern in our industry. The characteristics of this type of business are that when the data is hot, it is basically 100% hot data. For example: when everyone plays QQ Farm, it comes up every day. For fun, they would come up and steal some vegetables every once in a while. Many people would steal a handful of vegetables when they got up in the middle of the night to go to the toilet.
So the memory configuration of the MySQL database for this type of business must be increased. 15-20% is not enough.
Summary: ****General business 15%-20% is used to plan hot data, such as: user center, orders and other common businesses. For some other special businesses, the specific situation must be analyzed in detail.
Guidance allocation can be made based on Query response time
When we are planning and designing this large-scale online architecture-large database,
The response time of SQL query is also a very important indicator.
In such a large system, it must carry millions or even tens of millions of users to conduct business online at the same time. The response time of SQL query (query) must be strictly controlled. The Query response time of your system must be Control within a certain amount of time.
For example, for our core library, I require Query’s response time (average response) to be below 30ms. If it exceeds 30ms, we think that the database may have reached the load limit and the database needs to be expanded.
In addition, long-term indicator monitoring of this Query response time is required.
This is the core library. If there are other less important auxiliary libraries, such as libraries that store logs, or some libraries whose performance requirements are not too high, we can relax the Query response time to within 1 second or 2 seconds. .
Determine the threshold of this Query response time according to the importance of the business.
This is a very important guiding principle. Plan your performance capacity based on Query response time.
There are two types of capacity: performance capacity and space capacity. The space capacity is very simple, that is, how much SIZE data is placed, and how many T.
Performance capacity is more important and determines whether it can handle your business pressure and load.
Everyone needs to remember: If the business you want to fight against has millions of active users, not hundreds of users, performance is king, and meeting business needs in terms of performance is the most important.
No matter how awesome your functions are, no matter how good your product is, if the performance is unmatched, everything else is nonsense. Hundreds of people may bring down your entire system and project in a few seconds, and then your company will be blinded. .
Users who have worked hard will also be lost in large numbers, and the losses will be heavy.
Performance is the foundation. The entire architecture only makes sense if the performance can withstand it. If the performance is unsatisfactory, it will be useless to consider high availability later.
MySQL’s utilization characteristics of disk
Binlog, redo log, undo log sequential IO
MySQL has various IO types.
Binlog, redolog, undolog, these are all sequential IO writes.
There is no need to put this kind of things on SSD. Sequential writing on traditional mechanical disks is also very fast. Putting it on SSD is a waste of money. Moreover, SSD has problems of write loss and write life, so there is no need to put it on SSD. on SSD. Putting it on a traditional SAS disk is enough. There is no need to put an SSD.
SSD is used to store datafile. Because most of the IO that occurs on the datafile is random IO, it is very advantageous for SSD to run random IO. SSD solid state disk and traditional disk SAS disk are mixed together for storage. In addition, do not use SSD for backup disks.
Datafile random IO and sequential IO combined
Sequential IO is always faster. In database design, what determines whether you are an awesome DBA or an awesome architect depends on whether you can design a business as sequential IO as much as possible while reducing random IO. For example: When designing a friend relationship business, I hope that a query can take out the friend relationship through sequential IO. So how to design it?
In MySQL's InnoDB, we can take advantage of a feature of InnoDB: clustered index tables. (Similar to Oracle's IOT).
Using this feature, the user's friend data can be gathered in one page or multiple adjacent pages as much as possible. When reading, a sequential read IO can be done, and the performance is greatly improved.
The structure of the friend relationship table is as follows (the premise table is the InnoDB engine):
owner_id friend_id (friend id)
The above two fields are used as a primary key. The primary key of InnoDB is the clustered index. Then read these two fields IO can handle the fields in a certain order.
In the past, any database design books always mentioned that each table must add a specification for an auto-incrementing primary key. In fact, the specification is dead and the response is living. The friend relationship I gave an example above is useless. Instead of adding an additional primary key, two business fields that have business attributes and are read frequently are used as primary keys, which results in better performance.
Therefore, when you study, don’t memorize the norms and regulations in these books. Instead, you should really understand the principles of something, such as learning the internal principles of InnoDB, and then in actual work, with the support of the principles, use The principle is to draw inferences from one instance to other cases.
The principles of InnoDB are a large piece of knowledge and require learning over time. You can pay more attention to my official account, and some articles about InnoDB will be released one after another.
OLTP business requires more random IO
You can use memory for caching, thereby reducing random IO
OLAP business requires more sequential IO
Memory caching is not very useful
Before MySQL5.6, it was Page modification is not supported, and the default is 16K.
It can be changed after MySQL5.6. This parameter is innodb_page_size, but MySQL5.6 can only be changed to 8K or 4K, and cannot be increased. It cannot be changed to 32K or 64K until MySQL5.7 or above.
For OLAP systems, larger pages will help improve performance, because OLAP systems have relatively large queries and scan a lot of data.
Second point: Poor database design
For example, a lot of database features are used, such as triggers, partitions, a lot of stored procedures, functions, etc.
We often say that small is beautiful, which means that simplicity is the best. If you use all the functions of the database, the performance of the database will naturally be slowed down, and the chances of possible bugs and underlying failures will increase.
So everyone must understand that a good database project design is small, beautiful, concise and simple. In addition, the database is only a part of the overall project. Things like triggers and stored procedures can definitely be implemented using application code in the overall project.
So, when we use MySQL, we just use its powerful features, such as tables, indexes, and transactions, rather than using all its functions.
Another point is that before MySQL 5.6, subqueries were not allowed in the main database of the production environment.
The performance of subqueries before MySQL5.6 was particularly poor. (Syntax is supported, but SQL performance is very poor).
For example, if you are using Oracle now and want to migrate Oracle to MySQL, it is recommended that you use MySQL 5.6 version. MySQL 5.6 has made great improvements in subquery support and performance.
The performance of MySQL5.6 running subqueries will be greatly improved.
The third point: the program is too poorly written
I guess students who have been DBAs should have experienced this. In small and medium-sized companies, the level of programmers varies.
Especially when I meet many programmers who have just entered the industry (fresh graduates), it is more likely that these programmers who have just entered the industry will also take on some very urgent needs. It is difficult to think of a program developed in such an environment.
Of course, it’s not our programmers’ fault, we can’t blame them.
The reason for my above phenomenon is mainly due to the domestic development environment. There is no way. The development needs are urgent (products are activated every day), and programmers are busy rushing to work (long-term overtime). They can only be busy implementing business programs, and there is no way at all. Time to optimize the program.
Of course, in this environment, it is an opportunity for us DBAs. Bad SQL and complex SQL written by programmers caused the system to be slow or even crash. Then our DBA stepped in to optimize and transform these bad SQL and slow SQL, and the system returned to normal and became increasingly stable. This is also something that is very fulfilling and will be respected by colleagues and leaders.
At the same time, DBAs can also strengthen training for programmers and enhance their ability to quickly write good SQL. Let them spend less time and write SQL statements with better performance and smoother performance. In this way, it can also reduce the burden on the DBA.
I personally prefer to talk to programmers about training. Firstly, everyone can gain something by exchanging technology. Secondly, it can build a good relationship and make it easier to talk about any matters at work that need to be negotiated in the future. This is better than treating them to a meal.
We mainly have the following solutions for poorly written programs:
To make applications use database connection pools, especially in large-scale high-concurrency applications developed based on JAVA, connection pools must be used.
The advantage of using the connection pool is that it can limit the number of connections in the application. In addition, there is no need to create each additional connection. The cost of creating a connection for MySQL is also large, because creating a new connection is equivalent to MySQL creating a new connection. thread.
I also mentioned just now that MySQL performance will decrease as the number of connections increases.
Those who have written program code should also know that on our ordinary PC notebook (usually 4CORE), you create 400 threads, and each thread does 1 1 1 1... simple work, and then sleeps , check whether your PC is stuck or not. You will find that the CPU of your PC is almost full. If you dare to create 600 threads, then your machine will be restarted soon. This is because the CPU is fully occupied due to thread overhead.
Complex SQL Statements
As I said just now, SQL written by programmers generally has many problems. After all, they are too busy and do not consider the performance and operation of this SQL. In some cases, the SQL spliced by the programmer can directly bring down the entire system.
Let me give you a simple example: one of our applications creates 10 connections to the database (maximum number of connections = 10). Each of these 10 connections runs the same complex SQL at the same time. It takes at least 10 seconds to execute this complex SQL. minutes, then these 10 connections can only execute this complex SQL within 10 minutes, and all other subsequent SQLs will be blocked.
As a result, most applications will be unavailable for 10 minutes, right? And it may cause an avalanche and cause the system to collapse.
The optimization of complex SQL is also a very important job for DBAs. It is necessary to find out these complex SQL, slow SQL, and bad SQL through monitoring methods, and then give optimization suggestions to programmers (DBA needs to conduct performance comparison tests). Only by allowing programmers to modify the code can the system truly run smoothly and in parallel, like a highway without traffic jams.
Then someone may ask, our company's programmers are just bad guys. They won't change the SQL code even if they die, they won't optimize it, and they can't communicate. So what should we do?
We still have a way. We can also build a dedicated slave library (Slave library) to handle it. You can change the library query and it will be fine.
Take our company as an example. Our backend system that generates reports is connected to the slave database for query, and does not connect to the main database.
Invalid logic
Full table scan
For example: update t set a = a 1; Forgot to add the where condition.
If you want your system to support millions of users online, you must add the SQL Review system (SQL Review) to eliminate SQL with invalid logic and SQL with full table scans.
SQL can only be released online after it has been approved by the DBA.
In addition, this kind of large update SQL should be updated in batches, and the large SQL task should be divided into small tasks to run. In MySQL, this requires special attention.
Why update in batches?
**Reason 1. **As mentioned above, a MySQL query can only use one CORE. SQL transactions are too large and complex and take a long time to run, which can easily cause congestion.
Reason 2. In the online environment, MySQL generally has a Master/Slave architecture. If a large update transaction of 1 million rows occurs in the Master, it is likely to cause the SLAVE to get stuck there, because the SLAVE is a single-threaded structure, causing synchronization delays.
MySQL writes SQL and creates small transaction SQL, which can be executed quickly and submitted quickly. Let each query complete faster and the connection be released faster.
This article explains the precautions for MySQL optimization. For more related knowledge, please pay attention to the php Chinese website.
Related recommendations:
Discuz!X/Database DB:: Function operation method
ThinkPHP framework String class detailed explanation
The above is the detailed content of MySQL optimization considerations. For more information, please follow other related articles on the PHP Chinese website!