Home  >  Article  >  Database  >  Highly liked sharing: MySQL optimization ideas that are in line with production

Highly liked sharing: MySQL optimization ideas that are in line with production

藏色散人
藏色散人forward
2021-11-05 14:16:082241browse

Preface

The starting point of writing this article is to record my accumulated experience in dealing with data problems at work. As I write and write, I find that every point will lead to Other background knowledge should be provided, such as when optimizing indexes, you need to have a certain understanding of slow query, Explain and other related functions. For example, introducing Elasticsearch requires solving data synchronization, learning Elasticsearch knowledge, etc. Due to the length of the article, it is impossible to cover every point. They are all detailed like video tutorials, and I can only summarize them based on my limited knowledge and some general points. Even so, the length of the article is already very long. If you are interested in a certain point, please go to Baidu/Google for in-depth knowledge of individual details.

The article is quite long, so if you are interested, you may want to read it through. I hope you haven’t wasted dozens of minutes. [Recommended learning: "mysql video tutorial"]

Thinking Perspective

Database technology has gone through the manual management stage and the file system stage so far. and database system stage.

In the early days when there was no software system, real-world operations of a certain business could also be realized through the manual management stage of manual accounting and verbal agreements. This form existed for a long time and was relatively inefficient. a plan. In the next stage, with the development of computer technology, there was a file system stage that replaced manual accounting with excel tables, which improved productivity to a certain extent. In the software system stage, which is a database system with simple operation and high efficiency, productivity has been improved again, specific problems in the real world are abstracted into data, and real-world business is represented through the flow and change of data. In software systems, data storage is generally composed of a relational database and multiple non-relational databases.

The database is strongly related to the system business. This requires the product manager to understand the data storage and query process when designing the business. At the beginning of the design, it is clear what impact the business changes will have on the database and whether A new technology stack needs to be referenced. For example, a business designed by the product manager is to conduct statistical analysis and summary of data on multiple MySQL tables with a single table volume of millions. If MySQL multi-table query is directly used, slow queries will definitely occur and cause the msyql service to go down. In this case, the solution is Either compromise on the product side or change the technology stack.

In the system architecture and database solution, we should choose the one that is more suitable for the company's team capabilities. In the early stage of the system, simple database optimization with banknote capabilities will be the most cost-effective solution, but when it comes to mysql database banknote capabilities, there is nothing we can do. , introducing software services that focus on key functions will become the most cost-effective solution. How to choose the appropriate solution when encountering problems is the time to reflect your value.

A poor boy falls in love with a rich girl. The short-term sweetness cannot match the real class inequality. The happy ending only exists in the fantasy of the poor boy and the TV series of Teacher Qiong Yao.

How to improve the performance of data storage at a limited cost is the central idea of ​​this article.

Background knowledge

I believe that everyone will often come into contact with the following content in their daily work. Let me briefly summarize it.

Relational database

Relational database is a data organization composed of two-dimensional tables and the relationships between them, providing transaction data consistency, Functions such as data persistence are the core storage services of software systems. They are the databases we most often come into contact with during development and interviews. For some small outsourcing projects, one MySQL is enough to meet all business needs. It is something that we often come into contact with, and it is actually full of tricks. We will discuss the tricks in detail in the following chapters.
Advantages:

  1. Transaction
  2. Persistence
  3. Relatively common SQL language

Problems

  1. The requirements for hard disk I/O are very high
  2. The aggregation query efficiency of large amounts of data is low
  3. Index misses
  4. The leftmost matching principle of the index leads to mismatches Suitable for full-text retrieval
  5. Improper use of transactions will cause lock congestion
  6. Various problems caused by horizontal expansion are difficult to deal with

Non-relational Type database - NoSql

MySQL database, as a relational data storage software, has advantages and obvious disadvantages. Therefore, when the data volume of the software system continues to expand and the business complexity continues to increase, We cannot expect to solve all problems by enhancing the capabilities of the MySQL database. Instead, we need to introduce other storage software and use various types of NoSQL to solve the problems of the software system's expanding data volume and increasing business complexity.
Relational database is an optimization of relational database in different scenarios. It does not mean that everything will be fine if you introduce some kind of NoSQL. It means that you should fully understand the types and application difficulties of NoSQL on the market and choose the appropriate storage in the appropriate scenario. Software is the way to go.

Key-Value type

In business, the contents of certain tables are often queried, but most of the query results remain unchanged, so Key-value storage software, mainly Memcached and Redis, has emerged and is widely used in cache modules in the system. Redis has more data structures and persistence than Memcached, making it the most widely used among KV-type NoSQL.

Search type

In the scenario of full-text search, query optimization of MySQLB tree index, like query cannot hit the index, and every like keyword query is one time Full table scan can be supported in tables with tens of thousands of data, but slow queries will occur when the data is at the end. If the business code is not well written and the Like query is called in the transaction, a read lock will occur. ElasticSearch, with inverted index as its core, can perfectly meet the scenario of full-text search. At the same time, ElasticSearch also supports massive data very well, and the documentation and ecology are also very good. ElasticSearch is a representative product of search type.

Document type

Document type NoSql refers to a type of NoSql that stores semi-structured data as documents. Document type NoSql usually stores data in JSON or XML format. , so document-type NoSql does not have Schema. Since there is no Schema feature, we can store and read data at will. Therefore, the emergence of document-type NoSql solves the problem of inconvenient expansion of relational database table structures. The author has never used

column formula

For enterprises of a certain size, the business often involves some real-time and flexible data summary, which is not suitable for this kind of business Use the solution of calculating in advance to solve the problem. Even if you can write the business using the solution of calculating and summarizing in advance, as the number of summarized data increases, the final step of accumulating the summarized data will gradually become very slow. Column-based NoSQL is the product of this scenario. It is one of the most representative technologies in the big data era. The most common one is HBase, but the application of HBase is very heavy and often requires a complete set of Hadoop ecosystem to run. The author's company Alibaba Cloud's AnalyticDB is used, a column storage software compatible with MySql query statements. The powerful query capabilities of summary column storage software are sufficient to support various real-time and flexible data summary services.

Case

Taking 2021 as the time node, most systems start with the following plan in the early stage. Next, I will use this case Make some adjustments slowly.

Highly liked sharing: MySQL optimization ideas that are in line with production

#The benefits brought by hardware upgrades are lower as time goes by. This is the fastest optimization solution when time and personnel are tight. The benefits brought by software optimization are higher in the future, but the level of technical personnel required is also higher in the future. When time and personnel permit, it is the most cost-effective optimization solution. Hardware and software optimization are not mutually exclusive. When needed, both can approach the upper limit of MYSQL performance at the same time.

Highly liked sharing: MySQL optimization ideas that are in line with production

Hard Optimization - Cash Ability
  • Phase One

    • Improvement For disk I/O, try to use SSD disks (quality improvement)
    • Increase memory and increase query cache space
    • Increase the number of CPU cores and increase execution threads
  • Phase 2

    • Replace the self-built mysql with the service provider mysql service
    • Enable the built-in read and write separation function
  • Phase 3

    • The service provider’s mysql service is replaced with a cloud-native distributed database
    • Enable the built-in read and write separation function
    • Enable the built-in sub-table function
Soft Optimization - Query - OLTP

OLTP is mainly used to record the occurrence of certain types of business events, such as user behavior. When the behavior occurs, the system will record When and where the user did something, such a row (or multiple rows) of data will be updated in the database in the form of additions, deletions, and modifications. This requires high real-time performance, strong stability, and ensuring that the data is updated successfully in a timely manner. Common business systems all belong to OLTP, and the databases used are transaction databases, such as MySlq, Oracle, etc. For OLTP, improving query speed and service stability are the core of optimization

Highly liked sharing: MySQL optimization ideas that are in line with production

  • Slow query
    • Discover the SQL with efficiency problems through the slow query log
  • Problem SQL troubleshooting direction
    • There is a problem with the index design
    • There is a problem with the SQL statement
    • Wrong index selection for the database
    • The single table is large
  • Explain specific analysis
    • View sql execution comparison rate
    • Check the index hit situation (key points)
  • mysql optimizer
    • When the optimizer selects an index, it will refer to the index Cardinality
    • The cardinality is automatically maintained and estimated by MySQL, and may not be accurate
    • If the index does not hit or the wrong index is used, there is a problem with the optimizer step
    • analyze can re-count index information and recalculate the base number
  • Force index
    • The force keyword can force the use of index and forcefully specify index on the business code
  • Covered index - the most ideal hit index
    • Covered index means that the same index (unique, ordinary, joint index, etc.) is used from the execution of the query statement to the returned result
    • Covering indexes can reduce back table queries
    • If the data query uses more than one index, it is not a covering index
    • You can optimize the SQL statement or optimize the joint index. Use covering index
  • count() function
    • count(non-indexed field) - covering index cannot be used, theoretically the slowest
    • count(index field) - can overwrite the index, but still needs to determine whether the field is null each time
    • count (primary key) - Same as above
    • count(1) - only scans the index tree, there is no process of parsing the data rows , theoretically faster, but it will still determine whether 1 is null
    • count(*) - MySQL specifically optimizes the count(*) function to directly return the number of data in the index tree, optimal
  • ORDER BY
    • Minimize additional sorting and specify where condition
    • The combination of where statement and ORDER BY statement satisfies the leftmost prefix
    • The most efficient - Index coverage (few scenarios, low chance of encounter)
      • Index coverage can skip generating the intermediate result set and directly output the query results
      • The ORDER field needs to be indexed and consistent with the WHERE conditions and output The contents are all in the same index
  • Paging query
    • First find a way to cover the index
    • Find out what you need first The id of the data, return to the table to get the final result set
  • Index pushdown
    • KEY store_id_guide_id (store_id,guide_id) USING BTREE
    • select * from table where store_id in (1,2) and guide_id = 3;
    • Before MySQL5.6, you need to use the index to query store_id in (1,2), then add all tables to verify film_id = 3
    • MySQL5.6, if it can be read in the index, directly use index filtering
  • Loose index scan
    • KEY store_id_guide_id (store_id,guide_id) USING BTREE
    • select film_id from table where guide_id = 3
    • New features of MySQL8.0
    • Loose index scan can break the "left-hand principle" and solve the problem of losing the leading brother
    • The efficiency is lower than that of the joint index
  • Function operation
    • If you perform function operation on the index field, the optimizer will give up the index
    • This situation may include: time function, converting string to number, character encoding conversion
    • Optimize the use of server-side logic to replace mysql functions
  • The single table size is too large
    • Upgrade mysql, the single table size that different mysql software can carry It is different. Based on my current experience, there is no problem in querying the hit index when the single table of Alibaba Cloud polardb cluster version is 200 million (high priority)
    • Data settlement - such as pipeline data can be The latest value is obtained through settlement at a certain point in time, and the settled flow is transferred to the backup table (medium priority)
    • Separation of hot and cold data - data that cannot be settled is distinguished according to the frequency of query, and the frequency is low Transfer the query to another table, and distinguish the entry point of the query (medium priority) in terms of business
    • Distributed database table splitting-enable the table splitting function of the distributed database with orders, and the distributed database component management is Insertion and query after table splitting (medium priority)
    • Code to implement table splitting - split a single table into multiple tables according to certain rules, after splitting in most framework ORMs of PHP and GO It is necessary to make certain modifications to the framework ORM. The ORM in JAVA has native support. It is recommended to consider it at the early stage of the project. The later it becomes more difficult (lower priority)
Soft Optimization - Write Update Delete

Highly liked sharing: MySQL optimization ideas that are in line with production

  • Lock

    • According to granularity MySQL locks can be divided into global locks, table-level locks, and row locks

    • Global lock

      • 自google/baidu
    • Table-level locks are divided into table locks (data locks) and metadata locks

      • Table locks
        • 自google/baidu
      • Metadata lock
        • 自google/baidu
    • ##Row lock will lock the data row, divided into Shared locks and exclusive locks

        Google/baidu
  • Solution to deadlock

    • Parameter configuration
      • Adjust the innodb_lock_wait_timeout parameter
        • The default is 50 seconds, that is, if the lock is not acquired after waiting for 50 seconds, the current statement will report an error
        • If the waiting time expires Long, you can shorten this parameter appropriately
      • Active deadlock detection: innodb_deadlock_detect
        • Roll back less costly transactions when a deadlock is found
        • Enabled by default
    • Do not open the transaction if it is not necessary
    • Try to place the query outside the transaction to reduce the number of locked rows
    • Avoid The transaction time is too long, do not trigger http requests in the transaction
    • Actively check the transaction status
      show  processlist;SELECT * FROM information_schema.INNODB_TRX; //长事务SELECT * FROM information_schema.INNODB_LOCKs; //查看锁SELECT * FROM information_schema.INNODB_LOCK_waits; //查看阻塞事务
Search business
  • The number of search rows is less than 100,000 - mysql is hard to carry
    • Improve the cpu, io and memory hardware of mysql
  • The number of search rows is more than 100,000 - introduction Elasticsearch

Highly liked sharing: MySQL optimization ideas that are in line with production

Elasticsearch’s inverted index is suitable for full-text search, but the data structure has poor flexibility.

  • Data synchronization
    • When the business code changes data, it is synchronized to Elasticsearch at the same time
    • Canel subscription mysql log triggers synchronization
  • Elasticsearch-index
    • is composed of a list of documents with the same fields - analogous to mysql's table
    • Once the field type is set, modification is prohibited and new fields are allowed
    • Specific The method is self-directed google/baidu
  • Elasticsearch-Document
    • The user’s data document stored in es - analogy to the row of mysql
    • is composed of metadata and Json Object composition
    • Metadata and Json Object details are provided by google/baidu
  • Elasticsearch-Word Segmenter
    • by Google/baidu
  • Elasticsearch-Inverted Index (Key Points)
    • 自google/baidu
  • ##Elasticsearch-Aggregation Analysis
    • 自google/ baidu
Statistical Business-OLAP
OLAP is used for decision-making analysis of data relative to OLTP transaction processing scenarios. It is an offline data warehouse idea used in big data analysis, not a specific technology stack. If your solution can embody the idea of ​​OLAP analysis and processing, then the solution is OLAP.

Early data warehouse construction mainly refers to modeling and summarizing enterprise business databases such as ERP, CRM, SCM and other data into the data warehouse engine according to the requirements of decision-making analysis. Its application is mainly for reporting purposes. It is to support the decision-making of management and business personnel (medium- and long-term strategic decisions). As IT technology moves toward the Internet and mobility, data sources become more and more abundant. Unstructured data appears on the basis of the original business database, such as website logs, IoT device data, APP buried data, etc. The amount of these data It is several orders of magnitude larger than previous structured data.

No matter how the business faced by OLAP changes, it is inseparable from the following steps: Determine the analysis field->Synchronize business data to the computing library->Data cleaning modeling->Synchronize to the data warehouse- >Exposed to the outside

The calculation source database is specially used for data cleaning, and the purpose is to avoid affecting the performance of the business database during data cleaning. By cleaning the data in the calculation source database according to business and dimensions, the usability and reusability of the data are increased, and the final real-time detailed data is obtained, which is transferred to the data warehouse, and the data warehouse provides the final decision analysis data.

DEMO plan

Highly liked sharing: MySQL optimization ideas that are in line with production

Production plan

Highly liked sharing: MySQL optimization ideas that are in line with production

The software in each link can use the same functions If the software is replaced by the team's most confident software implementation solution, then the solution is OLAP.

Summary

Optimization must be down-to-earth, with capability accumulation step by step, multiple rounds of iterations, and cannot be achieved overnight. Conduct multiple rounds of iterations based on your own foundation, business scenarios and future development expectations.

The principle of iteration is to first improve the efficiency of the software through soft optimization and hard optimization of a single software service. When the optimization cost is lower than the benefit, based on future development expectations, refer to mature solutions on the market and follow the solutions. Introduce new software as needed for combined innovation, and avoid blind copying. Only through organic integration can the effects of 1 1>2, 2 1>3 be achieved. When the referenced software encounters a bottleneck, repeat this process.

Thank you for reading this. The above is all the content of the article. The optimization points and solutions proposed in the content are not necessarily the optimal solutions. They are the best practices in personal work. If you have different opinions, you are welcome to discuss them. comminicate.

                                                                                                                       

The above is the detailed content of Highly liked sharing: MySQL optimization ideas that are in line with production. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:learnku.com. If there is any infringement, please contact admin@php.cn delete