Detailed explanation of how PHP optimizes MySQL tens of millions of tables-PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

Detailed explanation of how PHP optimizes MySQL tens of millions of tables

藏色散人

Jan 25, 2020 pm 02:16 PM

mysqlphp

Detailed explanation of how PHP optimizes MySQL tens of millions of tables

First of all, using Mysql to store hundreds of billions of data is indeed a very big challenge. A single Mysql table can indeed store 1 billion levels of data, but the performance is very poor at this time. A large number of experiments in the project have proven that the capacity of a single Mysql table is around 5 million, and the performance is at its best.

Optimization of large tables is mainly solved through database sub-database and table sub-tables. There are currently three common solutions: Partition, Sub-database and sub-table, NoSql/NewSql. In actual projects, these three solutions are combined. At present, the core data of most systems are mainly RDBMS storage, supplemented by NoSql/NewSql storage.

Partition

First let’s understand the partition scheme.

Partitioned tables are implemented by multiple related underlying tables. These underlying tables are also represented by handle objects, so we can also directly access each partition. The storage engine manages the underlying tables of the partitions in the same way as it manages ordinary tables (all underlying tables must use the same storage engine). The index of the partition table is just Add an identical index to each underlying table. This solution shields users from the details of sharding. Even if the query conditions do not have a sharding column, it can still work normally (but the performance is average at this time).

However, its shortcomings are obvious: many resources are limited by a single machine, such as the number of connections, network throughput, etc. How to partition is one of the key elements in practical applications.

Let’s start with an example: Taking customer information as an example, the amount of customer data is 50 million plus. The project background requires saving the customer’s bank card binding relationship, the customer’s document binding relationship, and the customer’s binding business information.

Under this business background, how to design the database. During the first phase of the project, we established a customer business binding relationship table, which contains redundant business information bound to each customer.

The basic structure is roughly as follows:

Detailed explanation of how PHP optimizes MySQL tens of millions of tables

When querying, index the bank card and the business number Index, document number is used as index. As demand increases, the indexes of this table will reach more than 10. Moreover, when the customer cancels the contract and then signs the contract again, two pieces of data will be saved, but the binding status is different.

Assuming we have 50 million customers, 5 business types, and an average of 2 cards per customer, then the amount of data in this table will reach an astonishing 500 million. In fact, the number of users of our system is still It won't work until it reaches one million. Such a design is absolutely not possible. Whether it is inserting or querying, it will crash the system.

The data in the mysql database is stored on the disk in the form of files. By default, it is placed under /mysql/data (can be viewed through the datadir in my.cnf). One table mainly corresponds to three One file is frm to store the table structure, one is myd to store the table data, and the other is myi to store the table index. These three files are very large, especially the .myd file, which is almost 5G. The first partition optimization is carried out below. There are four partitioning methods supported by Mysql:

Detailed explanation of how PHP optimizes MySQL tens of millions of tables

In our project, range partitioning and list partitioning have no usage scenarios. If based on binding Customize the number to make a range or list partition. The binding number has no actual business meaning and cannot be queried through it. Therefore, we are left with HASH partition and KEY partition. HASH partition only supports partitions of int type columns, and is one of them. a row.

KEY partitioning can support multiple columns, but it also requires that one of the columns must be of type int. Looking at our library table structure, we find that none of the columns is of type int. How to partition? Add a column, the binding time column, set this column to the int type, and then partition it according to the binding time, and divide the users bound every day into the same area.

After this optimization, our insertion is much faster, but the query is still very slow. Why?

Because when making queries, we only query based on bank card or ID number, and not based on time. This is equivalent to every query, MySQL will query all partition tables.

Carry out the second solution optimization. Since HASH partitioning and KEY partitioning require that one of the columns must be of type int, is it possible to create a list of type int for partitioning?

Analysis found that there is a secret in the string of numbers on the bank card. Bank cards are generally a string of numbers ranging from 16 to 19 digits. Is it feasible for us to take one of the numbers and use it as a table partition? Through analysis, we found that in this string of numbers, one of them is indeed a random number from 0 to 9. Generated, we perform KEY partitioning based on the random digits of the bank card number. Every time we query, we intercept this random digit through calculation, add the card number, and perform a joint query to achieve the purpose of partition query. It should be noted that partitioning Finally, the index created must also be a partition column, otherwise Mysql will still query data in all partition tables.

The problem of querying the binding relationship through the bank card number has been solved. What about the certificate number? How to query the binding relationship through the certificate number.

As mentioned before, indexing must be done on the partition health, otherwise it will cause a full table scan. We created a new table to save the customer's ID number binding relationship. Each customer's ID number is unique. In the new ID number binding relationship table, the ID number is used as the primary key. So how to calculate the partition health? Well, the customer's document information is relatively complicated, including ID number, Hong Kong, Macao and Taiwan pass, motor vehicle driving license, etc. How to find the partition key in the disordered document number.

In order to solve this problem, we divide the document number binding relationship table into two. One of the tables is dedicated to saving ID number types, and the other table saves documents of other types. number, in the document binding relationship table of the ID card type, we split the month number in the ID number as the partition key, and save the ID number of the customer born in the same month in the same area, thus dividing it into 12 For each area, if the data volume of other certificate types does not exceed 100,000, there is no need to partition.

In this way, every time you query, first determine which table to query based on the certificate type, and then calculate the partition key for query. After the partition design, when saving 20 million user data, the data saving file of the bank card table was divided into 10 small files, and the data saving file of the certificate table was divided into 12 small files. This solved the two query problems and also One question remains: What to do with the business number?

A customer has multiple contracted services, how to save them? At this time, it is not appropriate to use a partitioning solution. It requires a table partitioning solution.

Sub-table

We mentioned earlier that for mysql, its data files are stored on the disk in the form of files. When a data file is too large, the operation of the large file by the operating system will be troublesome and time-consuming, and some operating systems do not support large files. At this time, the table must be divided.

In addition, the commonly used storage engine for mysql is Innodb, and its underlying data structure is a B-tree. When the data file is too large, querying a node may query many levels, which will inevitably lead to multiple IO operations to be loaded into the memory, which will definitely be time-consuming.

In addition, there is Innodb's locking mechanism for B-trees. By locking each node, when the table structure is changed, the tree will be locked. When the table file is large, this can be considered unachievable. So to sum up, we must perform the operations of sub-table and sub-database.

How to perform sub-database and sub-table, there are currently many versions on the Internet, some of the more well-known solutions: Alibaba's TDDL, DRDS and cobar, JD Finance's sharding-jdbc; private organizations' MyCAT; 360's Atlas ; Meituan’s Zebra; other companies such as NetEase, 58, JD.com and other companies have self-developed middleware.

So many sub-database and table middleware solutions can be summarized into two categories: client mode and proxy mode.

Detailed explanation of how PHP optimizes MySQL tens of millions of tables

client mode

Detailed explanation of how PHP optimizes MySQL tens of millions of tables

proxy mode

Whether it is client mode or proxy mode. Several core steps are the same: SQL parsing, rewriting, routing, execution, and result merging. Personally, I prefer to use the client mode. It has a simple architecture, relatively small performance loss, and low operation and maintenance costs.

How to divide the business types into databases and tables. The most important step in sharding databases and tables is the selection of sharding columns. The quality of sharding column selection will directly determine whether the entire database sharding and table sharding scheme is ultimately successful. The selection of sharding column is strongly related to the business.

In our project scenario, the best choice for sharding column is undoubtedly the business number. Through the business number, different binding contract services of the customer are saved in different tables, and are routed to the corresponding table for query according to the business number, so as to further optimize the SQL.

For more related php knowledge, please visit php tutorial!

The above is the detailed content of Detailed explanation of how PHP optimizes MySQL tens of millions of tables. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:cnblogs. If there is any infringement, please contact admin@php.cn delete

How does PHP type hinting work, including scalar types, return types, union types, and nullable types?Apr 17, 2025 am 12:25 AM

PHP type prompts to improve code quality and readability. 1) Scalar type tips: Since PHP7.0, basic data types are allowed to be specified in function parameters, such as int, float, etc. 2) Return type prompt: Ensure the consistency of the function return value type. 3) Union type prompt: Since PHP8.0, multiple types are allowed to be specified in function parameters or return values. 4) Nullable type prompt: Allows to include null values and handle functions that may return null values.

How does PHP handle object cloning (clone keyword) and the __clone magic method?Apr 17, 2025 am 12:24 AM

In PHP, use the clone keyword to create a copy of the object and customize the cloning behavior through the \_\_clone magic method. 1. Use the clone keyword to make a shallow copy, cloning the object's properties but not the object's properties. 2. The \_\_clone method can deeply copy nested objects to avoid shallow copying problems. 3. Pay attention to avoid circular references and performance problems in cloning, and optimize cloning operations to improve efficiency.

PHP vs. Python: Use Cases and ApplicationsApr 17, 2025 am 12:23 AM

PHP is suitable for web development and content management systems, and Python is suitable for data science, machine learning and automation scripts. 1.PHP performs well in building fast and scalable websites and applications and is commonly used in CMS such as WordPress. 2. Python has performed outstandingly in the fields of data science and machine learning, with rich libraries such as NumPy and TensorFlow.

Describe different HTTP caching headers (e.g., Cache-Control, ETag, Last-Modified).Apr 17, 2025 am 12:22 AM

Key players in HTTP cache headers include Cache-Control, ETag, and Last-Modified. 1.Cache-Control is used to control caching policies. Example: Cache-Control:max-age=3600,public. 2. ETag verifies resource changes through unique identifiers, example: ETag: "686897696a7c876b7e". 3.Last-Modified indicates the resource's last modification time, example: Last-Modified:Wed,21Oct201507:28:00GMT.

Explain secure password hashing in PHP (e.g., password_hash, password_verify). Why not use MD5 or SHA1?Apr 17, 2025 am 12:06 AM

In PHP, password_hash and password_verify functions should be used to implement secure password hashing, and MD5 or SHA1 should not be used. 1) password_hash generates a hash containing salt values to enhance security. 2) Password_verify verify password and ensure security by comparing hash values. 3) MD5 and SHA1 are vulnerable and lack salt values, and are not suitable for modern password security.

PHP: An Introduction to the Server-Side Scripting LanguageApr 16, 2025 am 12:18 AM

PHP is a server-side scripting language used for dynamic web development and server-side applications. 1.PHP is an interpreted language that does not require compilation and is suitable for rapid development. 2. PHP code is embedded in HTML, making it easy to develop web pages. 3. PHP processes server-side logic, generates HTML output, and supports user interaction and data processing. 4. PHP can interact with the database, process form submission, and execute server-side tasks.

PHP and the Web: Exploring its Long-Term ImpactApr 16, 2025 am 12:17 AM

PHP has shaped the network over the past few decades and will continue to play an important role in web development. 1) PHP originated in 1994 and has become the first choice for developers due to its ease of use and seamless integration with MySQL. 2) Its core functions include generating dynamic content and integrating with the database, allowing the website to be updated in real time and displayed in personalized manner. 3) The wide application and ecosystem of PHP have driven its long-term impact, but it also faces version updates and security challenges. 4) Performance improvements in recent years, such as the release of PHP7, enable it to compete with modern languages. 5) In the future, PHP needs to deal with new challenges such as containerization and microservices, but its flexibility and active community make it adaptable.

Why Use PHP? Advantages and Benefits ExplainedApr 16, 2025 am 12:16 AM

The core benefits of PHP include ease of learning, strong web development support, rich libraries and frameworks, high performance and scalability, cross-platform compatibility, and cost-effectiveness. 1) Easy to learn and use, suitable for beginners; 2) Good integration with web servers and supports multiple databases; 3) Have powerful frameworks such as Laravel; 4) High performance can be achieved through optimization; 5) Support multiple operating systems; 6) Open source to reduce development costs.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Chat Commands and How to Use Them

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 English version

Recommended: Win version, supports code prompts!

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.