Home >Database >Mysql Tutorial >How to implement data segmentation in mysql
Mysql method to implement data segmentation: 1. Use vertical segmentation of data; 2. Use horizontal segmentation of data; 3. Use MySQL Proxy to achieve data segmentation and integration; 4. Use Amoeba to implement data Segmentation; 5. Use HiveDB to achieve data segmentation and integration.
##More related free learning recommendations: mysql tutorial(Video)
Mysql's method of implementing data segmentation:
What is data segmentation
Simply put, it means to disperse the data stored in the same database to multiple databases (hosts) through certain specific conditions to achieve the effect of dispersing the load of a single device. Data slicing can also improve the overall availability of the system, because after a single device crashes, only a certain part of the overall data is unavailable, not all the data. Data sharding (Sharding) can be divided into two sharding modes according to the type of its sharding rules. One is to split it into different databases (hosts) according to different tables (or Schemas). This split can be called vertical (vertical) split of data; the other is to split it according to the data in the table. According to the logical relationship, the data in the same table is split into multiple databases (hosts) according to certain conditions. This kind of segmentation is called horizontal (horizontal) segmentation of data. The biggest feature of vertical segmentation is that the rules are simple and the implementation is more convenient. It is especially suitable for systems where the coupling between various businesses is very low, the mutual influence is small, and the business logic is very clear. In this kind of system, it is easy to split the tables used by different business modules into different databases. Splitting according to different tables will have less impact on the application, and the splitting rules will be simpler and clearer. Horizontal segmentation is slightly more complicated than vertical segmentation. Because different data in the same table needs to be split into different databases, for the application, the splitting rules themselves are more complicated than splitting based on table names, and subsequent data maintenance will also be more complicated. When a certain (or some) table has a particularly large amount of data and access, and it still cannot meet the performance requirements after vertically slicing it on an independent device, it must be vertically sharded. Combined with horizontal segmentation, vertical segmentation first and then horizontal segmentation can solve the performance problem of this very large table. The following is a corresponding analysis of the architecture implementation of the three data segmentation methods of vertical, horizontal and combined segmentation and the integration of the segmented data.Vertical segmentation of data
Let’s first take a look at how the vertical segmentation of data is segmented. Vertical segmentation of data can also be called vertical segmentation. Think of the database as consisting of many "data blocks" (tables), one by one. Cut these "data blocks" vertically, and then spread them across multiple database hosts. Such a slicing method is vertical (longitudinal) data slicing. The overall function of an application system with a well-designed architecture must be composed of many functional modules, and the data required by each functional module corresponds to one or more tables in the database. In architectural design, the more unified and fewer the interaction points between each functional module, the lower the coupling of the system, and the better the maintainability and scalability of each module of the system. Such a system makes it easier to achieve vertical segmentation of data. The clearer the functional modules and the lower the degree of coupling, the easier it will be to define rules for vertical data segmentation. Data can be segmented based on functional modules. The data of different functional modules are stored in different database hosts. Cross-database joins can be easily avoided, and the system architecture is also very clear. Of course, it is difficult to have a system that can make the tables used by all functional modules completely independent, and there is no need to access each other's tables at all, or it is necessary to join the tables of the two modules. In this case, evaluation and trade-offs must be made based on actual application scenarios. Decide whether to accommodate the application and store related modules of tables that need to be joined in the same database, or to let the application do more things - obtain data from different databases entirely through the module interface, and then complete the Join operation in the program . Generally speaking, if it is a system with relatively light load and very frequent table associations, then the database may give in and merge several related modules together to reduce the application work. With more workload, it is a feasible solution. Of course, through the concession of the database, allowing multiple modules to centrally share data sources actually indirectly acquiesces to the development of increased coupling of each module architecture, which may worsen future architectures. Especially when it reaches a certain stage of development and it is discovered that the database cannot bear the pressure brought by these tables and has to face segmentation again, the cost of architectural transformation may be far greater than the initial architectural design using segmentation.So, when the database is vertically segmented, how to segment it and to what extent is a challenging problem. Only by balancing the costs and benefits of all aspects in actual application scenarios can we analyze a split plan that truly suits us.
For example, in the example database of the example system used in this article, we briefly analyze it, and then design a simple segmentation rule to perform a vertical split.
System functions can basically be divided into 4 functional modules: users, group messages, photo albums and events, which correspond to the following tables:
User module table: user,user_profile,user_group,user_photo_album
Group discussion table: groups,group_message,group_message_content,top_message
Album related table: photo,photo_album ,photo_album_relation,photo_comment
Event information table: event
At first glance, no module can exist independently from other modules. Modules and There is a relationship between modules. Is it impossible to separate them?
Of course not. After a little more in-depth analysis, we can find that although the tables used by each module are related to each other, the relationship is relatively clear and simple.
The group discussion module and the user module are mainly related through user or group relationships. Generally, the association is made through the user's id or nick_name and the group's id. Implementation through the interface between modules will not cause too much trouble.
The photo album module only has a user association with the user module. The association between these two modules is basically only the content associated with the user ID, which is simple and clear, and the interface is clear.
The event module may be related to each module, but they only focus on the ID information of the objects in each module, which is also easier to split.
So, the first step can be to vertically split the database according to the tables related to the functional modules. The tables involved in each module are divided into a separate database. The table association between modules is in the application The system side is handled through the interface. As shown in the schematic diagram of vertical data segmentation (Figure 1):
After such vertical segmentation, services that could only be provided through one database before were split into four databases to provide services. The service capacity has naturally increased several times.
Advantages of vertical splitting:
The splitting of the database is simple and clear, and the splitting rules are clear;
Application The modules are clear and easy to integrate;
data maintenance is convenient and easy to locate.
Disadvantages of vertical segmentation:
Some table associations cannot be completed at the database level and must be completed in the program;
For tables that are accessed extremely frequently and have large amounts of data, there are still performance bottlenecks, which may not necessarily meet the requirements;
Transaction processing is relatively complex;
After segmentation reaches a certain level, scalability will be limited;
Excessive segmentation may make the system too complex and difficult to maintain.
In view of the data segmentation and transaction problems that may be encountered in vertical segmentation, it is really difficult to find a better solution at the database level. In actual application cases, the vertical segmentation of the database mostly corresponds to the modules of the application system. The data sources of the same module are stored in the same database, which can solve the problem of data association within the module. Between modules, the required data is provided to each other through application programs in the form of service interfaces. Although this will indeed increase the overall number of operations on the database, it is beneficial in terms of the overall scalability of the system and the modularization of the architecture. The single response time of some operations may be slightly increased, but the overall performance of the system is likely to be improved to a certain extent. The expansion bottleneck problem can only be solved by relying on the data horizontal segmentation architecture to be introduced in the next section.
Horizontal segmentation of data
The above section analyzes and introduces the vertical segmentation of data. This section analyzes the horizontal segmentation of data. Vertical segmentation of data can basically be simply understood as splitting data according to tables or modules, while horizontal segmentation is different. Generally speaking, simple horizontal sharding is mainly to disperse a table with extremely trivial access into multiple tables according to certain rules of a certain field, and each table contains a part of the data.
To put it simply, the horizontal segmentation of data can be understood as segmentation according to data rows, that is, some rows in the table are segmented into a database, and some other rows are segmented into in other databases. Of course, in order to easily determine which database each row of data has been split into, the splitting always needs to be carried out according to certain rules: for example, taking a modulo based on a specific number based on a numeric type field, a certain time The range of type fields, or the hash value of a character type field. If most of the core tables in the entire system can be related through a certain field, then this field is naturally the best choice for horizontal partitioning, except of course for very special cases where it cannot be used.
Generally speaking, like the very popular Web 2.0 websites now, basically most of the data can be associated through member user information. Maybe many core tables are very suitable for horizontal segmentation of data through member IDs. For example, the forum community discussion system is easier to segment. It can be segmented horizontally according to the forum number. After splitting, there will basically be no interaction between libraries.
If all the data in the example system is associated with users, then horizontal splitting can be performed based on users, and the data of different users can be divided into different databases. Of course, the only difference is that the groups table in the user module is not directly related to users, so groups cannot be split horizontally based on users. For this special case, the table can be completely separated and placed in an independent database. In fact, this approach can be said to make use of the "vertical segmentation of data" method introduced in the previous section. This joint segmentation method that uses vertical segmentation and horizontal segmentation at the same time will be introduced in more detail in the next section.
So, for the sample database, most tables can be horizontally split based on user ID. Data related to different users is segmented and stored in different databases. For example, all user IDs are taken modulo 2 and then stored in two different databases. Each table associated with a user ID can be split like this. In this way, basically every user-related data is in the same database, and even if correlation is needed, it is very easy to implement.
You can display the relevant information of horizontal segmentation more intuitively through the horizontal segmentation diagram (Figure 2):
Advantages of horizontal segmentation:
Table association can basically be completed on the database side;
There will be no bottleneck problems for some tables with extremely large data volumes and high loads;
The overall architecture of the application has relatively few changes;
Transaction processing is relatively simple;
As long as the segmentation rules can be defined Well, basically it's harder to hit scalability limits.
Disadvantages of horizontal segmentation:
The segmentation rules are relatively complex, and it is difficult to abstract a segmentation rule that can satisfy the entire database;
The difficulty of maintaining data in the later period has increased, and it is more difficult to manually locate the data;
The coupling degree of each module of the application system is high, which may This will cause certain difficulties in subsequent data migration and splitting.
The use of vertical and horizontal combined segmentation
In the previous two sections, we learned about the two "vertical" and "horizontal" respectively. The implementation of each segmentation method and the architectural information after segmentation, as well as the respective advantages and disadvantages of the two architectures. However, in actual application scenarios, except for those systems where the load is not too large and the business logic is relatively simple, which can solve the scalability problem through one of the two segmentation methods above, I am afraid that most other systems with complex business logic and complex business logic can solve the scalability problem. Systems with heavy loads cannot achieve better scalability through any of the above data segmentation methods. This requires a combination of the above two segmentation methods, and different scenarios use different segmentation methods.
This section will combine the advantages and disadvantages of vertical slicing and horizontal slicing to further improve the overall architecture and improve the scalability of the system.
Generally speaking, it is difficult to associate all the tables in the database through one (or a few) fields, so only horizontal segmentation of data cannot solve all problems. Vertical sharding can only solve part of the problem. For systems with very high loads, even a single table cannot bear its load through a single database host. It is necessary to combine the two segmentation methods of "vertical" and "horizontal" to make full use of the advantages of both and avoid their disadvantages.
The load of each application system increases step by step. When they begin to encounter performance bottlenecks, most architects and DBAs will choose to vertically split the data first because this is the lowest cost. , which is most in line with the maximum input-output ratio pursued during this period. However, as the business continues to expand and the system load continues to grow, after the system is stable for a period of time, the database cluster that has been vertically split may be overwhelmed again and encounter a performance bottleneck.
How to make a decision at this time? Should we further subdivide the module again, or seek other solutions? If we continue to subdivide modules and perform vertical segmentation of data as we did at the beginning, we may encounter the same problems we are facing now in the near future. Moreover, as modules continue to be refined, the architecture of the application system will become more and more complex, and the entire system is likely to get out of control.
At this time, we must take advantage of horizontal data segmentation to solve the problems encountered. Moreover, there is no need to overthrow the previous results of vertical data segmentation when using horizontal data segmentation. Instead, we can use the advantages of horizontal segmentation to avoid the disadvantages of vertical segmentation and solve the problem of the ever-increasing complexity of the system. question. The disadvantages of horizontal splitting (the rules are difficult to unify) have also been solved by the previous vertical splitting, making horizontal splitting easy.
For the sample database, it is assumed that the data was vertically segmented at the beginning. However, as the business continued to grow, the database system encountered bottlenecks, and we chose to reconstruct the architecture of the database cluster. How to refactor? Considering that the vertical segmentation of data has been done before, and the module structure is clear and clear, and the momentum of business growth is getting stronger and stronger, even if the modules are split again now, it will not last long. Therefore, we chose to perform horizontal segmentation on the basis of vertical segmentation.
Each database in the database cluster that has undergone vertical segmentation has only one functional module, and all tables in each functional module are basically associated with a certain field. For example, all user modules can be segmented by user ID, group discussion modules can be segmented by group ID, and photo album modules can be segmented by album ID. The final event notification information table takes into account the time limit of the data. (Only access the information of a recent event segment), it is divided by time.
Combined segmentation shows the entire architecture after segmentation:
In fact, in many large application systems, vertical segmentation and horizontal segmentation are basically They coexist and are often alternately performed to increase the system's expansion capabilities. When we deal with different application scenarios, we also need to fully consider the limitations and advantages of these two segmentation methods, and use different methods at different periods (load pressure).
Advantages of joint segmentation:
You can make full use of the respective advantages of vertical segmentation and horizontal segmentation and avoid their respective defects;
Maximize system scalability.
Disadvantages of joint segmentation:
The database system architecture is more complex and more difficult to maintain;
The application architecture is also more complex.
Data Segmentation and Integration Solution
Through the previous chapters, it has been clear that data segmentation through the database can greatly improve the performance of the system. Scalability. However, after the data in the database is stored in different database hosts after vertical and/or horizontal segmentation, the biggest problem faced by the application system is how to better integrate these data sources. This may also be of great concern to many readers. one question. The main content of this section is to analyze various overall solutions that can help us achieve data segmentation and data integration.
Data integration is difficult to achieve by relying on the database itself. Although MySQL has a Federated storage engine that can solve some similar problems, it is difficult to use it well in actual application scenarios. So how to integrate these data sources scattered on various MySQL hosts?
In general, there are two solutions:
Configure and manage one (or more) data sources you need in each application module, Directly access each database and complete data integration within the module;
All data sources are managed uniformly through the intermediate proxy layer, and the back-end database cluster is transparent to the front-end application.
Perhaps more than 90% of people will tend to choose the second solution when faced with these two solutions, especially when the system continues to become larger and more complex. Indeed, this is a very correct choice. Although the cost to be paid in the short term may be relatively large, it is very helpful to the scalability of the entire system.
Therefore, I will not analyze too much about the first solution. Let’s focus on analyzing some solutions in the second idea.
Self-developed intermediate proxy layer
After deciding to choose the architectural direction of data source integration through the intermediate proxy layer of the database, many companies (or enterprises) We have developed our own proxy layer applications that fit our specific application scenarios.
Self-developed intermediate proxy layer can respond to the characteristics of its own application to the greatest extent, maximize customization of personalized needs, and can also respond flexibly when facing changes. This should be the biggest advantage of developing your own proxy layer.
Of course, while choosing to develop by yourself and enjoy the maximum fun of personalized customization, you will naturally need to invest more costs in early research and development and subsequent continuous upgrades and improvements, and your own technical threshold may be higher than Simple web applications are higher. Therefore, before deciding to develop on your own, you still need to conduct a more comprehensive evaluation.
Since self-development often considers how to better adapt to its own application system and cope with its own business scenarios, it is not easy to analyze too much here. The following will mainly analyze several currently popular data source integration solutions.
Use MySQL Proxy to achieve data segmentation and integration
MySQL Proxy is a database proxy layer product officially provided by MySQL. Like MySQL Server, it is also based on GPL Open source products under open source licenses. Can be used to monitor, analyze or transmit communication information between them. Its flexibility allows it to be used to the maximum extent, and its current functions mainly include connection routing, Query analysis, Query filtering and modification, load balancing, and basic HA mechanisms.
In fact, MySQL Proxy itself does not have all the above functions, but provides the basis for realizing the above functions. To realize these functions, we also need to write LUA scripts ourselves.
MySQL Proxy actually establishes a connection pool between the client request and MySQL Server. All client requests are sent to MySQL Proxy, and then MySQL Proxy performs corresponding analysis to determine whether they are read operations or write operations, and distribute them to the corresponding MySQL Server. For multi-node Slave clusters, it can also achieve load balancing. For example, the basic architecture diagram of MySQL Proxy (Figure 4):
Through the above architectural diagram, you can clearly see the position of MySQL Proxy in practical applications and the basic things it can do. The detailed implementation details of MySQL Proxy are introduced in great detail and examples in the official MySQL documentation. Interested readers can download it directly from the official MySQL website for free or read it online, so I will not go into details here.
Using Amoeba to achieve data segmentation
Amoeba is an open source framework developed based on Java and focused on solving distributed database data source integration Proxy programs. It is open source based on GPL3 protocol. At present, Amoeba already has Query routing, Query filtering, read-write separation, load balancing and HA mechanism and other related contents, as shown in Figure 5.
Amoeba mainly solves the following problems:
Integration of complex data sources after data segmentation;
Provides data segmentation Separate rules and reduce the impact of data segmentation rules on the database;
Reduce the number of connections between the database and the client;
Separate reading and writing routing.
It can be seen that what Amoeba does is exactly what is needed to improve the scalability of the database through data segmentation.
Amoeba is not a proxy layer Proxy program, but a framework for developing database proxy layer Proxy programs. Currently, there are two Proxy programs developed based on Amoeba: Amoeba For MySQL and Amoeba For Aladin.
Amoeba For MySQL is a solution specifically for MySQL database. The protocol requested by the front-end application and the data source database connected by the back-end must be MySQL. For any client application, there is no difference between Amoeba For MySQL and a MySQL database. Any client request using the MySQL protocol can be parsed by Amoeba For MySQL and processed accordingly. Amoeba For can tell us the architectural information of Amoeba For MySQL (from the Amoeba developer blog):
Amoeba For Aladin is a Proxy program that is more widely applicable and more powerful. It can connect to data sources in different databases at the same time to provide services for front-end applications, but only accepts client application requests that comply with the MySQL protocol. In other words, as long as the front-end application is connected through the MySQL protocol, Amoeba For Aladin will automatically analyze the Query statement and automatically identify which physical host of what type of database the Query data source is based on the data requested in the Query statement. superior. The Amoeba For Aladdin architecture diagram (Figure 6) shows the architectural details of Amoeba For Aladin (from the Amoeba Developer Blog).
At first glance, the two seem to be exactly the same. If you look closely, you will find that the main difference between the two is that after processing by MySQL Protocol Adapter, the data source database is determined based on the analysis results, and then a specific JDBC driver and corresponding protocol are selected to connect to the back-end database.
In fact, through the above two architecture diagrams, you may have discovered the characteristics of Amoeba. It is just a development framework. In addition to choosing the two products it has provided, For MySQL and For Aladin, we can also use it based on Conduct secondary development according to your own needs and obtain a Proxy program that is more suitable for your own application characteristics.
But for using MySQL database, both Amoeba For MySQL and Amoeba For Aladin can be used well. Of course, considering that the more complex any system is, its performance will definitely suffer a certain loss, and the maintenance cost will naturally be higher. Therefore, when you only need to use the MySQL database, it is recommended to use Amoeba For MySQL.
Amoeba For MySQL is very simple to use. All configuration files are standard XML files. There are 4 in total, as follows:
amoeba.xml—— Main configuration file, configures all data sources and Amoeba's own parameters;
rule.xml - configures the information of all Query routing rules;
functionMap.xml - configure the Java implementation class used to parse the functions in Query;
rullFunctionMap.xml - configure the implementation of specific functions that need to be used in routing rules kind.
If your rules are not too complex, basically just using the first two of the 4 configuration files above will do everything. Commonly used functions of Proxy programs, such as read-write separation, load balancing and other configurations, are all configured in amoeba.xml. In addition, Amoeba already supports automatic routing for vertical and horizontal segmentation of data. Routing rules can be set in rule.xml.
Use HiveDB to achieve data segmentation and integration
Like the previous MySQL Proxy and Amoeba, HiveDB is also a Java-based open source framework that provides data segmentation and integration for MySQL databases. However, the current HiveDB only supports horizontal segmentation of data. It mainly solves the problems of database scalability and high-performance data access under large data volumes, while supporting data redundancy and basic HA mechanism.
The implementation mechanism of HiveDB is somewhat different from MySQL Proxy and Amoeba. It does not use MySQL's Replication function to achieve data redundancy, but implements its own data redundancy mechanism. Its underlying layer is mainly Implement data segmentation work based on Hibernate Shards.
In HiveDB, data is dispersed to multiple MySQL Servers through various user-defined Partition keys (that is, formulating data segmentation rules). When you run a Query request during access, the filter conditions will be automatically analyzed, data will be read from multiple MySQL Servers in parallel, and the result set will be merged and returned to the client application.
Purely speaking from a functional perspective, HiveDB may not be as powerful as MySQL Proxy and Amoeba, but its data segmentation ideas are not essentially different from the previous two. In addition, HiveDB is not just content shared by open source enthusiasts, but an open source project supported by commercial companies.
The HiveDB architecture diagram (Figure 7) on the HiveDB official website describes the basic information of how HiveDB organizes data. Although it cannot show the architectural information in detail, it can basically show its role in data cutting. It has a unique aspect.
Other solutions for data segmentation and integration
In addition to the several overall solutions for data segmentation and integration introduced above, there are many other solutions Solutions, such as HSCALE which is further extended based on MySQL Proxy, Spock Proxy built through Rails, and Pyshards based on Pathon, etc.
No matter which solution you choose to use, there should basically be no change in the overall design idea, that is, through vertical and horizontal segmentation of data, the overall service capabilities of the database are enhanced, so that the overall application system The expansion capability should be improved as much as possible and the expansion method should be as convenient as possible.
As long as the problems of data segmentation and data source integration are well solved through the middle layer Proxy application, the linear expansion capability of the database will be as convenient as the application: just by adding a cheap PC Server server, that is It can linearly increase the overall service capability of the database cluster, so that the database no longer easily becomes the performance bottleneck of the application system.
Possible problems in data segmentation and integration
Here, everyone should have a certain understanding of the implementation of data segmentation and integration. Perhaps many readers Based on the pros and cons of various solutions, you have basically selected a solution that is suitable for your application scenario. The next work is mainly to prepare for implementation.
Before implementing the data segmentation plan, some possible problems still need to be analyzed. Generally speaking, the main problems you may encounter include the following:
The problem of introducing distributed transactions;
The problem of cross-node Join Problem;
Cross-node merge sort paging problem.
The problem of introducing distributed transactions
Once the data is split and stored in multiple MySQL Servers, regardless of the splitting rule design No matter how perfect it is (actually there is no perfect segmentation rule), it may cause that the data involved in some previous transactions is no longer in the same MySQL Server.
In such a scenario, if the application still follows the old solution, distributed transactions must be introduced to solve it. Among the various MySQL versions, only versions starting from MySQL 5.0 provide support for distributed transactions, and currently only Innodb provides distributed transaction support. However, even if we happen to use a version of MySQL that supports distributed transactions and also use the Innodb storage engine, the distributed transaction itself consumes a lot of system resources and the performance is not very high. Introducing distributed transactions in terms of exception handling It will bring about many problems that are difficult to control.
what to do? In fact, this problem can be solved through a workaround. The first thing to consider is: Is the database the only place that can solve transactions? In fact, this is not the case. It can be solved by combining the database and the application. Each database resolves its own transactions, and applications control transactions on multiple databases.
In other words, as long as we are willing, we can split a distributed transaction across multiple databases into multiple small transactions that are only on a single database, and control each small transaction through the application. . Of course, doing so requires that the application must be sufficiently robust, and of course it will also bring some technical difficulties to the application.
Problems with cross-node Join
The above has introduced the problems that may introduce distributed transactions. Now let’s look at the problems that require cross-node Join. After the data is split, some old Join statements may no longer be used because the data source used by the Join may be split into multiple MySQL Servers.
what to do? From the perspective of the MySQL database, if this problem has to be solved directly on the database side, I am afraid it can only be handled through Federated, a special storage engine of MySQL. The Federated storage engine is MySQL's solution to problems similar to Oracle's DB Link. The main difference from Oracle DB Link is that Federated will save a copy of the definition information of the remote table structure locally. At first glance, Federated is indeed a very good solution to cross-node Join. But we should also be clear that if the remote table structure changes, the local table definition information will not change accordingly. If the local Federated table definition information is not updated when updating the remote table structure, Query may run incorrectly and cannot obtain correct results.
To deal with this kind of problem, it is recommended to deal with it through the application program. First, fetch the driving result set from the MySQL Server where the driving table is located, and then fetch the corresponding result set from the MySQL Server where the driven table is located based on the driving result set. The data. Many readers may think that doing so will have a certain impact on performance. Yes, it will indeed have a certain negative impact, but other than that, there are basically not many other better solutions. Moreover, since the database is better expanded, the load of each MySQL Server can be better controlled. For a single Query, its response time may be slightly higher than before it is not segmented, so the performance is brought The negative impact is not too great. What's more, the demand for cross-node joins like this is not too much, and it may only be a small part compared to the overall performance. Therefore, for the sake of overall performance, it is actually worth sacrificing a little bit occasionally. After all, system optimization itself is a process of many trade-offs and balances.
Cross-node merge sorting and paging issues
Once the data is horizontally split, it may not only be the cross-node Join that cannot run normally, but also some sorting and paging issues. The data source of the Query statement may also be split into multiple nodes. The direct consequence is that these sorting and paging Query cannot continue to run normally. In fact, this is the same as cross-node Join. The data source exists on multiple nodes and needs to be solved through a Query, which is a cross-node Join operation. Similarly, Federated can also partially solve it, but the risks are the same. But there is one difference: Join often has a driver-driven relationship, so the data reading between the multiple tables involved generally has a sequential relationship. But sorting and paging is different. The data source of sorting and paging can basically be said to be a table (or a result set), and there is no sequential relationship, so the process of fetching data from multiple data sources can be completely parallel. In this way, the sorting and paging data retrieval efficiency can be higher than that of cross-database Join, so the performance loss caused is relatively small. In some cases, it may be more efficient than in the original database without data segmentation. Of course, whether it is cross-node Join or cross-node sorting and paging, the application server will consume more resources, especially memory resources, because the process of reading, accessing and merging the result set requires more data than without processing the merge. .
The above is the detailed content of How to implement data segmentation in mysql. For more information, please follow other related articles on the PHP Chinese website!