Home >php教程 >php手册 >Build a billion-level Web system from a single machine to a distributed cluster

Build a billion-level Web system from a single machine to a distributed cluster

WBOY
WBOYOriginal
2016-07-21 14:52:541531browse

When a Web system gradually grows from 100,000 daily visits to 10 million, or even exceeds 100 million, the Web system will be under increasing pressure. In this process, we will encounter many problems. . In order to solve the problems caused by these performance pressures, we need to build multiple levels of caching mechanisms at the Web system architecture level. At different pressure stages, we will encounter different problems and solve them by building different services and architectures.

Web Load Balancing

Web load balancing (Load Balancing), simply put, is to allocate "work tasks" to our server cluster, and using appropriate allocation methods is very important for protecting the back-end web servers.

There are many load balancing strategies, let’s start with a simple one.

1. HTTP Redirect

When the user sends a request, the web server returns a new URL by modifying the Location tag in the HTTP response header, and then the browser continues to request the new URL, which is actually a page redirection. Through redirection, the goal of "load balancing" is achieved. For example, when we download the PHP source code package and click the download link, in order to solve the problem of download speed in different countries and regions, it will return a download address closest to us. The redirected HTTP return code is 302, as shown below:

If you use PHP code to implement this function, the method is as follows:

This redirect is very easy to implement and can be customized with various strategies. However, it performs poorly under large-scale traffic. Moreover, the user experience is not good. The actual request is redirected, which increases network delay.

2. Reverse proxy load balancing

The core work of the reverse proxy service is mainly to forward HTTP requests, playing the role of relay between the browser and the backend web server. Because it works at the HTTP layer (application layer), which is the seventh layer in the seven-layer network structure, it is also called "seven-layer load balancing". There are many software that can be used as reverse proxy, and one of the more common ones is Nginx.

Nginx is a very flexible reverse proxy software that can freely customize forwarding strategies, allocate the weight of server traffic, etc. In reverse proxy, a common problem is the session data stored by the web server, because the general load balancing strategy allocates requests randomly. There is no guarantee that requests from the same logged-in user will be allocated to the same web machine, which will lead to the problem that the session cannot be found.

There are two main solutions:

  1. Configure the forwarding rules of the reverse proxy so that requests from the same user must land on the same machine (by analyzing cookies). Complex forwarding rules will consume more CPU and increase the burden on the proxy server.
  2. It is recommended to use an independent service to store information such as session, such as redis/memchache.

The reverse proxy service can also enable caching. If enabled, it will increase the burden of the reverse proxy and should be used with caution. This load balancing strategy is very simple to implement and deploy, and its performance is relatively good. However, it has the problem of "single point of failure". If it hangs, it will cause a lot of trouble. Moreover, as the number of Web servers continues to increase in the later stages, it itself may become a bottleneck of the system.

3. IP load balancing

The IP load balancing service works at the network layer (modify IP) and transport layer (modify port, layer 4), and its performance is much higher than working at the application layer (layer 7). The principle is that it modifies the IP address and port information of the IP layer packets to achieve load balancing. This method is also called "four-layer load balancing". A common load balancing method is LVS (Linux Virtual Server, Linux Virtual Service), which is implemented through IPVS (IP Virtual Server, IP Virtual Service).

When the load balancing server receives the client's IP packet, it will modify the target IP address or port of the IP packet, and then deliver it to the internal network intact, and the data packet will flow into the actual Web server. After the actual server processing is completed, the data packet will be delivered back to the load balancing server, which will then modify the target IP address to the user IP address and finally return to the client.

The above method is called LVS-NAT. In addition, there are LVS-RD (direct routing) and LVS-TUN (IP tunnel). All three are LVS methods, but there are certain differences. Due to space issues, I won’t go into details.

The performance of IP load balancing is much higher than that of Nginx's reverse proxy. It only processes the data packets up to the transport layer, without further grouping, and then forwards them directly to the actual server. However, its configuration and construction are relatively complex.

4. DNS load balancing

DNS (Domain Name System) is responsible for domain name resolution services. The domain name URL is actually an alias of the server. The actual mapping is an IP address. The resolution process is that DNS completes the mapping from domain name to IP. A domain name can be configured to correspond to multiple IPs. Therefore, DNS can also be used as a load balancing service.

This load balancing strategy is simple to configure and has excellent performance. However, rules cannot be defined freely, and it is troublesome to change the mapped IP or when the machine fails, and there is also the problem of delay in DNS taking effect.

5. DNS/GSLB load balancing

Our commonly used CDN (Content Delivery Network, content distribution network) implementation method is actually to go one step further on the basis of mapping the same domain name to multiple IPs, and use GSLB (Global Server Load Balance, global load balancing) according to the specified rules Map the IP of the domain name. Under normal circumstances, the IP closest to the user is returned to the user according to the geographical location, reducing the cost of hopping between routing nodes in network transmission.

The actual process of "looking up" in the figure is that LDNS (Local DNS) first obtains the top-level root Name Server (such as .com) from the Root Name Server, and then obtains the authorized DNS of the specified domain name. , and then get the actual server IP.

In Web systems, CDN is generally used to solve the loading problem of large static resources (html/Js/Css/images, etc.), making these more dependent on content downloaded from the Internet and as far away from users as possible. Recently, improve user experience.

For example, I accessed an image on imgcache.gtimg.cn (Tencent’s self-built CDN, the reason why it does not use the qq.com domain name is to prevent extra cookie information from being included in http requests), and I The obtained IP is 183.60.217.90.

This method, like the previous DNS load balancing, not only has excellent performance, but also supports the configuration of multiple strategies. However, the setup and maintenance costs are very high. First-line Internet companies will build their own CDN services, while small and medium-sized companies generally use CDN provided by third parties.

Establishment and optimization of the caching mechanism of the Web system

We have just finished talking about the external network environment of the Web system, and now we start to pay attention to the performance issues of our Web system itself. As the number of visits to our website increases, we will encounter many challenges. Solving these problems is not just as simple as expanding the machine, but establishing and using an appropriate caching mechanism is fundamental.

In the beginning, our Web system architecture may be like this. Each link may have only one machine.

Let’s start with the most basic data storage.

1. Use of internal cache of MySQL database

MySQL’s caching mechanism starts from within MySQL. The following content will focus on the most common InnoDB storage engine.

1. Create appropriate indexes

The simplest is to create an index. When the table data is relatively large, the index plays a role in quickly retrieving data, but there is also a cost. First of all, it occupies a certain amount of disk space. Among them, the combined index is the most prominent. It needs to be used with caution. The index it generates may even be larger than the source data. Secondly, operations such as data insert/update/delete after index creation will take more time because the original index needs to be updated. Of course, in fact, our system as a whole is dominated by select query operations. Therefore, the use of indexes can still greatly improve system performance.

2. Database connection thread pool cache

If every database operation request needs to create and destroy a connection, it will undoubtedly be a huge overhead for the database. In order to reduce this type of overhead, thread_cache_size can be configured in MySQL to indicate how many threads are reserved for reuse. When there are not enough threads, they are created again, and when there are too many idle threads, they are destroyed.

In fact, there is a more radical approach, using pconnect (database long connection), once the thread is created, it will be maintained for a long time. However, when the amount of access is relatively large and there are many machines, this usage is likely to lead to "the number of database connections is exhausted", because the connections are established and not recycled, eventually reaching the max_connections (maximum number of connections) of the database. Therefore, the usage of long connections usually requires the implementation of a "connection pool" service between CGI and MySQL to control the number of connections created "blindly" by the CGI machine.

There are many implementation methods for establishing a database connection pool service. For PHP, I recommend using swoole (a network communication extension of PHP) to implement it.

3. Innodb cache settings (innodb_buffer_pool_size)

innodb_buffer_pool_size This is a memory cache area used to save indexes and data. If the machine is exclusive to MySQL, it is generally recommended to be 80% of the machine's physical memory. In the scenario of fetching table data, it can reduce disk IO. Generally speaking, the larger this value is set, the higher the cache hit rate will be.

4. Sub-library/table/partition.

MySQL database tables generally withstand data volume in the millions. If it increases further, the performance will drop significantly. Therefore, when we foresee that the data volume will exceed this level, it is recommended to carry out sub-database/ Table splitting/partitioning and other operations. The best approach is to design the service into a sub-database and sub-table storage model from the beginning, to fundamentally eliminate risks in the middle and later stages. However, some conveniences, such as list-based queries, will be sacrificed, and at the same time, maintenance complexity will be increased. However, when the data volume reaches tens of millions or more, we will find that they are all worth it.

2. Setting up multiple MySQL database services

A MySQL machine is actually a high-risk single point, because if it goes down, our web service will be unavailable. Moreover, as the number of visits to the Web system continued to increase, one day, we found that one MySQL server could not support it, and we began to need to use more MySQL machines. When multiple MySQL machines are introduced, many new problems will arise.

1. Establish MySQL master and slave, with the slave database as backup

This approach is purely to solve the problem of "single point of failure". When the main database fails, switch to the slave database. However, this approach is actually a bit of a waste of resources, because the slave library is actually idle.

2. MySQL separates reading and writing, writing to the main database and reading from the slave database.

The two databases separate reading and writing. The main database is responsible for writing operations, and the slave database is responsible for reading operations. Moreover, if the main database fails, the reading operation will not be affected. At the same time, all reading and writing can be temporarily switched to the slave database (you need to pay attention to the traffic, because the traffic may be too large and the slave database will be brought down).

3. The master and the master have mutual backup.

The two MySQL servers are each other’s slave database and the master database at the same time. This solution not only diverts traffic pressure, but also solves the problem of "single point of failure". If any unit fails, there is another set of services available.

However, this solution can only be used in a scenario with two machines. If the business is still expanding rapidly, you can choose to separate the business and establish multiple master-master and mutual-backup services.

3. Data synchronization between MySQL database machines

Every time we solve a problem, new problems will inevitably arise from the old solution. When we have multiple MySQL servers, during peak business periods, data between the two databases is likely to be delayed. Moreover, network and machine load, etc., will also affect the delay of data synchronization. We once encountered a special scenario where the number of daily visits was close to 100 million. It took many days for the data in the slave database to catch up with the data in the main database. In this scenario, the slave library basically loses its effectiveness.

So, solving the synchronization problem is what we need to focus on next.

1. MySQL comes with multi-thread synchronization

MySQL 5.6 begins to support data synchronization between the main database and the slave database, using multi-threading. However, the limitation is also relatively obvious, and it can only be based on the library. MySQL data synchronization is through the binlog log. The operations written by the main database to the binlog log are sequential. Especially when the SQL operation contains modifications to the table structure, it will have an impact on subsequent SQL statement operations. Therefore, synchronizing data from the database must be done in a single process.

2. Implement binlog parsing and multi-thread writing by yourself.

Using the database table as a unit, parse multiple binlog tables for data synchronization at the same time. Doing so can indeed speed up the efficiency of data synchronization. However, if there are structural relationships or data dependencies between tables, there will also be a problem of writing order. This method can be used for some relatively stable and relatively independent data tables.

Most domestic first-tier Internet companies use this method to speed up data synchronization efficiency. There is a more radical approach, which is to directly parse the binlog, ignore the table as a unit, and write directly. However, this approach is complex to implement and the scope of use is even more limited. It can only be used in some databases with special scenarios (no table structure changes, no data dependencies between tables, etc. special tables).

4. Establish a cache between the web server and the database

In fact, to solve the problem of large traffic volume, we cannot just focus on the database level. According to the "80/20 rule", 80% of requests only focus on 20% of hot data. Therefore, we should establish a caching mechanism between the web server and the database. This mechanism can use disk as cache or memory cache. Through them, most hot data queries are blocked in front of the database.

1. Static page

When a user visits a certain page on the website, most of the content on the page may not change for a long time. For example, a news report will almost never be modified once it is published. In this case, the static html page generated by CGI is cached locally on the disk of the web server. Except for the first time, which is obtained through dynamic CGI query database, thereafter the local disk file is directly returned to the user.

When the scale of the Web system was relatively small, this approach seemed perfect. However, once the scale of the Web system becomes larger, for example, when I have 100 Web servers. In this way, there will be 100 copies of these disk files, which is a waste of resources and difficult to maintain. At this time, some people may think that they can centralize a server to store it. Haha, why not take a look at the following caching method, which is how it does it.

2. Single memory cache

Through the example of page staticization, we can know that it is difficult to maintain the "cache" on the Web machine itself and will bring more problems (in fact, through PHP's apc expansion, it can be used through Key/ value operates on the web server's native memory). Therefore, the memory cache service we choose to build must also be an independent service.

The memory cache options mainly include redis/memcache. In terms of performance, there is not much difference between the two. In terms of feature richness, Redis is superior.

3. Memory cache cluster

When we build a single memory cache, we will face the problem of single point of failure, so we must turn it into a cluster. The simple way is to add a slave as a backup machine. However, what if there are really a lot of requests and we find that the cache hit rate is not high and more machine memory is needed? Therefore, we recommend configuring it as a cluster. For example, similar to redis cluster.

The Redis in the Redis cluster serve as multiple master-slave groups for each other. At the same time, each node can accept requests, which is more convenient when expanding the cluster. The client can send a request to any node, and if it is the content it is "responsible for", the content will be returned directly. Otherwise, find the actual responsible Redis node, then inform the client of the address, and the client requests again.

This is all transparent to clients using the caching service.

There are certain risks when switching the memory cache service. In the process of switching from cluster A to cluster B, it is necessary to ensure that cluster B is "warmed up" in advance (the hot data in the memory of cluster B should be the same as that of cluster A as much as possible, otherwise, a large number of content requests will be requested at the moment of switching. It cannot be found in the memory cache of cluster B. The traffic directly impacts the back-end database service, which is likely to cause database downtime).

4. Reduce database “writes”

The above mechanisms all reduce the "read" operation of the database, but the write operation is also a big pressure. Although the write operation cannot be reduced, it can reduce the pressure by merging requests. At this time, we need to establish a modification synchronization mechanism between the memory cache cluster and the database cluster.

First put the modification request into effect in the cache, so that external queries can display normally, and then put these SQL modifications into a queue and store them. When the queue is full or every once in a while, they are merged into one request and sent to the database to update the database.

In addition to improving write performance by changing the system architecture mentioned above, MySQL itself can also adjust the disk writing strategy by configuring the parameter innodb_flush_log_at_trx_commit. If the machine cost allows, to solve the problem from the hardware level, you can choose the older RAID (Redundant Arrays of independent Disks, disk array) or the newer SSD (Solid State Drives, solid state drives).

5. NoSQL storage

Regardless of whether the database is read or written, when the traffic increases further, the scenario of "when manpower is limited" will eventually be reached. The cost of adding more machines is relatively high and may not really solve the problem. At this time, you can consider using NoSQL database for some core data. Most NoSQL storage uses the key-value method. It is recommended to use Redis as introduced above. Redis itself is a memory cache and can also be used as a storage, allowing it to directly store data on the disk.

In this case, we will separate some frequently read and written data in the database and put it in our newly built Redis storage cluster, which will further reduce the pressure on the original MySQL database. At the same time, because Redis itself is a memory level Cache, the performance of reading and writing will be greatly improved.

Domestic first-tier Internet companies adopt many solutions similar to the above in terms of architecture. However, the cache service used is not necessarily Redis. They will have more abundant other choices, and even develop their own based on their own business characteristics. NoSQL services.

6. Empty node query problem

When we have built all the services mentioned above and think that the Web system is already very strong. We still say the same thing, new problems will still come. Empty node queries refer to data requests that do not exist in the database at all. For example, if I request to query a person's information that does not exist, the system will search it step by step from the cache at all levels, and finally find the database itself, and then draw the conclusion that it cannot be found, and return it to the front end. Because caches at all levels are invalid for it, this request consumes a lot of system resources, and if a large number of empty node queries are made, it can impact system services.

In my previous work experience, I suffered deeply from this. Therefore, in order to maintain the stability of the Web system, it is very necessary to design an appropriate empty node filtering mechanism.

The method we adopted at that time was to design a simple record mapping table. Store existing records and put them into a memory cache. In this case, if there are still empty node queries, they will be blocked at the cache level.

Off-site deployment (geographically distributed)

After completing the above architecture construction, is our system powerful enough? The answer is of course no, there are no limits to optimization. Although the Web system seems to be more powerful on the surface, the experience it provides users is not necessarily the best. Because a classmate from Northeast China visits a website service in Shenzhen, he still feels that the network distance is slow. At this time, we need to deploy off-site to make the Web system closer to users.

1. Core concentration and node dispersion

Students who have played large-scale online games will know that online games have many areas, which are usually divided by region, such as Guangdong area and Beijing area. If a player in Guangdong goes to the Beijing area to play, then he will feel obviously stuck than in the Guangdong area. In fact, the names of these regions already indicate where their servers are located. Therefore, when players in Guangdong connect to servers in Beijing, the network will of course be slower.

When a system and service are large enough, you must start to consider the issue of off-site deployment. Make your services as close to users as possible. We have mentioned before that static resources of the Web can be stored on CDN, and then distributed "all over the country" through DNS/GSLB. However, CDN only solves the problem of static resources and does not solve the problem of huge back-end system services that are only concentrated in a fixed city.

At this time, off-site deployment begins. Off-site deployment generally follows: the core is centralized and the nodes are dispersed.

  • Core concentration: In the actual deployment process, there are always some data and services that cannot be deployed in multiple sets, or the cost of deploying multiple sets is huge. For these services and data, a set of services and data will still be maintained, and the deployment location will be selected in a relatively central place in the region, and each node will be communicated through dedicated lines within the network.
  • Node dispersion: Deploy some services into multiple sets and distribute them in various city nodes, allowing users to request access to services from nodes as close as possible.

For example, we choose to deploy in Shanghai as the core node, and Beijing, Shenzhen, Wuhan, and Shanghai as distributed nodes (Shanghai itself is also a distributed node). Our service structure is as shown below:

It should be added that in the above figure, the Shanghai node and the core node are in the same computer room, and other distributed nodes have separate computer rooms.
There are many large-scale online games in China, which generally follow the above structure. They will place user core accounts with a small amount of data in core nodes, while most online game data, such as equipment, tasks and other data and services, will be placed in regional nodes. Of course, there is also a caching mechanism between the core node and the regional node.

2. Node disaster recovery and overload protection

Node disaster recovery means that if a node fails, we need to establish a mechanism to ensure that the service is still available. There is no doubt that the more common disaster recovery method here is to switch to a nearby city node. If the Tianjin node of the system fails, then we will switch network traffic to the nearby Beijing node. Considering load balancing, it may be necessary to switch traffic to several nearby geographical nodes at the same time. On the other hand, the core node itself also needs to do disaster recovery and backup. Once the core node fails, it will affect national services.

Overload protection means that a node has reached its maximum capacity and cannot continue to accept more requests. The system must have a protection mechanism. If a service is already fully loaded and continues to accept new requests, the result is likely to be downtime, affecting the service of the entire node. In order to ensure the normal use of at least most users, overload protection is necessary.

To solve overload protection, there are generally two directions:

  • Denial of service, after detecting full load, it will no longer accept new connection requests. For example, queuing in online game login.
  • Distributed to other nodes. In this case, the system implementation is more complicated and involves load balancing issues.

Summary

As the scale of access increases, the Web system will gradually grow from a single server that can meet the demand to a "behemoth" large cluster. The process of the Web system becoming larger is actually the process of solving problems. At different stages, different problems are solved, and new problems are born on top of old solutions.

There is no limit to system optimization. Software and system architecture have been developing rapidly. New solutions solve old problems and also bring new challenges.

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn