Home  >  Article  >  Backend Development  >  The RPC framework in PHP implements a flow control system based on Redis

The RPC framework in PHP implements a flow control system based on Redis

小云云
小云云Original
2018-03-15 14:03:363170browse


We have carried out a certain degree of microservice transformation on the project module. Previously, all modules were placed in one project (a large folder), and the same was true for online deployment, like this The shortcomings are obvious. Later, we split it into sub-modules based on business functions, and then accessed the sub-modules through the RPC framework. Each sub-module has its own independent online machine cluster, mysql, redis and other storage resources. If there is a problem with such a sub-module It will not affect other modules, and it is more maintainable and scalable.

But in reality, the service capabilities of each sub-module are different. As shown in the architecture diagram after splitting by sub-modules, assume that the QPS reaching module A is 100, and A depends on B. At the same time, each The request QPS from module A to module B is also 100, but the maximum QPS capability that module B can provide is 50. If there is no traffic limit, module B will accumulate traffic due to exceeding the load and the entire system will be unavailable. Our dynamic traffic The control system is to find the best service capability of the sub-module, that is, to limit the traffic from module A to module B to 50QPS, which will ensure that at least some of the requests can be processed normally without dragging down the entire system because a sub-service hangs up.

Our RPC framework is a PHP-implemented framework that mainly supports http protocol access. For a front-end A module, for the back-end B module it depends on, it is necessary to configure the B module as a service first, and then perform reference access according to the service name. The general form of service configuration is as follows:

[MODULE-B]  ; 服务名字
protocol = "http"  ;交互协议
lb_alg = "random" ; 负载均衡算法
conn_timeout_ms = 1000 ; 连接超时,所有协议使用, 单位为ms 
read_timeout_ms = 3000 ; 读超时
write_timeout_ms = 3000 ; 写超时 
exe_timeout_ms = 3000 ; 执行超时
host.default[] = "127.0.0.1" ; ip或域名
host.default[] = "127.0.0.2" ; ip或域名
host.default[] = "127.0.0.3" ; ip或域名
port = 80 ; 端口
domain = 'api.abc.com' ; 域名配置,不作真正解析,作为header host字段传给后端

For the required A service module accessed is usually deployed as a cluster. We need to configure all IPs of the machine cluster. Of course, if there is an internal DNS service, it can also be equipped with the domain name of the cluster.

For an RPC framework, the basic functions include load balancing, health check, downgrade & current limiting, etc. Our traffic control is for the downgrading & current limiting function. Before introducing it in detail, let’s talk about it first How load balancing and health checking are implemented is the basis for flow control implementation.

We have implemented random and polling algorithms for load balancing. The random algorithm can be implemented by randomly selecting one of all IPs, which is relatively easy to implement. For the polling algorithm, we are based on single-machine polling, and the previous selection is The IP serial number is recorded in the local memory using the apcu extension to facilitate finding the next IP serial number to be used.

The accessed machine may fail. We record the failed request IP in redis and analyze the recorded failure log to determine whether a machine IP needs to be removed. That is, the machine with this IP is considered to have died. , the service cannot be provided normally. This is the function of health check. We introduce the specific functions of health check through related service configuration items:

ip_fail_sample_ratio = 1 ; 采样比例

失败IP记录采样比例,我们将失败的请求记录在redis中,为防止太多的redis请求,我们可以配一个失败采样比例

ip_fail_cnt_threshold  = 10;  IP失败次数
ip_fail_delay_time_s = 2 ;  时间区间
ip_fail_client_cnt = 3 ; 失败的客户端数

不可能一个IP失败一次就将其从健康IP列表中去掉,只有在有效的ip_fail_delay_time_s 时间范围内,请求失败了 ip_fail_cnt_threshold 次,并且失败的客户端达到ip_fail_client_cnt 个, 才认为其是不健康的IP。 

为什么要添加 ip_fail_client_cnt 这样一个配置,因为如果只是某一台机器访问后端某个服务IP失败,那不一定是服务IP的问题,也可能是访问客户端的问题,只有当大多数客户端都有失败记录时才认为是后端服务IP的问题

我们将失败日志记录在redis的list表中,并带上时间戳,就比较容易统计时间区间内的失败次数。

ip_retry_delay_time_s = 30 ; 检查失败IP是否恢复间隔时间

某个失败的IP有可能在一定时间内恢复,我们间隔 ip_retry_delay_time_s 长的时间去检查,如果请求成功,则从失败的IP列表中去除

ip_retry_fail_cnt = 10;  失败IP如果检查失败,记录的失败权重值

ip_log_ttl_s = 60000; 日志有效期时间

一般来说只有最近的失败日志才有意义,对于历史的日志我们将其自动删除。
ip_log_max_cnt = 10000; 记录的最大日志量

我们用redis记录失败日志,容量有限,我们要设定一个记录的最大日志数量,多余的日志自动删除。

In our code implementation, in addition to the normal service IP configuration, We also maintain a list of failed IPs, so that when selecting IPs through the algorithm, we must first remove the failed IPs and record the failed IPs in a file. At the same time, we use apcu memory cache to accelerate access, so that all our operations are basically based on memory access. There will be performance issues.

We will only record the log in redis when the request fails. When will we find out the failed IP? This involves querying all the failure logs in the redis list and counting the failures. Numbering is a more complex operation. Our implementation is a way for multiple PHP processes to seize the lock. Whoever seizes it will perform analysis operations and record the failed IP to a file. Because only one process will perform the analysis operation, there will be no impact on normal requests. At the same time, the lock will be preempted only when it fails. Under normal circumstances, there will be basically no interaction with redis and there will be no performance loss.

Our health check relies on a centralized redis service. What if it hangs? If it is determined that the redis service itself is down, the rpc framework will automatically shut down the health check service and no longer interact with redis. This will at least not affect the normal RPC function.

Based on the health check implementation, we can implement flow control, that is, when we find that most or all IPs fail, we can infer that the backend service cannot respond due to excessive traffic and the request fails. At this time, we should limit the traffic with a certain strategy. The general implementation is to directly remove all the traffic, which is a bit crude. Our implementation is to gradually reduce the traffic until the proportion of failed IPs drops to a certain value, and then try to gradually increase the traffic. Increase and decrease may be a cyclic process, that is, dynamic flow control, and eventually we will find an optimal flow value. Let’s introduce the function of flow control through relevant configuration:

degrade_ip_fail_ratio = 1 ; 服务开始降级时失败IP比例

即失败的IP比例达到多少时开始降级,即开始减少流量

degrade_dec_step = 0.1 ; 每次限流增加多少

即每次减少多少比例的流量

degrade_stop_ip_ratio = 0.5; 

在失败的IP已降到多少比例时开始停止减少流量,并尝试增加流量
degrade_stop_ttl_s = 10;

停止等待多长时间开始尝试增加流量
degrade_step_ttl_s = 10

流量增加或减少需要等待的时间。
每一次流量增加或减少后,下一步如何做是根据当时失败的IP比例来决定的,而且会保持当前流量值一段时间,而不是立即做决定。

degrade_add_step = 0.1

每次增加流量增加的比例值

degrade_return = false ; 降级时返回值

降级时我们不会再去访问后端服务,而是直接给调用方返回一个配置的值。

The state diagram of flow control is described as follows:
The RPC framework in PHP implements a flow control system based on Redis

How to control traffic at a certain proportion? Through random selection, such as getting a random number and judging whether it falls within a certain range. By limiting the flow to an optimal value, most requests can work normally with the least impact on users. At the same time, flow control cooperates with monitoring and alarming. It is found that the flow control ratio of a certain module is below 1, indicating that the relevant module is system-wide. bottleneck, the next step should be to increase hardware resources or optimize our program performance.

Related recommendations:

Detailed examples of RPC framework

Detailed code explanation of PHP remote calls and RPC framework (picture)

Simple use of PHPRPC

The above is the detailed content of The RPC framework in PHP implements a flow control system based on Redis. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn