Home  >  Article  >  Backend Development  >  How to implement concurrent requests in php (code)

How to implement concurrent requests in php (code)

不言
不言Original
2018-09-11 15:08:4011251browse

The content of this article is about how to implement concurrent requests (code) in PHP. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

In back-end service development, there are often requirements for concurrent requests. For example, you need to obtain the bandwidth data of 10 suppliers (each providing a different url), and then return an integration What will you do with the final data?

In PHP, the most intuitive way is to foreach traverse urls and save the result of each request, then if the supplier The provided interface takes an average of 5s, and your request for this interface takes up to 50s, which is unacceptable for websites that pursue speed and performance.

At this time you need concurrent requests.

PHPRequest

PHP is a single-process synchronization model, one request corresponds to one process, I/O is synchronization blocked. Through the expansion of services such as nginx/apache/php-fpm, PHP can provide high-concurrency services. The principle is to maintain a process pool. Each time a service is requested, a new process is started separately. Each process exist independently.

PHP does not support multi-threading mode and callback processing, so PHP internal scripts are synchronously blocking. If you initiate a 5s request, then the program will block I/O5s until the request returns a result, and then the code will continue to be executed. Therefore, it is very difficult to do high-concurrency request requirements such as crawlers.

So how to solve the problem of concurrent requests? In addition to the built-in file_get_contents and fsockopen request methods, PHP also supports the cURL extension to initiate requests, which supports regular single requests: PHP Detailed explanation of cURL requests, and also supports concurrent requests. The concurrency principle is cURL extension uses multi-threading to manage multiple requests.

PHPConcurrent requests

Let’s look at the code directlydemo:

// 简单demo,默认支持为GET请求
public function multiRequest($urls) {
    $mh = curl_multi_init();
    $urlHandlers = [];
    $urlData = [];
    // 初始化多个请求句柄为一个
    foreach($urls as $value) {
        $ch = curl_init();
        $url = $value['url'];
        $url .= strpos($url, '?') ? '&' : '?';
        $params = $value['params'];
        $url .= is_array($params) ? http_build_query($params) : $params;
        curl_setopt($ch, CURLOPT_URL, $url);
        // 设置数据通过字符串返回,而不是直接输出
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        $urlHandlers[] = $ch;
        curl_multi_add_handle($mh, $ch);
    }
    $active = null;
    // 检测操作的初始状态是否OK,CURLM_CALL_MULTI_PERFORM为常量值-1
    do {
        // 返回的$active是活跃连接的数量,$mrc是返回值,正常为0,异常为-1
        $mrc = curl_multi_exec($mh, $active);
    } while ($mrc == CURLM_CALL_MULTI_PERFORM);
    // 如果还有活动的请求,同时操作状态OK,CURLM_OK为常量值0
    while ($active && $mrc == CURLM_OK) {
        // 持续查询状态并不利于处理任务,每50ms检查一次,此时释放CPU,降低机器负载
        usleep(50000);
        // 如果批处理句柄OK,重复检查操作状态直至OK。select返回值异常时为-1,正常为1(因为只有1个批处理句柄)
        if (curl_multi_select($mh) != -1) {
            do {
                $mrc = curl_multi_exec($mh, $active);
            } while ($mrc == CURLM_CALL_MULTI_PERFORM);
        }
    }
    // 获取返回结果
    foreach($urlHandlers as $index => $ch) {
        $urlData[$index] = curl_multi_getcontent($ch);
        // 移除单个curl句柄
        curl_multi_remove_handle($mh, $ch);
    }
    curl_multi_close($mh);
    return $urlData;
}

In this concurrent request, first create a Batch handle, then add the cURL handle of url to the batch handle, and continuously query the execution status of the batch handle. When the execution is completed, obtain the returned results.

curl_multi Related functions

/** 函数作用:返回一个新cURL批处理句柄
    @return resource 成功返回cURL批处理句柄,失败返回false
*/
resource curl_multi_init ( void )

/** 函数作用:向curl批处理会话中添加单独的curl句柄
    @param $mh 由curl_multi_init返回的批处理句柄
    @param $ch 由curl_init返回的cURL句柄
    @return resource 成功返回cURL批处理句柄,失败返回false
*/
int curl_multi_add_handle ( resource $mh , resource $ch )

/** 函数作用:运行当前 cURL 句柄的子连接
    @param $mh 由curl_multi_init返回的批处理句柄
    @param $still_running 一个用来判断操作是否仍在执行的标识的引用
    @return 一个定义于 cURL 预定义常量中的 cURL 代码
*/
int curl_multi_exec ( resource $mh , int &$still_running )

/** 函数作用:等待所有cURL批处理中的活动连接
    @param $mh 由curl_multi_init返回的批处理句柄
    @param $timeout 以秒为单位,等待响应的时间
    @return 成功时返回描述符集合中描述符的数量。失败时,select失败时返回-1,否则返回超时(从底层的select系统调用).
*/
int curl_multi_select ( resource $mh [, float $timeout = 1.0 ] )

/** 函数作用:移除cURL批处理句柄资源中的某个句柄资源
    说明:从给定的批处理句柄mh中移除ch句柄。当ch句柄被移除以后,仍然可以合法地用curl_exec()执行这个句柄。如果要移除的句柄正在被使用,则这个句柄涉及的所有传输任务会被中止。
    @param $mh 由curl_multi_init返回的批处理句柄
    @param $ch 由curl_init返回的cURL句柄
    @return 成功时返回0,失败时返回CURLM_XXX中的一个
*/
int curl_multi_remove_handle ( resource $mh , resource $ch )

/** 函数作用:关闭一组cURL句柄
    @param $mh 由curl_multi_init返回的批处理句柄
    @return void
*/
void curl_multi_close ( resource $mh )

/** 函数作用:如果设置了CURLOPT_RETURNTRANSFER,则返回获取的输出的文本流
    @param $ch 由curl_init返回的cURL句柄
    @return string 如果设置了CURLOPT_RETURNTRANSFER,则返回获取的输出的文本流。
*/
string curl_multi_getcontent ( resource $ch )
Predefined constants used in this example:
CURLM_CALL_MULTI_PERFORM: (int) -1
CURLM_OK: (int) 0

PHPConcurrent request time consumption comparison

  1. The first request uses the above curl_multi_initMethod, concurrent requests 105 times.

  2. The second request uses the traditional foreach method, traversing 105 times using the curl_init method request.

The actual request time-consuming result is:

How to implement concurrent requests in php (code)

excludedownload It takes about 765ms. The simple request time optimization has reached 39.83/1.58 and 25 times. If we continue to eliminate the time-consuming related to establishing a connection, it should will be higher. The time consumption:

  • Option 1: The slowest interface reached 1.58s

  • Option 2 : The average time consumption of 105 interfaces is 384ms

The request of this test is the internal interface of my environment, so the time consumption is very short. In fact, The crawler request environment optimization will be more obvious.

Note

Concurrency limit

curl_multi will consume a lot of system resources. There is a certain threshold for the number of concurrent requests, generally 512 , due to internal limitations of CURL, exceeding the maximum concurrency will cause failure.

I have not done specific test results. You can refer to other people's articles: How many concurrent requests are appropriate each time you use curl multi?

Timeout time

In order to prevent slow requests from affecting the entire service, you can set CURLOPT_TIMEOUT to control the timeout and prevent some suspended requests from blocking the process indefinitely and eventually killing the machine service.

CPUFull load

In the code example, if you continue to query the concurrent execution status, the load on cpu will be too high, so , you need to add the usleep(50000); statement in the code.
At the same time, curl_multi_select can also control the cpu occupancy. It will be in a waiting state until the data is responded to. It will be awakened and continue execution as soon as new data comes, which reduces CPU unnecessary consumption.

Related recommendations:

How to implement AJAX queue requests (with code)

PHP uses curl_multi to implement concurrent requests Method example php tips

The above is the detailed content of How to implement concurrent requests in php (code). For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn