Because we often need to use Curl multi-threading to handle some things at work, we have no choice but to conduct in-depth research on Curl multi-threading. Let me introduce to you the examples and principles of Curl multi-threading.
I believe many people have a headache with the curl_multi family of functions that are unclear in the PHP manual. They have few documents and the examples given are so simple that you cannot learn from them. I have also searched many web pages, but I have not found a complete application. example.
curl_multi_add_handle
curl_multi_close
curl_multi_exec
curl_multi_getcontent
curl_multi_info_read
curl_multi_init
curl_multi_remove_handle
curl_multi_select
Generally speaking, when you think of using these functions, the purpose should obviously be to request multiple URLs at the same time, rather than requesting them one by one. Otherwise, it is better to adjust curl_exec in a loop yourself.
The steps are summarized as follows:
Step 1: Call curl_multi_init
Step 2: Call curl_multi_add_handle
in a loop
What needs to be noted in this step is that the second parameter of curl_multi_add_handle is the subhandle from curl_init.
Step 3: Continue to call curl_multi_exec
Step 4: Call curl_multi_getcontent in a loop as needed to obtain the results
Step 5: Call curl_multi_remove_handle and call curl_close
for each word handle
Step 6: Call curl_multi_close
Here is an example from the PHP manual:
The code is as follows |
Copy code |
代码如下 |
复制代码 |
// 创建一对cURL资源
$ch1 = curl_init();
$ch2 = curl_init();
// 设置URL和相应的选项
curl_setopt($ch1, CURLOPT_URL, "http://lxr.php.net/");
curl_setopt($ch1, CURLOPT_HEADER, 0);
curl_setopt($ch2, CURLOPT_URL, "http://www.php.net/");
curl_setopt($ch2, CURLOPT_HEADER, 0);
// 创建批处理cURL句柄
$mh = curl_multi_init();
// 增加2个句柄
curl_multi_add_handle($mh,$ch1);
curl_multi_add_handle($mh,$ch2);
$active = null;
// 执行批处理句柄
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
// 关闭全部句柄
curl_multi_remove_handle($mh, $ch1);
curl_multi_remove_handle($mh, $ch2);
curl_multi_close($mh);
?>
|
<🎜>// Create a pair of cURL resources <🎜>
<🎜>$ch1 = curl_init(); <🎜>
<🎜>$ch2 = curl_init(); <🎜>
<🎜> <🎜>
<🎜>//Set URL and corresponding options <🎜>
<🎜>curl_setopt($ch1, CURLOPT_URL, "http://lxr.php.net/"); <🎜>
<🎜>curl_setopt($ch1, CURLOPT_HEADER, 0); <🎜>
<🎜>curl_setopt($ch2, CURLOPT_URL, "http://www.php.net/"); <🎜>
<🎜>curl_setopt($ch2, CURLOPT_HEADER, 0); <🎜>
<🎜> <🎜>
<🎜>//Create batch cURL handle <🎜>
<🎜>$mh = curl_multi_init(); <🎜>
<🎜> <🎜>
<🎜>//Add 2 handles <🎜>
<🎜>curl_multi_add_handle($mh,$ch1); <🎜>
<🎜>curl_multi_add_handle($mh,$ch2); <🎜>
<🎜> <🎜>
<🎜>$active = null; <🎜>
<🎜>//Execute batch handle <🎜>
<🎜>do { <🎜>
<🎜> $mrc = curl_multi_exec($mh, $active); <🎜>
<🎜>} while ($mrc == CURLM_CALL_MULTI_PERFORM); <🎜>
<🎜> <🎜>
<🎜>while ($active && $mrc == CURLM_OK) { <🎜>
<🎜> if (curl_multi_select($mh) != -1) { <🎜>
<🎜> do { <🎜>
<🎜> $mrc = curl_multi_exec($mh, $active); <🎜>
<🎜> } while ($mrc == CURLM_CALL_MULTI_PERFORM); <🎜>
<🎜> } <🎜>
<🎜>} <🎜>
<🎜> <🎜>
<🎜>//Close all handles <🎜>
<🎜>curl_multi_remove_handle($mh, $ch1); <🎜>
<🎜>curl_multi_remove_handle($mh, $ch2); <🎜>
<🎜>curl_multi_close($mh); <🎜>
<🎜> <🎜>
<🎜>?>
|
The entire usage process is almost like this. However, this simple code has a fatal weakness, that is, in the do loop, it is an infinite loop during the entire url request, which can easily cause the CPU to occupy 100%.
Now let’s improve it. Here we need to use a function curl_multi_select that has almost no documentation. Although C’s curl library has instructions for select, the interface and usage in PHP are indeed different from those in C.
Change the do paragraph above to the following:
The code is as follows
代码如下 |
复制代码 |
do {
$mrc = curl_multi_exec($mh,$active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active and $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
|
|
Copy code
|
do {
$mrc = curl_multi_exec($mh,$active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active and $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
}
Because $active has to wait until all url data is received before it becomes false, so the return value of curl_multi_exec is used here to determine whether there is still data. When there is data, curl_multi_exec will be called continuously. If there is no data temporarily, it will enter the select stage. , it can be awakened to continue execution as soon as new data comes. The advantage here is that there is no unnecessary consumption of CPU.
Also: There are some details that you may encounter sometimes:
To control the timeout of each request, do it through curl_setopt before curl_multi_add_handle:
curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);
To determine whether there is a timeout or other errors, use: curl_error($conn[$i]);
before curl_multi_getcontent
|
Features of this category:
Running very stable.
If you set a concurrency, you will always work with this concurrency number, even if you add tasks through the callback function, it will not be affected.
The CPU usage is extremely low, and most of the CPU is consumed in the user's callback function.
The memory utilization is high and the number of tasks is large (15W tasks will occupy more than 256M of memory). You can use the callback function to add tasks, and the number is customized.
Can occupy the maximum bandwidth.
Chained tasks, such as a task that needs to collect data from multiple different addresses, can be completed in one go through callbacks.
You can make multiple attempts for CURL errors, and the number of times can be customized (CURL errors are likely to occur at the beginning due to large concurrency, and CURL errors may also occur due to network conditions or the stability of the other party's server).
The callback function is quite flexible and can perform multiple types of tasks at the same time (for example, downloading files, crawling web pages, and analyzing 404 can be performed simultaneously in one PHP process).
It is very easy to customize the task type, such as checking 404, getting the last URL of the redirect, etc.
You can set up cache to challenge product integrity.
Inadequacy:
Cannot make full use of multi-core CPU (it can be solved by opening multiple processes, and you need to handle logic such as task division by yourself).
The maximum concurrency is 500 (or 512?). After testing, it is an internal limit of CURL. Exceeding the maximum concurrency will always result in a failure.
Currently there is no resume function.
The current task is atomic, and it is not possible to divide a large file into several parts and open separate threads to download them.
http://www.bkjia.com/PHPjc/633144.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/633144.htmlTechArticleBecause we often need to use Curl multi-threading to handle some things at work, we have no choice but to delve deeper into Curl I have studied multi-threading, and now I will introduce to you Curl multi-threading...