


In the process of actual projects or writing your own gadgets (such as news aggregation, commodity price monitoring, price comparison), you usually need to obtain data from a third-party website or API interface. When you need to process a URL queue, in order to improve For performance, you can use the curl_multi_* family of functions provided by cURL to achieve simple concurrency.
This article will discuss two specific implementation methods and make a simple performance comparison of different methods.
1. Classic cURL concurrency mechanism and its existing problems
Classic cURL The implementation mechanism is easy to find online. For example, refer to the following implementation method in the PHP online manual:
function
classic_curl($urls,
$delay)
{
$queue
= curl_multi_init();
$map
= array();
foreach
($urls
as
$url)
{
//
create cURL resources
$ch
= curl_init();
//
set URL and other appropriate options
curl_setopt($ch,
CURLOPT_URL, $url);
curl_setopt($ch,
CURLOPT_TIMEOUT, 1);
curl_setopt($ch,
CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,
CURLOPT_HEADER, 0);
curl_setopt($ch,
CURLOPT_NOSIGNAL, true);
//
add handle
curl_multi_add_handle($queue,
$ch);
$map[$url]
= $ch;
}
$active
= null;
//
execute the handles
do
{
$mrc
= curl_multi_exec($queue,
$active);
}
while
($mrc
== CURLM_CALL_MULTI_PERFORM);
while
($active
> 0 && $mrc
== CURLM_OK) {
if
(curl_multi_select($queue,
0.5) != -1) {
do
{
$mrc
= curl_multi_exec($queue,
$active);
}
while
($mrc
== CURLM_CALL_MULTI_PERFORM);
}
}
$responses
= array();
foreach
($map
as
$url=>$ch)
{
$responses[$url]
= callback(curl_multi_getcontent($ch),
$delay);
curl_multi_remove_handle($queue,
$ch);
curl_close($ch);
}
curl_multi_close($queue);
return
$responses;
}
First push all URLs into the concurrent queue, then execute the concurrent process, wait for all requests to be received, and perform subsequent processing such as data parsing. In the actual processing process, the recipient Due to the influence of network transmission, the content of some URLs will be returned prior to other URLs, but classic cURL concurrency must wait for the slowest URL to return before starting processing. Waiting means CPU idleness and waste. If the URL queue is very short, This kind of idleness and waste is still within the acceptable range, but if the queue is very long, this kind of waiting and waste will become unacceptable.
2. Improved Rolling cURL concurrency method
After careful analysis, it is not difficult to find that there is still room for optimization of classic cURL concurrency. The optimization method is to process a URL request as quickly as possible after it is completed, and wait for other URLs to return while processing, instead of waiting for the slowest interface. Start processing and other work only after returning, thereby avoiding CPU idleness and waste. Without further ado, here is the specific implementation:
function
rolling_curl($urls,
$delay)
{
$queue
= curl_multi_init();
$map
= array();
foreach
($urls
as
$url)
{
$ch
= curl_init();
curl_setopt($ch,
CURLOPT_URL, $url);
curl_setopt($ch,
CURLOPT_TIMEOUT, 1);
curl_setopt($ch,
CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,
CURLOPT_HEADER, 0);
curl_setopt($ch,
CURLOPT_NOSIGNAL, true);
curl_multi_add_handle($queue,
$ch);
$map[(string)
$ch]
= $url;
}
$responses
= array();
do
{
while
(($code
= curl_multi_exec($queue,
$active))
== CURLM_CALL_MULTI_PERFORM) ;
if
($code
!= CURLM_OK) { break;
}
//
a request was just completed -- find out which one
while
($done
= curl_multi_info_read($queue))
{
//
get the info and content returned on the request
$info
= curl_getinfo($done['handle']);
$error
= curl_error($done['handle']);
$results
= callback(curl_multi_getcontent($done['handle']),
$delay);
$responses[$map[(string)
$done['handle']]]
= compact('info',
'error',
'results');
//
remove the curl handle that just completed
curl_multi_remove_handle($queue,
$done['handle']);
curl_close($done['handle']);
}
//
Block for data in / output; error handling is done by curl_multi_exec
if
($active
> 0) {
curl_multi_select($queue,
0.5);
}
}
while
($active);
curl_multi_close($queue);
return
$responses;
}
3. 两种并发实现的性能对比
改进前后的性能对比试验在LINUX主机上进行, 测试时使用的并发队列如下:
http://a.com/item.htm?id=14392877692
http:/a.com/item.htm?id=16231676302
http://a.com/item.htm?id=5522416710
http://a.com/item.htm?id=16551116403
简要说明下实验设计的原则和性能测试结果的格式: 为保证结果的可靠, 每组实验重复20次, 在单次实验中, 给定相同的接口URL集合, 分别测量Classic(指经典的并发机制)和Rolling(指改进后的并发机制)两种并发机制的耗时(秒为单位), 耗时短者胜出(Winner), 并计算节省的时间(Excellence, 秒为单位)以及性能提升比例(Excel. %). 为了尽量贴近真实的请求而又保持实验的简单, 在对返回结果的处理上只是做了简单的正则表达式匹配, 而没有进行其他复杂的操作. 另外, 为了确定结果处理回调对性能对比测试结果的影响, 可以使用usleep模拟现实中比较负责的数据处理逻辑(如提取, 分词, 写入文件或数据库等).
性能测试中用到的回调函数为:
function
callback($data,
$delay)
{
preg_match_all('/
(.+)
/iU',$data,
$matches);
usleep($delay);
return
compact('data',
'matches');
}
When there is no delay in data processing callback: Rolling Curl is slightly better, but the performance improvement effect is not obvious.


php把负数转为正整数的方法:1、使用abs()函数将负数转为正数,使用intval()函数对正数取整,转为正整数,语法“intval(abs($number))”;2、利用“~”位运算符将负数取反加一,语法“~$number + 1”。

实现方法:1、使用“sleep(延迟秒数)”语句,可延迟执行函数若干秒;2、使用“time_nanosleep(延迟秒数,延迟纳秒数)”语句,可延迟执行函数若干秒和纳秒;3、使用“time_sleep_until(time()+7)”语句。

php除以100保留两位小数的方法:1、利用“/”运算符进行除法运算,语法“数值 / 100”;2、使用“number_format(除法结果, 2)”或“sprintf("%.2f",除法结果)”语句进行四舍五入的处理值,并保留两位小数。

判断方法:1、使用“strtotime("年-月-日")”语句将给定的年月日转换为时间戳格式;2、用“date("z",时间戳)+1”语句计算指定时间戳是一年的第几天。date()返回的天数是从0开始计算的,因此真实天数需要在此基础上加1。

方法:1、用“str_replace(" ","其他字符",$str)”语句,可将nbsp符替换为其他字符;2、用“preg_replace("/(\s|\ \;||\xc2\xa0)/","其他字符",$str)”语句。

php判断有没有小数点的方法:1、使用“strpos(数字字符串,'.')”语法,如果返回小数点在字符串中第一次出现的位置,则有小数点;2、使用“strrpos(数字字符串,'.')”语句,如果返回小数点在字符串中最后一次出现的位置,则有。

在PHP中,可以利用implode()函数的第一个参数来设置没有分隔符,该函数的第一个参数用于规定数组元素之间放置的内容,默认是空字符串,也可将第一个参数设置为空,语法为“implode(数组)”或者“implode("",数组)”。

在php中,可以使用substr()函数来读取字符串后几个字符,只需要将该函数的第二个参数设置为负值,第三个参数省略即可;语法为“substr(字符串,-n)”,表示读取从字符串结尾处向前数第n个字符开始,直到字符串结尾的全部字符。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

SublimeText3 Mac version
God-level code editing software (SublimeText3)

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Notepad++7.3.1
Easy-to-use and free code editor

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.
