Home >Backend Development >PHP Tutorial >Use of single-page parallel collection function get_htmls based on curl data collection_PHP tutorial
Use get_html() in the first article to implement simple data collection. Since the data is collected one by one, the transmission time will be the total download time of all pages. If one page is 1 second, then 10 pages will be 10 Seconds. Fortunately, curl also provides parallel processing capabilities.
To write a function for parallel collection, you must first understand what kind of pages you want to collect and what requests to use for the collected pages. Only then can you write a relatively commonly used function.
Functional requirements analysis:
Return what?
Of course the html of each page is collected into an array
What parameters are passed?
When writing get_html(), we learned that we can use the options array to pass more curl parameters, so the feature of writing simultaneous collection functions for multiple pages must be retained.
What type of parameters?
Whether it is requesting the HTML of a web page or calling the Internet API interface, the parameters passed by get and post always request the same page or interface, but the parameters are different. Then the parameter type is:
get_htmls($url,$options);
$url is string
$options is a two-dimensional array, and the parameters of each page are an array.
In this case, the problem seems to be solved. But I searched all over the curl manual and couldn't see where the get parameters are passed, so I can only pass $url in the form of an array and add a method parameter
The prototype of the function is decided on get_htmls($urls,$options = array, $method = 'get'); the code is as follows:
http://www.baidu.com/s?wd=shili&pn=0&ie=utf-8
http://www.baidu.com/s?wd=shili&pn=10&ie=utf-8
http://www.baidu.com/s?wd=shili&pn=20&ie=utf-8
http://www.baidu.com/s?wd=shili&pn=30&ie=utf-8
http://www.baidu.com/s?wd=shili&pn=50&ie=utf-8
The above five pages are very regular, and only the value of pn changes.
Write a post.php file as follows:
That’s it for today’s sharing. If it’s not well written or unclear, please give me some advice