Limited by the fact that the PHP language itself does not support multi-threading, the efficiency of developing crawler programs is not high. At this time, it is often necessary to use Curl Multi Functions, which can achieve concurrent multi-threaded access to multiple URL addresses. Since Curl Multi Function is so powerful, can Curl Multi Functions be used to write concurrent multi-threaded file downloads? Of course, my code is given below:
Code 1: Write the obtained code directly into a certain File
Copy code The code is as follows:
$urls = array(
'http ://www.sina.com.cn/',
'http://www.sohu.com/',
'http://www.163.com/'
); / / Set the page URL to be crawled
$save_to='/test.txt'; // Write the crawled code into the file
$st = fopen($save_to," a");
$mh = curl_multi_init();
foreach ($urls as $i => $url) {
$conn[$i] = curl_init($url);
curl_setopt($conn[$i], CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)");
curl_setopt($conn[$i], CURLOPT_HEADER ,0);
curl_setopt($conn[$i], CURLOPT_CONNECTTIMEOUT,60);
curl_setopt($conn[$i], CURLOPT_FILE,$st); // Set to write the crawled code to the file
curl_multi_add_handle ($ mh,$conn[$i]);
} // Initialization
do {
curl_multi_exec($mh,$active);
} while ($active); // Execute
foreach ($urls as $i => $url) {
curl_multi_remove_handle($mh,$conn[$i]);
curl_close($conn[$i]);
} // End cleanup
curl_multi_close($mh);
fclose($st);
?>
Code 2: The code that will be obtained First put the variables, then write to a file
Copy the code The code is as follows:
$urls = array(
'http://www.sina.com.cn/',
'http://www.sohu.com/',
'http://www.163 .com/'
);
$save_to='/test.txt'; // Write the captured code into the file
$st = fopen($save_to,"a" );
$mh = curl_multi_init();
foreach ($urls as $i => $url) {
$conn[$i] = curl_init($url);
curl_setopt($conn[$i], CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)");
curl_setopt($conn[$i], CURLOPT_HEADER ,0);
curl_setopt ($conn[$i], CURLOPT_CONNECTTIMEOUT,60);
curl_setopt($conn[$i],CURLOPT_RETURNTRANSFER,true); // Set the crawling code not to be written to the browser, but converted to a string
curl_multi_add_handle ($mh,$conn[$i]);
}
do {
curl_multi_exec($mh,$active);
} while ($active);
foreach ($urls as $i => $url) {
$data = curl_multi_getcontent($conn[$i]); // Get the crawled code string
fwrite($ st,$data); //Write string to file. Of course, it is also possible not to write to a file, such as storing it in a database
} // Obtain data variables and write to the file
foreach ($urls as $i => $url) {
curl_multi_remove_handle($mh,$conn[$i]);
curl_close($conn[$i]);
}
curl_multi_close($mh);
fclose($st) ;
?>
http://www.bkjia.com/PHPjc/322228.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/322228.htmlTechArticleLimited by the PHP language itself does not support multi-threading, the efficiency of developing crawler programs is not high. At this time, it is often necessary With Curl Multi Functions, it can achieve concurrent multi-threaded access to multiple...