Home >Backend Development >PHP Tutorial >PHP multi-threaded web page crawling implementation code_PHP tutorial

PHP multi-threaded web page crawling implementation code_PHP tutorial

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOriginal: 2016-07-21 15:35:58875browse

Limited by the fact that the PHP language itself does not support multi-threading, the efficiency of developing crawler programs is not high. At this time, it is often necessary to use Curl Multi Functions, which can achieve concurrent multi-threaded access to multiple URL addresses. Since Curl Multi Function is so powerful, can Curl Multi Functions be used to write concurrent multi-threaded file downloads? Of course, my code is given below:

Code 1: Write the obtained code directly into a certain File

Copy code The code is as follows:

 
$urls = array( 
'http ://www.sina.com.cn/', 
'http://www.sohu.com/', 
'http://www.163.com/' 
); / / Set the page URL to be crawled 

$save_to='/test.txt'; // Write the crawled code into the file 

$st = fopen($save_to," a"); 
$mh = curl_multi_init(); 

foreach ($urls as $i => $url) { 
$conn[$i] = curl_init($url); 
curl_setopt($conn[$i], CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"); 
curl_setopt($conn[$i], CURLOPT_HEADER ,0); 
curl_setopt($conn[$i], CURLOPT_CONNECTTIMEOUT,60); 
curl_setopt($conn[$i], CURLOPT_FILE,$st); // Set to write the crawled code to the file
curl_multi_add_handle ($ mh,$conn[$i]); 
} // Initialization

do { 
curl_multi_exec($mh,$active); 
} while ($active); // Execute 

foreach ($urls as $i => $url) { 
curl_multi_remove_handle($mh,$conn[$i]); 
curl_close($conn[$i]); 
} // End cleanup 

curl_multi_close($mh); 
fclose($st); 
?> 

Code 2: The code that will be obtained First put the variables, then write to a file

Copy the code The code is as follows:

 
 $urls = array( 
'http://www.sina.com.cn/', 
'http://www.sohu.com/', 
'http://www.163 .com/' 
); 

$save_to='/test.txt'; // Write the captured code into the file
$st = fopen($save_to,"a" ); 

$mh = curl_multi_init(); 
foreach ($urls as $i => $url) { 
$conn[$i] = curl_init($url); 
curl_setopt($conn[$i], CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"); 
curl_setopt($conn[$i], CURLOPT_HEADER ,0); 
curl_setopt ($conn[$i], CURLOPT_CONNECTTIMEOUT,60); 
curl_setopt($conn[$i],CURLOPT_RETURNTRANSFER,true); // Set the crawling code not to be written to the browser, but converted to a string
curl_multi_add_handle ($mh,$conn[$i]); 
} 

do { 
curl_multi_exec($mh,$active); 
} while ($active); 

foreach ($urls as $i => $url) { 
$data = curl_multi_getcontent($conn[$i]); // Get the crawled code string 
fwrite($ st,$data); //Write string to file. Of course, it is also possible not to write to a file, such as storing it in a database 
} // Obtain data variables and write to the file 

foreach ($urls as $i => $url) { 
 curl_multi_remove_handle($mh,$conn[$i]); 
curl_close($conn[$i]); 
} 

curl_multi_close($mh); 
fclose($st) ; 
?> 

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：Notes on using the PHP IN_ARRAY function_PHP tutorialNext article：Notes on using the PHP IN_ARRAY function_PHP tutorial

See more

PHP multi-threaded web page crawling implementation code_PHP tutorial

Related articles