Home  >  Article  >  Backend Development  >  Detailed explanation of several methods of crawling pages in php_PHP tutorial

Detailed explanation of several methods of crawling pages in php_PHP tutorial

WBOY
WBOYOriginal
2016-07-21 15:06:26892browse

When doing some weather forecast or RSS subscription programs, it is often necessary to capture non-local files. Generally, PHP is used to simulate browser access, and the URL address is accessed through HTTP requests, and then the HTML source code or XML data is obtained. We cannot directly output the data. We often need to extract the content and then format it to display it in a more friendly way.
The following is a brief introduction to several methods and principles of PHP crawling pages:
1. The main methods of PHP crawling pages:
1. file( ) function
2. file_get_contents() function
3. fopen()->fread()->fclose() mode
4.curl mode
5. fsockopen() function socket mode
6. Use plug-ins (such as: http://sourceforge.net/projects/snoopy/)

2. The main ways for PHP to parse html or xml code:
1. file() function

Copy code The code is as follows:

$url='http://t.qq.com';
$lines_array=file($url);
$lines_string=implode('',$lines_array);
echo htmlspecialchars($lines_string );

2. The file_get_contents() function
uses file_get_contents and fopen to enable allow_url_fopen. Method: Edit php.ini and set allow_url_fopen = On. When allow_url_fopen is turned off, neither fopen nor file_get_contents can open remote files.
Copy code The code is as follows:

$url='http://t.qq .com';
$lines_string=file_get_contents($url);
echo htmlspecialchars($lines_string);

3. fopen()->fread()->fclose () pattern
Copy code The code is as follows:

$url='http:// t.qq.com';
$handle=fopen($url,"rb");
$lines_string="";
do{
$data=fread($handle,1024) ;
if(strlen($data)==0) {
break;
}
$lines_string.=$data;
}while(true);
fclose($ handle);
echo htmlspecialchars($lines_string);

4. curl method
Using curl requires space to enable curl. Method: Modify php.ini under Windows, remove the semicolon in front of extension=php_curl.dll, and copy ssleay32.dll and libeay32.dll to C:WINDOWSsystem32; install the curl extension under Linux.
Copy code The code is as follows:

$url='http://t.qq .com';
$ch=curl_init();
$timeout=5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$lines_string=curl_exec($ch);
curl_close($ch);
echo htmlspecialchars($lines_string);

5. fsockopen() function socket mode
Whether the socket mode can be executed correctly is also related to the server settings. Specifically, you can check which communication protocols are enabled by the server through phpinfo, such as mine The local php socket does not have http enabled, so I can only test it using udp.
Copy codeThe code is as follows:

if (!$fp) {
echo "ERROR: $errno - $errstr
n"
} else {
fwrite($fp, "n")
echo fread($fp, 26)
fclose($fp)
}


6. Plug-ins
There should be many plug-ins on the Internet. The snoopy plug-in was found online. If you are interested, you can research it.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/327608.htmlTechArticleWhen doing some weather forecast or RSS subscription programs, it is often necessary to capture non-local files. Generally, It uses php to simulate browser access and access the url address through http request...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:Detailed explanation of using curl multi-threading to simulate concurrency_PHP tutorialNext article:Detailed explanation of using curl multi-threading to simulate concurrency_PHP tutorial

Related articles

See more