Home  >  Article  >  Backend Development  >  php crawl webpage_PHP tutorial

php crawl webpage_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 10:28:20945browse

Using php to capture the content of the page is very useful in actual development. For example, it can be used as a simple content collector to extract part of the content of the web page, etc. The captured content can be obtained by filtering it through regular expressions. To find the content you want, the following are several commonly used methods to use php to crawl the content of web pages.
1.file_get_contents
PHP code

$url = "http://www.phpzixue.cn";
$contents = file_get_contents($url);
//如果出现中文乱码使用下面代码
//$getcontent = iconv("gb2312", "utf-8",$contents);
echo $contents;
?>
$url = "http://www.phpzixue.cn";
$contents = file_get_contents($url);
//If Chinese garbled characters appear, use the following code
$url = "http://www.phpzixue.cn";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
//在需要用户检测的网页里需要增加下面两行
//curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
//curl_setopt($ch, CURLOPT_USERPWD, US_NAME.":".US_PWD);
$contents = curl_exec($ch);
curl_close($ch);
echo $contents;
?>
//$getcontent = iconv("gb2312", "utf-8",$contents);
echo $contents;
?>
$handle = fopen ("http://www.phpzixue.cn", "rb");
$contents = "";
do {
$data = fread($handle, 1024);
if (strlen($data) == 0) {
break;
}
$contents .= $data;
} while(true);
fclose ($handle);
echo $contents;
?>
2.curl
PHP code
$url = "http://www.phpzixue.cn";
$ch = curl_init(); $timeout = 5;

curl_setopt($ch, CURLOPT_URL, $url);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //The following two lines need to be added to the webpage that requires user detection //curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY); //curl_setopt($ch, CURLOPT_USERPWD, US_NAME.":".US_PWD); $contents = curl_exec($ch); curl_close($ch); echo $contents; ?>
3.fopen->fread->fclose PHP code
$handle = fopen ("http://www.phpzixue.cn", "rb"); $contents = ""; do { $data = fread($handle, 1024); if (strlen($data) == 0) {
break;
} $contents .= $data; } while(true); fclose ($handle); echo $contents; ?>
Note: 1. Use file_get_contents and fopen to enable allow_url_fopen. Method: Edit php.ini and set allow_url_fopen = On. When allow_url_fopen is turned off, neither fopen nor file_get_contents can open remote files. 2. To use curl, you must have space to enable curl. Method: Modify php.ini under Windows, remove the semicolon in front of extension=php_curl.dll, and copy ssleay32.dll and libeay32.dll to C:WINDOWSsystem32; install the curl extension under Linux. http://www.bkjia.com/PHPjc/802110.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/802110.htmlTechArticleUsing php to capture the content of the page is very useful in actual development, such as a simple content collection processor, extract part of the content from the web page, etc., and the captured content is processed through the regular...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn