Home  >  Article  >  Backend Development  >  Summary of methods of obtaining web page content in php_PHP tutorial

Summary of methods of obtaining web page content in php_PHP tutorial

WBOY
WBOYOriginal
2016-07-21 15:48:34942browse

The captured content can be filtered through regular expressions to get the content you want. As for how to use regular expressions to filter, I will not introduce it here. For those who are interested, the following are several commonly used PHP methods. How to crawl content from web pages.
1.file_get_contents
PHP code

Copy code The code is as follows:

< ;?php
$url = "http://www.jb51.net";
$contents = file_get_contents($url);
//If Chinese garbled characters appear, use the following code
// $getcontent = iconv("gb2312", "utf-8",$contents);
echo $contents;
?>

2.curl
PHP code
Copy code The code is as follows:

$url = "http: //www.jb51.net";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER , 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
//You need to add the following two lines to the webpage that requires user detection
//curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
//curl_setopt($ch, CURLOPT_USERPWD, US_NAME.":".US_PWD);
$contents = curl_exec($ch);
curl_close($ch);
echo $contents;
?>

3.fopen->fread->fclose
PHP code
Copy code The code is as follows:

$handle = fopen ("http://www.jb51.net", "rb");
$contents = "";
do {
$data = fread($handle, 1024);
if (strlen($data) == 0) {
break;
}
$contents .= $data;
} while(true);
fclose ($handle);
echo $contents;
?>

Note :
1. Use file_get_contents and fopen to enable allow_url_fopen. Method: Edit php.ini and set allow_url_fopen = On. When allow_url_fopen is turned off, neither fopen nor file_get_contents can open remote files.
2. To use curl, you must have space to enable curl. Method: Modify php.ini under Windows, remove the semicolon in front of extension=php_curl.dll, and copy ssleay32.dll and libeay32.dll to C:WINDOWSsystem32; install the curl extension under Linux.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/319720.htmlTechArticleThe captured content can be filtered through regular expressions to get the content you want. As for how Use regular expressions to filter. I won’t introduce it here. If you are interested, here’s what...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn