Home > Article > Backend Development > What are the methods for crawling web pages with PHP?
PHP methods for crawling web pages include: 1. file() function; 2. file_get_contents() function; 3. fopen()->fread()->fclose mode; 4. curl method; 5. fsockopen() function.
The operating environment of this article: windows10 system, php 7.1, thinkpad t480 computer.
When we are doing development work, we usually need to grab some web page files. Usually we use PHP to simulate browser access, access the url address through http requests, and then get the html source code or xml data. However, we cannot directly output the data after we get it. We often need to extract the content and then format it to display the data in a more friendly way.
Let’s briefly talk about several methods and principles of PHP crawling pages:
1. The main methods of PHP crawling pages:
1. file() function
2. file_get_contents() function
3. fopen()->fread()->fclose() mode
4.curl method
5. fsockopen() function socket mode
2. The main ways for PHP to parse html or xml code:
1. file() function
<?php //定义url $url='http://t.qq.com'; //fiel函数读取内容数组 $lines_array=file($url); //拆分数组为字符串 $lines_string=implode('',$lines_array); //输出内容,嘿嘿,大家也可以保存在自己的服务器上 echo $lines_string;
2. file_get_contents( )Function
Use file_get_contents and fopen must have space to enable allow_url_fopen. Method: Edit php.ini and set allow_url_fopen = On. When allow_url_fopen is turned off, neither fopen nor file_get_contents can open remote files.
<?php //定义url $url='http://t.qq.com'; //file_get_contents函数远程读取数据 $lines_string=file_get_contents($url); //输出内容,嘿嘿,大家也可以保存在自己的服务器上 echo htmlspecialchars($lines_string);
3. fopen()->fread()->fclose() mode
<?php //定义url $url='http://t.qq.com'; //fopen以二进制方式打开 $handle=fopen($url,"rb"); //变量初始化 $lines_string=""; //循环读取数据 do{ $data=fread($handle,1024); if(strlen($data)==0) { break; } $lines_string.=$data; }while(true); //关闭fopen句柄,释放资源 fclose($handle); //输出内容,嘿嘿,大家也可以保存在自己的服务器上 echo $lines_string;
4. Curl method
Using curl requires space to open curl. Method: Modify php.ini under Windows, remove the semicolon in front of extension=php_curl.dll, and copy ssleay32.dll and libeay32.dll to C:\WINDOWS\system32; under Linux, install the curl extension.
<?php // 创建一个新cURL资源 $url='http://t.qq.com'; $ch=curl_init(); $timeout=5; // 设置URL和相应的选项 curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // 抓取URL $lines_string=curl_exec($ch); // 关闭cURL资源,并且释放系统资源 curl_close($ch); //输出内容,嘿嘿,大家也可以保存在自己的服务器上 echo $lines_string;
5. fsockopen() function socket mode
Whether the socket mode can be executed correctly is also related to the server settings. Specifically, you can check which communication protocols are enabled by the server through phpinfo.
<?php $fp = fsockopen("t.qq.com", 80, $errno, $errstr, 30); if (!$fp) { echo "$errstr ($errno)<br />\n"; } else { $out = "GET / HTTP/1.1\r\n"; $out .= "Host: t.qq.com\r\n"; $out .= "Connection: Close\r\n\r\n"; fwrite($fp, $out); while (!feof($fp)) { echo fgets($fp, 128); } fclose($fp); }
The 17th online class of PHP Chinese website has officially started (php training)! Friends who love PHP programming, hurry up and sign up!
The above is the detailed content of What are the methods for crawling web pages with PHP?. For more information, please follow other related articles on the PHP Chinese website!