PHP methods for crawling web pages include: 1. file() function; 2. file_get_contents() function; 3. fopen()->fread()->fclose mode; 4. curl method; 5. fsockopen() function.
The operating environment of this article: windows10 system, php 7.1, thinkpad t480 computer.
When we are doing development work, we usually need to grab some web page files. Usually we use PHP to simulate browser access, access the url address through http requests, and then get the html source code or xml data. However, we cannot directly output the data after we get it. We often need to extract the content and then format it to display the data in a more friendly way.
Let’s briefly talk about several methods and principles of PHP crawling pages:
1. The main methods of PHP crawling pages:
1. file() function
2. file_get_contents() function
3. fopen()->fread()->fclose() mode
4.curl method
5. fsockopen() function socket mode
2. The main ways for PHP to parse html or xml code:
1. file() function
<?php //定义url $url='http://t.qq.com'; //fiel函数读取内容数组 $lines_array=file($url); //拆分数组为字符串 $lines_string=implode('',$lines_array); //输出内容,嘿嘿,大家也可以保存在自己的服务器上 echo $lines_string;
2. file_get_contents( )Function
Use file_get_contents and fopen must have space to enable allow_url_fopen. Method: Edit php.ini and set allow_url_fopen = On. When allow_url_fopen is turned off, neither fopen nor file_get_contents can open remote files.
<?php //定义url $url='http://t.qq.com'; //file_get_contents函数远程读取数据 $lines_string=file_get_contents($url); //输出内容,嘿嘿,大家也可以保存在自己的服务器上 echo htmlspecialchars($lines_string);
3. fopen()->fread()->fclose() mode
<?php //定义url $url='http://t.qq.com'; //fopen以二进制方式打开 $handle=fopen($url,"rb"); //变量初始化 $lines_string=""; //循环读取数据 do{ $data=fread($handle,1024); if(strlen($data)==0) { break; } $lines_string.=$data; }while(true); //关闭fopen句柄,释放资源 fclose($handle); //输出内容,嘿嘿,大家也可以保存在自己的服务器上 echo $lines_string;
4. Curl method
Using curl requires space to open curl. Method: Modify php.ini under Windows, remove the semicolon in front of extension=php_curl.dll, and copy ssleay32.dll and libeay32.dll to C:\WINDOWS\system32; under Linux, install the curl extension.
<?php // 创建一个新cURL资源 $url='http://t.qq.com'; $ch=curl_init(); $timeout=5; // 设置URL和相应的选项 curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // 抓取URL $lines_string=curl_exec($ch); // 关闭cURL资源,并且释放系统资源 curl_close($ch); //输出内容,嘿嘿,大家也可以保存在自己的服务器上 echo $lines_string;
5. fsockopen() function socket mode
Whether the socket mode can be executed correctly is also related to the server settings. Specifically, you can check which communication protocols are enabled by the server through phpinfo.
<?php $fp = fsockopen("t.qq.com", 80, $errno, $errstr, 30); if (!$fp) { echo "$errstr ($errno)<br />\n"; } else { $out = "GET / HTTP/1.1\r\n"; $out .= "Host: t.qq.com\r\n"; $out .= "Connection: Close\r\n\r\n"; fwrite($fp, $out); while (!feof($fp)) { echo fgets($fp, 128); } fclose($fp); }
The 17th online class of PHP Chinese website has officially started (php training)! Friends who love PHP programming, hurry up and sign up!
The above is the detailed content of What are the methods for crawling web pages with PHP?. For more information, please follow other related articles on the PHP Chinese website!

The article compares ACID and BASE database models, detailing their characteristics and appropriate use cases. ACID prioritizes data integrity and consistency, suitable for financial and e-commerce applications, while BASE focuses on availability and

The article discusses securing PHP file uploads to prevent vulnerabilities like code injection. It focuses on file type validation, secure storage, and error handling to enhance application security.

Article discusses best practices for PHP input validation to enhance security, focusing on techniques like using built-in functions, whitelist approach, and server-side validation.

The article discusses strategies for implementing API rate limiting in PHP, including algorithms like Token Bucket and Leaky Bucket, and using libraries like symfony/rate-limiter. It also covers monitoring, dynamically adjusting rate limits, and hand

The article discusses the benefits of using password_hash and password_verify in PHP for securing passwords. The main argument is that these functions enhance password protection through automatic salt generation, strong hashing algorithms, and secur

The article discusses OWASP Top 10 vulnerabilities in PHP and mitigation strategies. Key issues include injection, broken authentication, and XSS, with recommended tools for monitoring and securing PHP applications.

The article discusses strategies to prevent XSS attacks in PHP, focusing on input sanitization, output encoding, and using security-enhancing libraries and frameworks.

The article discusses the use of interfaces and abstract classes in PHP, focusing on when to use each. Interfaces define a contract without implementation, suitable for unrelated classes and multiple inheritance. Abstract classes provide common funct


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Zend Studio 13.0.1
Powerful PHP integrated development environment

SublimeText3 Linux new version
SublimeText3 Linux latest version

Atom editor mac version download
The most popular open source editor

SublimeText3 Mac version
God-level code editing software (SublimeText3)

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft