Several ways to crawl pages with PHP_PHP tutorial
When we develop network programs, we often need to grab non-local files. Generally, we use PHP to simulate browser access, access the URL address through HTTP requests, and then get the HTML source code or XML data. We cannot output the data directly. , it is often necessary to extract the content and then format it to display it in a more friendly way.
Let’s briefly talk about several methods and principles of php crawling pages:
1. The main method of crawling pages with PHP:
1. file() function
2. file_get_contents() function
3. fopen()->fread()->fclose() mode
4.curl method
5. fsockopen() function socket mode
6. Use plug-ins (such as: http://sourceforge.net/projects/snoopy/)
2. The main ways for PHP to parse html or xml code:
1. file() function
? 1 2 3 4 5 6 7 8 9<?php
//定义url
$url
=
'http://t.qq.com'
;
//fiel函数读取内容数组
$lines_array
=file(
$url
);
//拆分数组为字符串
$lines_string
=implode(
''
,
$lines_array
);
//输出内容,嘿嘿,大家也可以保存在自己的服务器上
echo
$lines_string
;
2. file_get_contents() function
Using file_get_contents and fopen must enable allow_url_fopen. Method: Edit php.ini and set allow_url_fopen = On. When allow_url_fopen is turned off, neither fopen nor file_get_contents can open remote files.
<?php
//定义url
$url
=
'http://t.qq.com'
;
//file_get_contents函数远程读取数据
$lines_string
=
file_get_contents
(
$url
);
//输出内容,嘿嘿,大家也可以保存在自己的服务器上
echo
htmlspecialchars(
$lines_string
);
3. fopen()->fread()->fclose() mode
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19<?php
//定义url
$url
=
'http://t.qq.com'
;
//fopen以二进制方式打开
$handle
=
fopen
(
$url
,
"rb"
);
//变量初始化
$lines_string
=
""
;
//循环读取数据
do
{
$data
=
fread
(
$handle
,1024);
if
(
strlen
(
$data
)==0) {
break
;
}
$lines_string
.=
$data
;
}
while
(true);
//关闭fopen句柄,释放资源
fclose(
$handle
);
//输出内容,嘿嘿,大家也可以保存在自己的服务器上
echo
$lines_string
;
4. curl方式
使用curl必须空间开启curl。方法:windows下修改php.ini,将extension=php_curl.dll前面的分号去掉,而且需 要拷贝ssleay32.dll和libeay32.dll到C:WINDOWSsystem32下;Linux下要安装curl扩展。
<?php
// 创建一个新cURL资源
$url
=
'http://t.qq.com'
;
$ch
=curl_init();
$timeout
=5;
// 设置URL和相应的选项
curl_setopt(
$ch
, CURLOPT_URL,
$url
);
curl_setopt(
$ch
, CURLOPT_RETURNTRANSFER, 1);
curl_setopt(
$ch
, CURLOPT_CONNECTTIMEOUT,
$timeout
);
// 抓取URL
$lines_string
=curl_exec(
$ch
);
// 关闭cURL资源,并且释放系统资源
curl_close(
$ch
);
//输出内容,嘿嘿,大家也可以保存在自己的服务器上
echo
$lines_string
;
5. fsockopen() function socket mode
Whether the socket mode can be executed correctly is also related to the server settings. Specifically, you can use phpinfo to check which communication protocols are enabled on the server.
<?php
$fp
=
fsockopen
(
"t.qq.com"
, 80,
$errno
,
$errstr
, 30);
if
(!
$fp
) {
echo
"$errstr ($errno)<br>n"
;
}
else
{
$out
=
"GET / HTTP/1.1rn"
;
$out
.=
"Host: t.qq.comrn"
;
$out
.=
"Connection: Closernrn"
;
fwrite(
$fp
,
$out
);
while
(!
feof
(
$fp
)) {
echo
fgets
(
$fp
, 128);
}
fclose(
$fp
);
}
6. snoopy插件,最新版本是Snoopy-1.2.4.zip Last Update: 2013-05-30,推荐大家使用
使用网上非常流行的snoopy来进行采集,这是一个非常强大的采集插件,并且它的使用非常方便,你也可以在里面设置agent来模拟浏览器信息。
? 1 2 3 4 5 6 7 8 9 10 11 12<?php
//引入snoopy的类文件
require
(
'Snoopy.class.php'
);
//初始化snoopy类
$snoopy
=
new
Snoopy;
$url
=
"http://t.qq.com"
;
//开始采集内容
$snoopy
->fetch(
$url
);
//保存采集内容到$lines_string
$lines_string
=
$snoopy
->results;
//输出内容,嘿嘿,大家也可以保存在自己的服务器上
echo
$lines_string
;
Note: The agent is set in line 45 of the Snoopy.class.php file. Please search for "var $agent" (the content in quotation marks) in the file. Browser content you can use PHP to get,
Use echo $_SERVER['HTTP_USER_AGENT']; to get browser information, and just copy the echo content into the agent.

PHPsessionstrackuserdataacrossmultiplepagerequestsusingauniqueIDstoredinacookie.Here'showtomanagethemeffectively:1)Startasessionwithsession_start()andstoredatain$_SESSION.2)RegeneratethesessionIDafterloginwithsession_regenerate_id(true)topreventsessi

In PHP, iterating through session data can be achieved through the following steps: 1. Start the session using session_start(). 2. Iterate through foreach loop through all key-value pairs in the $_SESSION array. 3. When processing complex data structures, use is_array() or is_object() functions and use print_r() to output detailed information. 4. When optimizing traversal, paging can be used to avoid processing large amounts of data at one time. This will help you manage and use PHP session data more efficiently in your actual project.

The session realizes user authentication through the server-side state management mechanism. 1) Session creation and generation of unique IDs, 2) IDs are passed through cookies, 3) Server stores and accesses session data through IDs, 4) User authentication and status management are realized, improving application security and user experience.

Tostoreauser'snameinaPHPsession,startthesessionwithsession_start(),thenassignthenameto$_SESSION['username'].1)Usesession_start()toinitializethesession.2)Assigntheuser'snameto$_SESSION['username'].Thisallowsyoutoaccessthenameacrossmultiplepages,enhanc

Reasons for PHPSession failure include configuration errors, cookie issues, and session expiration. 1. Configuration error: Check and set the correct session.save_path. 2.Cookie problem: Make sure the cookie is set correctly. 3.Session expires: Adjust session.gc_maxlifetime value to extend session time.

Methods to debug session problems in PHP include: 1. Check whether the session is started correctly; 2. Verify the delivery of the session ID; 3. Check the storage and reading of session data; 4. Check the server configuration. By outputting session ID and data, viewing session file content, etc., you can effectively diagnose and solve session-related problems.

Multiple calls to session_start() will result in warning messages and possible data overwrites. 1) PHP will issue a warning, prompting that the session has been started. 2) It may cause unexpected overwriting of session data. 3) Use session_status() to check the session status to avoid repeated calls.

Configuring the session lifecycle in PHP can be achieved by setting session.gc_maxlifetime and session.cookie_lifetime. 1) session.gc_maxlifetime controls the survival time of server-side session data, 2) session.cookie_lifetime controls the life cycle of client cookies. When set to 0, the cookie expires when the browser is closed.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

Atom editor mac version download
The most popular open source editor

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

SublimeText3 English version
Recommended: Win version, supports code prompts!

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.
