Home >Backend Development >PHP Tutorial >Introduction to several methods of crawling pages with php curl_PHP Tutorial
Curl mainly captures data. Of course, we can use other methods to capture it, such as fsockopen, file_get_contents, etc. But it can only capture those pages that can be directly accessed. If you want to capture pages with page access control, or pages after logging in, it will be more difficult.
is to retrieve the PHP homepage and put it into a file.
Example 1. Use PHP's CURL module to retrieve the PHP homepage
代码如下 | 复制代码 |
$ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "http://localhost/mytest/phpinfo.php"); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //如果把这行注释掉的话,就会直接输出 $result=curl_exec($ch); curl_close($ch); |
2. Use a proxy to crawl
Why use a proxy to crawl Woolen cloth? Take Google as an example. If you capture Google's data very frequently in a short period of time, you won't be able to capture it. When Google restricts your IP address, you can change the proxy and crawl again.
The code is as follows | Copy code |
代码如下 | 复制代码 |
$ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "http://www.hzhuti.com"); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE); curl_setopt($ch, CURLOPT_PROXY, 125.21.23.6:8080); //url_setopt($ch, CURLOPT_PROXYUSERPWD, 'user:password');如果要密码的话,加上这个 $result=curl_exec($ch); curl_close($ch); ?> |
curl_setopt($ch, CURLOPT_URL, "http://www.hzhuti.com");
curl_setopt($ch, CURLOPT_HEADER, false);curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);代码如下 | 复制代码 |
$ch = curl_init(); |
The code is as follows | Copy code |
$ch = curl_init();<🎜> /*It should be noted here that the data to be submitted cannot be a two-dimensional array or higher<🎜 > *For example array('name'=>serialize(array('tank','zhang')),'sex'=>1,'birth'=>'20101010') *For example array( 'name'=>array('tank','zhang'),'sex'=>1,'birth'=>'20101010') This will report an error*/ $data = array(' name' => 'test', 'sex'=>1,'birth'=>'20101010'); curl_setopt($ch, CURLOPT_URL, 'http://localhost/mytest/curl/upload .php'); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $data); curl_exec($ch); ?>In upload. In the php file, print_r($_POST); can use curl to grab the content Array output by upload.php ( [name] => test [sex] => 1 [birth] => 20101010 ) |
4. Grab some pages with page access control
3 methods of page access control
3 methods of page access control Zhang Published on 2010-10-12
Category: apache/nginx
We often see this phenomenon, see the picture below
apache page access control
Why should we carry out such control? Let different people see different things and protect information. Although this kind of protection is relatively low-level, it is still somewhat useful.
1. Use the htpasswd command to generate a permission control file
The code is as follows | Copy code | ||||||||
1.[zhangy@BlackGhost test]$ htpasswd -c ./access tank / /Generate a password file, -c is to create a new file htpasswd -h can be viewed 2.New password: Adding password for user tank5.[zhangy@BlackGhost test]$ cat access //Check the password file 6.tank:Uj5B3qIF/BNdI //The username is in clear text and the password is encrypted.[zhangy@BlackGhost test]$ htpasswd -c ./access tank //Generate a password file, -c is to create a new file htpasswd -h can be viewed New password: //Prompt for password
Adding password for user tank [zhangy@BlackGhost test]$ cat access //Check the password filetank:Uj5B3qIF/BNdI //The user name is in clear text , the password is encrypted. At this point the password file is generated. |
代码如下 | 复制代码 |
[zhangy@BlackGhost test]$ vi .htaccess //打开个文件 ,添加权限内容 |
The code is as follows | Copy code |
listen 10004NameVirtualHost *:10004 |
The code is as follows | Copy code |
[zhangy@BlackGhost test]$ vi .htaccess //Open a file and add permission content [zhangy@BlackGhost test]$ cat .htaccess //The following is the content of .htaccess AuthType Basic AuthName "access test" AuthUserFile /home/zhangy/www/test/access Require valid-user |
3,不用密码文件,也可以进行访问控制
代码如下 | 复制代码 |
define('ADMIN_USERNAME','tank'); // Admin Username //log check echo <<<EOB |
curl相关函数列表:
curl_init — 初始化一个CURL会话
curl_setopt — 为CURL调用设置一个选项
curl_exec — 执行一个CURL会话
curl_close — 关闭一个CURL会话
curl_version — 返回当前CURL版本
curl_init — 初始化一个CURL会话
描述
int curl_init ([string url])
curl_init()函数将初始化一个新的会话,返回一个CURL句柄供 curl_setopt(), curl_exec(),和 curl_close() 函数使用。如果可选参数被提供,那么CURLOPT_URL选项将被设置成这个参数的值。你可以使用curl_setopt()函数人工设置。
例 1. 初始化一个新的CURL会话,且取回一个网页
代码如下 | 复制代码 |
$ch = curl_init(); |