首页 >php教程 >php手册 >Php CURL模拟登陆论坛并采集数据实例

Php CURL模拟登陆论坛并采集数据实例

WBOY
WBOY原创
2016-05-25 16:44:341228浏览

要模拟浏览器访问网站,首选要学会观察浏览器是如何发送http报文的,以及网站服务器返回给浏览器 是什么样的内容,我推荐安装一个国外人开发的httpwatch的软件,最好搞个破解的版本,否则有些功能是使用不了的,这个软件安装完成之后是嵌入在 IE里的,启动Record,在地址栏输入网址后回车,它就会将浏览器和服务器之间的所有通讯扫描出来,让你一览无遗,关于这个软件的使用在本文不做介绍.

模拟浏览器登陆应用开发,最关键的地方是突破登陆验证,CURL技术不只支持http,还支持https,区别就在多了一层SSL加密传输,如果是要登陆 https网站,php记得要支持openssl,还是先拿一个例子来分析,代码如下:

<?php
$discuz_url = &#39;http://127.0.0.1/discuz/&#39;; //论坛地址
$login_url = $discuz_url . &#39;logging.php?action=login&#39;; //登录页地址
$post_fields = array();
//以下两项不需要修改
$post_fields[&#39;loginfield&#39;] = &#39;username&#39;;
$post_fields[&#39;loginsubmit&#39;] = &#39;true&#39;;
//用户名和密码,必须填写
$post_fields[&#39;username&#39;] = &#39;tianxin&#39;;
$post_fields[&#39;password&#39;] = &#39;111111&#39;;
//安全提问
$post_fields[&#39;questionid&#39;] = 0;
$post_fields[&#39;answer&#39;] = &#39;&#39;;
//@todo验证码
$post_fields[&#39;seccodeverify&#39;] = &#39;&#39;;
//获取表单FORMHASH
$ch = curl_init($login_url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$contents = curl_exec($ch);
curl_close($ch);
preg_match(&#39;/<inputs*type="hidden"s*name="formhash"s*value="(.*?)"s*/>/i&#39;, $contents, $matches);
if (!emptyempty($matches)) {
    $formhash = $matches[1];
} else {
    die(&#39;Not found the forumhash.&#39;);
}
//POST数据,获取COOKIE,cookie文件放在网站的temp目录下
$cookie_file = tempnam(&#39;./temp&#39;, &#39;cookie&#39;);
$ch = curl_init($login_url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_exec($ch);
curl_close($ch);
//取到了关键的cookie文件就可以带着cookie文件去模拟发帖,fid为论坛的栏目ID
$send_url = $discuz_url . "post.php?action=newthread&fid=2";
$ch = curl_init($send_url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
$contents = curl_exec($ch);
curl_close($ch);
//这里的hash码和登陆窗口的hash码的正则不太一样,这里的hidden多了一个id属性
preg_match(&#39;/<inputs*type="hidden"s*name="formhash"s*id="formhash"s*value="(.*?)"s*/>/i&#39;, $contents, $matches);
if (!emptyempty($matches)) {
    $formhash = $matches[1];
} else {
    die(&#39;Not found the forumhash.&#39;);
}
$post_data = array();
//帖子标题
$post_data[&#39;subject&#39;] = &#39;test2&#39;;
//帖子内容
$post_data[&#39;message&#39;] = &#39;test2&#39;;
$post_data[&#39;topicsubmit&#39;] = "yes";
$post_data[&#39;extra&#39;] = &#39;&#39;;
//帖子标签
$post_data[&#39;tags&#39;] = &#39;test&#39;;
//帖子的hash码,这个非常关键!假如缺少这个hash码,discuz会警告你来路的页面不正确
$post_data[&#39;formhash&#39;] = $formhash;
$ch = curl_init($send_url);
curl_setopt($ch, CURLOPT_REFERER, $send_url); //伪装REFERER
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);
$contents = curl_exec($ch);
curl_close($ch);
//清理cookie文件
unlink($cookie_file);
?>

CURL实现网站模拟登陆,代码如下:

<?php
$cookie_file = tempnam(&#39;./temp&#39;, &#39;cookie&#39;);
$login_url = &#39;/bbs/logging.php?action=login&loginsubmit=yes&#39;;
$post_fields = &#39;username=用户名&password=用户密码&referer=index.php&formhash=24eca8af&loginfield=username&questionid=0&loginsubmit=登录&#39;;
$ch = curl_init($login_url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_exec($ch);
curl_close($ch);
$url = &#39;/bbs&#39;;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
$contents = curl_exec($ch);
echo $contents;
curl_close($ch);
?>


声明:
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn