Heim > Artikel > Backend-Entwicklung > PHP curl 抓取AJAX异步内容,curlajax_PHP教程
其实抓ajax异步内容的页面和抓普通的页面区别不大。ajax只不过是做了一次异步的http请求,只要使用firebug类似的工具,找到请求的后端服务url和传值的参数,然后对该url传递参数进行抓取即可。
Code
<span>$cookie_file</span>=<span>tempnam</span>('./temp','cookie'<span>); </span><span>$ch</span> =<span> curl_init(); </span><span>$url1</span> = "http://www.cdut.edu.cn/default.html"<span>; curl_setopt(</span><span>$ch</span>,CURLOPT_URL,<span>$url1</span><span>); curl_setopt(</span><span>$ch</span>,CURLOPT_HTTP_VERSION,<span>CURL_HTTP_VERSION_1_1); curl_setopt(</span><span>$ch</span>,CURLOPT_HEADER,0<span>); curl_setopt(</span><span>$ch</span>,CURLOPT_RETURNTRANSFER,1<span>); curl_setopt(</span><span>$ch</span>,CURLOPT_FOLLOWLOCATION,1<span>); curl_setopt(</span><span>$ch</span>, CURLOPT_ENCODING ,'gzip'); <span>//</span><span>加入gzip解析 //设置连接结束后保存cookie信息的文件</span> curl_setopt(<span>$ch</span>,CURLOPT_COOKIEJAR,<span>$cookie_file</span><span>); </span><span>$content</span>=curl_exec(<span>$ch</span><span>); curl_close(</span><span>$ch</span><span>); </span><span>$ch3</span> =<span> curl_init(); </span><span>$url3</span> = "http://www.cdut.edu.cn/xww/dwr/call/plaincall/portalAjax.getNewsXml.dwr"<span>; </span><span>$curlPost</span> = "callCount=1&page=/xww/type/1000020118.html&httpSessionId=12A9B726E6A2D4D3B09DE7952B2F282C&scriptSessionId=295315B4B4141B09DA888D3A3ADB8FAA658&c0-scriptName=portalAjax&c0-methodName=getNewsXml&c0-id=0&c0-param0=string:10000201&c0-param1=string:1000020118&c0-param2=string:news_&c0-param3=number:5969&c0-param4=number:1&c0-param5=null:null&c0-param6=null:null&batchId=0"<span>; curl_setopt(</span><span>$ch3</span>,CURLOPT_URL,<span>$url3</span><span>); curl_setopt(</span><span>$ch3</span>,CURLOPT_POST,1<span>); curl_setopt(</span><span>$ch3</span>,CURLOPT_POSTFIELDS,<span>$curlPost</span><span>); </span><span>//</span><span>设置连接结束后保存cookie信息的文件</span> curl_setopt(<span>$ch3</span>,CURLOPT_COOKIEFILE,<span>$cookie_file</span><span>); </span><span>$content1</span>=curl_exec(<span>$ch3</span><span>); curl_close(</span><span>$ch3</span>);
尝试伪造 头信息 :Host 、Referer、User-Agent 等
刚写的。希望有用
>