Home >Backend Development >PHP Tutorial > PHP fsockopen/curl怎么获取目标转向后的页面代码有关问题

PHP fsockopen/curl怎么获取目标转向后的页面代码有关问题

WBOY
WBOYOriginal
2016-06-13 13:23:26725browse

PHP fsockopen/curl如何获取目标转向后的页面代码问题

PHP code
<!--

Code highlighting produced by Actipro CodeHighlighter (freeware)
http://www.CodeHighlighter.com/

-->
 
$ghurl = isset($_GET['id']) ? $_GET['id']:'http://3gabc.com/'; 
// php 获取 
function getContents($url){ 
$header = array("Referer: http://3gabc.com/"); 
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_TIMEOUT, 30); 
curl_setopt($ch, CURLOPT_HTTPHEADER,$header); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);  //是否抓取跳转后的页面
ob_start(); 
curl_exec($ch); 
$contents = ob_get_contents(); 
ob_end_clean(); 
curl_close($ch); 

return $contents; 
} 

$contents = getContents($ghurl); 
echo $contents; 
?> 



失败。。。

PHP code
<!--

Code highlighting produced by Actipro CodeHighlighter (freeware)
http://www.CodeHighlighter.com/

-->
<?php function get_page_content($url){
 $url = eregi_replace('^http://', '', $url);
 $temp = explode('/', $url);
 $host = array_shift($temp);
 $path = '/'.implode('/', $temp);
 $temp = explode(':', $host);
 $host = $temp[0];
 $port = isset($temp[1]) ? $temp[1] : 80;
 $fp = @fsockopen($host, $port, &$errno, &$errstr, 30);
 if ($fp){
     @fputs($fp, "GET ".$path." HTTP/1.1\r\nHost: ".$host." \r\nAccept: */*\r\nReferer:".$url." \r\nUser-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)\r\nConnection: Close\r\n\r\n");
 }
 $Content = '';
 while ($str = @fread($fp, 4096)){
     $Content .= $str;
 }
 @fclose($fp);
 //echo $Content;
 //重定向
 if(preg_match("/^HTTP\/\d.\d 301 Moved Permanently/is",$Content)){
  if(preg_match("/Location:\s+(.*?)\s+/is",$Content,$murl)){ 
      return get_page_content($url."/".$murl[1]);
  }
 }

 //读取内容
 if(preg_match("/^HTTP\/\d.\d 200 OK/is",$Content)){
  preg_match("/Content-Type:(.*?)\r\n/is",$Content,$murl);
  $contentType=trim($murl[1]);
  $Content=explode("\r\n\r\n",$Content,2);
  $Content=$Content[1];
 }
 return $Content;
}


echo get_page_content('3gabc.com');

?>



失败。。。

先后 尝试fsockopen/curl等方法,获取header并判断执行,但都失败,请教各位。

------解决方案--------------------


3gabc.com就只有这么一句。
你也只能获取到这个,meta refresh是在浏览器上执行的!!!

------解决方案--------------------
因为3gabc.com就只有这么一句。

你在抓取页面后 “echo $contents;” 页面自然就重定向到http://www.3Gabc.com 了。
所以不能echo $contents; 而是用正则“preg_match("//is",$content, $matches)”
抓出转向地址,然后在curl这个转线地址,就可以抓到你要的内容了。
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn