首页 >后端开发 >php教程 >PHP实现针对设置了防盗链网络相册图片的抓取功能

PHP实现针对设置了防盗链网络相册图片的抓取功能

墨辰丷
墨辰丷原创
2018-06-09 11:02:301615浏览

本篇文章主要介绍PHP实现针对设置了防盗链网络相册图片的抓取功能,感兴趣的朋友参考下,希望对大家有所帮助。

本文实例讲述了php实现的网络相册图片防盗链完美破解方法,具体如下:

网络相册图片防盗链破解程序 - PHP版 这个防盗链破解版可以完美破解当下比较流行的: 百度相册,网易相册,360我喜欢等网站图片. 还可以实现简单的图片防盗链. 因为这个类是先进行获取远程图片, 然后再把图片发送到客户端,所以,算是进行了两次流量的传送.因此,会浪费空间流量,接下来,会开发缓存功能,这样可以实现节约流量!

<?php  
/**   
 * 网络相册图片防盗链破解程序 - PHP版   
 *   
 * 使用方法:   
 *    
 *   http://yourdomain/url.php?url=http://hiphotos.baidu.com/verdana/pic/item/baidupicture.jpg&referer=   
 *   其中url是指需要破解的图片URL,而referer是为了兼容一些不需要设置来路域名才能显示的相册,例如360我喜欢网,必须设置来路为空才能正常浏览. 所以,此时应该设置referer为1  
 *   
 * @author 雪狐博客   
 * @version 1.0   
 * @since  July 16, 2012  
 * @URL http://www.xuehuwang.com   
 */
class Frivoller   
{   
  /**   
   * HTTP 版本号 (1.0, 1.1) , 百度使用的是 version 1.1   
   *   
   * @var string   
   */
  protected $version;   
  /**   
   * 进行HTTP请求后响应的数据  
   *   
   * @var 字符串格式   
   */
  protected $body;   
  /**   
   * 需要获取的远程URL  
   *   
   * @var 字符串格式   
   */
  protected $link;   
  /**   
   * An array that containing any of the various components of the URL.   
   *   
   * @var array   
   */
  protected $components;   
  /**   
   * HTTP请求时HOST数据  
   *   
   * @var 字符串   
   */
  protected $host;   
  /**   
   * The path of required file.   
   * (e.g. &#39;/verdana/abpic/item/mygirl.png&#39;)   
   *   
   * @var string   
   */
  protected $path;   
  /**   
   * The HTTP referer, extra it from original URL   
   *   
   * @var string   
   */
  protected $referer;   
  /**   
   * The HTTP method, &#39;GET&#39; for default   
   *   
   * @var string   
   */
  protected $method  = &#39;GET&#39;;   
  /**   
   * The HTTP port, 80 for default   
   *   
   * @var int   
   */
  protected $port   = 80;   
  /**   
   * Timeout period on a stream   
   *   
   * @var int   
   */
  protected $timeout = 100;   
  /**   
   * The filename of image   
   *   
   * @var string   
   */
  protected $filename;   
  /**   
   * The ContentType of image file.   
   * image/jpeg, image/gif, image/png, image   
   *   
   * @var string   
   */
  protected $contentType;   
  /**   
   * Frivoller constructor   
   *   
   * @param string $link   
   */
  public function __construct($link,$referer=&#39;&#39;)   
  {   
    $this->referer = $referer;  
    // parse the http link   
    $this->parseLink($link);   
    // begin to fetch the image   
    $stream = pfsockopen($this->host, $this->port, $errno, $errstr, $this->timeout);   
    if (!$stream){  
      header("Content-Type: $this->contentType;");   
      echo $this->CurlGet($link);   
    }else{   
      fwrite($stream, $this->buildHeaders());   
      $this->body = "";   
      $img_size = get_headers($link,true);  
      while (!feof($stream)) {   
        $this->body .= fgets($stream, $img_size[&#39;Content-Length&#39;]);   
        //fwrite($jpg,fread($stream, $img_size[&#39;Content-Length&#39;]));  
      }   
      $content = explode("\r\n\r\n", $this->body, 2);   
      $this->body = $content[1];  
      fclose($stream);    
      // send &#39;ContentType&#39; header for saving this file correctly
      // 如果不发送CT,则在试图保存图片时,IE7 会发生错误 (800700de)   
      // Flock, Firefox 则没有这个问题,Opera 没有测试   
      header("Content-Type: $this->contentType;");   
      header("Cache-Control: max-age=315360000");  
      echo $this->body;     
       //保存图片  
       //file_put_contents(&#39;hello.jpg&#39;, $this->body);   
    }  
  }   
  /**   
   * Compose HTTP request header   
   *   
   * @return string   
   */
  private function buildHeaders()   
  {   
    $request = "$this->method $this->path HTTP/1.1\r\n";   
    $request .= "Host: $this->host\r\n";   
    $request .= "Accept-Encoding: gzip, deflate\r\n";  
    $request .= "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; zh-CN; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1\r\n";
    $request .= "Content-Type: image/jpeg\r\n";   
    $request .= "Accept: */*\r\n";   
    $request .= "Keep-Alive: 300\r\n";   
    $request .= "Referer: $this->referer\r\n";   
    $request .= "Cache-Control: max-age=315360000\r\n";   
    $request .= "Connection: close\r\n\r\n";   
    return $request;   
  }   
  /**   
   * Strip initial header and filesize info   
   */   
  private function extractBody(&$body)   
  {     
    // The status of link   
    if(strpos($body, &#39;200 OK&#39;) > 0) {   
      // strip header   
      $endpos = strpos($body, "\r\n\r\n");   
      $body = substr($body, $endpos + 4);   
      // strip filesize at nextline   
      $body = substr($body, strpos($body, "\r\n") + 2);   
    }       
  }   
  /**   
   * Extra the http url   
   *   
   * @param $link   
   */
  private function parseLink($link)   
  {   
    $this->link     = $link;   
    $this->components  = parse_url($this->link);   
    $this->host     = $this->components[&#39;host&#39;];   
    $this->path     = $this->components[&#39;path&#39;];   
    if(empty($this->referer)){  
      $this->referer   = $this->components[&#39;scheme&#39;] . &#39;://&#39; . $this->components[&#39;host&#39;];   
    }elseif($this->referer == &#39;1&#39;){  
      $this->referer   = &#39;&#39;;  
    }  
    $this->filename   = basename($this->path);   
    // extract the content type   
    $ext = substr(strrchr($this->path, &#39;.&#39;), 1);   
    if ($ext == &#39;jpg&#39; or $ext == &#39;jpeg&#39;) {   
      $this->contentType = &#39;image/pjpeg&#39;;   
    }   
    elseif ($ext == &#39;gif&#39;) {   
      $this->contentType = &#39;image/gif&#39;;   
    }   
    elseif ($ext == &#39;png&#39;) {   
      $this->contentType = &#39;image/x-png&#39;;   
    }   
    elseif ($ext == &#39;bmp&#39;) {   
      $this->contentType = &#39;image/bmp&#39;;   
    }   
    else {   
      $this->contentType = &#39;application/octet-stream&#39;;   
    }   
  }   
  //抓取网页内容   
  function CurlGet($url){   
    $url = str_replace(&#39;&&#39;,&#39;&&#39;,$url);   
    $curl = curl_init();   
    curl_setopt($curl, CURLOPT_URL, $url);   
    curl_setopt($curl, CURLOPT_HEADER, false);   
    curl_setopt($curl, CURLOPT_REFERER,$url);   
    curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; SeaPort/1.2; Windows NT 5.1; SV1; InfoPath.2)");   
    curl_setopt($curl, CURLOPT_COOKIEJAR, &#39;cookie.txt&#39;);   
    curl_setopt($curl, CURLOPT_COOKIEFILE, &#39;cookie.txt&#39;);   
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);   
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 0);   
    $values = curl_exec($curl);   
    curl_close($curl);   
    return $values;   
  }   
}   
/**  
 * 取得根域名  
 *  
 * @author   lonely  
 * @create    2011-3-11  
 * @version  0.11  
 * @lastupdate lonely  
 * @package Sl  
*/
class RootDomain{  
   private static $self;  
  private $domain=null;  
  private $host=null;  
  private $state_domain;  
  private $top_domain;  
  /**  
   * 取得域名分析实例  
   * Enter description here ...  
   */
  public static function instace(){  
    if(!self::$self)  
      self::$self=new self();  
    return self::$self;  
  }  
  public function __construct(){  
    $this->state_domain=array(  
      &#39;al&#39;,&#39;dz&#39;,&#39;af&#39;,&#39;ar&#39;,&#39;ae&#39;,&#39;aw&#39;,&#39;om&#39;,&#39;az&#39;,&#39;eg&#39;,&#39;et&#39;,&#39;ie&#39;,&#39;ee&#39;,&#39;ad&#39;,&#39;ao&#39;,&#39;ai&#39;,&#39;ag&#39;,&#39;at&#39;,&#39;au&#39;,&#39;mo&#39;,&#39;bb&#39;,&#39;pg&#39;,&#39;bs&#39;,&#39;pk&#39;,&#39;py&#39;,&#39;ps&#39;,&#39;bh&#39;,&#39;pa&#39;,&#39;br&#39;,&#39;by&#39;,&#39;bm&#39;,&#39;bg&#39;,&#39;mp&#39;,&#39;bj&#39;,&#39;be&#39;,&#39;is&#39;,&#39;pr&#39;,&#39;ba&#39;,&#39;pl&#39;,&#39;bo&#39;,&#39;bz&#39;,&#39;bw&#39;,&#39;bt&#39;,&#39;bf&#39;,&#39;bi&#39;,&#39;bv&#39;,&#39;kp&#39;,&#39;gq&#39;,&#39;dk&#39;,&#39;de&#39;,&#39;tl&#39;,&#39;tp&#39;,&#39;tg&#39;,&#39;dm&#39;,&#39;do&#39;,&#39;ru&#39;,&#39;ec&#39;,&#39;er&#39;,&#39;fr&#39;,&#39;fo&#39;,&#39;pf&#39;,&#39;gf&#39;,&#39;tf&#39;,&#39;va&#39;,&#39;ph&#39;,&#39;fj&#39;,&#39;fi&#39;,&#39;cv&#39;,&#39;fk&#39;,&#39;gm&#39;,&#39;cg&#39;,&#39;cd&#39;,&#39;co&#39;,&#39;cr&#39;,&#39;gg&#39;,&#39;gd&#39;,&#39;gl&#39;,&#39;ge&#39;,&#39;cu&#39;,&#39;gp&#39;,&#39;gu&#39;,&#39;gy&#39;,&#39;kz&#39;,&#39;ht&#39;,&#39;kr&#39;,&#39;nl&#39;,&#39;an&#39;,&#39;hm&#39;,&#39;hn&#39;,&#39;ki&#39;,&#39;dj&#39;,&#39;kg&#39;,&#39;gn&#39;,&#39;gw&#39;,&#39;ca&#39;,&#39;gh&#39;,&#39;ga&#39;,&#39;kh&#39;,&#39;cz&#39;,&#39;zw&#39;,&#39;cm&#39;,&#39;qa&#39;,&#39;ky&#39;,&#39;km&#39;,&#39;ci&#39;,&#39;kw&#39;,&#39;cc&#39;,&#39;hr&#39;,&#39;ke&#39;,&#39;ck&#39;,&#39;lv&#39;,&#39;ls&#39;,&#39;la&#39;,&#39;lb&#39;,&#39;lt&#39;,&#39;lr&#39;,&#39;ly&#39;,&#39;li&#39;,&#39;re&#39;,&#39;lu&#39;,&#39;rw&#39;,&#39;ro&#39;,&#39;mg&#39;,&#39;im&#39;,&#39;mv&#39;,&#39;mt&#39;,&#39;mw&#39;,&#39;my&#39;,&#39;ml&#39;,&#39;mk&#39;,&#39;mh&#39;,&#39;mq&#39;,&#39;yt&#39;,&#39;mu&#39;,&#39;mr&#39;,&#39;us&#39;,&#39;um&#39;,&#39;as&#39;,&#39;vi&#39;,&#39;mn&#39;,&#39;ms&#39;,&#39;bd&#39;,&#39;pe&#39;,&#39;fm&#39;,&#39;mm&#39;,&#39;md&#39;,&#39;ma&#39;,&#39;mc&#39;,&#39;mz&#39;,&#39;mx&#39;,&#39;nr&#39;,&#39;np&#39;,&#39;ni&#39;,&#39;ne&#39;,&#39;ng&#39;,&#39;nu&#39;,&#39;no&#39;,&#39;nf&#39;,&#39;na&#39;,&#39;za&#39;,&#39;aq&#39;,&#39;gs&#39;,&#39;eu&#39;,&#39;pw&#39;,&#39;pn&#39;,&#39;pt&#39;,&#39;jp&#39;,&#39;se&#39;,&#39;ch&#39;,&#39;sv&#39;,&#39;ws&#39;,&#39;yu&#39;,&#39;sl&#39;,&#39;sn&#39;,&#39;cy&#39;,&#39;sc&#39;,&#39;sa&#39;,&#39;cx&#39;,&#39;st&#39;,&#39;sh&#39;,&#39;kn&#39;,&#39;lc&#39;,&#39;sm&#39;,&#39;pm&#39;,&#39;vc&#39;,&#39;lk&#39;,&#39;sk&#39;,&#39;si&#39;,&#39;sj&#39;,&#39;sz&#39;,&#39;sd&#39;,&#39;sr&#39;,&#39;sb&#39;,&#39;so&#39;,&#39;tj&#39;,&#39;tw&#39;,&#39;th&#39;,&#39;tz&#39;,&#39;to&#39;,&#39;tc&#39;,&#39;tt&#39;,&#39;tn&#39;,&#39;tv&#39;,&#39;tr&#39;,&#39;tm&#39;,&#39;tk&#39;,&#39;wf&#39;,&#39;vu&#39;,&#39;gt&#39;,&#39;ve&#39;,&#39;bn&#39;,&#39;ug&#39;,&#39;ua&#39;,&#39;uy&#39;,&#39;uz&#39;,&#39;es&#39;,&#39;eh&#39;,&#39;gr&#39;,&#39;hk&#39;,&#39;sg&#39;,&#39;nc&#39;,&#39;nz&#39;,&#39;hu&#39;,&#39;sy&#39;,&#39;jm&#39;,&#39;am&#39;,&#39;ac&#39;,&#39;ye&#39;,&#39;iq&#39;,&#39;ir&#39;,&#39;il&#39;,&#39;it&#39;,&#39;in&#39;,&#39;id&#39;,&#39;uk&#39;,&#39;vg&#39;,&#39;io&#39;,&#39;jo&#39;,&#39;vn&#39;,&#39;zm&#39;,&#39;je&#39;,&#39;td&#39;,&#39;gi&#39;,&#39;cl&#39;,&#39;cf&#39;,&#39;cn&#39;,&#39;yr&#39;
    );  
    $this->top_domain=array(&#39;com&#39;,&#39;arpa&#39;,&#39;edu&#39;,&#39;gov&#39;,&#39;int&#39;,&#39;mil&#39;,&#39;net&#39;,&#39;org&#39;,&#39;biz&#39;,&#39;info&#39;,&#39;pro&#39;,&#39;name&#39;,&#39;museum&#39;,&#39;coop&#39;,&#39;aero&#39;,&#39;xxx&#39;,&#39;idv&#39;,&#39;me&#39;,&#39;mobi&#39;);  
    $this->url=$_SERVER[&#39;HTTP_HOST&#39;];  
  }  
  /**  
   * 设置URL  
   * Enter description here ...  
   * @param string $url  
   */
  public function setUrl($url=null){  
    $url=$url?$url:$this->url;  
    if(empty($url))return $this;  
    if(!preg_match("/^http:/is", $url))  
      $url="http://".$url;  
    $url=parse_url(strtolower($url));  
    $urlarr=explode(".", $url[&#39;host&#39;]);  
    $count=count($urlarr);  
    if ($count<=2){  
      $this->domain=$url[&#39;host&#39;];  
    }else if ($count>2){  
      $last=array_pop($urlarr);  
      $last_1=array_pop($urlarr);  
      if(in_array($last, $this->top_domain)){  
        $this->domain=$last_1.&#39;.&#39;.$last;  
        $this->host=implode(&#39;.&#39;, $urlarr);  
      }else if (in_array($last, $this->state_domain)){  
        $last_2=array_pop($urlarr);  
        if(in_array($last_1, $this->top_domain)){  
          $this->domain=$last_2.&#39;.&#39;.$last_1.&#39;.&#39;.$last;  
          $this->host=implode(&#39;.&#39;, $urlarr);  
        }else{  
          $this->host=implode(&#39;.&#39;, $urlarr).$last_2;  
          $this->domain=$last_1.&#39;.&#39;.$last;  
        }  
      }  
    }  
    return $this;  
  }  
  /**  
   * 取得域名  
   * Enter description here ...  
   */
  public function getDomain(){  
    return $this->domain;  
  }  
  /**  
   * 取得主机  
   * Enter description here ...  
   */
  public function getHost(){  
    return $this->host;  
  }  
}  
$referer = array(&#39;xuehuwang.com&#39;,&#39;zangbala.cn&#39;,&#39;qianzhebaikou.net&#39;,&#39;sinaapp.com&#39;,&#39;163.com&#39;,&#39;sina.com.cn&#39;,&#39;weibo.com&#39;,&#39;abc.com&#39;);  
// Get the url, maybe you should check the given url   
if (isset($_GET[&#39;url&#39;]) and $_GET[&#39;url&#39;] != &#39;&#39;) {   
  //获取来路域名  
  $site = (isset($_SERVER[&#39;HTTP_REFERER&#39;]) && !empty($_SERVER[&#39;HTTP_REFERER&#39;])) ? $_SERVER[&#39;HTTP_REFERER&#39;] : &#39;&#39;;
  //匹配是否是一个图片链接  
  if(preg_match(&#39;/(http|https|ftp|rtsp|mms):(\/\/|\\\\){1}((\w)+[.]){1,}([a-zA-Z]|[0-9]{1,3})(\S*\/)((\S)+[.]{1}(gif|jpg|png|bmp))/i&#39;,$_GET[&#39;url&#39;])){  
    if(!empty($site)){  
      $tempu = parse_url($site);  
      $host = $tempu[&#39;host&#39;];  
      $root = new RootDomain();  
      $root->setUrl($site);  
      if(in_array($root->getDomain(),$referer)){  
        $img_referer = (isset($_GET[&#39;referer&#39;]) && !empty($_GET[&#39;referer&#39;]))? trim($_GET[&#39;referer&#39;]) : &#39;&#39;;  
        new Frivoller($_GET[&#39;url&#39;],$img_referer);   
      }  
    }else{  
      $img_referer = (isset($_GET[&#39;referer&#39;]) && !empty($_GET[&#39;referer&#39;]))? trim($_GET[&#39;referer&#39;]) : &#39;&#39;;  
      new Frivoller($_GET[&#39;url&#39;],$img_referer);   
    }  
  }  
}   
?>

总结:以上就是本篇文的全部内容,希望能对大家的学习有所帮助。

相关推荐:

php操作日期与字符串的方法

php实现网页缓存的工具类的代码及使用方法

php基于ajax实现控制所有后台函数调用

以上是PHP实现针对设置了防盗链网络相册图片的抓取功能的详细内容。更多信息请关注PHP中文网其他相关文章!

声明:
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn