Home  >  Article  >  Backend Development  >  Code written in PHP about spider crawling records of static pages

Code written in PHP about spider crawling records of static pages

WBOY
WBOYOriginal
2016-07-25 09:02:54983browse
  1. $useragent = addslashes(strtolower($_SERVER['HTTP_USER_AGENT']));

  2. if (strpos($useragent, 'googlebot')!== false){$bot = 'Google';}
  3. elseif (strpos($useragent,'mediapartners-google') !== false){$bot = 'Google Adsense';}
  4. elseif (strpos($useragent,'baiduspider') !== false){$bot = 'Baidu';}
  5. elseif (strpos($useragent,'sogou spider') !== false){$bot = 'Sogou';}
  6. elseif (strpos($useragent,'sogou web') !== false){$bot = 'Sogou web';}
  7. elseif (strpos($useragent,'sosospider') !== false){$bot = 'SOSO';}
  8. elseif (strpos($useragent,'yahoo') !== false){$bot = 'Yahoo';}
  9. elseif (strpos($useragent,'msn') !== false){$bot = 'MSN';}
  10. elseif (strpos($useragent,'msnbot') !== false){$bot = 'msnbot';}
  11. elseif (strpos($useragent,'sohu') !== false){$bot = 'Sohu';}
  12. elseif (strpos($useragent,'yodaoBot') !== false){$bot = 'Yodao';}
  13. elseif (strpos($useragent,'twiceler') !== false){$bot = 'Twiceler';}
  14. elseif (strpos($useragent,'ia_archiver') !== false){$bot = 'Alexa_';}
  15. elseif (strpos($useragent,'iaarchiver') !== false){$bot = 'Alexa';}
  16. elseif (strpos($useragent,'slurp') !== false){$bot = '雅虎';}
  17. elseif (strpos($useragent,'bot') !== false){$bot = '其它蜘蛛';}
  18. if(isset($bot)){
  19. $fp = @fopen('bot.txt','a');
  20. fwrite($fp,date('Y-m-d H:i:s')."t".$_SERVER["REMOTE_ADDR"]."t".$bot."t".'http://'.$_SERVER['SERVER_NAME'].$_SERVER["HTTP_X_REWRITE_URL"]."rn");
  21. fclose($fp);
  22. }
  23. $file=".".$_SERVER[HTTP_X_REWRITE_URL];
  24. $f_head=substr($file,-5);
  25. if($f_head==".html")
  26. {
  27. if(file_exists($file))
  28. {
  29. echo file_get_contents($file);
  30. }else
  31. {
  32. header('HTTP/1.1 404 Not Found');
  33. header("status: 404 Not Found");

  34. echo "该页面无法找到";

  35. }
  36. }
  37. else
  38. {
  39. header('HTTP/1.1 404 Not Found');
  40. header("status: 404 Not Found");
  41. echo "该页面无法找到";
  42. }
  43. ?>

复制代码

伪静态文件内容:

  1. [ISAPI_Rewrite]

  2. # 3600 = 1 hour

  3. CacheClockRate 3600
  4. RepeatLimit 32
  5. # Protect httpd.ini and httpd.parse.errors files
  6. # from accessing through HTTP
  7. RewriteRule /index.html /index.php
  8. RewriteRule ^/article/(.*) /bot.php [L]
  9. RewriteRule ^/list/(.*) /bot.php [L]

复制代码


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn