Home >Backend Development >PHP Tutorial >PHP statistics search engine crawling 404 link page path in nginx access log_PHP tutorial

PHP statistics search engine crawling 404 link page path in nginx access log_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 10:26:161058browse

I have the habit of cutting nginx logs on the server every day, so I can always record some 404 page information for the daily visits from major search engines. Traditionally, I only occasionally analyze the logs, but for friends who have a lot of log information, I can manually It may not be an easy task to filter. I personally researched it a little bit and generated a txt text for 404 visits from search engines such as Google, Baidu, Soso, 360 Search, Yisou, Sogou, and Bing. File, directly upload the code test.php.

Copy code The code is as follows:

//Visit test.php?s=google
$domain='http://www.jb51.net';
$spiders=array('baidu'=>'Baiduspider','360'=>'360Spider',
'google'=>'Googlebot','soso'=>'Sosospider','sogou'=>
'Sogou web spider','easou'=>'EasouSpider','bing'=>'bingbot');

$path='/home/nginx/logs/'.date('Y/m/').(date('d')-1).'/access_www.txt';

$s=$_GET['s'];

if(!array_key_exists($s,$spiders)) die();
$spider=$spiders[$s];

$file=$s.'_'.date('ym').(date('d')-1).'.txt';
if(!file_exists($file)){
$in=file_get_contents($path);
$pattern='/GET (.*) HTTP/1.1" 404.*'.$spider.'/';
Preg_match_all ( $pattern , $in , $matches );
$out='';
foreach($matches[1] as $k=>$v){
          $out.=$domain.$v."rn";
}
File_put_contents($file,$out);
}

$url=$domain.'/silian/'.$file;
echo $url;

Okay that’s it. There is no advanced technology, just the process of writing by hand.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/824747.htmlTechArticleI have the habit of cutting nginx logs on the server every day, so I can always record the visits from major search engines every day. Some 404 page information, traditionally I only analyze logs occasionally, but...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn