專業的統計網站,例如百度統計,Google Analytics,cnzz等統計後台提供的都是站長常用的統計指標,比如uv,pv,在線時長,ip等,另外由於網絡原因,我發現Google Analytics會比百度統計多幾百的ip,所以想自己寫腳本來了解下真正的訪問量有多少,不過基於nginx的訪問日誌會比統計後台多不少,因為有不少蜘蛛的訪問也會被統計進來,還有靜態文件的統計,其實如果演算法改進的話完全可以過濾掉那些無用的統計數據,今天給牛牛們分享下最基礎的統計,另外也是為了學習和回顧python語言。
例如,伺服器上有nginx的log如下:
221.221.155.54 - - [02/Aug/2014:15:16:11 +0800] "GET / HTTP/1.1" 200 8482 "http zuidaima.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.57 Safari/537.36" "-" "0.020"2157 Safari/537.36" "-" "0.020"21251. /2014:15:16:11 +0800] "GET / HTTP/1.1" 200 8482 "http://www.zuidaima.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.57 Safari/537.36" "-" "0.020"
221.221.155.54 - - [02/Aug/2014:15:16:11 +0800] "GET //2014:15:16:11 +0800] "GET //18485:18482:18484820 //www.zuidaima.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.57 Safari/537.36" "-" "0.020"
腳本
stat_ip.py
#encoding=utf8import re
zuidaima_nginx_log_path="/usr/local/nginx/logs/www.zuidaima.com.access.log"