Home >Backend Development >PHP Tutorial >PHP regular parsing apache log file

PHP regular parsing apache log file

巴扎黑
巴扎黑Original
2016-11-09 13:28:511914browse

You can roll the log by hour and use PHP regular log analysis to solve the problem

$logLine ='127.0.0.1 - - [22/May/2015:17:09:13 +0800] "GET /sale/images/y-select.png HTTP/1.1" 200 1095';
$pattern = &#39;/^(?P<ip>[0-9.]+) - - \[(?P<time>[^\]]+)\]+ "GET (?P<url>[^ ]+) HTTP\/1.[1|0|2]" (?P<status>[0-9.]+) (?P<size>[0-9.]+)/i&#39;;
preg_match($pattern, $log, $match);
//var_dump($match);
$ip     = $match[&#39;ip&#39;];
$time   = strtotime($match[&#39;time&#39;]);
$url    = $match[&#39;url&#39;];
$status = $match[&#39;status&#39;];
$size   = $match[&#39;size&#39;];
printf("IP:%s 访问时间:%s URL:%s 状态:%s 文件尺寸:%s",$ip,$time,$url,$status,$size);

You can also do this

Use regular expressions to separate Apache log files

www.MyException.Cn Netizens shared on: 2015-08-26 View : 17 times

Use regular expressions to separate Apache log files
Example of Apache log files in common log format:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET / apache_pb.gif HTTP/1.0" 200 2326

Apache log file example in combined log format:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/ 1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"



The IP address of the client.
The RFC1413 identity determined by the client identd process, the symbol "-" in the output indicates that the information here is invalid.
The client ID (userid) obtained by the HTTP authentication system for accessing the webpage. If the webpage is not password protected, this item will be "-".
The time when the server completes request processing.
The protocol used by the resource requested by the client's action.
The status code returned by the server to the client.
The number of bytes returned to the client excluding the response header. If no information is returned, this item should be "-".
"Referer" request header.
"User-Agent" request header.
The regular expression used to extract information consists of:

^: matches the beginning of each line.
([0-9.]+)s: Match IP address.
([w.-]+)s: matches identity, consisting of numbers, letters, underscores or dot separators.
([w.-]+)s: matches userid, consisting of numbers, letters, underscores or dot separators.
([[^[]]+])s: matching time.
"((?:[^"]|")+)"s: Match request information, escaped double quotes may appear in double quotes.
(d{3})s: Match status code.
(d+ |-)s: Match the number of response bytes or -.
"((?:[^"]|")+)"s: Match the "Referer" request header, and escaped double quotes may appear in the double quotes.
"((?:[^"]|")+)": Matches the "User-Agent" request header, and escaped double quotes may appear in the double quotes.
$: Matches the end of the line.
The final expression As follows:

^([0-9.]+)s([w.-]+)s([w.-]+)s([[^[]]+])s"((?:[ ^"]|")+)"s(d{3})s(d+|-)s"((?:[^"]|")+)"s"((?:[^"]|" )+)"$


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn