Home >Backend Development >PHP Tutorial >Performance analysis of strtotime function in php
Recently I am working on a game data statistics background. The most basic function is to display user data by analyzing registration and login logs. During the internal testing of the company, the number of users was very small, so no performance issues were found. But these two days were put into a real test environment, and users poured in in large numbers. Starting from the afternoon, the statistics of online people started to get stuck, and it took a few seconds to return the data; the query speed of the number of registered people was okay. At night, the statistics of online people basically times out and cannot be opened. Although I don’t know what the bugs are on their game side, players often have login problems, resulting in not many online and registered people. But with this amount of data, my query speed is not good enough, which is very embarrassing.
Now they are checking the game for bugs, and I am also looking at the performance of the code behind the statistics. First of all, let me explain that the data I use for statistics is from the slave database, and their game uses the main database. Besides, I only have a few administrators here, so it is impossible to affect the performance of the game server.
Today the project team leader imported all the databases to the company’s server. I copied a copy to my local computer to see where the performance problems of the statistics platform lie. Then I found that even the registration statistics were very slow. It took about two seconds to return on the server, and it took more than 20 seconds on the local machine, and it often timed out (the default configuration of PHP is a 30-second timeout); it goes without saying that the online statistics are sure. Can't open. I looked at the database and found that there were only about 3,500 registration records on that day (with fake data). They were counted every five minutes, which is 288 times a day. Of course, this is definitely not a loop query of the database 288 times, that would be scolded to death.
To count the number of registrations in a time period, the logic is also very simple. It is to traverse the data once in each time period, compare the time size, and +1 if it matches. But why does such a simple logic, with only one million loops, take so long as half a minute?
The key issue lies in time comparison. We all know that timestamp is a more scientific method for comparing time sizes, and the time recorded in the database is generally in YYYY-mm-dd HH:ii:ss In the form, PHP has the strtotime function to convert it into a timestamp. However, after the blessing of 288 for * 3500 foreach, the execution time here is as long as half a minute.
$nowDayDT = strtotime( date('Y-m-d') ); $__startT = microtime(TRUE); for($i=0; $i<$allTime; $i += $gapTime){ $count = 0; //用于数据比较的 $startDT = $nowDayDT+$i; $endDT = $nowDayDT+$i+$gapTime; //用于显示的 $xAxis1 = date('H:i', $nowDayDT+$i); $xAxis2 = date('H:i', $nowDayDT+$i+$gapTime); foreach($rawData as $line){ $time = strtotime($line['log_dt']); if( $startDT<=$time && $time<$endDT ){ $count ++; } } $resArr[] = [ 'date'=>$xAxis1.'~'.$xAxis2, 'number'=>$count ]; } echo microtime(TRUE)-$__startT;
In this case, basically there is no way to use this strtotime function anymore, so what other way to compare the time size? The answer is very simple and crude. You can directly compare two date and time strings in PHP! So the modified code is as follows. Then the current running time is about 0.3 seconds
$__startT = microtime(TRUE); for($i=0; $i<$allTime; $i += $gapTime){ $count = 0; //用于数据比较的 $startDT = date('Y-m-d H:i:s', $nowDayDT+$i); $endDT = date('Y-m-d H:i:s', $nowDayDT+$i+$gapTime); //用于显示的 $xAxis1 = date('H:i', $nowDayDT+$i); $xAxis2 = date('H:i', $nowDayDT+$i+$gapTime); foreach($rawData as $line){ $time = $line['log_dt']; if( $startDT<=$time && $time<$endDT ){ $count ++; } } $resArr[] = [ 'date'=>$xAxis1.'~'.$xAxis2, 'number'=>$count ]; } echo microtime(TRUE)-$__startT;
Traversal and re-optimization
You may find a problem. There is a foreach nested in for. This performance is a bit worrying. Is it necessary to completely traverse the foreach inside? In fact, it is not necessary. As long as you check the SQL data, it will be sorted by time. The optimized time comparison algorithm is as follows:
for{ ... foreach($rawData as $line){ $time = $line['log_dt'];//strtotime($line['log_dt']); //优化算法计算 if($time<$startDT) continue; //小于开始时间则跳过 if($time>=$endDT) break; //大于结束时间则结束 $count ++; //否则为符合条件 //原始的算法 // if( $startDT<=$time && $time<$endDT ){ // $count ++; // } } ...}
The continue and break keywords are cleverly used here to skip a loop and end the entire loop. This time, in the statistics of the first time of the day, a large part of the subsequent data can be skipped directly. In the end, the total traversal time was reduced to about 0.12 seconds.
In summary, in large-scale data processing, you should try to avoid data conversion during traversal and avoid using some functions with complex principles. Such as strtotime