Home  >  Article  >  Backend Development  >  Comparative analysis of the efficiency of reading large files using the file function and fseek function in PHP

Comparative analysis of the efficiency of reading large files using the file function and fseek function in PHP

高洛峰
高洛峰Original
2016-12-26 14:12:541550browse

PHP can use the file function and fseek function to read large files, but there may be differences in efficiency between the two. This article introduces the comparative analysis of the php file function and fseek function to achieve large file reading efficiency. Friends who need it can just for reference.

1. Directly use the file function to operate

Since the file function reads all the contents into the memory at one time, PHP is in order to prevent some poorly written programs from taking up too much The system memory is insufficient, causing the server to crash, so by default the maximum memory usage is limited to 16M. This is set through memory_limit = 16M in php.ini. If this value is set to -1, the memory There is no limit on usage.

The following is a piece of code that uses file to extract the last line of this file:

<?php
  ini_set(&#39;memory_limit&#39;, &#39;-1&#39;);
  $file = &#39;access.log&#39;;
  $data = file($file);
  $line = $data[count($data) - 1];
  echo $line;
  ?>

The entire code execution took 116.9613 (s) .

My machine has 2G of memory. When I press F5 to run, the system turns gray and recovers after almost 20 minutes. It can be seen that if such a large file is read directly into the memory, what are the consequences? It's serious, so it's not a last resort. The memory_limit thing cannot be adjusted too high. Otherwise, the only choice is to call the computer room and ask the machine to be reset.

2. Directly use PHP's fseek to perform file operations

This method is the most common method. It does not need to read all the contents of the file, but directly through the pointer. operation, so the efficiency is quite efficient. When using fseek to operate files, there are many different methods, and the efficiency may be slightly different. The following are two commonly used methods:

Method 1

First pass fseek Find the last EOF of the file, then find the starting position of the last line, get the data of this line, then find the starting position of the next line, then take the position of this line, and so on, until the $num line is found.

The implementation code is as follows

<?php
  $fp = fopen($file, "r");
  $line = 10;
  $pos = -2;
  $t = " ";
  $data = "";
  while ($line > 0)
  {
    while ($t != "\n")
    {
      fseek($fp, $pos, SEEK_END);
      $t = fgetc($fp);
      $pos--;
    }// http://www.manongjc.com
    $t = " ";
    $data .= fgets($fp);
    $line--;
  }
  fclose($fp);
  echo $data
  ?>

The entire code execution takes 0.0095 (s)

Method 2

Still use fseek to read from the end of the file, but this time it is not read one by one, but one by one. Every time a piece of data is read, the read data is placed in a buf. Then judge whether the last $num rows of data have been read by the number of newline characters (n).

The implementation code is as follows

<?php
  $fp = fopen($file, "r");
  $num = 10;
  $chunk = 4096;
  $fs = sprintf("%u", filesize($file));
  $max = (intval($fs) == PHP_INT_MAX) ? PHP_INT_MAX : filesize($file);
  for ($len = 0; $len < $max; $len += $chunk)
  {
    $seekSize = ($max - $len > $chunk) ? $chunk : $max - $len;
    fseek($fp, ($len + $seekSize) * -1, SEEK_END);
    $readData = fread($fp, $seekSize) . $readData;
    if (substr_count($readData, "\n") >= $num + 1)
    {
      // 作者:码农教程  http://www.manongjc.com
      preg_match("!(.*?\n){" . ($num) . "}$!", $readData, $match);
      $data = $match[0];
      break;
    }
  }
  fclose($fp);
  echo $data;
  ?>

The entire code execution takes 0.0009(s).

Method 3

<?php
  function tail($fp, $n, $base = 5)
  {
    assert($n > 0);
    $pos = $n + 1;
    $lines = array();
    while (count($lines) <= $n)
    {
      try
      {
        fseek($fp, -$pos, SEEK_END);
      }
      catch (Exception $e)
      {
        fseek(0);
        break;
      }
      $pos *= $base;
      while (!feof($fp))
      {
        array_unshift($lines, fgets($fp));
      }
    }
   
    return array_slice($lines, 0, $n);
  }
   
  var_dump(tail(fopen("access.log", "r+"), 10));
  ?>

The entire code execution takes 0.0003(s)

The above is the entire content of this article. I hope it will be helpful to everyone's learning, and I also hope that everyone will support the PHP Chinese website.

For more articles on the comparative analysis of the efficiency of reading large files using the file function and fseek function in PHP, please pay attention to the PHP Chinese website!


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn