Home  >  Article  >  Backend Development  >  Solution to Concurrent Reading and Writing File Conflicts in PHP_PHP Tutorial

Solution to Concurrent Reading and Writing File Conflicts in PHP_PHP Tutorial

WBOY
WBOYOriginal
2016-07-13 10:25:58970browse

For applications where the daily IP is not high or the number of concurrency is not very large, generally there is no need to consider these! There is no problem at all with normal file manipulation methods. But if the concurrency is high, when we read and write files, it is very likely that multiple processes will operate on the next file. If the access to the file is not exclusive at this time, it will easily cause data loss.
For example: In an online chat room (it is assumed that the chat content is written to a file), at the same time, both user A and user B need to operate the data save file. First, A opens the file, and then updates the data in it, but Here B also happens to have the same file open and is also preparing to update the data inside. When A saves the written file, B has actually opened the file. But when B saves the file back, data loss has already been caused, because user B here has no idea that the file he opened when he changed it, user A also changed the file, so in the end user B When saving changes, User A's updates will be lost.
For such a problem, the general solution is that when one process operates on the file, it first locks the other process, which means that only this process has the right to read the file. If other processes read it now, it is There is no problem at all, but if a process tries to update it at this time, the operation will be rejected. If the process that previously locked the file completes the update operation of the file, it will release the exclusive identifier. The file is restored to a changeable state. Next, in the same way, if the file is not locked when the process is operating the file, then it can safely lock the file and enjoy it alone.
The general solution is:

Copy the code The code is as follows:

$fp=fopen('/tmp/ lock.txt','w+');
if (flock($fp,LOCK_EX)){
fwrite($fp,"Write something heren");
flock($fp,LOCK_UN);
}else{
echo 'Couldn't lock the file !';
}
fclose($fp);

But in PHP flock seems to work It's not that good! In the case of multiple concurrency, it seems that resources are often monopolized and not released immediately, or not released at all, causing deadlock, causing the server's CPU usage to be very high, and sometimes even causing the server to die completely. It seems that this happens in many linux/unix systems. Therefore, you must consider carefully before using flock.
So there is no solution? In fact, this is not the case. If flock() is used properly, it is entirely possible to solve the deadlock problem. Of course, if you do not consider using the flock() function, there will also be a good solution to our problem. After my personal collection and summary, the solutions are roughly summarized as follows.
Option 1: Set a timeout when locking the file. The approximate implementation is as follows:
Copy code The code is as follows:

if($fp=fopen($fileName,' a')){
$startTime=microtime();
do{
$canWrite=flock($fp,LOCK_EX);
if(!$canWrite){
usleep(round (rand(0,100)*1000));
}
}while((!$canWrite)&&((microtime()-$startTime)<1000));
if($canWrite){
fwrite($fp,$dataToSave);
}
fclose($fp);
}

The timeout is set to 1ms, if the lock is not obtained within this time , it is obtained repeatedly, until the right to operate the file is obtained directly, of course. If the timeout limit has been reached, you must exit immediately and give up the lock to allow other processes to operate.

Option 2: Do not use the flock function and use temporary files to solve the problem of read and write conflicts. The general principle is as follows:
(1) Put a copy of the file that needs to be updated into our temporary file directory, save the last modification time of the file to a variable, and pick a random one for this temporary file, which is not easy Duplicate filename.
(2) After updating this temporary file, check whether the last update time of the original file is consistent with the previously saved time.
(3) If the last modification time is the same, rename the modified temporary file to the original file. In order to ensure that the file status is updated synchronously, the file status needs to be cleared.
(4) However, if the last modification time is consistent with the previously saved one, it means that the original file has been modified during this period. At this time, the temporary file needs to be deleted and then return false, indicating that the file has been modified at this time. There are other processes in progress.
The implementation code is as follows:

Copy the code The code is as follows:

$dir_fileopen='tmp';
function randomid(){
    return time().substr(md5(microtime()),0,rand(5,12));
}
function cfopen($filename,$mode){
    global $dir_fileopen;
    clearstatcache();
    do{
  $id=md5(randomid(rand(),TRUE));
        $tempfilename=$dir_fileopen.'/'.$id.md5($filename);
    } while(file_exists($tempfilename));
    if(file_exists($filename)){
        $newfile=false;
        copy($filename,$tempfilename);
    }else{
        $newfile=true;
    }
    $fp=fopen($tempfilename,$mode);
    return $fp?array($fp,$filename,$id,@filemtime($filename)):false;
}
function cfwrite($fp,$string){
 return fwrite($fp[0],$string);
}
function cfclose($fp,$debug='off'){
    global $dir_fileopen;
    $success=fclose($fp[0]);
    clearstatcache();
    $tempfilename=$dir_fileopen.'/'.$fp[2].md5($fp[1]);
    if((@filemtime($fp[1])==$fp[3])||($fp[4]==true&&!file_exists($fp[1]))||$fp[5]==true){
        rename($tempfilename,$fp[1]);
    }else{
        unlink($tempfilename);
  //说明有其它进程 在操作目标文件,当前进程被拒绝
        $success=false;
    }
    return $success;
}
$fp=cfopen('lock.txt','a+');
cfwrite($fp,"welcome to beijing.n");
fclose($fp,'on');

The functions used in the above code need to be explained:
(1) rename(); to rename a file or a directory, this function is actually more like mv in Linux. It is convenient to update the path or name of a file or directory. But when I test the above code in window, if the new file name already exists, a notice will be given saying that the current file already exists. But it works fine under linux.
(2) clearstatcache(); Clear the status of the file. PHP will cache all file attribute information to provide higher performance, but sometimes, when multiple processes delete or update files, PHP does not have time to update the cache. The file attributes in it can easily lead to the fact that the last update time accessed is not the real data. So here you need to use this function to clear the saved cache.

Option 3: Randomly read and write the operated files to reduce the possibility of concurrency.
This solution seems to be used more often when recording user access logs. Previously, we needed to define a random space. The larger the space, the smaller the possibility of concurrency. Assuming that the random read and write space is [1-500], then the distribution of our log files ranges from log1 to log500. Every time a user accesses, data is randomly written to any file between log1~log500. At the same time, there are two processes recording logs. Process A may be the updated log32 file, but what about process B? Then the update at this time may be log399. You must know that if you want process B to also operate log32, the probability is basically 1/500, which is almost equal to zero. When we need to analyze access logs, we only need to merge these logs first and then analyze them. One benefit of using this solution to record logs is that the possibility of queuing process operations is relatively small, allowing the process to complete each operation very quickly.

Option 4: Put all processes to be operated into a queue. Then put a dedicated service to complete the file operation. Each excluded process in the queue is equivalent to the first specific operation, so for the first time our service only needs to obtain the specific operation items from the queue. If there are a large number of file operation processes here, it doesn't matter. , just queue to the back of our queue. As long as you are willing to queue, it doesn’t matter how long the queue is.

For the previous options, each has its own benefits! It can be roughly divided into two categories:
(1) queuing is required (slow impact), such as options 1, 2, and 4
(2) queuing is not required. (Fast impact) Option 3
When designing a cache system, generally we will not use Option 3. Because the analysis program and the writing program of Plan 3 are not synchronized, when writing, the difficulty of analysis is not considered at all, as long as the writing is good. Just imagine, if we also use random file reading and writing when updating a cache, it seems that a lot of processes will be added when reading the cache. But options one and two are completely different. Although the writing time needs to wait (when acquiring the lock is unsuccessful, it will be acquired repeatedly), but reading the file is very convenient. The purpose of adding cache is to reduce data reading bottlenecks and thereby improve system performance.
The above is a summary of personal experience and some information. If there is anything wrong or something that has not been mentioned, colleagues are welcome to correct me.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/824873.htmlTechArticleFor applications where the daily IP is not high or the number of concurrency is not very large, generally there is no need to consider these! There is no problem at all with normal file manipulation methods. But if the concurrency is high, after we process the file...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn