Home >Backend Development >PHP Tutorial >How to solve the problem of concurrent reading and writing file conflicts in PHP_PHP Tutorial

How to solve the problem of concurrent reading and writing file conflicts in PHP_PHP Tutorial

WBOY
WBOYOriginal
2016-07-13 10:27:50852browse

For applications where the daily IP is not high or the number of concurrency is not very large, generally there is no need to consider these! There is no problem at all with normal file operation methods. But if the concurrency is high, when we read and write files, it is very likely that multiple processes will operate on the next file. If the access to the file is not exclusive at this time, it will easily cause data loss.

For example: In an online chat room (here it is assumed that the chat content is written to a file), at the same time, both user A and user B need to operate the data save file. First, A opens the file, and then updates the data in it, but Here B also happens to have the same file open and is also preparing to update the data inside. When A saves the written file, B has actually opened the file. But when B saves the file back, data loss has already been caused, because user B here has no idea that the file he opened when he changed it, user A also changed the file, so in the end user B When saving changes, User A's updates will be lost.

For such a problem, the general solution is that when one process operates on the file, it first locks the other process, which means that only this process has the right to read the file. If other processes read it now, it is There is no problem at all, but if a process tries to update it at this time, the operation will be rejected. If the process that previously locked the file completes the update operation of the file, it will release the exclusive identifier. The file is restored to a changeable state. Next, in the same way, if the file is not locked when the process is operating the file, then it can safely lock the file and enjoy it alone.

So the general plan would be:

$fp=fopen('/tmp/lock.txt','w+');

 if (flock($fp,LOCK_EX)){

fwrite($fp,"Write something here");

flock($fp,LOCK_UN);

 }else{

echo 'Couldn't lock the file !';

 }

 fclose($fp);

But in PHP, flock does not seem to work so well! In multi-concurrency situations, it seems that resources are often monopolized and not released immediately, or not released at all, causing a deadlock and causing a high CPU usage on the server. , and sometimes even cause the server to die completely. It seems that this happens in many linux/unix systems. Therefore, you must consider carefully before using flock.

So there is no solution? In fact, that is not the case. If flock() is used properly, it is entirely possible to solve the deadlock problem. Of course, if you do not consider using the flock() function, there will also be a good solution to our problem. After my personal collection and summary, the solutions are roughly summarized as follows.

Solution 1: Set a timeout when locking the file. The rough implementation is as follows:

 if($fp=fopen($fileName,'a')){

$startTime=microtime();

do{

$canWrite=flock($fp,LOCK_EX);

 if(!$canWrite){

usleep(round(rand(0,100)*1000));

 }

 }while((!$canWrite)&&((microtime()-$startTime)<1000));

 if($canWrite){

fwrite($fp,$dataToSave);

 }

 fclose($fp);

 }

The timeout is set to 1ms. If the lock is not obtained within this time, it will be obtained repeatedly until the right to operate the file is obtained, of course. If the timeout limit has been reached, you must exit immediately and give up the lock to allow other processes to operate.

Solution 2: Instead of using flock function, use temporary files to solve the problem of read and write conflicts. The general principle is as follows:

 (1) Put a copy of the file that needs to be updated into our temporary file directory, save the last modification time of the file to a variable, and give this temporary file a random file name that is not easy to repeat.

(2) After updating this temporary file, check whether the last update time of the original file is consistent with the previously saved time.

(3) If the last modification time is the same, rename the modified temporary file to the original file. In order to ensure that the file status is updated synchronously, the file status needs to be cleared.

 (4) However, if the last modification time is consistent with the previously saved one, it means that the original file has been modified during this period. At this time, the temporary file needs to be deleted and then return false, indicating that the file has been modified at this time. There are other processes in progress.

The approximate implementation code is as follows:

$dir_fileopen='tmp';

Function randomid(){

Return time().substr(md5(microtime()),0,rand(5,12));

 }

Function cfopen($filename,$mode){

global $dir_fileopen;

clearstatcache();

do{

$id=md5(randomid(rand(),TRUE));

$tempfilename=$dir_fileopen.'/'.$id.md5($filename);

 } while(file_exists($tempfilename));

 if(file_exists($filename)){

$newfile=false;

copy($filename,$tempfilename);

 }else{

$newfile=true;

 }

 $fp=fopen($tempfilename,$mode);

return $fp?array($fp,$filename,$id,@filemtime($filename)):false;

 }

Function cfwrite($fp,$string){

return fwrite($fp[0],$string);

 }

Function cfclose($fp,$debug='off'){

global $dir_fileopen;

$success=fclose($fp[0]);

clearstatcache();

$tempfilename=$dir_fileopen.'/'.$fp[2].md5($fp[1]);

 if((@filemtime($fp[1])==$fp[3])($fp[4]==true&&!file_exists($fp[1]))$fp[5]==true ){

rename($tempfilename,$fp[1]);

 }else{

unlink($tempfilename);

//Indicates that there are other processes operating the target file and the current process is rejected

$success=false;

 }

return $success;

 }

$fp=cfopen('lock.txt','a+');

cfwrite($fp,"welcome to beijing.n");

 fclose($fp,'on');

Regarding the functions used in the above code, it is necessary to explain:

 (1)rename(); Rename a file or a directory. This function is actually more like mv in Linux. It is convenient to update the path or name of a file or directory. But when I test the above code in window, if the new file name already exists, a notice will be given saying that the current file already exists. But it works fine under linux.

 (2) clearstatcache(); Clear the status of the file. PHP will cache all file attribute information to provide higher performance, but sometimes, when multiple processes delete or update files, PHP does not have time to update the cache. The file attributes in it can easily lead to the fact that the last update time accessed is not the real data. So here you need to use this function to clear the saved cache.

Option 3: Randomly read and write the operated files to reduce the possibility of concurrency.

When recording user access logs, this solution seems to be used more often. Previously, we needed to define a random space. The larger the space, the smaller the possibility of concurrency. Assuming that the random read and write space is [1-500], then the distribution of our log files ranges from log1 to log500. Every time a user accesses, data is randomly written to any file between log1~log500. At the same time, there are two processes recording logs. Process A may be the updated log32 file, but what about process B? Then the update at this time may be log399. You must know that if you want process B to also operate log32, the probability is basically The upper limit is 1/500, which is almost equal to zero. When we need to analyze access logs, we only need to merge these logs first and then analyze them. One benefit of using this solution to record logs is that the possibility of queuing process operations is relatively small, allowing the process to complete each operation very quickly.

Option 4: Put all processes to be operated into a queue. Then put a dedicated service to complete file operations. Each excluded process in the queue is equivalent to the first specific operation, so for the first time our service only needs to obtain the specific operation items from the queue. If there are a large number of file operation processes here, it doesn't matter. , just queue to the back of our queue. As long as you are willing to queue, it doesn’t matter how long the queue is.

For the previous solutions, each has its own advantages! It can be roughly divided into two categories:

(1) Need to queue (slow impact) such as options one, two and four

(2) No need to queue. (Fast impact) Option 3

When designing a caching system, generally we will not adopt option three. Because the analysis program and the writing program of Plan 3 are not synchronized, when writing, the difficulty of analysis is not considered at all, as long as the writing is good. Just imagine, if we also use random file reading and writing when updating a cache, it seems that a lot of processes will be added when reading the cache. But options one and two are completely different. Although the writing time needs to wait (when acquiring the lock is unsuccessful, it will be acquired repeatedly), but reading the file is very convenient. The purpose of adding cache is to reduce data reading bottlenecks and thereby improve system performance.

The above is a summary of personal experience and some information. If there is anything wrong or something that has not been mentioned, colleagues are welcome to correct me.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/814678.htmlTechArticleFor applications where the daily IP is not high or the number of concurrency is not very large, generally there is no need to consider these! Use ordinary files There is absolutely no problem with the method of operation. But if the concurrency is high, when we read the file...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn