Home >Backend Development >PHP Tutorial >Solution to Concurrent Reading and Writing File Conflicts in PHP_PHP Tutorial
For applications where the daily IP is not high or the number of concurrency is not very large, generally there is no need to consider these! There is no problem at all with normal file manipulation methods. But if the concurrency is high, when we read and write files, it is very likely that multiple processes will operate on the next file. If the access to the file is not exclusive at this time, it will easily cause data loss.
For example: In an online chat room (it is assumed that the chat content is written to a file), at the same time, both user A and user B need to operate the data save file. First, A opens the file, and then updates the data in it, but Here B also happens to have the same file open and is also preparing to update the data inside. When A saves the written file, B has actually opened the file. But when B saves the file back, data loss has already been caused, because user B here has no idea that the file he opened when he changed it, user A also changed the file, so in the end user B When saving changes, User A's updates will be lost.
For such a problem, the general solution is that when one process operates on the file, it first locks the other process, which means that only this process has the right to read the file. If other processes read it now, it is There is no problem at all, but if a process tries to update it at this time, the operation will be rejected. If the process that previously locked the file completes the update operation of the file, it will release the exclusive identifier. The file is restored to a changeable state. Next, in the same way, if the file is not locked when the process is operating the file, then it can safely lock the file and enjoy it alone.
The general solution is:
Option 2: Do not use the flock function and use temporary files to solve the problem of read and write conflicts. The general principle is as follows:
(1) Put a copy of the file that needs to be updated into our temporary file directory, save the last modification time of the file to a variable, and pick a random one for this temporary file, which is not easy Duplicate filename.
(2) After updating this temporary file, check whether the last update time of the original file is consistent with the previously saved time.
(3) If the last modification time is the same, rename the modified temporary file to the original file. In order to ensure that the file status is updated synchronously, the file status needs to be cleared.
(4) However, if the last modification time is consistent with the previously saved one, it means that the original file has been modified during this period. At this time, the temporary file needs to be deleted and then return false, indicating that the file has been modified at this time. There are other processes in progress.
The implementation code is as follows:
Option 3: Randomly read and write the operated files to reduce the possibility of concurrency.
This solution seems to be used more often when recording user access logs. Previously, we needed to define a random space. The larger the space, the smaller the possibility of concurrency. Assuming that the random read and write space is [1-500], then the distribution of our log files ranges from log1 to log500. Every time a user accesses, data is randomly written to any file between log1~log500. At the same time, there are two processes recording logs. Process A may be the updated log32 file, but what about process B? Then the update at this time may be log399. You must know that if you want process B to also operate log32, the probability is basically 1/500, which is almost equal to zero. When we need to analyze access logs, we only need to merge these logs first and then analyze them. One benefit of using this solution to record logs is that the possibility of queuing process operations is relatively small, allowing the process to complete each operation very quickly.
Option 4: Put all processes to be operated into a queue. Then put a dedicated service to complete the file operation. Each excluded process in the queue is equivalent to the first specific operation, so for the first time our service only needs to obtain the specific operation items from the queue. If there are a large number of file operation processes here, it doesn't matter. , just queue to the back of our queue. As long as you are willing to queue, it doesn’t matter how long the queue is.
For the previous options, each has its own benefits! It can be roughly divided into two categories:
(1) queuing is required (slow impact), such as options 1, 2, and 4
(2) queuing is not required. (Fast impact) Option 3
When designing a cache system, generally we will not use Option 3. Because the analysis program and the writing program of Plan 3 are not synchronized, when writing, the difficulty of analysis is not considered at all, as long as the writing is good. Just imagine, if we also use random file reading and writing when updating a cache, it seems that a lot of processes will be added when reading the cache. But options one and two are completely different. Although the writing time needs to wait (when acquiring the lock is unsuccessful, it will be acquired repeatedly), but reading the file is very convenient. The purpose of adding cache is to reduce data reading bottlenecks and thereby improve system performance.
The above is a summary of personal experience and some information. If there is anything wrong or something that has not been mentioned, colleagues are welcome to correct me.