Home >Backend Development >Python Tutorial >Detailed explanation of key points to improve efficiency when storing large amounts of small files

Detailed explanation of key points to improve efficiency when storing large amounts of small files

高洛峰
高洛峰Original
2016-10-18 14:37:031454browse

In WEB development, we often encounter the situation of writing files, and the most common one is to save image files. If the number of files is not large, then there is no need to worry about its efficiency. But when you have a large number of users and a large number of pictures, how we store picture files will directly affect the efficiency of the entire picture storage system.

Usually, there is a saying that if there are 10,000 sub-files in a directory, the speed of reading a certain file will decrease significantly. So is this statement correct or not? Let’s take a look:

Question: Why do too many sub-files in a single directory affect performance? For example, if there are 10,000 sub-files in a directory, the speed of reading a certain file will be significantly slower? This is related to the file index. What? How to organize these nodes in the index?

Answer: Yes, it is related to the index. 10,000 is not too much. You can tell from millions of them. But it is recommended not to exceed 10,000.

Question: Millions of slow files are related to the file system as a whole, so how does it relate to the current directory? It’s nothing if a similar file system supports millions of files

Answer: I mean one directory, no The molecular directory directly contains hundreds of thousands or millions of files. At this time, retrieving the directory index is very resource intensive.

 The limited number of supports is because the size of the directory object itself is limited. The directory is a container that holds the file name and the inode number corresponding to the file. If it is limited, then the entries it can accommodate are also limited.

 The speed of reading a certain file has no impact. But finding it is difficult. The indexing mechanism of some file systems is imperfect and does not even have any optimization algorithms, causing each search to take more time.


We can know from the above question and answer that "if there are 10,000 sub-files in a directory, the speed of reading a certain file will decrease significantly." is correct. How to divide directories?


In fact, it is relatively simple. You can divide it by month, hash point, or time plus hash combination. As for which method to use, it depends on your project needs. . .


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn