Want to figure out how the Linux file system works?
Linux file system is a core component of the operating system, and its operating mechanism is what we programmers need to understand and master. The disk provides the most basic persistent storage for the system, and the file system provides the system with the disk. For the management of all files, everything is a file in Linux. Not only ordinary files and directories, but also block devices, sockets, pipes, etc., must be managed through a unified file system. Today we will talk together: How do disks and file systems work?
Index nodes and directory entries
In the Linux file system, the metadata of a file includes: directory entries, index nodes, and data blocks.
- Directory entry: referred to as dentry, used to record the name of the file, the index node pointer and the association with other directory entries. Multiple associated directory entries constitute the directory structure of the file system. A directory entry is a memory data structure maintained by the kernel, so it is often called the directory entry cache (Cache).
- Index node: referred to as inode, used to record the metadata of the file, including inode number, file size, access permissions, modification date, data location, number of links, etc. The index node information will be persisted to the disk for storage. Takes up disk space.
- Data block: referred to as block, where file data is stored. The smallest storage unit of a disk is called a sector. Each sector stores 512 bytes, which is equivalent to 0.5KB. When the operating system reads the hard disk, it will not read it sector by sector, which is too inefficient. , but reads multiple sectors continuously at one time, that is, reads one "block" at one time. This "block" consisting of multiple sectors is the smallest unit of file access. "Block" size, the most common is 4KB (eight sectors).
In order to speed up file access, index nodes are usually loaded into memory, and the hard disk is divided into three storage areas: super block, index node area and data block area when formatted.
- Super block is used to store detailed information of the file system, such as the number of blocks, block size, free blocks, etc.
- Index node area, used to store index nodes.
- Data block area, used to store file or directory data.
Virtual File System
The virtual file system (VFS, Virtual File System) of the Linux system is a key abstraction layer that provides users and applications with a consistent file system interface, allowing them to access various types of files in a unified manner system without worrying about the implementation details of the underlying file system.
Both the user program and the glibc library belong to the user space, and file operations are completed by calling functions of the system call layer (SCI). These functions are interfaces provided by the Linux kernel for users to request operations from the system. For example, the cat command in the system will call the open() function to open the file, then call the read() function to read the file content, and finally call the write() function to output the file content to the console. Common file system types can be divided into several broad categories.
- Based on local disk: EXT3, EXT4, XFS, OverlayFS, etc. The characteristic of this type of file system is that the data is directly stored in the disk mounted locally on the computer, with good performance and no network IO access consumption.
- Based on network file systems: NFS, CIFS/SMB, CephFS, GlusterFS, etc. The characteristic of this type of files is that they allow users to access and manage files through the network. Distribution, cross-platform, flexibility and scalability are their greatest advantages.
- Memory-based file systems: tmpfs, ramfs, /proc, etc. These memory-based file systems are usually used for specific purposes, such as temporary file storage, caching, fast data access, etc. They provide a high-performance solution for reading and writing files in memory, but they also need to pay attention to memory limitations and data volatility.
File I/O
We partition and format the disk in order to create different types of file systems. These file systems must be mounted to specific directories on the VFS of Linux before they can be used by the system. There are different I/O types for file read and write operations, and the application program chooses the appropriate method according to needs.
Buffered vs. Unbuffered I/O
- The so-called no buffering does not mean that the kernel does not provide buffering, but only simple system calls, not function library calls. The system kernel provides a block buffer for reading and writing to the disk. When using the write function to write data to it, the system call is directly called to write the data to the block buffer and queue it. When the block buffer reaches a certain amount, it will Data is written to disk. Therefore, the so-called unbuffered I/O means that the process does not provide buffering function. Each time the write or read function is called, it is called directly by the system. (Buffered by the kernel).
- Buffered I/O means that the process improves the input and output streams and provides a stream buffer. When using the write function to write data, the data is first written into the stream buffer. When certain conditions are reached, such as the stream buffer being full, the data will be sent to the block buffer provided by the kernel at once, and then written through the block buffer. into the disk. (double buffering)
- Therefore, buffered I/O will require fewer system calls than unbuffered I/O when writing the same amount of data to disk.
Direct I/O and indirect I/O
- Direct I/O: The application directly accesses the disk data without going through the kernel buffer. The purpose of this is to reduce the data copy from the kernel buffer to the user program cache.
- Indirect I/O: When files are read or written, they must first go through the system's page cache, and then written to the disk by the kernel or additional system calls.
- For direct I/O, if the accessed data is not in the application cache, then the data will be loaded directly from the disk each time, and the efficiency of this direct loading will be slower. However, for applications such as database management systems, they are more likely to choose their own caching mechanism, because database management systems often know the data stored in the database better than the operating system, and direct I/O is more appropriate.
Blocking I/O and non-blocking I/O
- Blocking I/O: The application process blocks when calling an I/O operation. It only returns when the data to be operated is ready and copied to the buffer of the application process. The characteristics are: low implementation difficulty, easy application development, and suitable for network application development with small concurrency.
- Non-blocking I/O: means that after the application performs an I/O operation, it will not block the current thread and can continue to perform other tasks, and then obtain the result of the call through polling or event notification. The characteristics are: relatively complex. Suitable for network application development that has a small amount of concurrency and does not require timely response
Synchronous and asynchronous I/O
- Synchronous I/O: means that after the application performs an I/O operation, it must wait until the entire I/O is completed before it can obtain the I/O response.
- Asynchronous I/O: means that after the application performs the I/O operation, it does not need to wait for completion and the response after completion, but can continue to execute. After this I/O is completed, the response will be notified to the application in the form of event notification.
Some common knowledge about files
There is still a lot of remaining space on the disk, and there is insufficient space for new files and directories.
- Troubleshooting ideas: There is a high probability that there are too many small files and the inodes are used up. You can use df -i.
The hard disk usage statistics between du and df are inconsistent.
- du counts the size of each file recorded by the file system, and then accumulates the total size, which is obtained through the file system. df mainly reads hard disk usage information from the superblock (superblock). What df obtains is the usage of disk blocks. This situation is most likely caused by a file being deleted, but another process is using it (possessing the handle), which can be found through lsof | grep deleted. When the process stops or is killed, these spaces will be released.
When we query the disk capacity, why is the size of Used Avail always smaller than the total capacity (SIze).
- In order to prevent emergencies, the Linux ext file system will reserve some hard disk space. The specific reserved value can be viewed through tune2fs -l [dev_name] | grep "Reserved block count", (dev_name) is the device name, here The reserved space will be calculated by df into the used space, resulting in inconsistent statistics between df and du. If you need to adjust the reserved space size, we can use tune2fs -m [size] [dev_name] to make adjustments.
The above is the detailed content of Want to figure out how the Linux file system works?. For more information, please follow other related articles on the PHP Chinese website!

Do you have trouble downloading or sending attachments in Outlook 365? Sometimes, Outlook doesn’t show them for some unknown reason, so you are unable to see them. In this post on php.cn Website, we collect some use tips for attachments not showing.

When V Rising players try to join a server that is close to or already full, they may encounter the “V Rising connection timed out” issue. If you are one of them, you can refer to this post from php.cn to get solutions. Now, keep on your reading.

Windows supplies real-time protection via Windows Security. But this feature may prevent you from doing something it thinks are dangerous. In this situation, you may want to temporarily turn on real-time protection. This php.cn post will show you how

Microsoft has started working on next year’s Windows updates very early. Recent rumors state that the next update in 2024 might be Windows 11 24H2 rather than Windows 12. Everything is uncertain now. php.cn will now take you to see some related infor

The error 0x80030001 often happens when you are attempting to copy files. The error code will be accompanied by a message that tells “unable to perform requested operation”. If you are struggling with this error, you can read this article on php.cn W

On February 13, 2024, Microsoft released KB5034765 (OS builds 22621.3155 and 22631.3155) for Windows 11 22H2 and Windows 11 23H2. This security update brings you many new improvements and bug fixes. You can learn how to download and install Windows 1

Device Manager is widely used when you need to fix some computer issues. You can check the problematic devices and decide to uninstall or update device drivers. Besides, you can also set Power Management settings in Device Manager. However, you may f

When Backup and Restore (Windows Backup) fails to work, you can choose to reset it to default. How to restore Windows Backup to default in Windows 11/10? php.cn will guide you to easily do this thing in 2 ways and let’s go to see them.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

WebStorm Mac version
Useful JavaScript development tools

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.