Home >System Tutorial >LINUX >24 hours to learn about the Linux kernel and issues related to Linux file system implementation

24 hours to learn about the Linux kernel and issues related to Linux file system implementation

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBforward: 2024-02-05 16:00:031002browse

The use of Linux is closely related to the programming of user space programs and the file system. You may already be familiar with the concept of file system, so I won't explain it too much. After all, as long as you can understand these concepts, those who want to learn more can always get more information through search engines such as Baidu. Now I will focus on Linux's virtual file system.

Virtual file system is one of the important features of Linux, which supports a variety of different file systems. The structure of the file system is shown in the figure below: [See original text for picture] 24 hours to learn about the Linux kernel and issues related to Linux file system implementation

The VFS (Virtual File System) in the above figure relies on data structures to save its general representation of a file system. The data structures are listed as follows:

Super block structure: stores information related to the installed file system;
Index node structure: stores information about files;
File structure: stores information related to files opened by the process;
Directory entry structure: stores information about the path name and the file pointed to by the path name.

The Linux kernel uses global variables to save the pointers to the structures mentioned earlier. All structures are saved in doubly linked lists. The kernel saves the pointer to the head of the linked list and uses it as the access point of the linked list. These structures use A field of list_head type, use it to point to the previous element in the linked list. The following table is the global variables saved by the kernel and the types of linked lists pointed to by these variables (global variables related to VFS)

Global variables	structure type
super_blocks	super_block
file_systems	file_systems_type
dentry_unused	dentry
vfsmntlist	vfsmount
inode_in_use	inode
inode_unused	inode

Super_block, file_system_type, dentry, and vfsmoubt structures are all stored in their own linked lists. The index nodes can find themselves on the global inode_in_use or inode_unused, or on their corresponding super-fast local linked lists. Find yourself.

In addition to the main VFS structure, there are several other structures that interact with VFS, fs_struct and files_struct, namespace, fd_set. The following figure shows how process descriptors are associated with file-related structures.

24 hours to learn about the Linux kernel and issues related to Linux file system implementation

Let’s first introduce the fs_struct structure. The fs_struct structure can be referenced by multiple process descriptors. The following code can be found in include/Linux/fs_struct.h. If you don’t understand the code well, please give me some advice.

struct fs_struct{
    atomic_t count;  //保存引用特定fs_struct的进程描述符数目
    rwlock_t lock;
    int umask;  //保存一个掩码，表示将要在打开文件上设置的许可权
    struct dentry * root, *pwd ,*altroot;  //都是指针，，，，
    struct vfsmount * rootmnt, *pwdmnt,  *altrootmnt;  //指针，
};

files_struct contains information about open files and their descriptors, and it uses these collections to group its descriptors. The following code can be viewed in include/linux/file.h

struct files_struct{
    atomic_t count;  //与fs_struct类似
    spinlock_t file_lock;
    int max_fds;  //表示进程能够打开的文件的最大数
    int max_fdset;  //表示描述符的最大数
    int next_fd;  //保存下一个将要分配的文件描述符的值
    struct file ** fd;  //fd数组指向打开的文件对象的数组
    fd_set *close_on_exec; //是指向文件描述符集的一个指针，这些文件描述符在exec()时候就被标志位将要关闭，如果在exec()时候被标志位“打开”的文件描述符数超过close_on_exec_init域的大小，则改变close_on_exec域的值；
    fd_set *open_fds; //是一个指针，指向被标记为“打开”的文件描述符集合，
    fd_set close_on_exec_init;  //保存一个位域，表示打开文件对应的文件描述符
    fd_set open_fds_init;    //这些都是fd_set类型的域，其实都不懂，，，
    struct file *fd_array[NR_OPEN_DEFAULT];//fd_array数组指针指向前32个打开的文件描述法
};

Initialize the fs_struct structure through the INIT_FILES macro:

#define INIT_FILES \
{
    .count = ATOMIC_INIT(1),
    .file_lock = SPIN_LOCK_UNLOCKED,
    .max_fds = NR_OPEN_DEFAULT,
    .max_fdset = __FD_SETSIZE,
    .next_fd = 0,
    .fd = &init_files.fd_array[0];
    .close_on_exec = &init_files.close_on_exec_init,
    .open_fds = &init_files.open_fds_init,
    .close_on_exec_init = {{0, }},
    .open_fda_init = {{0, }},
    .fd_array = {NULL, }
}

The global definition of NR_OPEN_DEFAULT is set to BITS_PER_LONG. BITS_PER_LONG is 32 in 32-bit systems and 64 in 64-bit systems.

Let's introduce page buffering. Let's now see how it works and is implemented. In Linux, memory is divided into partitions. Each has a linked list of active pages and an inactive linked list. When the page is inactive, it will be written back to the disk. The following figure illustrates the above relationship:

image-20240202221039708

The core of page buffering is the address_space object, and its code can be viewed in include/linux/fs.h (I don’t understand this code very well, please give me some advice):

struct address_space{    
    struct inode *host;
    struct radix_tree_root page_tree;
    spinlock_t tree_lock;
    unsigned long nrpages;
    pgoff_t writeback;
    struct address_space_operations *a_ops;
    struct prio_tree_root i_map;
    unsigned inr i_map_lock;
    struct list_head i_mmap_nonlinear;
    spinlock_t i_mmap_lock;
    atomic_t truncate_count;
    unsigned long flags;
    struct backing_dev_info *backing_dev_info;
    spinlock_t private_lock;
    struct list_head private_list;
    struct address_space *assoc_mapping;
};

The Linux kernel also represents each sector on the block device as a buffer_head structure. The physical area used by the buffer_head structure is the logical block b_blocknr of the device b_dev. The referenced physical memory starts from b_data with a block size of b_size bytes. Memory data block. This memory block is in the physical page b_page. Its structure is as follows:

Finally, let’s talk about the VFS system call and file system layer, and track their execution to the kernel level. We must first understand four functions: open(), close(), read(), write() .

open() function:

open function is used to open and create files. The following is a brief description of the open function

#include 
int open(const char *pathname, int oflag, ... );

Return value: Return the file descriptor if successful, otherwise return -1

For the open function, the third parameter (...) is only used when creating a new file and is used to specify the access permission bits of the file. pathname is the pathname of the file to be opened/created (such as C:/cpp/a.cpp); oflag is used to specify the opening/creation mode of the file. This parameter can be composed of the following constants (defined in fcntl.h) through logical OR.

O_RDONLY read-only mode
O_WRONLY Write-only mode
O_RDWR read and write mode
When opening/creating a file, at least one of the above three constants must be used. The following constants are optional:
O_APPEND Each write operation is written to the end of the file
O_CREAT If the specified file does not exist, create this file
O_EXCL If the file to be created already exists, return -1 and modify the value of errno
O_TRUNC If the file exists and is opened in write-only/read-write mode, clear the entire contents of the file
O_NOCTTY If the pathname points to a terminal device, do not use this device as a controlling terminal.
O_NONBLOCK If the path name points to a FIFO/block file/character file, set the file opening and subsequent I/O to nonblocking mode (nonblocking mode)
The following three constants are also optional, they are used to synchronize input and output
O_DSYNC waits for physical I/O to complete before writing. Without affecting the reading of newly written data, do not wait for file attribute updates.
O_RSYNC read waits for all write operations to the same area to complete before proceeding
O_SYNC waits for the completion of physical I/O before writing, including I/O that updates file attributes

The file descriptor returned by open must be the smallest unused descriptor.

If NAME_MAX (maximum length of file name, excluding '\0') is 14, and we want to create a file with a file name longer than 14 bytes in the current directory, early System V systems (such as SVR2) will truncate Beyond that, only the first 14 bytes are retained; BSD-derived systems will return an error message and set errno to ENAMETOOLONG.

POSIX.1 引入常量 _POSIX_NO_TRUNC 用于决定是否截断长文件名/长路径名。如果_POSIX_NO_TRUNC 设定为禁止截断，并且路径名长度超过 PATH_MAX（包括 ‘\0’），或者组成路径名的任意文件名长度超过 NAME_MAX，则返回错误信息，并且把 errno 置为 ENAMETOOLONG。

close()函数

进程使用完文件后，发出close()系统调用：

sysopsis

#include 
int close(int fd);

参数：fd文件描述符

函数返回值：0成功，-1出错

参数fd是要关闭的文件描述符。需要说明的是：当一个进程终止时，内核对该进程所有尚未关闭的文件描述符调用close关闭，所以即使用户程序不调用close，在终止时内核也会自动关闭它打开的所有文件。但是对于一个长年累月运行的程序（比如网络服务器），打开的文件描述符一定要记得关闭，否则随着打开的文件越来越多，会占用大量文件描述符和系统资源。

read()函数

当用户级别程序调用read()函数时，Linux把它转换成系统调sys_read()：

功能描述：从文件读取数据。
所需头文件： #include

函数原型：ssize_t read(int fd, void *buf, size_t count);

参数：

fd：将要读取数据的文件描述词。
buf：指缓冲区，即读取的数据会被放到这个缓冲区中去。
count：表示调用一次read操作，应该读多少数量的字符。
返回值：返回所读取的字节数；0（读到EOF）；-1（出错）。
以下几种情况会导致读取到的字节数小于 count ：
读取普通文件时，读到文件末尾还不够 count 字节。例：如果文件只有 30 字节，而我们想读取 100，字节，那么实际读到的只有 30 字节，函数返回 30 。此时再使用 read 函数作用于这个文件会导致 read 返回 0
从终端设备（terminal device）读取时，一般情况下每次只能读取一行。
从网络读取时，网络缓存可能导致读取的字节数小于 count字节。
读取 pipe 或者 FIFO 时，pipe 或 FIFO 里的字节数可能小于 count 。
从面向记录（record-oriented）的设备读取时，某些面向记录的设备（如磁带）每次最多只能返回一个记录。
在读取了部分数据时被信号中断，读操作始于 cfo 。在成功返回之前，cfo 增加，增量为实际读取到的字节数。

例程如下(程序是网上找的例子，贴下来以以供大家理解一下):：

#include 
#include 
#include 
#include 
#include 
#include 
int main(void)
{
    void* buf ;
    int handle;
    int bytes ;
    buf=malloc(10);
    /*
    LooksforafileinthecurrentdirectorynamedTEST.$$$andattempts
    toread10bytesfromit.Tousethisexampleyoushouldcreatethe
    fileTEST.$$$
    */
    handle=open("TEST.$$$",O_RDONLY|O_BINARY,S_IWRITE|S_IREAD);
    if(handle==-1)
    {
        printf("ErrorOpeningFile\n");
        exit(1);
    }
    bytes=read(handle,buf,10);
    if(bytes==-1)
    {
        printf("ReadFailed.\n");
        exit(1);
    }
    else 
    {
        printf("Read:%dbytesread.\n",bytes);
    }
    return0 ;
}

write()函数

功能描述：向文件写入数据。
所需头文件： #include

函数原型：ssize_t write(int fd, void *buf, size_t count);

返回值：写入文件的字节数（成功）；-1（出错）

功能：write 函数向 filedes 中写入 count 字节数据，数据来源为 buf 。返回值一般总是等于 count，否则就是出错了。常见的出错原因是磁盘空间满了或者超过了文件大小限制。对于普通文件，写操作始于 cfo 。如果打开文件时使用了 O_APPEND，则每次写操作都将数据写入文件末尾。成功写入后，cfo 增加，增量为实际写入的字节数。

例程如下(程序是网上找的例子，贴下来以以供大家理解一下):

#include 
#include 
#include 
#include 
#include 
#include 
int main(void)
{
int *handle; char string[40];
int length, res;/* Create a file named "TEST.$$$" in the current directory and write a string to it. If "TEST.$$$" already exists, it will be overwritten. */
if ((handle = open("TEST.$$$", O_WRONLY | O_CREAT | O_TRUNC, S_IREAD | S_IWRITE)) == -1)
{
printf("Error opening file.\n");
exit(1);
}
strcpy(string, "Hello, world!\n");
length = strlen(string);
if ((res = write(handle, string, length)) != length)
{
printf("Error writing to the file.\n");
exit(1);
}
printf("Wrote %d bytes to the file.\n", res);
close(handle); return 0; }

小结

今天看的代码不多，差不多都是网上找的代码，有些解释也是查阅资料写上去的，有些还是不懂，希望各路大神指教，这里我总结了有关Linux文件系统实现的问题，但是具体的细节方面并没有提及到，大家看了之后应该只能有一个大致的最Linux文件系统的了解，有读者问我看的是哪些书，这里我说明一下，看了Linux内核编程，还有深入理解Linux内核以及网上各种资料或者其他大牛写的好的博客。这里我是总结了一下，并且把自己不懂的还有觉得重要的说了一下，希望各位大神给些建议，thanks~

The above is the detailed content of 24 hours to learn about the Linux kernel and issues related to Linux file system implementation. For more information, please follow other related articles on the PHP Chinese website!

EOF 常量 count include 全局变量结构体 errno int void 指针数据结构 Namespace 对象 linux 搜索引擎 Access

Statement：

This article is reproduced at:lxlinux.net. If there is any infringement, please contact admin@php.cn delete

Previous article：Teach you step by step how to build linux rootfsNext article：Teach you step by step how to build linux rootfs

See more

24 hours to learn about the Linux kernel and issues related to Linux file system implementation

Related articles