Home >System Tutorial >LINUX >Linux file I/O: principles and methods

Linux file I/O: principles and methods

PHPz
PHPzforward
2024-02-09 18:27:27965browse

Files are the most basic and commonly used data storage method in Linux systems. They can be text files, binary files, device files, directory files, etc. Reading and writing files is one of the most important operations in Linux programming. It involves concepts such as file descriptors, buffers, system calls, and library functions. In this article, we will introduce the basic principles and methods of Linux file I/O, including opening, closing, reading, writing, positioning, truncation, synchronization and other operations, and give examples to illustrate their usage and precautions.

Linux file I/O: principles and methods

File Descriptor

a small, nonnegative integer for use in subsequent system calls (read(2), write(2), lseek(2), fcntl(2), etc.) ($man 2 open). When a program starts running Generally there will be 3 open file descriptors:

  • 0: STDIN_FIFLENO, standard input stdin
  • 1: STDOUT_FILENO, standard output stdout
  • 2: STDERR_FILENO, standard error stderror

fd principle

  • fd starts from 0, looks for the smallest unused descriptor, and establishes a corresponding relationship between the file table pointer and the file table descriptor (VS pid keeps rising, and comes back when it is full)
  • The file descriptor is an int, used to represent an open file, but the management information of the file cannot be stored in the file descriptor. When using the open() function to open a file, the OS will load the relevant information of the file into However, due to factors such as security and efficiency, data structures such as file tables are not suitable for direct operation. Instead, a number is assigned to the structure and the number is used for operations. This number is the file descriptor.
  • The OS will maintain a file descriptor master table internally for each process. When there is a need for a new file descriptor, it will search for the smallest unused descriptor in the master table and return it. Although the file descriptor is of type int, But it is actually a non-negative integer, that is, 0~OPEN_MAX (1024 in the current system), of which 0, 1, and 2 have been occupied by the system, representing stdin, stdout, and stderror respectively
  • When using close() to close an fd, the correspondence between the fd and the file table structure is removed from the total table, but the file table structure will not necessarily be deleted, only when the file table does not correspond to any other fd (that is, A file table can correspond to multiple fds at the same time) before deleting the file table. close() will not change the integer value of the file descriptor itself, it will only make the file descriptor unable to represent a file.
  • duplicate fdVS copy fd:dup copies the file table pointer corresponding to old_fd to new_fd, instead of int new_fd=old_fd
  • UNIX uses three data structures to describe open files: the file descriptor table used to describe the file opened by the current process in each process, the file status identification table that represents the current file status, And the V node table used to find the file i node (index node). This Vnode structure is not used in Linux. Instead, it is a general inode structure, but there is no essential difference. The inode is in The file location imported from disk via the file system when reading the file
    Linux file I/O: principles and methods
Linux file I/O: principles and methods

File Descriptor Flag

The current system has only one file descriptor flagclose-on-exec, which is just a flag. When the process forks a child process, it is used when the exec function is called in the child process. the sign. The meaning is whether to close this file descriptor before executing exec.

  • Generally, we will call exec to execute another program. At this time, the text, data, heap and stack of the child process will be replaced with a new program. At this time, of course, the variable that holds the file descriptor no longer exists, and we cannot close the useless file descriptor. So usually we fork the child process and directly execute close in the child process to turn off useless file descriptors, and then execute exec. However, in complex systems, sometimes we no longer know how many file descriptors (including socket handles, etc.) are opened when we fork the child process. At this time, it is really difficult to clean up one by one. What we expect is to specify it when opening a file handle before forking the child process: I will close this handle when I execute exec after the fork child process. So there is close-on-exec
  • Every file descriptor has a close-on-exec flag. By system default, the last bit of this flag is set to 0. This flag is turned off. Then when the child process calls the exec function, the child process will not close the file descriptor. At this time, the parent and child processes will share the file. They have the same file table entry and the same file offset.
  • The FD_CLOEXEC of fcntl() and the O_CLOEXEC of open() are used to set the file’s close-on-exec, when the close-on-exec flag is set to 1, this flag is turned on. At this time, before the child process calls the exec function, the system has already asked the child process to close the file descriptor.

Note: Although the new version supports setting CLOEXEC when opening, an error will still be prompted during compilation – error: ‘O_CLOEXEC’ undeclared (first use in this function). This function needs to be turned on by setting the macro (_GNU_SOURCE).

#define _GNU_SOURCE //在源代码中加入   

-D_GNU_SOURCE   //在编译参数中加入  

File Status Flag

File status flags are used to represent the attributes of an open file. The file status flag can share the status of the same open file by duplicating a file descriptor, but the file descrptor flag cannot.

  • Access Modes: Specifies the access mode of the file: read-only, write-only, read-write. Set by open() and returned by fcntl(), but cannot be changed
  • Open-time Flags: Indicates the operation when open() is executed. This flag will not be saved after open() is executed.
  • Operating Modes: Affects read and write operations, set through open(), but can be read or changed with fcntl()

open()

//给定一个文件路径名,按照相应的选项打开文件,就是将一个fd和文件连接到一起,成功返回文件描述符,失败返

回-1设errno
#include
int open(const char *pathname, int flags)
int open(const char *pathname, int flags, mode_t mode)//不是函数重载,C中没有重载, 是可变长参数列

表

//pathname:文件或设备路径
//flags :file status flags=Access mode+Open-time flags+Operating Modes、
/*Access Mode(必选一个):
O_RDONLY:0
O_WRONLY:1
O_RDWR:2
*/
/*Open-time Flags(Bitwise Or):
O_CLOEXEC   :为新打开的文件描述符使能close-on-exec。可以避免程序再用fcntl()的F_SETFD来设置

FD_CLOEXEC
O_CREAT     :如果文件不存在就创建文件,并返回它的文件描述符,如果文件存在就忽略这个选项,必须在保护模式

下使用,eg:0664
O_DIRECTORY :如果opendir()在一个FIFO或tape中调用的话,这个选项可以避免denial-of-service问题,  如

果路径指向的不是一个目录,就会打开失败。
O_EXCL      :确保open()能够穿件一个文件,如果文件已经存在,则会导致打开失败,总是和O_CREAT一同使用。
O_NOCTTY    :如果路径指向一个终端设备,那么这个设备不会成为这个进程的控制终端,即使这个进程没有一个控制

终端
O_NOFOLLOW  :如果路径是一个符号链接,就打开它链接的文

件//If pathname is a symbolic link, then the open fails.

O_TMPFILE   :创建一个无名的临时文件,文件系统中会创建一个无名的inode,当最后一个文件描述符被关闭的时

候,所有写入这个文件的内容都会丢失,除非在此之前给了它一个名字
O_TRUNC     :清空文件
O_TTY_INIT
*/
/*Operating Modes(Bitwise Or)
O_APPEND    :以追加的方式打开文件, 默认写入结尾,在当下的Unix/Linux系统中,这个选项已经被定义为一个原

子操作  
O_ASYNC     :使能signal-driven I/O

O_DIRECT    :试图最小化来自I/O和这个文件的

cache effect//Try to minimize cache effects of the I/O to and from this  file.
O_DSYNC     :每次写操作都会等待I/O操作的完成,但如果文件属性的更新不影响读取刚刚写入的数据的话,就不会

等待文件属性的更新    。
O_LARGEFILE :允许打开一个大小超过off_t(但没超过off64_t)表示范围的文件
O_NOATIME   :不更改文件的st_time(last access time)
O_NONBLOCK /O_NDELAY :如果可能的话,用nonblock模式打开文件
O_SYNC      :每次写操作都会等待I/O操作的完成,包括write()引起的文件属性的更新。
O_PATH      :获得一个能表示文件在文件系统中位置的文件描述符
#include
#include
int fd=open("b.txt",O_RDWR|O_CREAT|O_EXCL,0664);
if(-1==fd)
    perror("open"),exit(-1);

FQ:Why Bitwise ORed:
FA: Guess the following model: use a string of strings where one bit is 1 and the rest are all 0 to represent an option. By doing "bitwise AND" on the options, you can get the 0/1 string, which represents the status of the entire flags. Note: The lower three bits represent Access Mode

creat()

Equivalent to calling open() with the flag O_WRONLY |O_TRUNC|O_CREAT

#include
int creat(const char *pathname, mode_t mode);

dup()、dup2()、dup3()

//复制一个文件描述符的指向,新的文件描述符的flags和原来的一样,成功返回new_file_descriptor, 失败返

回-1并设errno
#include 
int dup(int oldfd);             //使用未被占用的最小的文件描述符编号作为新的文件描述符

int dup2(int oldfd, int newfd);
#include       
#include 
int dup3(int oldfd, int newfd, int flags);
#include
#include
int res=dup2(fd,fd2);
if(-1==res){
        perror("dup2"),exit(-1);
Linux file I/O: principles and methods

read()

//从fd对应的文件中读count个byte的数据到以buf开头的缓冲区中,成功返回成功读取到的byte的数目,失败返回-1设errno
#include 
ssize_t read(int fd, void *buf, size_t count);
#include 
#include
int res=read(fd,buf,6);
if(-1==fd)
    perror("read"),exit(-1);

write()

//从buf指向的缓冲区中读取count个byte的数据写入到fd对应的文件中,成功返回成功写入的byte数目,文件的位置指针会向前移动这个数目,失败返回-1设errno
#include 
ssize_t write(int fd, const void *buf, size_t count);//不需要对buf操作, 所以有const, VS read()没有const
#include 
#include
int res=write(fd,"hello",sizeof("hello"));
if(-1==res)
    perror("write"),exit(-1);

Note: 上例中即使只有一个字符’A’,也要写”A”,因为”A”才是地址,’A’只是个int

lseek()

l 表示long int, 历史原因

//根据移动基准whence和移动距离offset对文件的位置指针进行重新定位,返回移动后的位置指针与文件开头的距离,失败返回-1设errno
#include 
#include 
off_t lseek(int fd, off_t offset, int whence);
/*whence:
SEEK_SET:以文件开头为基准进行偏移,0一般不能向前偏
SEEK_CUR:以当前位置指针的位置为基准进行偏移,1向前向后均可
SEEK_END:以文件的结尾为基准进行偏移,2向前向后均可向后形成”文件空洞”
#include
#include
int len=lseek(fd,-3,SEEK_SET);
if(-1==len){
        perror("lseek"),exit(-1);

fcntl()

//对fd进行各种操作,成功返回0,失败返回-1设errno
#include 
#include 
int fcntl(int fd, int cmd, ... );       //...表示可变长参数
/*cmd:
Adversory record locking:
F_SETLK(struct flock*)  //设建议锁
F_SETLKW(struct flock*) //设建议锁,如果文件上有冲突的锁,且在等待的时候捕获了一个信号,则调用被打断并在信号捕获之后立即返回一个错误,如果等待期间没有信号,则一直等待 
F_GETLK(struct flock*)  //尝试放锁,如果能放锁,则不会放锁,而是返回一个含有F_UNLCK而其他不变的l_type类型,如果不能放锁,那么fcntl()会将新类型的锁加在文件上,并把当前PID留在锁上
Duplicating a file descriptor:
F_DUPFD (int)       //找到>=arg的最小的可以使用的文件描述符,并把这个文件描述符用作fd的一个副本
F_DUPFD_CLOEXEC(int)//和F_DUPFD一样,除了会在新的文件描述符上设置close-on-exec
F_GETFD (void)      //读取fd的flag,忽略arg的值
F_SETFD (int)       //将fd的flags设置成arg的值.
F_GETFL (void)      //读取fd的Access Mode和其他的file status flags; 忽略arg
F_SETFL (long)      //设置file status flags为arg
F_GETOWN(void)      //返回fd上接受SIGIO和SIGURG的PID或进程组ID
F_SETOWN(int)       //设置fd上接受SIGIO和SIGURG的PID或进程组ID为arg
F_GETOWN_EX(struct f_owner_ex*) //返回当前文件被之前的F_SETOWN_EX操作定义的文件描述符R
F_SETOWN_EX(struct f_owner_ex*) //和F_SETOWN类似,允许调用程序将fd的I/O信号处理权限直接交给一个线程,进程或进程组
F_GETSIG(void)      //当文件的输入输出可用时返回一个信号
F_SETSIG(int)       //当文件的输入输出可用时发送arg指定的信号
*/

/*…:    
可选参素,是否需要得看cmd,如果是加锁,这里应是struct flock*
struct flock {
    short l_type;   //%d Type of lock: F_RDLCK(读锁), F_WRLCK(写锁), F_UNLCK(解锁)
    short l_whence; //%d How to interpret l_start, 加锁的位置参考标准:SEEK_SET, SEEK_CUR, SEEK_END
    off_t l_start;  //%ld Starting offset for lock,     加锁的起始位置
    off_t l_len;    //%ld Number of bytes to lock , 锁定的字节数
    pid_t l_pid;    // PID of process blocking our lock, (F_GETLK only)加锁的进程号,,默认给-1
};
*/

建议锁(Adversory Lock)

限制加锁,但不限制读写, 所以只对加锁成功才读写的程序有效,用来解决不同的进程 同时同一个文件同一个位置 “写”导致的冲突问题
读锁是一把共享锁(S锁):共享锁+共享锁+共享锁+共享锁+共享锁+共享锁
写锁是一把排他锁(X锁):永远孤苦伶仃

释放锁的方法(逐级提高):

  • 将锁的类型改为:F_UNLCK, 再使用fcntl()函数重新设置
  • close()关闭fd时, 调用进程在该fd上加的所有锁都会自动释放
  • 进程结束时会自动释放所有该进程加过的文件锁

Q:为什么加了写锁还能gedit或vim写???

A:可以写, 锁只可以控制能否加锁成功, 不能控制对文件的读写, 所以叫”建议”锁, 我加了锁就是不想让你写, 你非要写我也没办法. vim/gedit不通过能否加锁成功来决定是否读写, 所以可以直接上

Q: So如何实现文件锁控制文件的读写操作????

A:可以在读操作前尝试加读锁, 写操作前尝试加写锁, 根据能否加锁成功决定能否进行读写操作

int fd=open("./a.txt",O_RDWR);                  //得到fd
if(-1==fd)
    perror("open"),exit(-1);
struct flock lock={F_RDLCK,SEEK_SET,2,5,-1};    //设置锁   //此处从第3个byte开始(包含第三)锁5byte
int res=fcntl(fd,F_SETLK,&lock);                //给fd加锁
if(-1==res)
    perror("fcntl"),exit(-1);

ioct1()

这个函数可以实现其他文件操作函数所没有的功能,大多数情况下都用在设备驱动程序里,每个设备驱动程序可以定义自己专用的一组ioctl命令,系统则为不同种类的设备提供通用的ioctl命令

//操作特殊文件的设备参数,成功返回0,失败返回-1设errno
#include 
int ioctl(int d, int request, ...);
//d:an open file descriptor.
//request: a device-dependent  request  code

close()

//关闭fd,这样这个fd就可以重新用于连接其他文件,成功返回0,失败返回-1设errno
#include 
int close(int fd);
#include 
#include
int res=close(fd);
if(-1==res)
        perror("close"),exit(-1);

通过本文,我们了解了Linux文件I/O的基本原理和方法,它们可以满足我们对文件的各种操作需求。我们应该根据实际需求选择合适的方法,并遵循一些基本原则,如关闭不用的文件描述符,检查错误返回值,使用合适的缓冲区大小等。文件I/O是Linux程序设计中不可或缺的一部分,它可以实现数据的持久化和交换,也可以提升程序的功能和性能。希望本文能够对你有所帮助和启发。

The above is the detailed content of Linux file I/O: principles and methods. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:lxlinux.net. If there is any infringement, please contact admin@php.cn delete