Home >System Tutorial >LINUX >System calls in Linux are not legal entries into the kernel
In Linux, system calls are the only means for user space to access the kernel. They are the only legal entrance to the kernel. In fact, other methods such as device files and /proc are ultimately performed through system calls.
Normally, applications are programmed through application programming sockets (APIs) rather than directly through system calls, and these programming sockets do not actually need to correspond to the system calls provided by the kernel. An API defines a set of programming sockets used by applications. They can be implemented as one system call or by calling multiple system calls. There is no problem even if no system calls are used. In fact, APIs can be implemented on a variety of different operating systems, providing exactly the same sockets to applications, but their implementation on such systems may be very different.
In the Unix world, the most popular application programming sockets are based on the POSIX standard, and Linux is POSIX compatible.
From a programmer's point of view, they only need to deal with the API, and the kernel only deals with system calls; how library functions and applications use system calls is not the kernel's concern.
System calls (often called syscalls in Linux) are generally called through functions. They generally require the definition of one or several parameters (inputs) and may cause some side effects. This side effect is represented by a long return value indicating success (0 value) or error (negative value). When an error occurs in a system call, the error code is written to the errno global variable. By calling the perror() function, this variable can be translated into an error string that the user can understand.
There are two peculiarities in the implementation of system calls: 1) There are asmlinkage qualifiers in the function declaration, which are used to notify the compiler to only extract the parameters of the function from the stack. 2) The system call getXXX() is defined as sys_getXXX() in the kernel. This is the naming convention that all system calls in Linux should follow.
System call number: In Linux, each system call is assigned a system call number, and the system call can be associated with this unique number. When a user-space process executes a system call, the system call number is used to indicate which system call is to be executed; the process does not mention the name of the system call. Once the system call number is allocated, it cannot be changed (otherwise the compiled application will crash). If a system call is deleted, the system call number it occupies is not allowed to be recycled. Linux has an "unused" system call sys_ni_syscall(), which not only returns -ENOSYS but does not do any other work. This error number is specifically designed for invalid system calls. It seems rare, but if a system call is deleted, this function is responsible for "filling the gap".
The kernel records a list of all registered system calls in the system call table and stores it in sys_call_table. It is architecture related and usually defined in entry.s. This table assigns a unique system call number to each valid system call.
It is difficult for user space programs to directly execute kernel code. They cannot directly call functions in the kernel space, because the kernel resides in a protected address space. The application should notify the system in some form, telling the kernel that it needs to execute a system call, and the system switches to kernel modelinux kernel Call so that the kernel can execute the system call on behalf of the application. These mechanisms for notifying the kernel are implemented through soft interrupts. Soft interrupts on x86 systems are formed by the int$0x80 instruction. This instruction will trigger an exception, causing the system to switch to kernel mode and execute exception handler No. 128. This program is the system call handler, and its name is system_call(). It is closely related to the hardware architecture and is generally in the entry Compiled in assembly language in .s file.
All system calls are trapped in the kernel in the same form as the red flag Linux system, so just trapping in the kernel space is not enough. Therefore, the system call number must be passed to the kernel. On x86, this transfer is accomplished by placing the call number in the eax register before triggering the softirq. In this way, once the system call handler is run, the data can be obtained from eax. The system_call() mentioned above checks the validity of the given system call number by comparing it with NR_syscalls. If it is less than or equal to NR_syscalls, the function returns -ENOSYS. Otherwise, the corresponding system call is executed: call*sys_call_table(,�x,4);
Because the entries in the system call table are stored in 32-bit (4-byte) type, the kernel needs to divide the given system call number by 4, and then use the result to query the table Location. As shown in Figure 1:
It has already been mentioned that not only the system call number, but also some external parameter input is required. The simplest way is to store this parameter in a register just like passing the system call number. On x86 systems ebx, ecx, edx, esi and edi store the first 5 parameters in order. In the rare case that six or more parameters are required, a separate register should be used to store pointers pointing to the user-space addresses of all those parameters. Return values to user space are also passed through registers. On x86 systems, it is stored in the eax register.
System calls must carefully check whether all their parameters are legal and valid. System calls are executed in kernel space. If users are allowed to pass illegal input to the kernel, the security and stability of the system will face a great test. The most important test is to detect whether the watch pointer provided by the user is valid. Before the kernel receives a user-space watch pointer, the kernel must ensure:
1) The video memory area pointed to by the meter needle belongs to user space
2) The video memory area pointed to by the table needle is in the address space of the process
3) If it is reading, the read memory should be marked as readable. If writing, the memory should be marked writable.
The kernel provides two ways to complete the necessary detection and copy data back and forth between kernel space and user space. One of these two methods must be called.
copy_to_user(): Writing data to user space requires 3 parameters. The first parameter is the destination memory address in process space. The second is the source address in kernel space
.The third is the data width (number of bytes) that needs to be copied.
copy_from_user(): Reading data from user space requires 3 parameters. The first parameter is the destination memory address in process space. The second is the source location in the kernel space
Address. The third is the data width (number of bytes) that needs to be copied.
Note: Both of these may cause blocking. These situations occur when pages containing user data are swapped out to hard disk rather than in math memory. At this time, the linux kernel calls , and the process will sleep until the page fault handler replaces the page from the hard disk back to the chemical memory.
The kernel is in the process context when executing a system call, and the current pointer points to the current task, which is the process that caused the system call. In the context of a process, the kernel can sleep (for example, while blocking on a system call or explicitly calling schedule()) but can be occupied. When the system call returns, control remains in system_call(), which is ultimately responsible for switching to user space and allowing the user process to continue execution.
It is very simple to add a system call time to Linux. How to design and implement a system call is the dilemma. The first step in implementing a system call is to decide its purpose. This purpose should be clear and unique. Don't try to write a multi-purpose system call. ioctl is a back-end teaching material. The parameters, return values and error codes of the new system call are very important. Once a system call is compiled, registering it as an upcoming system call is a tedious task, usually following the following steps:
1) Add an entry at the end of the system call table (usually located in entry.s). Counting from 0, the position of a system entry in the table is its system call number. As in
10 system calls are assigned to system call number 9
2) For any architecture, the system call number must be defined in include/asm/unistd.h
3) System calls must be compiled into the kernel image (cannot be compiled into modules). This just needs to be put into a related file under kernel/.
Generally, system calls are supported by the C library. User programs can use system calls (or use library functions, which are actually called by the library functions) by including standard header files and linking with the C library. Fortunately, Linux itself provides a set of macros for direct access to system calls. It will set the register and call the int$0x80 instruction. This macro is _syscalln(), where n ranges from 0 to 6. It represents the number of parameters that need to be passed to the system call. This is because the macro must know exactly how many arguments are pushed into the registers and in which order. Take the open system call as an example:
The open() system call definition is as follows:
longopen(constchar*filename,intflags,intmode)
The way to directly call the macro called by this system is:
#defineNR_open5
_syscall3(long,open,constchar*,filename,int,flags,int,mode)
In this way, the application can directly use open(). Just call the open() system call and directly place the macro inside the application. For each macro, there are 2 2*n parameters. The meaning of each parameter is simple and clear, and will not be explained in detail here.
The above is the detailed content of System calls in Linux are not legal entries into the kernel. For more information, please follow other related articles on the PHP Chinese website!