Home >System Tutorial >LINUX >Syscall system call Linux kernel tracing

Syscall system call Linux kernel tracing

WBOY
WBOYforward
2024-02-12 21:21:14476browse

In Linux user space, we often need to call system calls. Let's take Linux version 2.6.37 as an example to track the implementation of the read system call. System call implementations may vary between versions of Linux.

Syscall system call Linux kernel tracing

In some applications, we can see the following definition:

scssCopy code
#define real_read(fd, buf, count ) (syscall(SYS_read, (fd), (buf), (count)))

Actually, what is actually called is the system function syscall(SYS_read), that is, the sys_read() function. In the Linux version 2.6.37, this function is implemented through several macro definitions.

Linux system call (SCI, system call interface) is actually a process of multi-channel aggregation and decomposition. The aggregation point is the 0x80 interrupt entry point (X86 system structure). That is to say, all system calls are aggregated from user space to the 0x80 interrupt point, and the specific system call number is saved at the same time. When the 0x80 interrupt handler is running, different system calls will be processed separately according to the system call number, that is, different kernel functions will be called for processing.

There are two ways to cause system calls:

(1) int $0×80, this is the only way to cause a system call in old Linux kernel versions.

(2) sysenter assembly instructions

In the Linux kernel, we can use the following macro definitions to make system calls.

SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
{
    struct file *file;
    ssize_t ret = -EBADF;
    int fput_needed;

    file = fget_light(fd, &fput_needed);
    if (file) {
        loff_t pos = file_pos_read(file);
        ret = vfs_read(file, buf, count, &pos);
        file_pos_write(file, pos);
        fput_light(file, fput_needed);
    }

    return ret;
}

The macro definition of SYSCALL_DEFINE3 is as follows:

#define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)

## means that the characters in the macro are directly replaced,
If name = read, then __NR_##name is replaced with __NR_read in the macro. NR##name is the system call number, ## refers to two macro expansions. That is, replace "name" with the actual system call name, and then expand __NR.... If name == ioctl, it is __NR_ioctl.

#ifdef CONFIG_FTRACE_SYSCALLS
#define SYSCALL_DEFINEx(x, sname, ...)                \
    static const char *types_##sname[] = {            \
        __SC_STR_TDECL##x(__VA_ARGS__)            \
    };                            \
    static const char *args_##sname[] = {            \
        __SC_STR_ADECL##x(__VA_ARGS__)            \
    };                            \
    SYSCALL_METADATA(sname, x);                \
    __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
#else
#define SYSCALL_DEFINEx(x, sname, ...)                \
    __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
#endif

Regardless of whether the CONFIG_FTRACE_SYSCALLS macro is defined or not, the following macro definition will eventually be executed:

__SYSCALL_DEFINEx(x, sname, VA_ARGS)

#ifdef CONFIG_HAVE_SYSCALL_WRAPPERS

#define SYSCALL_DEFINE(name) static inline 
long SYSC_##name

#define __SYSCALL_DEFINEx(x, name, ...)                    \
    asmlinkage long sys##name(__SC_DECL##x(__VA_ARGS__));        \
    static inline long SYSC##name(__SC_DECL##x(__VA_ARGS__));    \
    asmlinkage long SyS##name(__SC_LONG##x(__VA_ARGS__))        \
    {                                \
        __SC_TEST##x(__VA_ARGS__);                \
        return (long) SYSC##name(__SC_CAST##x(__VA_ARGS__));    \
    }                                \
    SYSCALL_ALIAS(sys##name, SyS##name);                \
    static inline long SYSC##name(__SC_DECL##x(__VA_ARGS__))

#else /*
 CONFIG_HAVE_SYSCALL_WRAPPERS */

#define SYSCALL_DEFINE(name) asmlinkage 
long sys_##name
#define __SYSCALL_DEFINEx(x, name, ...)                    \
    asmlinkage long sys##name(__SC_DECL##x(__VA_ARGS__))

#endif /*
 CONFIG_HAVE_SYSCALL_WRAPPERS */

The following types of macro definitions will eventually be called:

asmlinkage long sys##name(__SC_DECL##x(VA_ARGS))
That is the sys_read() system function we mentioned earlier.
asmlinkage tells the compiler to extract only the function's arguments from the stack. All system calls require this qualifier! This is similar to the macro definition mentioned in our previous article quagga.

That is, the following code in the macro definition:

struct file *file;
    ssize_t ret = -EBADF;
    int fput_needed;

    file = fget_light(fd, &fput_needed);
    if (file) {
        loff_t pos = file_pos_read(file);
        ret = vfs_read(file, buf, count, &pos);
        file_pos_write(file, pos);
        fput_light(file, fput_needed);
    }

    return ret;

Code analysis:

  • fget_light(): According to the index specified by fd, retrieve the corresponding file object from the current process descriptor (see Figure 3).
  • If the specified file object is not found, an error
  • is returned.
  • If the specified file object is found:
  • Call the file_pos_read() function to get the current position of the file read and written this time.
  • Call vfs_read() to perform a file reading operation, and this function ultimately calls the function pointed to by file->f_op.read(). The code is as follows:

if (file->f_op->read)
ret = file->f_op->read(file, buf, count, pos);

  • Call file_pos_write() to update the current read and write position of the file.
  • Call fput_light() to update the file's reference count.
  • Finally, the number of bytes of read data is returned.

At this point, the processing of the virtual file system layer is completed, and control is handed over to the ext2 file system layer.

The above is the detailed content of Syscall system call Linux kernel tracing. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:lxlinux.net. If there is any infringement, please contact admin@php.cn delete