Home  >  Article  >  System Tutorial  >  Artifact in Linux: Principles and Applications of eventfd

Artifact in Linux: Principles and Applications of eventfd

王林
王林forward
2024-02-13 20:30:16716browse

Linux is a powerful operating system that provides many efficient inter-process communication mechanisms, such as pipes, signals, message queues, shared memory, etc. But is there a simpler, more flexible, and more efficient way to communicate? The answer is yes, that is eventfd. eventfd is a system call introduced in Linux version 2.6. It can be used to implement event notification, that is, to deliver events through a file descriptor. eventfd contains a 64-bit unsigned integer counter maintained by the kernel. The process can read/change the counter value by reading/writing this file descriptor to achieve inter-process communication. What are the advantages of eventfd? It has the following characteristics:

Artifact in Linux: Principles and Applications of eventfd

  • eventfd does not need to create any additional files or memory space, it only needs a file descriptor;
  • eventfd can be used in conjunction with multiplexing mechanisms such as select, poll, and epoll to achieve efficient event-driven programming;
  • eventfd can be set to non-blocking or semaphore mode, providing different communication semantics;
  • eventfd can cross process or thread boundaries to achieve different levels of communication.

So, how does eventfd work? What application scenarios does it have? This article will introduce the artifact eventfd from two aspects: principle and application.

Generally speaking: There are five major solutions for Linux inter-process communication: pipes, message queues, semaphores, shared memory, and sockets.
I am not very familiar with pipes. I only know about the limitations of general pipes and the relationship between parent and child processes. I ruled it out at first because what I want to do is independent inter-process communication. Named pipes do not seem to be limited to parent and child processes, but in the kernel state. Not sure how to use it.
I don't understand Message Queue at all.
The core of the semaphore is an atomic operation of kernel variables, but the interface is only reflected in the user mode, and the PV operation of the semaphore seems to be mutually exclusive, rather than the notification wake-up mechanism I want.
Shared memory is even more troublesome. The interface is only in user mode. If you want to share memory between kernel mode and user mode, you have to write the file yourself and then provide the mmap interface.
Sockets have only been used with af_inet's tcp/udp and af_unix's dgram before. The problem is still the same. The kernel does not provide a clear interface. Although you can call it yourself using functions such as sock->ops->recvmsg, but After all, you need to construct the input parameters yourself, which still feels unsafe.

The only thing left seems to be netlink. This socket clearly provides the kernel's packet sending function, because it clearly exports the netlink_kernel_create function, so the kernel mode function can use this sock to send packets. But one is that the user mode needs to register a packet receiving function, and the other is that the kernel mode still needs to assemble skb to send packets. It is still too complicated for me who simply just want to wake up by notification.

So I searched again and found the artifact eventfd. Between the communication between KVM and Qemu, eventfd was used superbly by Daniel. After carefully analyzing the source code, I found that this thing is just as the name says, purely for exists for notification.
As a file (is there anything in Linux that is not a file~~), its private_data structure eventfd_ctx has only four pitiful variables.

struct eventfd_ctx {
  struct kref kref;  /* 这个就不多说了,file计数用的,用于get/put */
  wait_queue_head_t wqh; /* 这个用来存放用户态的进程wait项,有了它通知机制才成为可能 */
/*
\* Every time that a write(2) is performed on an eventfd, the
\* value of the __u64 being written is added to "count" and a
\* wakeup is performed on "wqh". A read(2) will return the "count"
\* value to userspace, and will reset "count" to zero. The kernel
\* side eventfd_signal() also, adds to the "count" counter and
\* issue a wakeup.
*/
  __u64 count;  /* 这个就是一个技术器,应用程序可以自己看着办,read就是取出然后清空,write就是把value加上 */
  unsigned int flags;  /* 所有的file都有的吧,用来存放阻塞/非阻塞标识或是O_CLOEXEC之类的东西 */
};
  我之所以选用它是因为它有 eventfd_signal 这个特地为内核态提供的接口,下面的是注释。
 \* This function is supposed to be called by the kernel in paths that do not
 \* allow sleeping. In this function we allow the counter to reach the ULLONG_MAX
 \* value, and we signal this as overflow condition by returining a POLLERR to poll(2).

In fact, it will be clearer if you look at the code

int eventfd_signal(struct eventfd_ctx *ctx, int n)
{
  unsigned long flags;

  if (n return -EINVAL;
  spin_lock_irqsave(&ctx->wqh.lock, flags);
  if (ULLONG_MAX - ctx->count count);
  ctx->count += n;
  if (waitqueue_active(&ctx->wqh))
    wake_up_locked_poll(&ctx->wqh, POLLIN);
  spin_unlock_irqrestore(&ctx->wqh.lock, flags);

  return n;
}  

The essence is to wake up once, without reading or writing. The difference from eventfd_write is that no blocking is required

Let me talk about my specific usage:
The kernel state is a module that registers a misc device and creates a kernel thread to work (the parameter is the module's file->private_data). Provide an ioctl interface for the user mode process to deliver the fd created by its own eventfd, and save it in file->private_data that can be accessed by the kernel thread.
When the kernel state wants to notify the user state, eventfd_signal is used directly. At this time, the user state thread needs to first place itself on eventfd_ctx->wqh. There are two solutions, one is to call read, and the other is to call poll. If it is a read, eventfd_ctx->count will be cleared later, and it can be blocked next time. However, if poll is used, the count is not cleared afterwards, causing poll to return immediately even if there is no eventfd_signal in the kernel state when polling again.
It is a little more troublesome to notify the kernel state from the user state. First, you need to create an eventfd and then send it to file->private_data (the operation here is the same as above). In addition, you need to make an iotcl in the module, which is responsible for the user state to notify the kernel state. , eventfd_signal is done in the function. The kernel state thread needs to be placed on eventfd_ctx->wqh first. You can use vfs_read, or do a poll in the kernel state yourself (seems to be troublesome again).

This article introduces eventfd, an artifact in Linux. It is a simple, flexible and efficient inter-process communication mechanism. We analyzed the creation, reading and writing, and flag bits of eventfd from the principle aspect, and gave corresponding code examples. We also introduced the use of eventfd in scenarios such as user mode and kernel mode communication, timers and event triggers from the application perspective, and gave corresponding code examples. Through the study of this article, we can master the basic usage of eventfd, and can flexibly use eventfd in actual development to achieve different communication needs. Hope this article is helpful to you!

The above is the detailed content of Artifact in Linux: Principles and Applications of eventfd. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:lxlinux.net. If there is any infringement, please contact admin@php.cn delete