Process scheduling is one of the core functions of the operating system. It determines which processes can get CPU execution time and how much time they get. In Linux systems, there are many process scheduling algorithms, the most important and commonly used one is Completely Fair Scheduling Algorithm (CFS), which was introduced from Linux version 2.6.23. The goal of CFS is to allow each process to obtain a reasonable and fair allocation of CPU time according to its own weight and needs, thereby improving the overall performance and response speed of the system. This article will introduce the basic principles, data structure, implementation details, advantages and disadvantages of CFS, and help you deeply understand the complete fairness of Linux process scheduling.
A brief history of the Linux scheduler
Early Linux schedulers used a minimalist design that clearly wasn't focused on large architectures with many processors, let alone hyper-threading. 1.2 The Linux scheduler uses a ring queue for runnable task management and uses a round-robin scheduling strategy. This scheduler adds and removes processes very efficiently (with locks protecting the structure). In short, the scheduler is not complex but simple and fast.
Linux version 2.2 introduced the concept of scheduling classes, allowing scheduling strategies for real-time tasks, non-preemptive tasks, and non-real-time tasks. The 2.2 scheduler also includes symmetric multiprocessing (SMP) support.
The 2.4 kernel includes a relatively simple scheduler that runs at O(N) intervals (it iterates over each task during the scheduling event). 2.4 The scheduler divides time into epochs. In each epoch, each task is allowed to execute until its time slice is exhausted. If a task does not use all of its time slices, then half of the remaining time slices will be added to the new time slice so that it can execute longer in the next epoch. The scheduler simply iterates over tasks and applies a goodness function (metric) to decide which task to execute next. Although this method is relatively simple, it is inefficient, lacks scalability and is not suitable for use in real-time systems. It also lacks the ability to take advantage of new hardware architectures, such as multi-core processors.
The early 2.6 scheduler, called the O(1) scheduler, was designed to solve a problem with the 2.4 scheduler - the scheduler did not need to iterate through the entire list of tasks to determine the next task to schedule (hence the name O(1) , which means it is more efficient and scalable). The O(1) scheduler keeps track of runnable tasks in the run queue (actually, there are two run queues per priority level - one for active tasks and one for expired tasks), which means determining which tasks to execute next task, the scheduler simply takes the next task off the run queue of a specific activity by priority). The O(1) scheduler scales better and includes interactivity, providing a wealth of insights for determining whether a task is I/O bound or processor bound. But the O(1) scheduler is clumsy in the kernel. Requires a lot of code to compute revelations, is difficult to manage, and for purists fails to capture the essence of the algorithm.
To solve the problems faced by the O(1) scheduler and to deal with other external pressures, something needs to be changed. This change comes from Con Kolivas's kernel patch, which includes his Rotating Staircase Deadline Scheduler (RSDL), which incorporates his early work on the staircase scheduler. The result of this work is a simply designed scheduler that incorporates fairness and bounded latency. Kolivas' scheduler has attracted a lot of attention (and many are calling for it to be included in the current 2.6.21 mainstream kernel), and it's clear that scheduler changes are coming. Ingo Molnar, the creator of the O(1) scheduler, then developed a CFS-based scheduler around some of Kolivas' ideas. Let’s dissect CFS and see how it operates at a high level.
————————————————————————————————————————
CFS Overview
The main idea behind CFS is to maintain balance (fairness) in terms of providing processor time to tasks. This means that the process should be allocated a significant number of processors. When the time allotted to a task is out of balance (meaning that one or more tasks are not given a significant amount of time relative to other tasks), the out-of-balance task should be given time to execute.
To achieve balance, CFS maintains the amount of time provided to a task in what is called a virtual runtime. The smaller the virtual runtime of a task, the shorter the time the task is allowed to access the server - the higher the demand on the processor. CFS also includes the concept of sleep fairness to ensure that tasks that are not currently running (for example, waiting for I/O) get a fair share of the processor when they eventually need it.
But unlike the previous Linux scheduler, which did not maintain tasks in a run queue, CFS maintained a time-ordered red-black tree (see Figure 1). A red-black tree is a tree with many interesting and useful properties. First, it is self-balancing, meaning that no path in the tree is more than twice as long as any other path. Second, running on the tree occurs in O(log n) time (where n is the number of nodes in the tree). This means you can insert or delete tasks quickly and efficiently.
Figure 1. Red-black tree example

Tasks are stored in a time-ordered red-black tree (represented by sched_entity
objects), with the most processor-demanding tasks (lowest virtual runtime) stored on the left side of the tree , the tasks with the least processor requirements (highest virtual runtime) are stored on the right side of the tree. For fairness, the scheduler then selects the leftmost node of the red-black tree to be scheduled next to maintain fairness. A task accounts for its CPU time by adding its running time to the virtual runtime, and then, if runnable, inserted back into the tree. This way, tasks on the left side of the tree are given time to run, and the contents of the tree are migrated from the right to the left to maintain fairness. Therefore, each runnable task catches up with other tasks to maintain execution balance across the entire set of runnable tasks.
————————————————————————————————————————
CFS Internal Principles
All tasks within Linux are represented by a task structure called task_struct
. This structure (and other related content) completely describes the task and includes the current state of the task, its stack, process identity, priority (static and dynamic), and so on. You can find these and related structures in ./linux/include/linux/sched.h. But because not all tasks are runnable, you won't find any CFS-related fields in task_struct
. Instead, a new structure named sched_entity
is created to track scheduling information (see Figure 2).
Figure 2. Structural hierarchy of tasks and red-black trees

The relationship between various structures is shown in Figure 2. The root of the tree is referenced through the cfs_rq
structure (in ./kernel/sched.c) through the rb_root
element. The leaves of a red-black tree contain no information, but the internal nodes represent one or more runnable tasks. Each node of a red-black tree is represented by rb_node
, which contains only the child reference and the color of the parent object. rb_node
is contained within the sched_entity
structure, which contains rb_node
references, load weights, and various statistics. Most importantly, sched_entity
contains vruntime
(64-bit field), which represents the amount of time the task has been running and serves as an index into the red-black tree. Finally, task_struct
is at the top, which fully describes the task and contains the sched_entity
structure.
As far as the CFS part is concerned, the scheduling function is very simple. In ./kernel/sched.c you'll see the generic schedule()
function, which preempts the currently running task (unless it preempts itself via yield()
code ). Note that CFS has no real concept of time slicing for preemption because preemption time is variable. The currently running task (the one that is now preempted) is returned to the red-black tree by a call to put_prev_task
(via the scheduling class). When the schedule
function begins to determine the next task to be scheduled, it calls the pick_next_task
function. This function is also generic (in ./kernel/sched.c), but it calls the CFS scheduler through the scheduler class. The pick_next_task
function in CFS can be found in ./kernel/sched_fair.c (called pick_next_task_fair()
). This function simply gets the leftmost task from the red-black tree and returns the associated sched_entity
. From this reference, a simple task_of()
call determines the returned task_struct
reference. The universal scheduler finally provides the processor for this task.
————————————————————————————————————————
Priority and CFS
CFS does not use priority directly but uses it as a decay factor for the time a task is allowed to execute. Low-priority tasks have a higher decay coefficient, while high-priority tasks have a lower decay coefficient. This means that the time allowed for task execution consumes faster for low-priority tasks than for high-priority tasks. This is a neat solution to avoid maintaining a priority-scheduled run queue.
CFS Group Scheduling
Another interesting aspect of CFS is the group scheduling concept (introduced in the 2.6.24 kernel). Group scheduling is another way to bring fairness to scheduling, especially when dealing with tasks that spawn many other tasks. Suppose a server that spawns many tasks wants to parallelize incoming connections (a typical architecture for HTTP servers). Not all tasks are treated uniformly and fairly, and CFS introduces groups to handle this behavior. The server processes that spawn tasks share their virtual runtimes across the group (in a hierarchy), while individual tasks maintain their own independent virtual runtimes. This way individual tasks receive approximately the same scheduling time as the group. You'll find that the /proc interface is used to manage the process hierarchy, giving you complete control over how groups are formed. Using this configuration, you can distribute fairness across users, processes, or variations thereof.
Scheduling classes and domains
Introduced together with CFS is the concept of scheduling class (can be reviewed in Figure 2). Each task belongs to a scheduling class, which determines how the task will be scheduled. The scheduling class defines a common set of functions (via sched_class
) that define the behavior of the scheduler. For example, each scheduler provides a way to add tasks to be scheduled, call out the next task to run, provide it to the scheduler, and so on. Each scheduler class is connected to each other in a one-to-one connected list, allowing the classes to be iterated over (for example, to enable the disabling of a given processor). The general structure is shown in Figure 3. Note that enqueuing or dequeuing a task function simply adds or removes the task from a specific scheduling structure. Function pick_next_task
Selects the next task to be executed (depending on the specific strategy of the scheduling class).
Figure 3. Scheduling graphic view

But don’t forget that the scheduling class is part of the task structure itself (see Figure 2). This simplifies the operation of tasks regardless of their scheduling class. For example, the following function uses the new task in ./kernel/sched.c to preempt the currently running task (where curr
defines the current running task, rq
represents the CFS red-black tree and p
is the next task to be scheduled):
static inline void check_preempt( struct rq *rq, struct task_struct *p ) { rq->curr->sched_class->check_preempt_curr( rq, p ); }
If this task is using the fair scheduling class, check_preempt_curr()
will resolve to check_preempt_wakeup()
. You can view these relationships in ./kernel/sched_rt.c, ./kernel/sched_fair.c and ./kernel/sched_idle.c.
Scheduling classes are another interesting place where scheduling changes, but as the scheduling domain increases, so does the functionality. These domains allow you to hierarchically group one or more processors for load balancing and isolation purposes. One or more processors can share a scheduling policy (and maintain load balancing among them) or implement independent scheduling policies to intentionally isolate tasks.
Other schedulers
Continue to study scheduling and you will find schedulers under development that will push the boundaries of performance and scalability. Undeterred by his Linux experience, Con Kolivas developed another Linux scheduler, abbreviated as: BFS. The scheduler is said to have better performance on NUMA systems and mobile devices, and was introduced in a derivative of the Android operating system.
Through this article, you should have a basic understanding of CFS. It is one of the most advanced and efficient process scheduling algorithms in Linux systems. It uses red-black trees to store runnable process queues and calculates virtual The running time is used to measure the fairness of the process, and the response speed of the interactive process is improved by implementing the wake-up preemption feature, thereby achieving complete fairness in process scheduling. Of course, CFS is not perfect. It has some potential problems and limitations, such as over-compensation of active sleep processes, insufficient support for real-time processes, underutilization of multi-core processors, etc., which need to be addressed in future versions. Make improvements and optimizations. In short, CFS is an indispensable component in the Linux system and is worthy of your in-depth study and mastery.
The above is the detailed content of Linux CFS: How to achieve complete fairness in process scheduling. For more information, please follow other related articles on the PHP Chinese website!

linux设备节点是应用程序和设备驱动程序沟通的一个桥梁;设备节点被创建在“/dev”,是连接内核与用户层的枢纽,相当于硬盘的inode一样的东西,记录了硬件设备的位置和信息。设备节点使用户可以与内核进行硬件的沟通,读写设备以及其他的操作。

区别:1、open是UNIX系统调用函数,而fopen是ANSIC标准中的C语言库函数;2、open的移植性没fopen好;3、fopen只能操纵普通正规文件,而open可以操作普通文件、网络套接字等;4、open无缓冲,fopen有缓冲。

端口映射又称端口转发,是指将外部主机的IP地址的端口映射到Intranet中的一台计算机,当用户访问外网IP的这个端口时,服务器自动将请求映射到对应局域网内部的机器上;可以通过使用动态或固定的公共网络IP路由ADSL宽带路由器来实现。

在linux中,eof是自定义终止符,是“END Of File”的缩写;因为是自定义的终止符,所以eof就不是固定的,可以随意的设置别名,linux中按“ctrl+d”就代表eof,eof一般会配合cat命令用于多行文本输出,指文件末尾。

在linux中,可以利用“rpm -qa pcre”命令判断pcre是否安装;rpm命令专门用于管理各项套件,使用该命令后,若结果中出现pcre的版本信息,则表示pcre已经安装,若没有出现版本信息,则表示没有安装pcre。

在linux中,交叉编译是指在一个平台上生成另一个平台上的可执行代码,即编译源代码的平台和执行源代码编译后程序的平台是两个不同的平台。使用交叉编译的原因:1、目标系统没有能力在其上进行本地编译;2、有能力进行源代码编译的平台与目标平台不同。

在linux中,rpc是远程过程调用的意思,是Reomote Procedure Call的缩写,特指一种隐藏了过程调用时实际通信细节的IPC方法;linux中通过RPC可以充分利用非共享内存的多处理器环境,提高系统资源的利用率。

linux查询mac地址的方法:1、打开系统,在桌面中点击鼠标右键,选择“打开终端”;2、在终端中,执行“ifconfig”命令,查看输出结果,在输出信息第四行中紧跟“ether”单词后的字符串就是mac地址。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Dreamweaver Mac version
Visual web development tools

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.
