Linux process group scheduling mechanism: how to group and schedule processes-LINUX-php.cn

Home

System Tutorial

LINUX

Linux process group scheduling mechanism: how to group and schedule processes

王林

Feb 11, 2024 pm 08:30 PM

linuxlinux tutoriallinux systemlinux commandshell scriptembeddedlinuxGetting started with linuxlinux learning

Process group is a way to classify and manage processes in Linux systems. It can put processes with the same characteristics or relationships together to form a logical unit. The function of the process group is to facilitate the control, communication and resource allocation of processes to improve the efficiency and security of the system. Process group scheduling is a mechanism for scheduling process groups in Linux systems. It can allocate appropriate CPU time and resources based on the attributes and needs of the process group, thereby improving the concurrency and responsiveness of the system. But, do you really understand the Linux process group scheduling mechanism? Do you know how to create and manage process groups in Linux? Do you know how to use and configure the process group scheduling mechanism under Linux? This article will introduce you to the relevant knowledge of the Linux process group scheduling mechanism in detail, allowing you to better use and understand this powerful kernel function under Linux.

Linux 进程组调度机制：如何对进程进行分组和调度

Another magical process scheduling problem was encountered. During the system restart process, it was found that the system hung and was reset after 30 seconds. The real cause of the system reset was the system restarted by the hardware watchdog, not the original system. Normal reboot process. The reset time of the hardware dog record is pushed forward by 30 seconds when the dog is not fed. When analyzing the serial port record log, the log at that time printed a sentence: "sched: RT throttling activated".
It can be seen from the linux-3.0.101-0.7.17 version kernel code that sched_rt_runtime_exceeded prints this sentence. In the kernel process group scheduling process, real-time process scheduling is restricted by rt_rq->rt_throttled. Let’s talk about the process group scheduling mechanism in Linux in detail below.

Process group scheduling mechanism

Group scheduling is a concept in cgroup, which refers to treating N processes as a whole and participating in the scheduling process in the system. This is reflected in the example: Task A has 8 processes or threads, and task B has 2 processes or threads. Threads, if there are still other processes or threads, it is necessary to control the CPU usage of task A to not be higher than 40%, the CPU usage of task B to not be higher than 40%, and the occupancy of other tasks to not be less than 20%, then There are cgroup threshold settings, cgroup A is set to 200, cgroup B is set to 200, and other tasks default to 100, thus realizing the CPU control function.
In the kernel, process groups are managed by task_group, and many of the contents involved are cgroup control mechanisms. In addition, the development unit is being written. Here we refer to the part that focuses on group scheduling. See the following comments for details.

struct task_group {
 struct cgroup_subsys_state css;

//下面是普通进程调度使用
#ifdef CONFIG_FAIR_GROUP_SCHED
 /* schedulable entities of this group on each cpu */
//普通进程调度单元，之所以用调度单元，因为被调度的可能是一个进程，也可能是一组进程
 struct sched_entity **se;
 /* runqueue "owned" by this group on each cpu */
//公平调度队列
 struct cfs_rq **cfs_rq;
//下面就是如上示例的控制阀值
 unsigned long shares;
 atomic_t load_weight;
#endif

#ifdef CONFIG_RT_GROUP_SCHED
//实时进程调度单元
 struct sched_rt_entity **rt_se;
//实时进程调度队列
 struct rt_rq **rt_rq;
//实时进程占用CPU时间的带宽（或者说比例）
 struct rt_bandwidth rt_bandwidth;
#endif

 struct rcu_head rcu;
 struct list_head list;
//task_group呈树状结构组织，有父节点，兄弟链表，孩子链表，内核里面的根节点是root_task_group
 struct task_group *parent;
 struct list_head siblings;
 struct list_head children;

#ifdef CONFIG_SCHED_AUTOGROUP
 struct autogroup *autogroup;
#endif

 struct cfs_bandwidth cfs_bandwidth;
};

There are two types of scheduling units, namely ordinary scheduling units and real-time process scheduling units.

struct sched_entity {
 struct load_weight load;  /* for load-balancing */
 struct rb_node  run_node;
 struct list_head group_node;
 unsigned int  on_rq;

 u64   exec_start;
 u64   sum_exec_runtime;
 u64   vruntime;
 u64   prev_sum_exec_runtime;

 u64   nr_migrations;

#ifdef CONFIG_SCHEDSTATS
 struct sched_statistics statistics;
#endif

#ifdef CONFIG_FAIR_GROUP_SCHED
//当前调度单元归属于某个父调度单元
 struct sched_entity *parent;
 /* rq on which this entity is (to be) queued: */
//当前调度单元归属的父调度单元的调度队列，即当前调度单元插入的队列
 struct cfs_rq  *cfs_rq;
 /* rq "owned" by this entity/group: */
//当前调度单元的调度队列，即管理子调度单元的队列，如果调度单元是task_group，my_q才会有值
//如果当前调度单元是task，那么my_q自然为NULL
 struct cfs_rq  *my_q;
#endif
 void *suse_kabi_padding;
};

struct sched_rt_entity {
 struct list_head run_list;
 unsigned long timeout;
 unsigned int time_slice;
 int nr_cpus_allowed;

 struct sched_rt_entity *back;
#ifdef CONFIG_RT_GROUP_SCHED
//实时进程的管理和普通进程类似，下面三项意义参考普通进程
 struct sched_rt_entity *parent;
 /* rq on which this entity is (to be) queued: */
 struct rt_rq  *rt_rq;
 /* rq "owned" by this entity/group: */
 struct rt_rq  *my_q;
#endif
};

Let’s take a look at the scheduling queue, because the options that need to be explained for real-time scheduling and ordinary scheduling queues are similar. Take the real-time queue as an example:

struct rt_rq {
 struct rt_prio_array active;
 unsigned long rt_nr_running;
#if defined CONFIG_SMP || defined CONFIG_RT_GROUP_SCHED
 struct {
  int curr; /* highest queued rt task prio */
#ifdef CONFIG_SMP
  int next; /* next highest */
#endif
 } highest_prio;
#endif
#ifdef CONFIG_SMP
 unsigned long rt_nr_migratory;
 unsigned long rt_nr_total;
 int overloaded;
 struct plist_head pushable_tasks;
#endif
//当前队列的实时调度是否受限
 int rt_throttled;
//当前队列的累计运行时间
 u64 rt_time;
//当前队列的最大运行时间
 u64 rt_runtime;
 /* Nests inside the rq lock: */
 raw_spinlock_t rt_runtime_lock;

#ifdef CONFIG_RT_GROUP_SCHED
 unsigned long rt_nr_boosted;
//当前实时调度队列归属调度队列
 struct rq *rq;
 struct list_head leaf_rt_rq_list;
//当前实时调度队列归属的调度单元
 struct task_group *tg;
#endif
};

Through the analysis of the above three structures, the following picture can be obtained (click to enlarge the picture):

task_group

As can be seen from the figure, the scheduling unit and the scheduling queue are combined into a tree node, which is another separate tree structure. However, it should be noted that the scheduling unit will only operate when there is a TASK_RUNNING process in the scheduling unit. is placed in the dispatch queue.
Another point is that before there was group scheduling, there was only one scheduling queue on each CPU. At that time, it could be understood that all processes were in one scheduling group. Now, each scheduling group has a scheduling queue on each CPU. During the scheduling process, the system originally selected a process to run. Currently, it selects a scheduling unit to run. When scheduling occurs, the schedule process starts from the root_task_group and looks for the scheduling unit determined by the scheduling policy. When the scheduling unit is task_group, it enters the task_group. The run queue selects a suitable scheduling unit and finally finds a suitable task scheduling unit. The whole process is a tree traversal. The task_group with the TASK_RUNNING process is the node of the tree, and the task scheduling unit is the leaf of the tree.

Group process scheduling policy

The purpose of group process scheduling is no different from the original, which is to complete real-time process scheduling and ordinary process scheduling, that is, rt and cfs scheduling.

CFS组调度策略：

文章前面示例中提到的任务分配CPU，说的就是cfs调度，对于CFS调度而言，调度单元和普通调度进程没有多大区别，调度单元由自己的调度优先级，而且不受调度进程的影响，每个task_group都有一个shares，share并非我们说的进程优先级，而是调度权重，这个是cfs调度管理的概念，但在cfs中最终体现到调度优先排序上。shares值默认都是相同的，所有没有设置权重的值，CPU都是按旧有的cfs管理分配的。总结的说，就是cfs组调度策略没变化。具体到cgroup的CPU控制机制上再说。

RT组调度策略：

实时进程的优先级是设置固定，调度器总是选择优先级最高的进程运行。而在组调度中，调度单元的优先级则是组内优先级最高的调度单元的优先级值，也就是说调度单元的优先级受子调度单元影响，如果一个进程进入了调度单元，那么它所有的父调度单元的调度队列都要重排。实际上我们看到的结果是，调度器总是选择优先级最高的实时进程调度，那么组调度对实时进程控制机制是怎么样的？
在前面的rt_rq实时进程运行队列里面提到rt_time和rt_runtime，一个是运行累计时间，一个是最大运行时间，当运行累计时间超过最大运行时间的时候，rt_throttled则被设置为1,见sched_rt_runtime_exceeded函数。

if (rt_rq->rt_time > runtime) {
 rt_rq->rt_throttled = 1;
 if (rt_rq_throttled(rt_rq)) {
  sched_rt_rq_dequeue(rt_rq);
  return 1;
 }
}

设置为1意味着实时队列中被限制了，如__enqueue_rt_entity函数，不能入队。

static inline int rt_rq_throttled(struct rt_rq *rt_rq)
{
 return rt_rq->rt_throttled && !rt_rq->rt_nr_boosted;
}
static void __enqueue_rt_entity(struct sched_rt_entity *rt_se, bool head)
{
 /*
  * Don't enqueue the group if its throttled, or when empty.
  * The latter is a consequence of the former when a child group
  * get throttled and the current group doesn't have any other
  * active members.
  */
 if (group_rq && (rt_rq_throttled(group_rq) || !group_rq->rt_nr_running))
  return;
.....
}

其实还有一个隐藏的时间概念，即sched_rt_period_us，意味着sched_rt_period_us时间内，实时进程可以占用CPU rt_runtime时间，如果实时进程每个时间周期内都没有调度，则在do_sched_rt_period_timer定时器函数中将rt_time减去一个周期，然后比较rt_runtime，恢复rt_throttled。

//overrun来自对周期时间定时器误差的校正
rt_rq->rt_time -= min(rt_rq->rt_time, overrun*runtime);
if (rt_rq->rt_throttled && rt_rq->rt_time rt_throttled = 0;
  enqueue = 1;

则对于cgroup控制实时进程的占用比则是通过rt_runtime实现的，对于root_task_group，也即是所有进程在一个cgroup下，则是通过/proc/sys/kernel/sched_rt_period_us和/proc/sys/kernel/sched_rt_runtime_us接口设置的，默认值是1s和0.95s。这么看以为实时进程只能占用95%CPU，那么实时进程占用CPU100%导致进程挂死的问题怎么出现了？
原来实时进程所在的CPU占用超时了，实时进程的rt_runtime可以向别的cpu借用，将其他CPU剩余的rt_runtime-rt_time的值借过来，如此rt_time可以最大等于rt_runtime，造成事实上的单核CPU达到100%。这样做的目的自然规避了实时进程缺少CPU时间而向其他核迁移的成本，未绑核的普通进程自然也可以迁移其他CPU上，不会得不到调度，当然绑核进程仍然是个杯具。

static int do_balance_runtime(struct rt_rq *rt_rq)
{
 struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
 struct root_domain *rd = cpu_rq(smp_processor_id())->rd;
 int i, weight, more = 0;
 u64 rt_period;

 weight = cpumask_weight(rd->span);

 raw_spin_lock(&rt_b->rt_runtime_lock);
 rt_period = ktime_to_ns(rt_b->rt_period);
 for_each_cpu(i, rd->span) {
  struct rt_rq *iter = sched_rt_period_rt_rq(rt_b, i);
  s64 diff;

  if (iter == rt_rq)
   continue;

  raw_spin_lock(&iter->rt_runtime_lock);
  /*
   * Either all rqs have inf runtime and there's nothing to steal
   * or __disable_runtime() below sets a specific rq to inf to
   * indicate its been disabled and disalow stealing.
   */
  if (iter->rt_runtime == RUNTIME_INF)
   goto next;

  /*
   * From runqueues with spare time, take 1/n part of their
   * spare time, but no more than our period.
   */
  diff = iter->rt_runtime - iter->rt_time;
  if (diff > 0) {
   diff = div_u64((u64)diff, weight);
   if (rt_rq->rt_runtime + diff > rt_period)
    diff = rt_period - rt_rq->rt_runtime;
   iter->rt_runtime -= diff;
   rt_rq->rt_runtime += diff;
   more = 1;
   if (rt_rq->rt_runtime == rt_period) {
    raw_spin_unlock(&iter->rt_runtime_lock);
    break;
   }
  }
next:
  raw_spin_unlock(&iter->rt_runtime_lock);
 }
 raw_spin_unlock(&rt_b->rt_runtime_lock);

 return more;
}

通过本文，你应该对 Linux 进程组调度机制有了一个深入的了解，知道了它的定义、原理、流程和优化方法。你也应该明白了进程组调度机制的作用和影响，以及如何在 Linux 下正确地使用和配置进程组调度机制。我们建议你在使用 Linux 系统时，使用进程组调度机制来提高系统的效率和安全性。同时，我们也提醒你在使用进程组调度机制时要注意一些潜在的问题和挑战，如进程组类型、优先级、限制等。希望本文能够帮助你更好地使用 Linux 系统，让你在 Linux 下享受进程组调度机制的优势和便利。

The above is the detailed content of Linux process group scheduling mechanism: how to group and schedule processes. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:良许Linux教程网. If there is any infringement, please contact admin@php.cn delete

What are the main tasks of a Linux system administrator?Apr 19, 2025 am 12:23 AM

The main tasks of Linux system administrators include system monitoring and performance tuning, user management, software package management, security management and backup, troubleshooting and resolution, performance optimization and best practices. 1. Use top, htop and other tools to monitor system performance and tune it. 2. Manage user accounts and permissions through useradd commands and other commands. 3. Use apt and yum to manage software packages to ensure system updates and security. 4. Configure a firewall, monitor logs, and perform data backup to ensure system security. 5. Troubleshoot and resolve through log analysis and tool use. 6. Optimize kernel parameters and application configuration, and follow best practices to improve system performance and stability.

Is it hard to learn Linux?Apr 18, 2025 am 12:23 AM

Learning Linux is not difficult. 1.Linux is an open source operating system based on Unix and is widely used in servers, embedded systems and personal computers. 2. Understanding file system and permission management is the key. The file system is hierarchical, and permissions include reading, writing and execution. 3. Package management systems such as apt and dnf make software management convenient. 4. Process management is implemented through ps and top commands. 5. Start learning from basic commands such as mkdir, cd, touch and nano, and then try advanced usage such as shell scripts and text processing. 6. Common errors such as permission problems can be solved through sudo and chmod. 7. Performance optimization suggestions include using htop to monitor resources, cleaning unnecessary files, and using sy

What is the salary of Linux administrator?Apr 17, 2025 am 12:24 AM

The average annual salary of Linux administrators is $75,000 to $95,000 in the United States and €40,000 to €60,000 in Europe. To increase salary, you can: 1. Continuously learn new technologies, such as cloud computing and container technology; 2. Accumulate project experience and establish Portfolio; 3. Establish a professional network and expand your network.

What is the main purpose of Linux?Apr 16, 2025 am 12:19 AM

The main uses of Linux include: 1. Server operating system, 2. Embedded system, 3. Desktop operating system, 4. Development and testing environment. Linux excels in these areas, providing stability, security and efficient development tools.

Does the internet run on Linux?Apr 14, 2025 am 12:03 AM

The Internet does not rely on a single operating system, but Linux plays an important role in it. Linux is widely used in servers and network devices and is popular for its stability, security and scalability.

What are Linux operations?Apr 13, 2025 am 12:20 AM

The core of the Linux operating system is its command line interface, which can perform various operations through the command line. 1. File and directory operations use ls, cd, mkdir, rm and other commands to manage files and directories. 2. User and permission management ensures system security and resource allocation through useradd, passwd, chmod and other commands. 3. Process management uses ps, kill and other commands to monitor and control system processes. 4. Network operations include ping, ifconfig, ssh and other commands to configure and manage network connections. 5. System monitoring and maintenance use commands such as top, df, du to understand the system's operating status and resource usage.

Boost Productivity with Custom Command Shortcuts Using Linux AliasesApr 12, 2025 am 11:43 AM

Introduction Linux is a powerful operating system favored by developers, system administrators, and power users due to its flexibility and efficiency. However, frequently using long and complex commands can be tedious and er

What is Linux actually good for?Apr 12, 2025 am 12:20 AM

Linux is suitable for servers, development environments, and embedded systems. 1. As a server operating system, Linux is stable and efficient, and is often used to deploy high-concurrency applications. 2. As a development environment, Linux provides efficient command line tools and package management systems to improve development efficiency. 3. In embedded systems, Linux is lightweight and customizable, suitable for environments with limited resources.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks agoByDDD

Hot Tools

SublimeText3 English version

Recommended: Win version, supports code prompts!

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SublimeText3 Mac version

God-level code editing software (SublimeText3)

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Atom editor mac version download

The most popular open source editor

Hot Topics

Where is the login entrance for gmail email?

7635

CakePHP Tutorial

1390

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

148