Home > Article > System Tutorial > Concurrency control technology in Linux drivers: principles and practice
If you are an embedded Linux developer, you may encounter this question: How to safely share device resources between multiple tasks or threads? How to avoid data races and inconsistencies? How to improve system performance and reliability? These issues all involve concurrency control technology, that is, how to coordinate the access of multiple execution entities to shared resources. In this article, we will introduce the commonly used concurrency control technologies in Linux drivers, including atomic operations, spin locks, semaphores, mutex locks, read-write locks, sequential locks and RCU, etc., and give examples of their usage and attention. matter.
In order to achieve effective management of critical resources, application layer programs have atomic variables, condition variables, and semaphores to control concurrency. The same problem also exists in driver development, such as a driver being called by multiple application layer programs at the same time. , at this time, the global variables in the driver will belong to the process space of multiple application layer processes at the same time. In this case, some technologies must also be used to control concurrency. This article will discuss the technical characteristics and application scenarios of the following concurrency control technologies in the kernel.
As the name suggests, it means to block all interrupts. In embedded systems, interrupt shielding can have three levels, 1. Hardware interface shielding, 2. Hardware GIC shielding, 3. CPU (kernel) shielding. If it is blocked at the interface, then the interrupt will be lost and cannot be found at all. If it is shielded at the GIC, then if irq_1, irq_2, irq_3 interrupts come during the shielding period, because there is only one pending flag, the pending flag will be set when irq_3 finally comes. After that, the shielding is unblocked, and the CPU finds that there is a pending flag. If the bit is set, it will still be processed, but 1 and 2 will definitely be lost. The shielding at ARM, that is, the shielding in the kernel, depends on how it is set. If it is local_irq_disable, then it is lost, just like the shielding at the interface. If it is local_irq_save, it is the same as the second one, chasing the last interrupt. The kernel also has a corresponding mechanism to count interrupts and know how many interrupts have come during this period. However, in actual operations, in most cases we will not chase the missed interrupts unless the interrupt is very important.
What we are discussing here is interrupt masking in the kernel. Since many important operations in the kernel rely on interrupts, it is very dangerous to mask all interrupts. The code executed inside must be as fast as possible. Moreover, since the process scheduling of the kernel is also driven by interrupts, it is very dangerous to mask all interrupts. There must be no code that may trigger sleep, otherwise it cannot be woken up. Note that interrupt masking only masks the interrupts of this CPU, so it does not solve the competition problem caused by SMP. Usually, interrupt masking is used in conjunction with spin locks to prevent access spins The critical section protected by the lock was interrupted by an interrupt
local_irq_disable(); //屏蔽中断 //或 local_irq_save(flags); //屏蔽中断并保存目前CPU中的中断位信息 /* 临界区 */ local_irq_enable(); //解除屏蔽 //或 local_irq_restore(flags); //解除屏蔽并恢复中断位信息
local_bh_disable(); //屏蔽中断 /* 临界区 */ local_bh_enable();
Atomic operations are operations that cannot be interrupted. They are the same as the concept of the application layer. The atomic operation template in the kernel is as follows:
//asm/atomic.h //创建并初始化原子变量 atomic_t tv = ATOMIC_INIT(初值); //读原子变量 int atomic_read(atomic_t *v); //写原子变量 void atomic_set(atomic_t *v, int i); /** *atomic_dec_and_test - 尝试将原子变量-1 *v:如果-1之后原子变量变为0,返回非0, 否则返回0 */ int atomic_dec_and_test(volatile atomic_t *v); int atomic_inc_and_test(volatile atomic_t *v); int atomic_sub_and_test(int i, volatile atomic_t *v); //操作并返回 int atomic_add_return(int i, atomic *v); int atomic_sub_return(int i, atomic *v); int atomic_inc_return(atomic *v); int atomic_dev_return(atomic *v);
static atomic_t tv; static int demo_open(struct inode *inode, struct file *filp) { if(!atomic_dec_and_test(&tv)){ atomic_inc(&tv); return -EBUSY; } /* 操作代码 */ return 0; } static int demo_release(struct inode *inode, struct file *filp) { atomic_inc(&tv); return 0; } static int __init demo_init(void) { // init atomic_t atomic_set(&tv, 1); }
Bit atomic operations are atomic bit operations. A large number of "bits" are used in the kernel to record information, such as bitmaps. These operations must be atomic. The kernel API is as follows:
//设置位 void set_bit(nr,void *addr); //清除位 void clear_bit(nr,void *addr); //改变位 void change_bit(nr,void *addr); //测试位 test_bit(nr, void *addr); //测试并操作位 int test_and_set_bit(nr, void *addr); int test_and_clear_bit(nr,void *addr); int test_and_change_bit(nr,void *addr);
means "spinning in place". When the lock is unsuccessful, spin, and the spin lock will continue to occupy the CPU for variable testing. Since it is an atomic operation, the CPU usage will rise to 100%. Therefore, when using spin locks, the code in the critical section needs to be very short, otherwise it will affect system performance. In addition, as a kind of lock mechanism, when using spin locks, you also need to pay attention to the occurrence of deadlocks. Spin locks cannot be called during locking. Functions that may cause process scheduling, if the process blocks after obtaining the spin lock, eg, copy_from_user(), copy_to_user(), kmalloc(), msleep(), etc., once blocking occurs, it may cause the kernel to crash.
Spin locks can be used to solve SMP race problems. Different types of spin locks have their own processing mechanisms, suitable for different situations. Includingtraditional spin locks, read-write spin locks, RCU mechanisms, sequential locks, etc., spin locks are the underlying implementation tools for semaphores and mutexes.
Comparison\Type | Traditional spin lock | Read and write spin lock | Sequential lock | RCU mechanism |
---|---|---|---|---|
Application occasions | Resources that need to be exclusive to the locker | Requires resources exclusive to the writer | Resources that are rarely read and written at the same time | Read more and write less resources |
Read Read Concurrency | × | √ | √ | √ |
Read Write Concurrency | × | × | √ | √ |
Write Write Concurrency | × | × | × | √ |
和其他锁机制的一样,使用自旋锁保护数据分为抢锁-操作-解锁,下面就是一个典型的使用锁的流程,通过自旋锁实现一个文件只被一个进程打开。
int cnt=0; lock_t lock; static int my_open() { lock(&lock); if(cnt){ unlock(&lock) } cnt++; unlock(&lock); } static int release() { lock(&lock); cnt--; unlock(&lock); }
是一种比较粗暴的自旋锁,使用这种锁的时候,被锁定的临界区域不允许其他CPU访问,需要注意的是,尽管获得锁之后执行的临界区操作不会被其他CPU和本CPU内其他抢占进程的打扰,但是仍然会被中断和底半部的影响,所以通常我们会使用下述API中的衍生版本,比如上文中提到的将自旋锁+中断屏蔽来防止使用自旋锁访问临界资源的时候被中断打断,对应的宏函数就是spin_lock_irq和spin_lock_irqsave。
//定义并初始化自旋锁 spinlock_t spinlock void spin_lock_init(spinlock_t *); //加锁 //spin_lock - 加锁函数(忙等) void spin_lock(spinlock_t *lock); int spin_trylock(spinlock_t *lock); spin_lock_irq(); //=spin_lock() + local_irq_disable() spin_lock_irqsave(); //= spin_lock() + lock_irq_save(); spin_lock_bh(); //=spin_lock() + local_bh_disable(); //解锁 void spin_unlock(spinlock_t *lock); spin_unlock_irq(); //=spin_unlock() + local_irq_enable() spin_unlock_irqrestore(); //= spin_unlock() + lock_irq_restore(); spin_unlock_bh(); //=spin_unlock() + local_bh_enable();
传统的自旋锁粗暴的将临界资源划归给一个CPU,但是很多资源都不会因为读而被破坏,所以我们可以允许多个CPU同时读临界资源,但不允许同时写资源,类似于应用层的文件锁,内核的读写锁机制同样有下述互斥原则:
//include/linux/rwlock.h //定义并初始化自旋锁 rwlock_t rwlock; void rwlock_init(rwlock_t *lock); //加读锁 void read_lock(rwlock_t *lock); int read_trylock(rwlock_t *lock); void read_lock_irqsave(rwlock_t *lock,unsigned long flags); void read_lock_irq(rwlock_t *lock, unsigned long flags); void read_lock_bh(rwlock_t *lock); //解读锁 void read_unlock(rwlock_t *lock) void read_unlock_irqrestrore(rwlock_t *lock,unsigned long flags); void read_unlock_irq(rwlock_t *lock, unsigned long flags); void read_unlock_bh(rwlock_t *lock); //加写锁 void write_lock(rwlock_t *lock) int write_trylock(rwlock_t *lock) void write_lock_irqsave(rwlock_t *lock,unsigned long flags); void write_lock_irq(rwlock_t *lock, unsigned long flags); void write_lock_bh(rwlock_t *lock); //解写锁 void write_unlock(rwlock_t *lock) void write_unlock_irqrestrore(rwlock_t *lock,unsigned long flags); void write_unlock_irq(rwlock_t *lock, unsigned long flags); void write_unlock_bh(rwlock_t *lock);
顺序锁可以看作读写锁的升级版,读写锁不允许同时存在读者和写者,顺序锁对这一情况进行了改良,它允许写者和读者同时访问临界区,不必再向读写锁那样读者要读必须等待写者写完,写者要写必须等待读者读完。不过,使用顺序锁的时候,临界区不能有指针,因为写者可能会修改指针的指向,如果读者去读,就会Oops,此外,如果读者读过的数据被写者改变了,读者需要重新读,以维持数据是最新的,虽然有这两个约束,但是顺序锁提供了比读写锁更大的灵活性。对于写者+写者的情况,顺序锁的机制和读写锁一样,必须等!
//include/linux/seqlock.h //定义顺序锁 struct seqlock_t sl; //获取顺序锁 void write_seqlock(seqlock_t *sl); void write_tryseqlock(seqlock_t *sl); void write_seqlock_irqsave(lock,flags); //=local_irq_save() + write_seqlock() void write_seqlock_irq(seqlock_t *sl); //=local_irq_disable() + write_seqlock() void write_seqlock_bh(seqlock_t *sl); //local_bh_disable() + write_seqlock() //释放顺序锁 void write_sequnlock(seqlock_t *sl); void write_sequnlock_irqsave(lock,flags); //=local_irq_restore() + write_sequnlock() void write_sequnlock_irq(seqlock_t *sl); //=local_irq_enable() + write_sequnlock() void write_sequnlock_bh(seqlock_t *sl); //local_bh_enable() + write_sequnlock() //读开始 unsigned read_seqbegin(const seqlock_t *sl); read_seqbegin_irqsave(lock,flags); //=local_irq_save() + read_seqbegin(); //重读 int read_seqretry(const seqlock_t *sl,unsigned iv); read_seqretry_irqrestore(lock,iv,flags); //=local_irq_restore() + read_seqretry();
RCU即Read-Copy Update,即读者直接读,写者先拷贝再择时更新,是另外一种读写锁的升级版,这种机制在VFS层被大量使用。正如其名,读者访问临界资源不需要锁,从下面的rcu_read_lock的定义即可看出,写者在写之前先将临界资源进行备份,去修改这个副本,等所有的CPU都退出对这块临界区的引用后,再通过回调机制,将引用这块资源的原指针指向已经修改的备份。从中可以看出,在RCU机制下,读者的开销大大降低,也没有顺序锁的指针问题,但是写者的开销很大,所以RCU适用于读多写少的临界资源。如果写操作很多,就有可能将读操作节约的性能抵消掉,得不偿失。
内核会为每一个CPU维护两个数据结构-rcu_data和rcu_bh_data,他们用于保存回调函数,函数call_rcu()把回调函数注册到rcu_data,而call_rcu_bh()则把回调函数注册到rcu_bh_data,在每一个数据结构上,回调函数们会组成一个队列。
使用RCU时,读执行单元必须提供一个信号给写执行单元以便写执行单元能够确定数据可以被安全地释放或修改的时机。内核中有一个专门的垃圾收集器来探测读执行单元的信号,一旦所有的读执行单元都已经发送信号告诉收集器自己都没有使用RCU的数据结构,收集器就调用回调函数完成最后的数据释放或修改操作。
读即RCU中的R,从下面的宏定义可以看出,读操作其实就是禁止内核的抢占调度,并没有使用一个锁对象。
//读锁定 //include/linux/rcupdate.h rcu_read_lock(); //preempt_disbale() rcu_read_lock_bh(); //local_bh_disable() //读解锁 rcu_read_unlock() //preempt_enable() rcu_read_unlock_bh(); //local_bh_enable()
同步即是RCU写操作的最后一个步骤-Update,下面这个接口会则色写执行单元,直到所有的读执行单元已经完成读执行单元临界区,写执行单元才可以继续下一步操作。如果有多个RCU写执行单元调用该函数,他们将在一个grace period(即所有的读执行单元已经完成对临界区的访问)之后全部被唤醒。
synchrosize_rcu()
下面这个接口也由RCU写执行单元调用,它不会使写执行单元阻塞,因而可以在中断上下文或软中断中使用,该函数把func挂接到RCU回调函数链上,然后立即返回。函数sychrosize_rcu()其实也会调用call_rcu()。
void call_rcu(struct rcu_head *head,void (*func)(struct rcu_head *rcu));
下面这个接口会将软中断的完成也当作经历一个quiecent state(静默状态),因此如果写执行单元调用了该函数,在进程上下文的读执行单元必须使用rcu_read_lock_bh();
void call_rcu_bh(struct rcu_head *head,void (*func)(struct rcu_head *rcu));
RCU机制被大量的运用在内核链表的读写中,下面这些就是内核中使用RCU机制保护的数据结构,函数的功能和非RCU版本一样,具体可以参考内核文件**”include/linux/list.h”**,只不过这里的操作都会使用RCU保护起来。
void list_add_rcu(struct list_head *new, struct list_head *head); void list_add_tail_rcu(struct list_head *new,struct list_head *head); void list_del_rcu(struct list_head *entry); void list_replace_rcu(struct list_head *old,struct list_head *new); list_for_each_rcu(pos,head); list_for_each_safe_rcu(pos,n,head); list_for_each_entry_rcu(pos,head,member); void hlist_del_rcu(struct hlist_node *n); void hlist_add_head_rcu(struct hlist_node *n, struct hlist_head *h); list_for_each_rcu(pos,head); hlist_for_each_entry_rcu(tpos,pos,head,member);
自旋锁一节提过,如果一个CPU不能获取临界资源,就会造成”原地自旋”,所以自旋锁保护的临界区的执行时间不能太长,但如果我们的确需要保护一段执行时间比较长的临界区呢?答案就是信号量
,信号量的底层依托于自旋锁来实现其原子性,并进一步将其提高到”进程”的维度,称为一种可以运行在进程上下文的”锁”,正是这种能运行在进程上下文的能力赋予了信号量和自旋锁的诸多不同。
使用信号量,如果试图获取信号量的进程获取失败,内核就会将其调度为睡眠状态,执行其他进程,避免了CPU的忙等。不过,进程上下文的切换也是有成本的,所以通常,信号量在内核中都是只用于保护大块临界区。
此外,一个进程一旦无法获取期待的信号量就会进入睡眠,所以信号量保护的临界区可以有睡眠的代码。在这方面,自旋锁保护的区域是不能睡眠也不能执行schedule()的,因为一旦一个抢到了锁的CPU进行了进程上下文的切换或睡眠,那么其他等待这个自旋锁的CPU就会一直在那忙等,就永远无法等到这个锁,,形成死锁,除非有其他进程将其唤醒(通常都不会有)。
也正是由于信号量操作可能引起阻塞,所以信号量不能用于中断上下文。总结一下刚才罗嗦这一段:
project | signal | Spin Lock |
---|---|---|
Critical Section Time | Shorter process switching time | Critical section execution time is shorter |
Process Context | Critical sections can sleep or schedule | Critical sectionCannot sleep or schedule |
Only down_trylock() can | Can |
The above is the detailed content of Concurrency control technology in Linux drivers: principles and practice. For more information, please follow other related articles on the PHP Chinese website!