search
HomeDatabaseMysql TutorialDetailed introduction of mysql lock mechanism
Detailed introduction of mysql lock mechanismJul 17, 2017 am 10:01 AM
rocksdbmechanism

1. Isolation level

(1) Read Uncommitted (RU)

Under this isolation level, transactions are not isolated at all, dirty reads will occur, and uncommitted records can be read, which will not be used in actual situations.

(2) Read committed (RC)

Only the submitted record can be read. This kind of Under the isolation level, there will be phantom reading. The so-called phantom reading refers to the phenomenon that the same query is executed multiple times in the same transaction and the records returned are not exactly the same. The root cause of phantom reading is that under the RC isolation level, each statement will read the update of the submitted transaction. If other transactions are submitted between the two queries, the results of the two queries will be inconsistent. Nonetheless, the read-commit isolation level is widely used in production environments.

(3) Repeatable Read (RR)

The repeatable read isolation level solves the problem of non-repeatable read, but The problem of phantom reading is still not solved. So what is the difference between non-repeatable reading and phantom reading? Non-repeatable reading focuses on modification, that is, the values ​​read twice are different; while phantom reading focuses on changes in the number of records [insertion and deletion]. Generally, textbooks tell us that the phantom read problem can only be solved when the serialization isolation level is reached, but MySQL's innodb is special. RR solves the phantom read problem, mainly through GAP locks. In addition, not all databases implement this isolation level. Later, we will briefly introduce how mysql implements the repeatable read isolation level.

(4) Serializable

In serialization isolation mode, dirty reads and phantoms are eliminated, but transactions The degree of concurrency drops sharply. The isolation level of a transaction is inversely proportional to the concurrency of the transaction. The higher the isolation level, the lower the concurrency of the transaction. In an actual production environment, the DBA will make a trade-off between concurrency and meeting business needs, and choose an appropriate isolation level.

2. Concurrent scheduling method

Another thing closely related to the isolation level is concurrent scheduling, through which the isolation level is achieved. For concurrent scheduling, different database vendors have different implementation mechanisms, but the basic principles are similar. They all use locking to protect data objects from being modified by multiple transactions at the same time. Compared with traditional lock-based concurrency control, the main feature of multi-version concurrency control (MVCC) is that it does not lock for reading. This feature greatly improves the concurrency of the system for scenarios where there are many reads and few writes, so most relational databases All implement MVCC.

3. Two-phase lock protocol

The meaning of the two-phase lock protocol is that the transaction is divided into two phases. The first phase is After obtaining the blockade, the second stage is to release the blockade. Two-stage blocking ensures the correctness of concurrent scheduling. Compared with one-stage locking (obtaining all locks required for a transaction at once), two-phase locking improves concurrency, but it also brings the possibility of deadlock.

4. Deadlock

The so-called deadlock refers to two or more transactions, each occupying the resources that the other party expects to obtain, forming a The loops of waiting are in a state where each other cannot continue to execute.

5. Lock type

According to the type of lock, it can be divided into shared lock, exclusive lock, intention shared lock and intention exclusive lock . According to the granularity of the lock, it can be divided into row lock and table lock. For mysql, the transaction mechanism is implemented more by the underlying storage engine. Therefore, the mysql level only has table locks, while the innodb storage engine that supports transactions implements row locks (record locks), gap locks, and next-key locks. . The record lock of Mysql is essentially the lock of the index record, because innodb is an index organized table; the gap lock is the lock of the index record gap, which is only valid under the RR isolation level; the next-key lock is the record lock plus the gap before the record Lock combination. mysql implements the RR isolation level through gap locks and next-key locks.

Note:

For update operations (reading is not locked), uplink locking is possible only through the index; otherwise, the clustered index will be A write lock on each row is actually equivalent to a write lock on the table.

If multiple physical records correspond to the same index, lock conflicts will also occur if they are accessed at the same time;

When the table has multiple indexes At this time, different transactions can use different indexes to lock different rows. In addition, innodb will use row locks to lock data records (clustered index) at the same time.

Under the MVCC concurrency control mechanism, no operation will block the read operation, and the read operation will not block any operation, just because the read operation is not locked.


As an open source storage engine, RocksDB supports the ACID characteristics of transactions. To support I (Isolation) in ACID, concurrency control is indispensable. This article mainly discusses the lock mechanism implementation of RocksDB, and the details will be covered. To source code analysis, I hope that through this article readers can gain a deeper understanding of RocksDB concurrency control principles. The article mainly starts from the following four aspects. First, I will introduce the basic structure of RocksDB lock. Then I will introduce the lock space overhead under the design of RocksDB row lock data structure. Then I will introduce the locking process of several typical scenarios. Finally, I will introduce the lock. An essential deadlock detection mechanism in the mechanism.

1. Row lock data structure
The minimum lock granularity of RocksDB is a row. For KV storage, the lock object is the key. , each key corresponds to a LockInfo structure. All keys are managed through the hash table. When looking for a lock, you can directly locate it through the hash table to determine whether the key has been locked. But if there is only one hash table globally, it will cause many conflicts in accessing this hash table, affecting concurrency performance. RocksDB is first split by Columnfamily. The locks in each Columnfamily are managed by a LockMap, and each LockMap is split into several shards. Each shard is managed by LockMapStripe, and the hash table (std::unordered_map) exists in the Stripe structure. The Stripe structure also contains a mutex and condition_variable. This main function is to provide mutual exclusive access to the hash table. When a lock conflict occurs, the thread is suspended and wakes up after unlocking. Hanging thread. This design is very simple, but it also brings an obvious problem, that is, multiple unrelated locks share a condition_variable, which causes a batch of threads to be woken up unnecessarily when the lock is released, and after these threads retry, they still need to wait. Caused an invalid context switch. Comparing the InnoDB lock mechanism we discussed before, we found that InnoDB reuses a lock for records in a page, and the reuse is conditional. The same transaction must lock several records of a page before reuse; and the lock The waiting queue is an accurate wait, accurate to the record level, and will not cause invalid wake-ups. Although the RocksDB lock design is relatively rough, certain optimizations have been made. For example, when managing LockMaps, a copy of lock_maps_cache_ is cached locally in each thread, and the caches of each thread are linked through a global linked list. When LockMaps changes ( Delete the columnfamily), and the copy of each thread will be cleared globally. Since the columnfamily changes rarely, most operations accessing LockMaps do not require locking, which improves concurrency efficiency. The relevant data structure is as follows:

struct LockInfo {
bool exclusive; //排它锁或是共享锁
autovector<TransactionID> txn_ids; //事务列表,对于共享锁而言,同一个key可以对应多个事务

// Transaction locks are not valid after this time in us
uint64_t expiration_time;
}

struct LockMapStripe { 
// Mutex must be held before modifying keys map
std::shared_ptr<TransactionDBMutex> stripe_mutex;

// Condition Variable per stripe for waiting on a lock
std::shared_ptr<TransactionDBCondVar> stripe_cv;

// Locked keys mapped to the info about the transactions that locked them.
std::unordered_map<std::string, LockInfo> keys;
}

struct LockMap {
const size_t num_stripes_; //分片个数
std::atomic<int64_t> lock_cnt{0}; //锁数目
std::vector<LockMapStripe*> lock_map_stripes_; //锁分片
}

class TransactionLockMgr {
using LockMaps = std::unordered_map<uint32_t, std::shared_ptr<LockMap>>;
LockMaps lock_maps_;

// Thread-local cache of entries in lock_maps_. This is an optimization
// to avoid acquiring a mutex in order to look up a LockMap
std::unique_ptr<ThreadLocalPtr> lock_maps_cache_;
}

2.行锁空间代价
    由于锁信息是常驻内存,我们简单分析下RocksDB锁占用的内存。每个锁实际上是unordered_map中的一个元素,则锁占用的内存为key_length+8+8+1,假设key为bigint,占8个字节,则100w行记录,需要消耗大约22M内存。但是由于内存与key_length正相关,导致RocksDB的内存消耗不可控。我们可以简单算算RocksDB作为MySQL存储引擎时,key_length的范围。对于单列索引,最大值为2048个字节,具体可以参考max_supported_key_part_length实现;对于复合索引,索引最大长度为3072个字节,具体可以参考max_supported_key_length实现。假设最坏的情况,key_length=3072,则100w行记录,需要消耗3G内存,如果是锁1亿行记录,则需要消耗300G内存,这种情况下内存会有撑爆的风险。因此RocksDB提供参数配置max_row_locks,确保内存可控,默认RDB_MAX_ROW_LOCKS设置为1G,对于大部分key为bigint场景,极端情况下,也需要消耗22G内存。而在这方面,InnoDB则比较友好,hash表的key是(space_id, page_no),所以无论key有多大,key部分的内存消耗都是恒定的。前面我也提到了InnoDB在一个事务需要锁大量记录场景下是有优化的,多个记录可以公用一把锁,这样也间接可以减少内存。

3.上锁流程分析
    前面简单了解了RocksDB锁数据结构的设计以及锁对内存资源的消耗。这节主要介绍几种典型场景下,RocksDB是如何加锁的。与InnoDB一样,RocksDB也支持MVCC,读不上锁,为了方便,下面的讨论基于RocksDB作为MySQL的一个引擎来展开,主要包括三类,基于主键的更新,基于二级索引的更新,基于主键的范围更新等。在展开讨论之前,有一点需要说明的是,RocksDB与InnoDB不同,RocksDB的更新也是基于快照的,而InnoDB的更新基于当前读,这种差异也使得在实际应用中,相同隔离级别下,表现有所不一样。对于RocksDB而言,在RC隔离级别下,每个语句开始都会重新获取一次快照;在RR隔离级别下,整个事务中只在第一个语句开始时获取一次快照,所有语句共用这个快照,直到事务结束。

3.1.基于主键的更新
这里主要接口是TransactionBaseImpl::GetForUpdate
1).尝试对key加锁,如果锁被其它事务持有,则需要等待
2).创建snapshot
3).调用ValidateSnapshot,Get key,通过比较Sequence判断key是否被更新过
4).由于是加锁后,再获取snapshot,所以检查一定成功。
5).执行更新操作
这里有一个延迟获取快照的机制,实际上在语句开始时,需要调用acquire_snapshot获取快照,但为了避免冲突导致的重试,在对key加锁后,再获取snapshot,这就保证了在基于主键更新的场景下,不会存在ValidateSnapshot失败的场景。

堆栈如下:

1-myrocks::ha_rocksdb::get_row_by_rowid
2-myrocks::ha_rocksdb::get_for_update
3-myrocks::Rdb_transaction_impl::get_for_update
4-rocksdb::TransactionBaseImpl::GetForUpdate
{
//加锁
5-rocksdb::TransactionImpl::TryLock
  6-rocksdb::TransactionDBImpl::TryLock
    7-rocksdb::TransactionLockMgr::TryLock 

 //延迟获取快照,与acquire_snapshot配合使用
 6-SetSnapshotIfNeeded()

 //检查key对应快照是否过期
 6-ValidateSnapshot
  7-rocksdb::TransactionUtil::CheckKeyForConflict
    8-rocksdb::TransactionUtil::CheckKey
      9-rocksdb::DBImpl::GetLatestSequenceForKey //第一次读取

//读取key
5-rocksdb::TransactionBaseImpl::Get
  6-rocksdb::WriteBatchWithIndex::GetFromBatchAndDB
    7-rocksdb::DB::Get
      8-rocksdb::DBImpl::Get
        9-rocksdb::DBImpl::GetImpl //第二次读取
}

3.2.基于主键的范围更新
1).创建Snapshot,基于迭代器扫描主键
2).通过get_row_by_rowid,尝试对key加锁
3).调用ValidateSnapshot,Get key,通过比较Sequence判断key是否被更新过
4).如果key被其它事务更新过(key对应的SequenceNumber比Snapshot要新),触发重试
5).重试情况下,会释放老的快照并释放锁,通过tx->acquire_snapshot(false),延迟获取快照(加锁后,再拿snapshot)
5).再次调用get_for_update,由于此时key已经被加锁,重试一定可以成功。
6).执行更新操作
7).跳转到1,继续执行,直到主键不符合条件时,则结束。

3.3.基于二级索引的更新
这种场景与3.2类似,只不过多一步从二级索引定位主键过程。
1).创建Snapshot,基于迭代器扫描二级索引
2).根据二级索引反向找到主键,实际上也是调用get_row_by_rowid,这个过程就会尝试对key加锁
3).继续根据二级索引遍历下一个主键,尝试加锁
4).当返回的二级索引不符合条件时,则结束

4.死锁检测算法
      死锁检测采用DFS((Depth First Search,深度优先算法),基本思路根据加入等待关系,继续查找被等待者的等待关系,如果发现成环,则认为发生了死锁,当然在大并发系统下,锁等待关系非常复杂,为了将死锁检测带来的资源消耗控制在一定范围,可以通过设置deadlock_detect_depth来控制死锁检测搜索的深度,或者在特定业务场景下,认为一定不会发生死锁,则关闭死锁检测,这样在一定程度上有利于系统并发的提升。需要说明的是,如果关闭死锁,最好配套将锁等待超时时间设置较小,避免系统真发生死锁时,事务长时间hang住。死锁检测基本流程如下:
1.定位到具体某个分片,获取mutex
2.调用AcquireLocked尝试加锁
3.若上锁失败,则触发进行死锁检测
4.调用IncrementWaiters增加一个等待者
5.如果等待者不在被等待者map里面,则肯定不会存在死锁,返回
6.对于被等待者,沿着wait_txn_map_向下检查等待关系,看看是否成环
7.若发现成环,则将调用DecrementWaitersImpl将新加入的等待关系解除,并报死锁错误。

相关的数据结构:

class TransactionLockMgr {
// Must be held when modifying wait_txn_map_ and rev_wait_txn_map_.
std::mutex wait_txn_map_mutex_;

// Maps from waitee -> number of waiters.
HashMap<TransactionID, int> rev_wait_txn_map_;

// Maps from waiter -> waitee.
HashMap<TransactionID, autovector<TransactionID>> wait_txn_map_;

DecrementWaiters //
IncrementWaiters //
}
struct TransactionOptions {
bool deadlock_detect = false; //是否检测死锁
int64_t deadlock_detect_depth = 50; //死锁检测的深度
int64_t lock_timeout = -1; //等待锁时间,线上一般设置为5s
int64_t expiration = -1; //持有锁时间,
}

The above is the detailed content of Detailed introduction of mysql lock mechanism. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
深入探讨Golang变量的存储位置和机制深入探讨Golang变量的存储位置和机制Feb 28, 2024 pm 09:45 PM

标题:深入探讨Golang变量的存储位置和机制随着Go语言(Golang)在云计算、大数据和人工智能领域的应用逐渐增多,深入了解Golang变量的存储位置和机制变得尤为重要。在本文中,我们将详细探讨Golang中变量的内存分配、存储位置以及相关的机制。通过具体代码示例,帮助读者更好地理解Golang变量在内存中是如何存储和管理的。1.Golang变量的内存

深入了解CSS布局重新计算和渲染的机制深入了解CSS布局重新计算和渲染的机制Jan 26, 2024 am 09:11 AM

CSS回流(reflow)和重绘(repaint)是网页性能优化中非常重要的概念。在开发网页时,了解这两个概念的工作原理,可以帮助我们提高网页的响应速度和用户体验。本文将深入探讨CSS回流和重绘的机制,并提供具体的代码示例。一、CSS回流(reflow)是什么?当DOM结构中的元素发生可视性、尺寸或位置改变时,浏览器需要重新计算并应用CSS样式,然后重新布局

探索采用RocksDB的MySQL:更高效的数据储存与检索探索采用RocksDB的MySQL:更高效的数据储存与检索Jul 25, 2023 pm 05:19 PM

探索采用RocksDB的MySQL:更高效的数据储存与检索摘要:随着互联网行业的快速发展,数据规模和访问负载也在不断增加。传统的关系型数据库在处理大规模数据存储和高并发读写时,往往面临性能瓶颈。为了解决这一问题,一种新的存储引擎RocksDB应运而生。本文将探索采用RocksDB的MySQL,以展示其在数据储存与检索方面的优势,并通过代码示例进行验证。Roc

PHP中的自动加载机制PHP中的自动加载机制Jun 18, 2023 pm 01:11 PM

随着PHP语言越来越受欢迎,开发人员需要使用越来越多的类和函数。当项目规模扩大时,手动引入所有依赖项将变得不切实际。这时候就需要一种自动加载机制来简化代码开发和维护过程。自动加载机制是一种PHP语言的特性,可以在运行时自动载入所需的类和接口,并减少手动的类文件引入。这样,程序员可以专注于开发代码,减少因繁琐的手动类引入而产生的错误和时间浪费。在PHP中,一般

了解 RocksDB 缓存技术了解 RocksDB 缓存技术Jun 20, 2023 am 09:03 AM

RocksDB是一个高性能的存储引擎,它是FacebookRocksDB的开源版本。RocksDB采用部分排序和滑动窗口压缩等技术,适用于多种场景,例如云存储、索引、日志、缓存等。在实际项目中,RocksDB缓存技术通常被用于协助提升程序性能,下面将详细介绍RocksDB缓存技术及其应用。一、RocksDB缓存技术简介RocksDB缓存技术是一种高性能的缓

PHP中的隐式转换机制解析PHP中的隐式转换机制解析Mar 09, 2024 am 08:00 AM

PHP中的隐式转换机制解析在PHP编程中,隐式转换是指在不显式指定类型转换的情况下,PHP自动将一个数据类型转换为另一个数据类型的过程。隐式转换机制在编程中非常常见,但也容易造成一些意想不到的bug,因此了解隐式转换机制的原理和规则对于编写稳健的PHP代码非常重要。1.整型与浮点型之间的隐式转换在PHP中,整型和浮点型之间的隐式转换是非常常见的。当一个整型

如何在Go中使用错误处理机制?如何在Go中使用错误处理机制?May 11, 2023 pm 04:45 PM

Go语言作为一门强类型、高效、现代化的编程语言,在现代软件开发中得到了越来越广泛的应用。其中,错误处理机制是Go语言值得关注的一个方面,而Go语言的错误处理机制相比于其他编程语言也有很大的不同和优越性。本文将介绍Go语言的错误处理机制的基本概念、实现方式以及最佳实践,以帮助读者更好地理解和使用Go语言中的错误处理机制。一、Go语言错误处理机制的基本概念在Go

Go 语言中的内存管理机制是怎样的?Go 语言中的内存管理机制是怎样的?Jun 10, 2023 pm 04:04 PM

Go语言是一门广泛用于系统级编程的高效编程语言,其主要优势之一是其内存管理机制。Go语言内建的垃圾回收机制(GarbageCollection,简称GC)使得程序员不必亲自进行内存分配和释放操作,提高了开发效率和代码质量。本文将对Go语言中的内存管理机制进行详细介绍。一、Go内存分配在Go语言中,内存分配使用了两个堆区:小对象堆(small

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor