Home >Backend Development >Golang >golang map implementation principle
In the process of learning Golang, map is a data structure we often use, which can be used to store key-value pairs. But, have you ever thought about how map is implemented? In this article, we will explore how map is implemented in Golang.
Introduction to the implementation principle of map
map is actually a hash table, and its implementation principle is the hash algorithm. A hash table is a data structure that directly accesses memory storage locations based on keys, that is, it provides a way to map keys to memory addresses. In a hash table, keywords are called "keys" and the memory addresses calculated by the hash function are called "buckets". The buckets store key-value pairs.
In Golang, map is implemented by storing key-value pairs in the bucket memory area. Each bucket has a header structure used to describe the status of the bucket. This header structure is called " bucket".
bucket data structure
Let’s first look at the bucket data structure:
type bmap struct { tophash [bucketCnt]uint8 // data 的类型在 map 被分配空间的时候由 map 的 key 和 value 类型决定 // 如果 key 类型和 value 类型都是未知类型,那么 data 的类型是 byte // 因为 byte 可以用于存储任何类型,也就是说 data 可以存储任何类型的 key 和 value // 在存储 key 和 value 的时候,需要将 key 和 value 转换成 uintptr,然后再转换成 byte 存储到 data 中 // 读取 key 和 value 的时候反向操作即可 // 因此,map 中的 key 和 value 一般都是需要转换成 uintptr 类型的 // 这里我们可以简单地理解为 data 存储了键值对的二进制数据 // 第一个 bucket 中如果存不下某个 key-value,还需要在第二个 bucket 中存储 // 所以,data 的长度可以为 1 或 2 // 当 data 的长度为 1 或 2 时,存储的键值对的个数分别为 1 或 2 // 因此,bucket 中最多只能存储 8 个键值对 // 否则,就需要将键值对存储在 overflow 中 // overflow 是一个指向单链表头部的指针数组 // 如果某个桶存储的键值对数量超过了 8,那么就将多余的键值对存储到 overflow 中 // overflow 中的第一个 bucket 中也是能够存储 8 个键值对的,如果第一个 bucket 中存不下的话 // 那么还需要在第二个 bucket 中继续存储 // 所以,overflow 数组的长度也可以为 1 或 2 // 当 overflow 数组长度为 1 或 2 时,最多存储 8 个键值对的 overflow // 当 overflow 数组长度为 3 时,最多存储 24 个键值对的 overflow // 也就是说,如果 map 中存储的键值对数量很多的话,那么可能会有很多个 overflow 桶 // 而 overflow 数组是动态分配的,因此它的长度是不定的,只有在运行时才能确定 data unsafe.Pointer // overflow 是一个指针数组,用于存储除当前桶以外的其它桶 // 如果某个桶的键值对数量超过了 8,那么它会将多余的键值对存储到 overflow 中 // overflow 数组中的每个元素都是一个 bucket 指针 // 这些 bucket 指针构成了一个链表,也就是所谓的 overflow 链表 overflow *[]*bmap // 这里存储的是这个 bucket 当前存储的键值对的个数 count int16 // flags 存储了 bucket 的标志 // 一个 bucket 可能会有如下三种状态: // 1. empty,表示这个 bucket 没有存储任何键值对 // 2. iterator,表示这个 bucket 正在被访问,不能再被其它操作使用 // 3. deleted,表示这个 bucket 存储的键值对已经被删除了 // 注意,该标志位只有在 GC 达到一定阈值后才会被置为 true,而且并不是立即删除键值对,而是在下一次 GC 的时候才会内存回收 flags uint8 // Bmap 的类型标记 // Bmap 用来标识存储于上文中 data 指针中的数据类型 // 如果 data 中存储的是 int,那么 Bmap 的值为 hmap.BmapInt // 如果 data 中存储的是 string,那么 Bmap 的值为 hmap.BmapString // 如果 data 中存储的是 interface{},那么 Bmap 的值为 hmap.BmapInterface // 如果 data 中存储其他类型,那么 Bmap 的值为 hmap.BmapOther BmapType uint16 // 与该 bucket 相关联的 key 的类型 tophash0 uint8 }
From the above definition, we can see that the bucket stores the following information:
Implementation Principle
Since the data stored in map is a key-value pair, and the key value of the key-value pair is non-repeatable and can be compared, we can use hashing The function converts the key value into a unique hash value, then uses the hash value as the index corresponding to the key value, and stores the value value at the index.
In Golang, each map has a hash table. This hash table is actually a continuous storage space that can store several buckets. Each bucket is a bucket type. structure.
So how is the size of the hash table determined? It is dynamically allocated memory when the map is created, and the default size used when space is first allocated is 8 buckets. If the number of buckets is not enough when adding elements for the first time, the size of the hash table will be expanded and the index position of each element in the hash table will be recalculated.
In Golang, the main function of the hash function is to hash the key value, making it more convenient to locate a bucket in the hash table, thereby speeding up the search for the value value corresponding to the key value. In Golang, the hash function is implemented using the MurmurHash3 algorithm.
Since the hash function maps key to an integer, different key values may be mapped to the same index. This situation is called a hash collision. When a hash conflict occurs, Golang uses the linked list method to resolve it, storing the key-value pairs on the same index in the linked list of the index. These key-value pairs are called overflow.
Summary
Golang’s map implementation principle mainly relies on hash tables and hash functions. A hash function is used to hash the key value, mapping it to a bucket in a hash table, thereby speeding up the operation of looking up the value. The hash table consists of several buckets, each bucket is represented by a bucket type structure, which stores key-value pair information. When there are multiple key-value pairs at the same index position in the hash table, the linked list method is used to store overflow. The map in Golang is actually a dynamic hash table, which can dynamically adjust the size of the hash table. When using maps, care needs to be taken to avoid hash collisions and including types with incomparable sizes as keys.
The above is the detailed content of golang map implementation principle. For more information, please follow other related articles on the PHP Chinese website!