首页 >后端开发 >Golang >Go Maps 解释:键值对实际上是如何存储的

Go Maps 解释:键值对实际上是如何存储的

WBOY
WBOY原创
2024-09-10 11:01:501008浏览

这是帖子的摘录;完整的帖子可以在这里找到:https://victoriametrics.com/blog/go-map/。

如果您是 Go 新手,那么弄清楚如何在 Go 中使用地图可能会有点混乱。即使您更有经验,理解地图的真正工作原理也可能相当困难。

举个例子:您是否曾经为地图设置过“提示”,并想知道为什么它被称为“提示”而不是简单的长度(就像我们对切片所做的那样)?

// hint = 10
m := make(map[string]int, 10)

或者您可能已经注意到,当您在地图上使用 for-range 循环时,顺序与插入顺序不匹配,如果您在不同时间循环同一个地图,它甚至会发生变化。但奇怪的是,如果你在同一时间循环它,顺序通常保持不变。

这是一个很长的故事,所以系好安全带并开始吧。

在我们继续之前,请注意——这里的信息基于 Go 1.23。如果事情发生了变化并且这不是最新的,请随时通过 X(@func25) 联系我。

Go 中的地图:快速入门

那么,我们来谈谈 Go 中的映射,它是一种内置类型,充当键值存储。与数组不同,在数组中,您必须使用键作为递增索引(如 0、1、2 等),而使用映射时,键可以是任何可比较的类型。

这给了你更多的灵活性。

m := make(map[string]int)
m["a"] = 1
m["b"] = 2

m // map[a:1 b:2]

在该示例中,我们使用 make() 创建了一个空映射,其中键是字符串,值是整数。

Go Maps Explained: How Key-Value Pairs Are Actually Stored

地图["a": 1, "b": 2]

现在,您可以使用地图文字来节省一些时间,而不是手动分配每个键。这使您可以在创建地图时立即设置所有键值对:

m := map[string]int{
    "a": 1,
    "b": 2,
}

您所做的就是在第一次创建地图时列出大括号内的键及其值,就这么简单。

如果您后来意识到不再需要某个键值对,Go 就能满足您的需求。有一个方便的删除功能,可以删除您不需要的键:delete(m, "a").

映射的零值是 nil,nil 映射在某些方面有点像空映射。你可以尝试在其中查找键,Go 不会吓坏你的程序并崩溃。

如果您搜索不存在的键,Go 会悄悄地为您提供该映射值类型的“零值”:

var m map[string]int

println(m["a"]) // 0
m["a"] = 1      // panic: assignment to entry in nil map

但事情是这样的:你不能向 nil 映射添加新的键值对。

事实上,Go 处理映射的方式与处理切片的方式非常相似。映射和切片都是从零开始的,如果你在它们为零时对它们做了一些“无害”的事情,Go 不会惊慌。例如,您可以循环遍历一个 nil 切片,而无需任何“戏剧”。

那么,如果你尝试循环一个 nil 映射会发生什么?

var m map[string]int

for k, v := range m {
    println(k, v)
} 

什么也没发生,没有错误,没有意外。它只是静静地什么都不做。

Go 的方法是将任何类型的默认值视为有用的东西,而不是导致程序崩溃的东西。 Go 唯一会出错的时候是当你做了一些真正非法的事情时,比如尝试向 nil 映射添加新的键值对或访问切片中的越界索引。

关于 Go 中的地图,您还应该了解以下几件事:

  • 地图上的 for-range 循环不会以任何特定顺序返回键。
  • 映射不是线程安全的,如果您尝试同时读取(或使用 for-range 迭代)并写入同一个映射,Go 运行时将导致致命错误。
  • 您可以通过执行简单的 ok 检查来检查某个键是否在映射中:_, ok := m[key]。
  • 映射的键类型必须是可比较的。

让我们深入探讨关于地图键的最后一点。我之前提到过“密钥可以是任何类似的类型”,但它的含义远不止于此。

“那么,到底什么是可比较类型?什么不是?”

非常简单:如果您可以使用 == 来比较相同类型的两个值,那么该类型就被认为是可比较的。

func main() {
    var s map[int]string

    if s == s {
        println("comparable")
    }
}

// compile error: invalid operation: s == s (map can only be compared to nil)

但是正如你所看到的,上面的代码甚至无法编译。编译器抱怨:“无效操作:s == s(map 只能与 nil 进行比较)。”

同样的规则适用于其他不可比较的类型,例如切片、函数或包含切片或映射的结构等。因此,如果您尝试使用任何这些类型作为映射中的键,那么您运气不好。

func main() {
  var s map[[]int]string
}

// compile error: invalid map key type []intcompilerIncomparableMapKey

但是这里有一个小秘密,接口可以是可比较的,也可以是不可比较的。

这是什么意思?您绝对可以定义一个以空接口为键的映射,而不会出现任何编译错误。但请注意,您很有可能会遇到运行时错误。

func main() {
    m := map[interface{}]int{
        1: 1,
        "a": 2,
    }

    m[[]int{1, 2, 3}] = 3
    m[func() {}] = 4
}

// panic: runtime error: hash of unhashable type []int
// panic: runtime error: hash of unhashable type func()

Everything looks fine until you try to assign an uncomparable type as a map key.

That's when you'll hit a runtime error, which is trickier to deal with than a compile-time error. Because of this, it's usually a good idea to avoid using interface{} as a map key unless you have a solid reason and constraints that prevent misuse.

But that error message: "hash of unhashable type []int" might seem a bit cryptic. What's this about a hash? Well, that's our cue to dig into how Go handles things under the hood.

Map Anatomy

When explaining the internals of something like a map, it's easy to get bogged down in the nitty-gritty details of the Go source code. But we're going to keep it light and simple so even those new to Go can follow along.

What you see as a single map in your Go code is actually an abstraction that hides the complex details of how the data is organized. In reality, a Go map is composed of many smaller units called "buckets."

type hmap struct {
  ...
  buckets unsafe.Pointer
  ...
}

Look at Go source code above, a map contains a pointer that points to the bucket array.

This is why when you assign a map to a variable or pass it to a function, both the variable and the function's argument are sharing the same map pointer.

func changeMap(m2 map[string]int) {
  m2["hello"] = 2
}

func main() {
  m1 := map[string]int{"hello": 1}
  changeMap(m1)
  println(m1["hello"]) // 2
}

But don't get it twisted, maps are pointers to the hmap under the hood, but they aren't reference types, nor are they passed by reference like a ref argument in C#, if you change the whole map m2, it won't reflect on the original map m1 in the caller.

func changeMap(m2 map[string]int) {
  m2 = map[string]int{"hello": 2}
}

func main() {
  m1 := map[string]int{"hello": 1}
  changeMap(m1)
  println(m1["hello"]) // 1
}

In Go, everything is passed by value. What's really happening is a bit different: when you pass the map m1 to the changeMap function, Go makes a copy of the *hmap structure. So, m1 in the main() and m2 in the changeMap() function are technically different pointers point to the same hmap.

Go Maps Explained: How Key-Value Pairs Are Actually Stored

Map is passed by value

For more on this topic, there's a great post by Dave Cheney titled There is no pass-by-reference in Go.

Each of these buckets can only hold up to 8 key-value pairs, as you can see in the image below.

Go Maps Explained: How Key-Value Pairs Are Actually Stored

Buckets of a map

The map above has 2 buckets, and len(map) is 6.

So, when you add a key-value pair to a map, Go doesn't just drop it in there randomly or sequentially. Instead, it places the pair into one of these buckets based on the key's hash value, which is determined by hash(key, seed).

Let's see the simplest assignment scenario in the image below, when we have an empty map, and assign a key-value pair "hello": 1 to it.

Go Maps Explained: How Key-Value Pairs Are Actually Stored

Assign a key-value pair to an empty map

It starts by hashing "hello" to a number, then it takes that number and mods it by the number of buckets.

Since we only have one bucket here, any number mod 1 is 0, so it's going straight into bucket 0 and the same process happens when you add another key-value pair. It'll try to place it in bucket 0, and if the first slot's taken or has a different key, it'll move to the next slot in that bucket.

Take a look at the hash(key, seed), when you use a for-range loop over two maps with the same keys, you might notice that the keys come out in a different order:

func main() {
    a := map[string]int{"a": 1, "b": 2, "c": 3, "d": 4, "e": 5, "f": 6}
    b := map[string]int{"a": 1, "b": 2, "c": 3, "d": 4, "e": 5, "f": 6}

    for i := range a {
        print(i, " ")
    }
    println()

    for i := range b {
        print(i, " ")
    }
}

// Output:
// a b c d e f 
// c d e f a b     

How's that possible? Isn't the key "a" in map a and the key "a" in map b hashed the same way?

But here's the deal, while the hash function used for maps in Go is consistent across all maps with the same key type, the seed used by that hash function is different for each map instance. So, when you create a new map, Go generates a random seed just for that map.

In the example above, both a and b use the same hash function because their keys are string types, but each map has its own unique seed.

"Wait, a bucket has only 8 slots? What happens if the bucket gets full? Does it grow like a slice?"

Well, sort of. When the buckets start getting full, or even almost full, depending on the algorithm's definition of "full", the map will trigger a growth, which might double the number of main buckets.

But here's where it gets a bit more interesting.

Wenn ich „Haupt-Buckets“ sage, beziehe ich mich auf ein anderes Konzept: „Überlauf-Buckets“. Diese kommen ins Spiel, wenn Sie eine Situation mit hohen Kollisionsraten haben. Stellen Sie sich vor, Sie haben 4 Buckets, aber einer davon ist aufgrund hoher Kollisionen vollständig mit 8 Schlüssel-Wert-Paaren gefüllt, während die anderen 3 Buckets leer bleiben.

Der vollständige Beitrag ist hier verfügbar: https://victoriametrics.com/blog/go-map/.

以上是Go Maps 解释:键值对实际上是如何存储的的详细内容。更多信息请关注PHP中文网其他相关文章!

声明:
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn