首頁 >後端開發 >Golang >Go Maps 解釋:鍵值對實際上是如何儲存的

Go Maps 解釋:鍵值對實際上是如何儲存的

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB
WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB原創
2024-09-10 11:01:501060瀏覽

這是帖子的摘錄;完整的帖子可以在這裡找到:https://victoriametrics.com/blog/go-map/。

如果您是 Go 新手,那麼弄清楚如何在 Go 中使用地圖可能會有點混亂。即使您更有經驗,要理解地圖的真正工作原理也可能相當困難。

舉個例子:您是否曾經為地圖設定過“提示”,並想知道為什麼它被稱為“提示”而不是簡單的長度(就像我們對切片所做的那樣)?

// hint = 10
m := make(map[string]int, 10)

或者您可能已經注意到,當您在地圖上使用 for-range 循環時,順序與插入順序不匹配,如果您在不同時間循環同一個地圖,它甚至會發生變化。但奇怪的是,如果你在同一時間循環它,順序通常保持不變。

這是一個很長的故事,所以請繫好安全帶並開始吧。

在我們繼續之前,請注意——這裡的資訊是基於 Go 1.23。如果事情發生了變化並且這不是最新的,請隨時透過 X(@func25) 聯繫我。

Go 中的地圖:快速入門

那麼,讓我們來談談 Go 中的映射,它是一種內建類型,充當鍵值儲存。與陣列不同,在陣列中,您必須使用鍵作為遞增索引(如 0、1、2 等),而使用對應時,鍵可以是任何可比較的類型。

這給了你更多的彈性。

m := make(map[string]int)
m["a"] = 1
m["b"] = 2

m // map[a:1 b:2]

在該範例中,我們使用 make() 建立了一個空映射,其中鍵是字串,值是整數。

Go Maps Explained: How Key-Value Pairs Are Actually Stored

地圖["a": 1, "b": 2]

現在,您可以使用地圖文字來節省一些時間,而不是手動分配每個鍵。這可讓您在建立地圖時立即設定所有鍵值對:

m := map[string]int{
    "a": 1,
    "b": 2,
}

您所做的就是在第一次創建地圖時列出大括號內的鍵及其值,就這麼簡單。

如果您後來意識到不再需要某個鍵值對,Go 就能滿足您的需求。有一個方便的刪除功能,可以刪除您不需要的按鍵:delete(m, "a").

映射的零值是 nil,nil 映射在某些方面有點像空映射。你可以嘗試在其中尋找鍵,Go 不會嚇壞你的程式並使你的程式崩潰。

如果您搜尋不存在的按鍵,Go 會悄悄地為您提供該映射值類型的「零值」:

var m map[string]int

println(m["a"]) // 0
m["a"] = 1      // panic: assignment to entry in nil map

但事情是這樣的:你不能為 nil 映射新增新的鍵值對。

事實上,Go 處理映射的方式與處理切片的方式非常相似。映射和切片都是從零開始的,如果你在它們為零時對它們做了一些「無害」的事情,Go 不會驚慌。例如,您可以循環遍歷一個 nil 切片,而無需任何「戲劇」。

那麼,如果你嘗試循環一個 nil 映射會發生什麼?

var m map[string]int

for k, v := range m {
    println(k, v)
} 

什麼事也沒發生,沒有錯誤,沒有意外。它只是靜靜地什麼都不做。

Go 的方法是將任何類型的預設值視為有用的東西,而不是導致程式崩潰的東​​西。 Go 唯一會出錯的時候是當你做了一些真正非法的事情時,例如嘗試在 nil 映射中添加新的鍵值對或訪問切片中的越界索引。

關於 Go 中的地圖,您還應該了解以下幾件事:

  • 地圖上的 for-range 迴圈不會以任何特定順序傳回鍵。
  • 映射不是線程安全的,如果您嘗試同時讀取(或使用 for-range 迭代)並寫入同一個映射,Go 運行時將導致致命錯誤。
  • 您可以透過執行簡單的 ok 檢查來檢查某個鍵是否在映射中:_, ok := m[key]。
  • 映射的鍵類型必須是可比較的。

讓我們深入探討關於地圖鍵的最後一點。我之前提到過“密鑰可以是任何類似的類型”,但它的含義遠不止於此。

「那麼,到底什麼是可比較型別?什麼不是?」

非常簡單:如果您可以使用 == 來比較相同類型的兩個值,那麼該類型就被認為是可比較的。

func main() {
    var s map[int]string

    if s == s {
        println("comparable")
    }
}

// compile error: invalid operation: s == s (map can only be compared to nil)

但是正如你所看到的,上面的程式碼甚至無法編譯。編譯器抱怨:「無效運算:s == s(map 只能與 nil 比較)。」

同樣的規則適用於其他不可比較的類型,例如切片、函數或包含切片或映射的結構等。因此,如果您嘗試使用任何這些類型作為映射中的鍵,那麼您運氣不好。

func main() {
  var s map[[]int]string
}

// compile error: invalid map key type []intcompilerIncomparableMapKey

但是這裡有個小秘密,介面可以是可比較的,也可以是不可比較的。

這是什麼意思?您絕對可以定義一個以空介面為鍵的映射,而不會出現任何編譯錯誤。但請注意,您很有可能會遇到運行時錯誤。

func main() {
    m := map[interface{}]int{
        1: 1,
        "a": 2,
    }

    m[[]int{1, 2, 3}] = 3
    m[func() {}] = 4
}

// panic: runtime error: hash of unhashable type []int
// panic: runtime error: hash of unhashable type func()

Everything looks fine until you try to assign an uncomparable type as a map key.

That's when you'll hit a runtime error, which is trickier to deal with than a compile-time error. Because of this, it's usually a good idea to avoid using interface{} as a map key unless you have a solid reason and constraints that prevent misuse.

But that error message: "hash of unhashable type []int" might seem a bit cryptic. What's this about a hash? Well, that's our cue to dig into how Go handles things under the hood.

Map Anatomy

When explaining the internals of something like a map, it's easy to get bogged down in the nitty-gritty details of the Go source code. But we're going to keep it light and simple so even those new to Go can follow along.

What you see as a single map in your Go code is actually an abstraction that hides the complex details of how the data is organized. In reality, a Go map is composed of many smaller units called "buckets."

type hmap struct {
  ...
  buckets unsafe.Pointer
  ...
}

Look at Go source code above, a map contains a pointer that points to the bucket array.

This is why when you assign a map to a variable or pass it to a function, both the variable and the function's argument are sharing the same map pointer.

func changeMap(m2 map[string]int) {
  m2["hello"] = 2
}

func main() {
  m1 := map[string]int{"hello": 1}
  changeMap(m1)
  println(m1["hello"]) // 2
}

But don't get it twisted, maps are pointers to the hmap under the hood, but they aren't reference types, nor are they passed by reference like a ref argument in C#, if you change the whole map m2, it won't reflect on the original map m1 in the caller.

func changeMap(m2 map[string]int) {
  m2 = map[string]int{"hello": 2}
}

func main() {
  m1 := map[string]int{"hello": 1}
  changeMap(m1)
  println(m1["hello"]) // 1
}

In Go, everything is passed by value. What's really happening is a bit different: when you pass the map m1 to the changeMap function, Go makes a copy of the *hmap structure. So, m1 in the main() and m2 in the changeMap() function are technically different pointers point to the same hmap.

Go Maps Explained: How Key-Value Pairs Are Actually Stored

Map is passed by value

For more on this topic, there's a great post by Dave Cheney titled There is no pass-by-reference in Go.

Each of these buckets can only hold up to 8 key-value pairs, as you can see in the image below.

Go Maps Explained: How Key-Value Pairs Are Actually Stored

Buckets of a map

The map above has 2 buckets, and len(map) is 6.

So, when you add a key-value pair to a map, Go doesn't just drop it in there randomly or sequentially. Instead, it places the pair into one of these buckets based on the key's hash value, which is determined by hash(key, seed).

Let's see the simplest assignment scenario in the image below, when we have an empty map, and assign a key-value pair "hello": 1 to it.

Go Maps Explained: How Key-Value Pairs Are Actually Stored

Assign a key-value pair to an empty map

It starts by hashing "hello" to a number, then it takes that number and mods it by the number of buckets.

Since we only have one bucket here, any number mod 1 is 0, so it's going straight into bucket 0 and the same process happens when you add another key-value pair. It'll try to place it in bucket 0, and if the first slot's taken or has a different key, it'll move to the next slot in that bucket.

Take a look at the hash(key, seed), when you use a for-range loop over two maps with the same keys, you might notice that the keys come out in a different order:

func main() {
    a := map[string]int{"a": 1, "b": 2, "c": 3, "d": 4, "e": 5, "f": 6}
    b := map[string]int{"a": 1, "b": 2, "c": 3, "d": 4, "e": 5, "f": 6}

    for i := range a {
        print(i, " ")
    }
    println()

    for i := range b {
        print(i, " ")
    }
}

// Output:
// a b c d e f 
// c d e f a b     

How's that possible? Isn't the key "a" in map a and the key "a" in map b hashed the same way?

But here's the deal, while the hash function used for maps in Go is consistent across all maps with the same key type, the seed used by that hash function is different for each map instance. So, when you create a new map, Go generates a random seed just for that map.

In the example above, both a and b use the same hash function because their keys are string types, but each map has its own unique seed.

"Wait, a bucket has only 8 slots? What happens if the bucket gets full? Does it grow like a slice?"

Well, sort of. When the buckets start getting full, or even almost full, depending on the algorithm's definition of "full", the map will trigger a growth, which might double the number of main buckets.

But here's where it gets a bit more interesting.

When I say "main buckets," I'm setting up for another concept: "overflow buckets." These come into play when you've got a situation with high collisions. Imagine you have 4 buckets, but one of them is completely filled with 8 key-value pairs due to high collisions, while the other 3 buckets are sitting empty.

The full post is available here: https://victoriametrics.com/blog/go-map/.

以上是Go Maps 解釋:鍵值對實際上是如何儲存的的詳細內容。更多資訊請關注PHP中文網其他相關文章!

陳述:
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn