Home >Backend Development >Golang >Go String parsing

Go String parsing

Guanhui
Guanhuiforward
2020-06-12 18:21:233117browse

Go String parsing

What is a string?

In Go, a string is a (possibly empty) immutable sequence of bytes. For us, the key word here is immutable. Because byte slices are mutable, converting between string and []byte usually requires allocation and copying, which is expensive.

Under the hood, Go's strings are (currently) represented as a length and a pointer to the string data.

What is string resident?

Consider this code:

b := []byte("hello")
s := string(b)
t := string(b)

s and t are strings, so they both have length and data pointers. Their lengths are obviously the same. What about their data pointers?

The Go language cannot provide us with a direct search method. But we can use unsafe to probe:

func pointer(s string) uintptr {
    p := unsafe.Pointer(&s)
    h := *(*reflect.StringHeader)(p)
    return h.Data
}

(This function should return unsafe.Pointer. See Go issue 19367 for details.)

If we fmt.Println(pointer(s), pointer( t)), we will get information similar to 4302664 4302632. Pointers are different; they have two separate copies of the data hello.

(This is an exercise link. If you want to try it, what happens if you change "hello" to "h"? Explanation )

Suppose you want to reuse the single data hello Copy? This is string residency. String residency has two advantages. The obvious advantage is that you don't need to allocate and copy data. Another advantage is that it speeds up string equality checks. If two strings have the same length and the same data pointer, they are equal; there is no need to check the bytes.

As of Go 1.14, Go does not persist most strings. Like other forms of caching, persistence has costs: synchronization for concurrency safety, garbage collector complexity, and extra code to be executed each time a string is created. And, like caching, there are situations where it can be more harmful than helpful. If you're dealing with words in a dictionary, no word will appear twice, and string persistence is a waste of time and memory.

Manual string persistence

You can manually persist strings in Go. What we need is a way to find an existing string to reuse given a byte slice, perhaps using something like map[[]byte]string . If the lookup succeeds, the existing string is used; if it fails, we convert and store the string for future use.

There's just one problem here: you can't use []byte as a key for a map.

Thanks to long-term compiler optimizations, we can use map[string]string instead. An optimization here is that map operations whose keys are slices of transformed bytes don't actually generate new strings that are used during lookups.

m := make(map[string]string)
b := []byte("hello")
s := string(b) // 分配了
_ = m[string(b)] // 不分配!

(Similar optimizations apply to other cases where the compiler can prove that the converted byte slice will not be modified during use, such as switch string(b), when all switch When there are no side effects.)

All the code needed to persist the string is this:

func intern(m map[string]string, b []byte) string {
    // 查找一个存在的字符串来重用
    c, ok := m[string(b)]
    if ok {
        // 找到一个存在的字符串
        return c
    }
    // 没有找到,所以制作一个并且存储它
    s := string(b)
    m[s] = s
    return s
}

It’s easy

New difficulties (concurrency Symptoms)

Please note that this manual dwell routine pushes the dwell problem into the calling code. You need to manage concurrent access to the map; you need to determine the lifetime of the map (and everything in it); and you pay the additional cost of a map lookup every time you need a string.

Pushing these decisions onto the calling code can yield better performance. For example, say you are decoding json to map[string]interface{}. The json decoder may not be concurrent. The map's lifecycle can be tied to the json decoder. And the keys of this map are likely to be repeated often, which is the best case for string residency; this makes the extra map lookup cost worth it.

A helper package

If you don't want to consider any of these complications, and are willing to accept a slight performance hit, and have strings resident there might be For the helpful code, there is a package for this: github.com/josharian/intern.

How it works is horribly abused sync.Pool. It stores resident maps in sync.Pool, retrieving them as needed. This solves the problem of concurrent access very well, because access to sync.Pool is concurrently safe. It mainly solves the lifetime problem, since the contents in sync.Pool will usually eventually be garbage collected. (For related reading on managing lifetimes, see Go issue 29696.)

Recommended tutorials: "PHP" "GO Tutorial"

The above is the detailed content of Go String parsing. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:learnku.com. If there is any infringement, please contact admin@php.cn delete