Home > Article > Backend Development > How to solve the golang byte garbled problem
In recent years, as the Go language has gradually become popular, more and more people choose to use the Go language to develop projects. However, in the process of using Go language development, we sometimes encounter some inexplicable problems. One of the more common problems is the byte garbled problem. Next, we will introduce the byte garbled problem in detail and provide some solutions.
First of all, we need to know what byte is. In the Go language, byte is a natural number type, representing an 8-bit byte value (unsigned 8-bit integers), which can be used to represent ASCII characters. Rune is an integer, representing a Unicode character, which can be used to represent ASCII characters and other multi-byte characters. Usually, we use string to store characters, and byte slices ([]byte) are used to store the byte sequence of ASCII code strings, that is, one character corresponds to one byte.
When we convert characters into byte slices (that is, convert strings into []byte), sometimes garbled characters will occur. For example, the following code:
func main() { str := "你好,世界!" b := []byte(str) fmt.Println(b) }
The output result is:
[228 189 160 229 165 189 239 188 129 231 149 140 239 188 154 33]
As you can see, the program does not output the "Hello, world!" we expected, but a bunch of garbled characters. So, why is this?
Actually, this is because the bottom layer of string in Go language is a byte array. When converting string into byte slice, the characters in the string will be converted into byte type one by one. In UTF-8 encoding, Chinese characters are 3 bytes, while an English letter is only 1 byte. Therefore, when we convert Chinese characters into byte, we need to occupy 3 bytes, but when converting string to []byte, each character only occupies one byte, so the problem of garbled characters occurs.
For example, the two characters "Hello" correspond to the hexadecimal encoding E4BDA0 E5A5BD in UTF-8 encoding. The result of converting it into byte type is:
[]byte{0xE4, 0xBD, 0xA0, 0xE5, 0xA5, 0xBD}
If you directly splice two bytes together, you will get:
[]byte{0xE4, 0xBD, 0xA0, 0xE5, 0xA5, 0xBD}
This will cause garbled characters. Therefore, when converting a string into a byte slice, we need to use the relevant functions in the string package for conversion, such as strconv.Atoi(), strconv.ParseInt(), strconv.ParseUint(), strconv.ParseFloat(), strconv. Quote() and so on.
Of course, in some special cases, we can also manually convert string to byte type instead of using the functions in the string package. The specific operation method is as follows:
func main() { str := "你好,世界!" b := make([]byte, len(str)*3) blen := 0 for _, runeValue := range str { c := utf8.EncodeRune(b[blen:], runeValue) blen += c } fmt.Println(b[:blen]) }
The output result is:
[228 189 160 229 165 189 239 188 129 231 149 140 239 188 154 33]
It can be seen that the result of our manual conversion is consistent with the result obtained by using the function in the string package.
In addition to the above methods, we can also use third-party libraries to help solve the byte garbled problem, such as GORM, goka, gRPC, etc.
In short, when using Go language to develop projects, we must pay attention to the problem of byte garbled characters, and use functions in the string package for conversion as much as possible, or use third-party libraries. Only by correctly solving this problem can we better use the Go language for development work.
The above is the detailed content of How to solve the golang byte garbled problem. For more information, please follow other related articles on the PHP Chinese website!