Home >Backend Development >Golang >How to Best Handle Raw Unicode in HTTP Response Bodies?

How to Best Handle Raw Unicode in HTTP Response Bodies?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-12-03 14:18:15352browse

How to Best Handle Raw Unicode in HTTP Response Bodies?

How to Handle Raw Unicode Content in Response Bodies

When retrieving data from web APIs using the net/http library, it's possible to encounter content encoded as raw Unicode. This manifests as ASCII characters in the response body. To decipher the actual content hidden within, you need to decode the Unicode sequences.

One approach is to utilize the bufio.ScanRunes functionality to iterate through the individual Unicode codepoints. However, in some cases, this approach might fall short. A more reliable method involves using tools like the json package to unmarshal the response body into a structured object. This process automatically handles Unicode decoding, leaving you with clean and readily usable data.

For instance, you can use the following snippet to unmarshal JSON data containing escaped Unicode characters:

func main() {
    var i interface{}
    err := json.Unmarshal([]byte(src), &i)
    fmt.Println(err, i)
}

const src = `{"forum":{"id":"3251718","name":"\u5408\u80a5\u5de5\u4e1a\u5927\u5b66\u5ba3\u57ce\u6821\u533a","first_class":"\u9ad8\u7b49\u9662\u6821","second_class":"\u5b89\u5fbd\u9662\u6821","is_like":"0","user_level":"1","level_id":"1","level_name":"\u7d20\u672a\u8c0b\u9762","cur_score":"0","levelup_score":"5","member_num":"80329","is_exists":"1","thread_num":"108762","post_num":"3445881","good_classify":[{"class_id":"0","class_name":"\u5168\u90e8"},{"class_id":"1","class_name":"\u516c\u544a\u7c7b"},{"class_id":"2","class_name":"\u5427\u53cb\u4e13\u533a"},{"class_id":"4","class_name":"\u6d3b\u52a8\u4e13\u533a"},{"class_id":"6","class_name":"\u793e\u56e2\u73ed\u7ea7"},{"class_id":"5","class_name":"\u8d44\u6e90\u5171\u4eab"},{"class_id":"8","class_name":"\u6e29\u99a8\u751f\u6d3b\u7c7b"},{"class_id":"7","class_name":"\u54a8\u8be2\u65b0\u95fb\u7c7b"},{"class_id":"3","class_name":"\u98ce\u91c7\u5c55\u793a\u533a"}]}}`

Output:

<nil> map[forum:map[levelup_score:5 is_exists:1 post_num:3445881 good_classify:[map[class_id:0 class_name:全部] map[class_id:1 class_name:公告类] map[class_id:2 class_name:吧友专区] map[class_id:4 class_name:活动专区] map[class_id:6 class_name:社团班级] map[class_id:5 class_name:资源共享] map[class_id:8 class_name:温馨生活类] map[class_name:咨询新闻类 class_id:7] map[class_id:3 class_name:风采展示区]] id:3251718 is_like:0 cur_score:0

Alternatively, to decode a specific Unicode sequence without using a third-party library, you can employ the strconv.Unquote() function:

fmt.Println(strconv.Unquote(`"\u7d20\u672a\u8c0b"`))

Output:

素未谋面 <nil>

The above is the detailed content of How to Best Handle Raw Unicode in HTTP Response Bodies?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn