问题:
用python原生json解析 urllib2.urlopen返回的对象失败。
代码:
url = "https://www.baidu.com"
data = urllib2.urlopen(url)
json.load(data)
错误:
No JSON object could be decoded
这是接手了一个现成的项目,在本机搭环境时出现的问题。
参考下面文章,可能是因为windows下,对于UTF-8编码默认都是带BOM的,
而Python中Json库不支持带BOM的UTF-8。
参考文章
然后有几点疑问想请教大家:
1.urllib2.urlopen获取的对象确实是可以用json.load解析的么?
2.urllib2.urlopen获取对象时能否直接指定编码为utf8去BOM呢?
3.有什么方式可以是windows的UTF8编码默认去BOM么?
高洛峰2017-04-17 18:00:10
Whether the object obtained by urllib2.urlopen can be parsed with json.load depends on whether the data returned by the server is in the correct json format. For example, you can check whether the content-type returned by the request is application/json
BOM is used to identify the encoding format when storing files, especially when UTF-16 encoding is used to indicate whether the encoding byte order is big-endian or little-endia. UTF-8 itself does not require a BOM. The encoding returned by the request is specified by the charset in content-type, such as Content-Type: application/json; charset=utf-8
Whether a BOM header is added to UTF8-encoded files depends on the editor you use. Different editors have different setting methods.