Python爬虫的乱码问题？

Question

使用python实现模拟登陆并爬取返回页面的时候出现了乱码，目标网页的编码使用utf-8 相关代码： {代码...} 控制台输出信息： 第一次遇见这种乱码比较懵逼

PHPz · Answer

urllib2 does not handle compression issues, you have to use gzip to decompress, like this

from StringIO import StringIO
import gzip

if response.info().get('Content-Encoding') == 'gzip':
    buf = StringIO(text)
    f = gzip.GzipFile(fileobj=buf)
    data = f.read()

In summary, urllib2 is relatively low-level, and it is recommended to use requests

Python爬虫的乱码问题？

reply all(1)I'll reply