编码后的html:
def getHtml(self,url):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0",
"Connection":"keep-alive",
}
r = requests.get(url,headers=headers)
html = r.text.encode(r.encoding)
return html
执行
bs = BeautifulSoup(html)
结果报错如下;
encoding error : input conversion failed due to input error, bytes 0xAC 0xE5 0x8F 0xB8
Unicode的hmtl:
def getHtml(self,url):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0",
"Connection":"keep-alive",
}
r = requests.get(url,headers=headers)
html = r.text
return html
执行
bs = BeautifulSoup(html)
结果报错如下;
encoding error : input conversion failed due to input error, bytes 0xA1 0x6C 0x09 0x67