Home >Backend Development >Python Tutorial >python BeautifulSoup设置页面编码的方法

python BeautifulSoup设置页面编码的方法

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOriginal: 2016-06-10 15:16:241440browse

在用BeautifulSoup进行抓取页面的时候，会各种各样的编码错误。
可以通过在beautifulsoup中指定字符编码，解决问题。

复制代码代码如下:

import urllib2  

from BeautifulSoup import BeautifulSoup  

page = urllib2.urlopen('http://www.163.com');  

soup = BeautifulSoup(page,from_encoding="gb2312")  

print soup.originalEncoding

print soup.prettify()  

红色部分表示需要注意的地方。在BeautifulSoup构造器中传入fromEncoding参数即可解决乱码问题，当然具体参数值是什么就要看你获取页面的编码是什么

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：仅用500行Python代码实现一个英文解析器的教程Next article：对于Python异常处理慎用“except:pass”建议

See more

python BeautifulSoup设置页面编码的方法

Related articles