Home >Backend Development >Python Tutorial >Python character encoding judgment method

Python character encoding judgment method

高洛峰
高洛峰Original
2017-03-01 13:21:161761browse

The example in this article describes the Python character encoding judgment method. Share it with everyone for your reference, the details are as follows:

Method 1:

isinstance(s, str) is used to determine whether it is a general string
isinstance(s, unicode) is used to determine whether it is unicode

or


if type(str).__name__!="unicode":
str=unicode(str,"utf-8")
else:
pass


Method 2 :

Python chardet character encoding judgment

Using chardet can easily implement string/file encoding detection. Especially for Chinese web pages, some pages use GBK/GB2312, and some use UTF8. If you need to crawl some pages, it is important to know the web page encoding. Although HTML pages have charset tags, sometimes they are incorrect. Then chardet can help us a lot.

chardet instance


>>> import urllib
>>> rawdata = urllib.urlopen('http://www.google.cn/').read()
>>> import chardet
>>> chardet.detect(rawdata)
{'confidence': 0.98999999999999999, 'encoding': 'GB2312'}
>>>chardet可以直接用detect函数来检测所给字符的编码。函数返回值为字典,有2个元数,一个是检测的可信度,另外一个就是检测到的编码。


chardet installation

After downloading chardet, unzip the chardet compressed package, place the chardet folder directly in the application directory, and then use import chardet to start using chardet.

Or use the setup.py installation file to copy chardet to the Python system directory, so that all your python programs only need to import chardet.

python setup.py install reference

chardet official website: http://chardet.feedparser.org/
chardet download page: http://chardet.feedparser.org/download/

For more articles related to Python character encoding judgment methods, please pay attention to the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn