Home  >  Article  >  Backend Development  >  Python encoding type conversion

Python encoding type conversion

高洛峰
高洛峰Original
2017-03-01 13:32:001059browse

The examples in this article describe the Python encoding type conversion method. Share it with everyone for your reference, the details are as follows:

1: Python and unicode

In order to correctly handle multi-language texts, Python was introduced after version 2.0 Unicode string.

2: print in python

Although python internally needs to convert the text encoding to unicode encoding for processing, the terminal display work is completed by traditional Python strings (In fact, Python's print statement cannot print out double-byte Unicode-encoded characters at all).

Python's print will automatically perform encoding conversion on the output Unicode encoding (for other non-Unicode encodings, print will output it as is) (when output to the console), but the write method of the file object will not do it. , Therefore, when some strings are output normally by printing, writing to the file may not necessarily be the same as printing.

Under Linux, it is converted according to environment variables. You can see it by using the locale command under Linux. The implementation of the print statement is to transmit the content to be output to the operating system, and the operating system will encode the input byte stream according to the system's encoding.

>>>str='学习python'
>>> str
'\xe5\xad\xa6\xe4\xb9\xa0python' #asII编码
>>> print str
学习python
>>> str=u'学习python'
>>> str       ####unicode编码
'\xe5u\xad\xa6\xe4\xb9\xa0python'

3: decode

in python converts other character sets into unicode encoding (only Chinese characters Need to be converted)

>>> str='学习'
>>> ustr=str.decode('utf-8')
>>> ustr
u'\u5b66\u4e60'

In this way, the Chinese characters are encoded and converted, and python can be used for subsequent processing; (if not converted, python will be based on the machine's Environment variables perform default encoding conversion, so garbled characters may appear)

4: encode

in python converts unicode into other character sets

>>> str='学习'
>>> ustr=str.decode('utf-8')
>>> ustr
u'\u5b66\u4e60'
>>> ustr.encode('utf-8')
'\xe5\xad\xa6\xe4\xb9\xa0'
>>> print ustr.encode('utf-8')
学习

For more articles related to Python encoding type conversion, please pay attention to the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn