Home > Article > Backend Development > Detailed explanation of Python character encoding conversion method
python There are two types of str object and unicode object string, both of which can store the byte encoding of characters, but they are different types. This point is very important and is why there are encode and decode.
The meaning of encode and decode in pyhton can be expressed as
encode
unicode ------------- ------------> str
unicode <-------------------------- str
decode
Several common methods:
str_string.decode('codec') is to convert str_string is unicode_string, codec is the encoding method of source str_string
unicode_string.encode('codec') is to convert unicode_string to str_string, codec is the encoding method of target str_string
str_string.decode('from_codec ').encode('to_codec') can realize conversion between str_strings of different encodings
For example:
>>> t='Great Wall'
>>> t
'\xb3\xa4\xb3\xc7'
>>> t.decode('gb2312').encode('utf-8 ')
'\xe9\x95\xbf\xe5\x9f\x8e'
str_string.encode('codec') first calls the system's default codec to convert str_string to unicode_string , and then use the encode parameter codec to convert to the final str_string. Equivalent to str_string.decode('sys_codec').encode('codec').
unicode_string.decode('codec') is basically meaningless. Unicode only uses one unicode encoding in python, UTF16 or UTF32 (already determined when compiling python), and there is no need for encoding conversion.
Note: The default codec is specified in the sitecustomize.py file under site-packages, such as
import sys
sys.setdefaultencoding('utf-8')
The above is the detailed content of Detailed explanation of Python character encoding conversion method. For more information, please follow other related articles on the PHP Chinese website!