Home >Backend Development >Python Tutorial >Python character encoding conversion secrets
Python has two types of strings, str object and unicode object, both of which can store the byte encoding of characters, but they are different types. This is very important and is why there are encode and decode. The meaning of
encode and decode in pyhton can be expressed as
encode
unicode -------------------------> str
unicode
Several common methods of decode
:
str_string.decode('codec') is to decode str_string Convert to unicode_string, codec is the encoding method of source str_string
unicode_string.encode('codec') is to convert unicode_string to str_string, codec is the encoding method of target str_string
str_string.decode('from_codec').encode('to_codec ') can realize conversion between str_strings of different encodings
For example:
>>> t='Great Wall'
>>> t
'xb3xa4xb3xc7'
>>> t.decode('gb2312') .encode('utf-8')
'xe9x95xbfxe5x9fx8e'
str_string.encode('codec') first calls the system's default codec to convert str_string to unicode_string, and then uses the encode parameter codec to convert to the final str_string. Equivalent to str_string.decode('sys_codec').encode('codec').
unicode_string.decode('codec') is basically meaningless. Unicode only uses one unicode encoding in python, UTF16 or UTF32 (already determined when compiling python), and there is no need for encoding conversion.
Note: The default codec is specified in the sitecustomize.py file under site-packages, such as
import sys
sys.setdefaultencoding('utf-8')