Home >Backend Development >Python Tutorial >Python character encoding conversion secrets

Python character encoding conversion secrets

高洛峰
高洛峰Original
2016-10-19 11:41:361503browse

Python has two types of strings, str object and unicode object, both of which can store the byte encoding of characters, but they are different types. This is very important and is why there are encode and decode. The meaning of

encode and decode in pyhton can be expressed as

encode

unicode -------------------------> str

unicode

Several common methods of decode

:

str_string.decode('codec') is to decode str_string Convert to unicode_string, codec is the encoding method of source str_string

unicode_string.encode('codec') is to convert unicode_string to str_string, codec is the encoding method of target str_string

str_string.decode('from_codec').encode('to_codec ') can realize conversion between str_strings of different encodings

For example:

>>> t='Great Wall'

>>> t

'xb3xa4xb3xc7'

>>> t.decode('gb2312') .encode('utf-8')

'xe9x95xbfxe5x9fx8e'

str_string.encode('codec') first calls the system's default codec to convert str_string to unicode_string, and then uses the encode parameter codec to convert to the final str_string. Equivalent to str_string.decode('sys_codec').encode('codec').

unicode_string.decode('codec') is basically meaningless. Unicode only uses one unicode encoding in python, UTF16 or UTF32 (already determined when compiling python), and there is no need for encoding conversion.

Note: The default codec is specified in the sitecustomize.py file under site-packages, such as

import sys

sys.setdefaultencoding('utf-8')


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn