Home > Article > Backend Development > Comparison of Python 2 and Python 3 versions and encodings
1. Version comparison
The first thing to say is that Python versions are currently mainly divided into two categories:
The Python 2.x version is called Python2: It is currently the most widely used, such as Python 2.7.3.
The Python 3.x version is called Python3: it is the latest version, such as Python 3.1. In the long run, it can be regarded as the future trend.
[The difference between Python2 and Python3]
1. From Python2 to Python3, many basic function interfaces have changed, and even some libraries or functions have been removed and renamed
Python2 and Python3 have changed their interfaces in many basic and most commonly used functions. The most typical one is the most commonly used print function.
2. In terms of support for third-party libraries, currently, Python2 is the best supported, while Python3 is not supported enough
One of the reasons why Python is powerful is that there are many third-party libraries , the function is very powerful.
Currently, many third-party libraries for Python only provide Python2.
Or even if Python3 is provided, it may not be very mature.
2. Encoding comparison
In Python, whether it is Python2 or Python3, generally speaking, there are only two categories of characters:
Universal Unicode characters;
(unicode encoded) characters of a certain encoding type, such as UTF-8, GBK and other types of characters.
Character type in Python2:
str: encoded byte sequence
unicode: Text characters before encoding
Character type in Python3:
str: Encoded unicode Text characters
bytes: byte sequence before encoding
We can think of strings as having two states, text state and byte (binary) status. The two character types in Python2 and Python3 correspond to these two states respectively, and then encode and decode each other. Encoding is to convert a string into bytecode, which involves the internal representation of the string; decoding is to convert the bytecode into a string and display the bits into characters.
In Python2, both str and unicode have encode and decode methods. However, it is not recommended to use encode for str and decode for unicode. This is a flaw in the design of Python2. Python3 has been optimized. str has only one encode method to convert a string into a bytecode, and bytes has only one decode method to convert the bytecode into a text string.
Python2’s str and unicode are both subclasses of basestring, so the two can be spliced directly. In Python3, bytes and str are two independent types, and they cannot be spliced.
In Python2, ordinary characters enclosed in quotation marks are str; at this time, the encoding type of the string corresponds to the encoding in which your Python file itself is saved. On the most common Windows platform, The default is GBK. In Python3, a string enclosed in single quotes or double quotes is already a Unicode type str.
There are some prerequisites for the encoding of str:
The corresponding encoding has been declared at the beginning of the Python file
Python file It is indeed saved using this encoding
The encoding types of the two must be the same (for example, both are UTF-8 or both are GBK, etc.)
In this way, the Python parser can correctly parse the text into the corresponding encoded str.
Generally speaking, in Python3, the character encoding problem has been greatly optimized and is no longer as troublesome as in Python2. In Python3, text is always Unicode, represented by the str type, and binary data is represented by bytes. str and bytes are not secretly mixed together, making the difference between the two more obvious.
Summary
The above is the entire content of this article. I hope the content of this article can bring some help to everyone learning or using python. If you have any questions, you can leave a message to communicate. , thank you all for your support of PHP Chinese website.
For more articles related to the comparison of Python 2 and Python 3 versions and coding, please pay attention to the PHP Chinese website!