Home >Backend Development >Python Tutorial >Bytes that you must learn in Python learning

Bytes that you must learn in Python learning

高洛峰
高洛峰Original
2017-03-13 18:04:361857browse

PythonMust learn bytes every day, learn and understand bytes in Python, interested friends can refer to

Bytecode in Python Expressed in the form of b'xxx'. x can be represented by characters, or it can be represented by ASCII encoding format \xnn. nn ranges from 00-ff (hexadecimal), a total of 256 characters.

1. Basic operations

The basic operations of bytes are listed below. It can be seen that it is very similar to string:


In[40]: b = b"abcd\x64"
In[41]: b
Out[41]: b'abcdd'
In[42]: type(b)
Out[42]: bytes
In[43]: len(b)
Out[43]: 5
In[44]: b[4]
Out[44]: 100 # 100用十六进制表示就是\x64

If you want to modify a byte in a byte string, you cannot modify it directly. You need to convert it into a bytearray and then modify it:


In[46]: barr = bytearray(b)
In[47]: type(barr)
Out[47]: bytearray
In[48]: barr[0] = 110
In[49]: barr
Out[49]: bytearray(b'nbcdd')

2. The relationship between bytes and characters

It is also mentioned above that bytes and characters are very similar. In fact, they can be converted into each other. . Bytes can be converted into corresponding characters through some encoding form. Bytes can be converted into characters by passing the encoding method through the encode() method, and characters can be converted into bytes through the decode() method:


In[50]: s = "人生苦短,我用Python"
In[51]: b = s.encode('utf-8')
In[52]: b
Out[52]: b'\xe4\xba\xba\xe7\x94\x9f\xe8\x8b\xa6\xe7\x9f\xad\xef\xbc\x8c\xe6\x88\x91\xe7\x94\xa8Python'
In[53]: c = s.encode('gb18030')
In[54]: c
Out[54]: b'\xc8\xcb\xc9\xfa\xbf\xe0\xb6\xcc\xa3\xac\xce\xd2\xd3\xc3Python'
In[55]: b.decode('utf-8')
Out[55]: '人生苦短,我用Python'
In[56]: c.decode('gb18030')
Out[56]: '人生苦短,我用Python'
In[57]: c.decode('utf-8')
Traceback (most recent call last):
 exec(code_obj, self.user_global_ns, self.user_ns)
 File "<ipython-input-57-8b50aa70bce9>", line 1, in <module>
 c.decode(&#39;utf-8&#39;)
UnicodeDecodeError: &#39;utf-8&#39; codec can&#39;t decode byte 0xc8 in position 0: invalid continuation byte
In[58]: b.decode(&#39;gb18030&#39;)
Out[58]: &#39;浜虹敓鑻︾煭锛屾垜鐢≒ython&#39;

We can see The characters and bytes parsed using different encoding methods are completely different. If different encoding methods are used for encoding and decoding, garbled characters will be generated, or even the conversion will fail. Because each encoding method contains a different number of byte types, \xc8 in the above example exceeds the maximum character of utf-8.

3. Application

As the simplest example, I want to crawl the content of a web page. Now let’s crawl the page returned when searching for Python on Baidu. Baidu uses UTF-8 encoding format. If the return result is not decoded, it will be a super long byte string. After correct decoding, a normal html page can be displayed.


import urllib.request

url = "http://www.baidu.com/s?ie=utf-8&wd=python"
page = urllib.request.urlopen(url)
mybytes = page.read()
encoding = "utf-8"
print(mybytes.decode(encoding))
page.close()

The above is the entire content of this article, I hope it will be helpful to everyone learning python programming.

The above is the detailed content of Bytes that you must learn in Python learning. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn