Home >Backend Development >Python Tutorial >Bytes that you must learn in Python learning
PythonMust learn bytes every day, learn and understand bytes in Python, interested friends can refer to
Bytecode in Python Expressed in the form of b'xxx'. x can be represented by characters, or it can be represented by ASCII encoding format \xnn. nn ranges from 00-ff (hexadecimal), a total of 256 characters.
1. Basic operations
The basic operations of bytes are listed below. It can be seen that it is very similar to string:
In[40]: b = b"abcd\x64" In[41]: b Out[41]: b'abcdd' In[42]: type(b) Out[42]: bytes In[43]: len(b) Out[43]: 5 In[44]: b[4] Out[44]: 100 # 100用十六进制表示就是\x64
If you want to modify a byte in a byte string, you cannot modify it directly. You need to convert it into a bytearray and then modify it:
In[46]: barr = bytearray(b) In[47]: type(barr) Out[47]: bytearray In[48]: barr[0] = 110 In[49]: barr Out[49]: bytearray(b'nbcdd')
2. The relationship between bytes and characters
It is also mentioned above that bytes and characters are very similar. In fact, they can be converted into each other. . Bytes can be converted into corresponding characters through some encoding form. Bytes can be converted into characters by passing the encoding method through the encode() method, and characters can be converted into bytes through the decode() method:
In[50]: s = "人生苦短,我用Python" In[51]: b = s.encode('utf-8') In[52]: b Out[52]: b'\xe4\xba\xba\xe7\x94\x9f\xe8\x8b\xa6\xe7\x9f\xad\xef\xbc\x8c\xe6\x88\x91\xe7\x94\xa8Python' In[53]: c = s.encode('gb18030') In[54]: c Out[54]: b'\xc8\xcb\xc9\xfa\xbf\xe0\xb6\xcc\xa3\xac\xce\xd2\xd3\xc3Python' In[55]: b.decode('utf-8') Out[55]: '人生苦短,我用Python' In[56]: c.decode('gb18030') Out[56]: '人生苦短,我用Python' In[57]: c.decode('utf-8') Traceback (most recent call last): exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-57-8b50aa70bce9>", line 1, in <module> c.decode('utf-8') UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 0: invalid continuation byte In[58]: b.decode('gb18030') Out[58]: '浜虹敓鑻︾煭锛屾垜鐢≒ython'
We can see The characters and bytes parsed using different encoding methods are completely different. If different encoding methods are used for encoding and decoding, garbled characters will be generated, or even the conversion will fail. Because each encoding method contains a different number of byte types, \xc8 in the above example exceeds the maximum character of utf-8.
3. Application
As the simplest example, I want to crawl the content of a web page. Now let’s crawl the page returned when searching for Python on Baidu. Baidu uses UTF-8 encoding format. If the return result is not decoded, it will be a super long byte string. After correct decoding, a normal html page can be displayed.
import urllib.request url = "http://www.baidu.com/s?ie=utf-8&wd=python" page = urllib.request.urlopen(url) mybytes = page.read() encoding = "utf-8" print(mybytes.decode(encoding)) page.close()
The above is the entire content of this article, I hope it will be helpful to everyone learning python programming.
The above is the detailed content of Bytes that you must learn in Python learning. For more information, please follow other related articles on the PHP Chinese website!