The same code cannot pass in the Notepad environment (tested with Notepad), but can pass in Pycharm (Python3.5)
Code :
import urllib
import urllib.request
url = "http://www.baidu.com"
data = urllib.request.urlopen(url).read()
data = data.decode('UTF-8')
This statement can be passed in both environments
data.decode('gbk', 'ignore').encode('UTF-8')
print(data)
Display the crawled web page in Pycharm and display it in the cmd window
UnicodeEncodeError: 'gbk' codec can't encode character 'xbb' in position 26830:
illegal multibyte sequence
Invalid characters must be removed.
import urllib
import urllib.request
url = "http://www.baidu.com"
data = urllib.request.urlopen(url).read()
data.decode('gbk', 'ignore').encode('UTF-8')
print(data)
This is okay, please explain
淡淡烟草味2017-05-18 10:52:11
You may encounter the same python encoding problem as me, or the encoding support problem of the terminal you are using. Take a look at the questions below.
【Python coding problem? 】Shared from @SegmentFault, portal: /q/10...