Crawling the content of Baidu Encyclopedia
response=urllib2.urlopen(url)
if response.getcode()!=200:
return None
html=response.read()
return html.decode("UTF-8")
Write another file. When writing, 'gbk' codec can't encode character 'xa0' in position 15 appears.
I use UTF8 encoding to write. How does it have anything to do with gbk? ? ?