测试一个非常简单的爬虫,把一个非常简约风格的网页的文本内容保存到本地的电脑上。最后出现错误:
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-35-ead5570b2e15> in <module>()
7 filename=str(i)+'.txt'
8 with open(filename,'w')as f:
----> 9 f.write(content)
10 print('当前小说第{}章已经下载完成'.format(i))
11 f.close()
UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position 7: illegal multibyte sequence
代码如下:
In [1]: import requests
In [2]: from bs4 import BeautifulSoup
In [3]: re=requests.get('http://www.qu.la/book/168/')
In [4]: html=re.text
In [5]: soup=BeautifulSoup(html,'html.parser')
In [6]: list=soup.find(id="list")
In [9]: link_list=list.find_all('a')
In [14]: mylist=[]
...: for link in link_list:
...: mylist.append('http://www.qu.la'+link.get('href'))
...:
...:
#遍历每个链接,下载文本内容到 本地文本文件
i=0
...: for url in mylist1:
...: re1=requests.get(url)
...: html2=re1.text
...: soup=BeautifulSoup(html2,"html.parser")
...: content=soup.find(id="content").text.replace('chaptererror();', '')
...: filename=str(i)+'.txt'
...: with open(filename,'w')as f:
...: f.write(content)
...: print('当前小说第{}章已经下载完成'.format(i))
...: f.close()
...: i=i+1
给我你的怀抱2017-06-12 09:28:21
f.write(content.encode('utf-8'))
或者
import codecs
with codecs.open(filename, 'w', 'utf-8') as f:
f.write(content)