巴扎黑2017-04-18 09:36:28
First of all, there is absolutely no need to write this requirement in this waycsv
这个模块来做, csv
默认以半角逗号分隔不同的列, 但是如果单列内容有半角逗号的话, excel
读取就有点尴尬. 我建议用TAB
来做分隔符(定界符), 然后直接用with open(...) as fh
In fact, you only need to call it once, there is no need to call it twiceget_data
/
# -*- coding:utf-8 -*-
import requests
from bs4 import BeautifulSoup
user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'
URL = 'http://finance.qq.com'
def get_data(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
soup = soup.find('p', {'id': 'listZone'}).findAll('a')
return soup
def main():
with open("hello.tsv", "w") as fh:
fh.write("url\ttitile\n")
for item in get_data(URL + "/gdyw.htm"):
fh.write("{}\t{}\n".format(URL + item.get("href"), item.get_text()))
if __name__ == "__main__":
main()
ringa_lee2017-04-18 09:36:28
Because you wrote csvrow1 first and then csvrow2, which resulted in this result. You should traverse csvrow1 and 2 at the same time, like this:
for i in zip(csvrow1, csvrow2):
csvfile.write(i[0] + ',' + i[1] + '\n')
伊谢尔伦2017-04-18 09:36:28
# -*- coding:utf-8 -*-
import requests
from bs4 import BeautifulSoup
import csv
user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'
def get_data(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
soup = soup.find('p', {'id': 'listZone'}).findAll('a')
return soup
urls = []
titles = []
for url in get_data('http://finance.qq.com/gdyw.htm'):
urls.append('http://finance.qq.com/'+url.get('href'))
for title in get_data('http://finance.qq.com/gdyw.htm'):
titles.append(title.get_text())
data = []
for url, title in zip(urls, titles):
row = {
'url': url,
'title': title
}
data.append(row)
with open('a.csv', 'w') as csvfile:
fieldnames = ['url', 'title']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(data)