After the crawler generated the dict, I wanted to write it into a csv file, but an error occurred.
Use jupyter notebook and window environment.
The specific code is as follows
import requests
from multiprocessing.dummy import Pool as ThreadPool
from lxml import etree
import sys
import time
import random
import csv
def spider(url):
header={
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
}
timeout=random.choice(range(31,50))
html = requests.get(url,header,timeout=timeout)
time.sleep(random.choice(range(8,16)))
selector = etree.HTML(html.text)
content_field = selector.xpath('//*[@class="inner"]/p[3]/p[2]/ul/li')
item ={}
for each in content_field:
g = each.xpath('a/p[1]/p[1]/h3/span/text()')
go = each.xpath('a/p[1]/p[2]/p/h3/text()')
h = each.xpath('a/p[1]/p[2]/p/p/text()[1]')
j= each.xpath('a/p[1]/p[1]/p/text()[2]')
ge = each.xpath('a/p[1]/p[2]/p/p/text()[3]')
x = each.xpath('a/p[1]/p[1]/p/text()[3]')
city = each.xpath('a/p[1]/p[1]/p/text()[1]')
gg = each.xpath('a/p[2]/span/text()')
item['city']="".join(city)
item['hangye']="".join(hangye)
item['guimo']="".join(guimo)
item['gongsi']="".join(gongsi)
item['gongzi']="".join(gongzi)
item['jingyan']="".join(jingyan)
item['xueli']="".join(xueli)
item['gongzuoneirong']="".join(gongzuoneirong)
fieldnames =['city','hangye','guimo','gongsi','gongzi','jingyan','xueli','gongzuoneirong']
with open('bj.csv','a',newline='',errors='ignore')as f:
f_csv=csv.DictWriter(f,fieldnames=fieldnames)
f_csv.writeheader()
f_csv.writerow(item)
if __name__ == '__main__':
pool = ThreadPool(4)
f=open('bj.csv','w')
page = []
for i in range(1,100):
newpage = 'https://www.zhipin.com/c101010100/h_101010100/?query=%E6%95%B0%E6%8D%AE%E8%BF%90%E8%90%A5&page='+str(i) + '&ka=page-' + str(i)
page.append(newpage)
results = pool.map(spider,page)
pool.close()
pool.join()
f.close()
Run the above code and the error message is
ValueError: too many values to unpack (expected 2)
The reason for querying is to traverse the dict, which requires the form of dict.items(). But how to implement it in the above code has not been straightened out. Please give me some advice
習慣沉默2017-05-18 10:51:20
Sorry, I only have time to answer your question now. I saw that you changed the code according to my suggestion. I will post the changed code below. I have run it and there is no problem.
import requests
from multiprocessing.dummy import Pool
from lxml import etree
import time
import random
import csv
def spider(url):
header = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
}
timeout = random.choice(range(31, 50))
html = requests.get(url, headers=header, timeout=timeout)
time.sleep(random.choice(range(8, 16)))
selector = etree.HTML(html.text)
content_field = selector.xpath('//*[@class="inner"]/p[3]/p[2]/ul/li')
item = {}
for each in content_field:
g = each.xpath('a/p[1]/p[1]/h3/span/text()')
go = each.xpath('a/p[1]/p[2]/p/h3/text()')
h = each.xpath('a/p[1]/p[2]/p/p/text()[1]')
j = each.xpath('a/p[1]/p[1]/p/text()[2]')
ge = each.xpath('a/p[1]/p[2]/p/p/text()[3]')
x = each.xpath('a/p[1]/p[1]/p/text()[3]')
city = each.xpath('a/p[1]/p[1]/p/text()[1]')
gg = each.xpath('a/p[2]/span/text()')
item['city'] = "".join(city)
item['hangye'] = "".join(g)
item['guimo'] = "".join(go)
item['gongsi'] = "".join(h)
item['gongzi'] = "".join(j)
item['jingyan'] = "".join(ge)
item['xueli'] = "".join(x)
item['gongzuoneirong'] = "".join(gg)
fieldnames = ['city', 'hangye', 'guimo', 'gongsi', 'gongzi', 'jingyan', 'xueli', 'gongzuoneirong']
with open('bj.csv', 'a', newline='', errors='ignore')as f:
f_csv = csv.DictWriter(f, fieldnames=fieldnames)
f_csv.writeheader()
f_csv.writerow(item)
if __name__ == '__main__':
f = open('bj.csv', 'w')
page = []
for i in range(1, 100):
newpage = 'https://www.zhipin.com/c101010100/h_101010100/?query=%E6%95%B0%E6%8D%AE%E8%BF%90%E8%90%A5&page=' + str(
i) + '&ka=page-' + str(i)
page.append(newpage)
print(page)
pool = Pool(4)
results = pool.map(spider, page)
pool.close()
pool.join()
f.close()
The main thing here is header, you are the set
类型,我修改后是dict
type
I still need some advice for you here
Do you run your code in an IDE or a text editor? Some things will obviously report errors under the IDE
It is recommended that novices abide by the PEP8 specifications from the beginning of learning. Don't develop bad habits. Take a look at your naming
过去多啦不再A梦2017-05-18 10:51:20
item = {'a':1, 'b':2}
fieldnames = ['a', 'b']
with open('test.csv', 'a') as f:
f_csv = DictWriter(f, fieldnames=fieldnames)
f_csv.writeheader()
f_csv.writerow(item)
I didn’t get an error when I wrote like this
Writerrow just receives the dict directly. I think your problem is because the key of the item does not correspond to the header of your table