Sometimes information is passed directly to the next function through pyspider, but there is no need to cral a new URL.
For example, a certain part of the information on a list page needs to be passed to the next function, but it does not want to be passed through the URL of the list.
I thought about using the send_message method, and also using self.crawl's response.save (just write a URL that is easier to crawl)
But these two methods are not particularly good.
Is there any good method?
ringa_lee2017-05-18 11:02:39
The next function has finished executing and it no longer exists. How do you pass information to something that does not exist?
曾经蜡笔没有小新2017-05-18 11:02:39
It’s nothing more than a jump callback, and the data that has come out yields, but you should still test it.
def detail(self, response):
next_urls = [i.attr.href for i in response.doc('#fetch urls')]
for url in next_urls:
self.crawl(url, callback=self.list_page)
for i in items = [
# some result
]
yield i