Home  >  Q&A  >  body text

python - 关于 scrapy 的 pipeline 和 items 问题

能不能实现这种:

  1. aItem的数据由aPipeline处理

  2. bItem的数据由bPipeline处理

PHP中文网PHP中文网2741 days ago518

reply all(3)I'll reply

  • 天蓬老师

    天蓬老师2017-04-18 09:51:55

    Is this the purpose?
    For example, your items.py has the following items

    Then you can do the following in the process_item function in pipelines.py

    This way different data can be processed separately,

    reply
    0
  • 天蓬老师

    天蓬老师2017-04-18 09:51:55

    You can determine which crawler the result is in the pipeline:

    def process_item(self, item, spider):
        if spider.name == 'news':
            #这里写存入 News 表的逻辑
            news = News()
            ...(省略部分代码)
            self.session.add(news)
            self.session.commit()
         elif spider.name == 'bsnews':
            #这里写存入 News 表的逻辑
            bsnews = BsNews()
            ...(省略部分代码)
            self.session.add(bsnews)
            self.session.commit()
            
          return item
    

    For this kind of problem where multiple crawlers are in one project, different crawlers need to use different logic in the pipeline. The author of scrapy explained it this way.
    Go and have a look

    reply
    0
  • PHP中文网

    PHP中文网2017-04-18 09:51:55

    Yes, the process_item of pipelines has a spider parameter, which can filter the corresponding spider to use this pipeline

    reply
    0
  • Cancelreply