search

Home  >  Q&A  >  body text

python爬虫 - 最近在用python分布式爬虫,使用的是scrapy框架,采用主从模式?

Master和sleeve之间需要协作通信,而实现协作通信需要用到jsonRPC,在网上看了一些这方面的内容,
一、需要安装jsonrpc-scrapy
二、在程序中导入相应的包
三、协作通信主要是通过http来实现
现在有一个master,多个sleeve,master相当于服务器,而sleeve相当于爬虫的节点,来实现具体的爬虫任务。
分布式实现涉及任务的调度,以及任务的分配,但整体还是不太清楚协作通信是如何实现的

巴扎黑巴扎黑2812 days ago840

reply all(1)I'll reply

  • 黄舟

    黄舟2017-04-17 16:01:45

    I’ve been learning this recently, but I haven’t achieved distributed implementation yet.

    Up to google搜索了一下,恰巧也用到了redis, I was also asked this question during the interview before.

    Refer to these two blog posts, I hope it will be helpful to you. I feel that the first one is more referenceable.

    A distributed web crawler implemented using scrapy, redis, and mongodb

    How to get started with Python crawler?

    reply
    0
  • Cancelreply