search

Home  >  Q&A  >  body text

python - 一般公司做爬虫采集的话常用什么语言

一般公司做爬虫采集的话常用什么语言 在京东搜点书全是有关java的

阿神阿神2802 days ago1783

reply all(30)I'll reply

  • 怪我咯

    怪我咯2017-04-17 17:50:02

    scrapy +1

    It is very convenient to use, has a lot of functions, and the documentation is very clear:

    scrapy official website

    reply
    0
  • 高洛峰

    高洛峰2017-04-17 17:50:02

    The questioner has already added the python tag himself, why do you still ask about the language...

    reply
    0
  • PHPz

    PHPz2017-04-17 17:50:02

    The company I work for uses Java.

    reply
    0
  • 黄舟

    黄舟2017-04-17 17:50:02

    Using a browser or browser-like method to parse a page is far less fast than regular analysis. If you want to use a selector, you have to build something. This is not a labor-saving job
    However, the biggest problem with regular parsing is that once someone else changes the version, you may find it easier to change it

    reply
    0
  • 迷茫

    迷茫2017-04-17 17:50:02

    nodejs +1

    reply
    0
  • PHPz

    PHPz2017-04-17 17:50:02

    I know a lot about python, but occasionally I use java

    reply
    0
  • PHP中文网

    PHP中文网2017-04-17 17:50:02

    I have used nokogiri when writing ruby, but for high efficiency, python is more convenient

    reply
    0
  • PHPz

    PHPz2017-04-17 17:50:02

    node +1

    reply
    0
  • 大家讲道理

    大家讲道理2017-04-17 17:50:02

    Language is not a problem. The specific business depends on the module. There must be a useful http library, a useful concurrency library, a useful job scheduling library, and a useful markup language parsing library. These are all available and the language has good performance. Having a more beautiful syntax depends on whether most people in the company can accept this language. From a broad perspective, python, java, ruby, nodejs, c# all meet these conditions. As for how to choose, it depends on the following conditions.

    reply
    0
  • PHP中文网

    PHP中文网2017-04-17 17:50:02

    We wrote it in ruby

    reply
    0
  • Cancelreply