Home  >  Q&A  >  body text

python3.x - Python网络爬虫学习建议,初学者需要哪些准备?

此前一直是做PHP开发的,现在想学习下爬虫开发,很疑惑呀不知道从何做起,请大家指教下学习线路,我是属于想要深入研究型的。网上看到很多示例感觉就像做采集,Url扩散爬去和分析部分的资料很少...求推荐学习线路、数据、视频等各种,能介绍下避坑攻略就更好啦。

PHPzPHPz2741 days ago865

reply all(3)I'll reply

  • PHP中文网

    PHP中文网2017-04-18 10:33:48

    Having done web development, I think making a crawler is very simple. Just make sure that this is the http protocol and it will be ok

    Just tell me a few points

    • Crawling speed (control vs. speed trade-off)

      • Multi-threading

      • Multiple processes

        • Message Queue

    • Web page analysis

      • Interface discovery-> Make good use of F12.Network

      • xpath, re and other parsing libraries

      • Structured data

    • Persistence->Database connection pool->Enable database connections to a certain number

    • Anti-crawler

      • Ban IP->Proxy Pool->How to use proxy more rationally

      • Verification code->OCR

    reply
    0
  • 迷茫

    迷茫2017-04-18 10:33:48

    You can first use PHP to implement the crawler and understand the principles. Curl can also do it, language is just a tool

    reply
    0
  • 天蓬老师

    天蓬老师2017-04-18 10:33:48

    Read a book called "Python Web Crawler".

    reply
    0
  • Cancelreply