Home  >  Q&A  >  body text

python菜鸟 想做一个简单的爬虫 求教程

python菜鸟 想做一个简单的爬虫 求教程 ps:一般公司做爬虫采集的话常用什么语言

PHP中文网PHP中文网2742 days ago1280

reply all(21)I'll reply

  • PHPz

    PHPz2017-04-17 14:29:26

    Scrapy is a better choice, it is relatively simple, here is an introductory tutorial

    reply
    0
  • 天蓬老师

    天蓬老师2017-04-17 14:29:26

    You can first use a crawler framework to implement business logic, such as scrapy, and then slowly replace the framework according to your own needs. Finally, you will find that you have implemented a crawler framework

    reply
    0
  • 大家讲道理

    大家讲道理2017-04-17 14:29:26

    Python’s Scrapy is great for writing crawlers. Attached is a very simple welfare crawler I wrote

    https://github.com/ZhangBohan/fun_crawler

    reply
    0
  • 高洛峰

    高洛峰2017-04-17 14:29:26

    You can use urllib/urllib2/requests to capture content. Requests is recommended.
    You can use BeautifulSoup to analyze the content, or you can use regular or violent string parsing.

    reply
    0
  • ringa_lee

    ringa_lee2017-04-17 14:29:26

    http://cuiqingcai.com/1052.html

    I’ve been learning Python crawler recently, and I find it very interesting, and it really makes life a lot easier. During the learning process, I summarized some study notes, and also recorded some small crawlers that I actually wrote. I will share them with you here. I hope it will be helpful to children who are interested in Python crawlers. If you have the opportunity, I look forward to communicating with you. .

    1. Introduction to Python

    1. A review of getting started with Python crawlers

    2. Introduction to Python crawler 2: Basic understanding of crawlers

    3. Introduction to Python crawler 3: Basic use of Urllib library

    4. Introduction to Python crawler 4: Advanced usage of Urllib library

    5. Getting Started with Python Crawler 5: URLError Exception Handling

    6. Introduction to Python Crawler 6: Use of Cookies

    7. Getting Started with Python Crawler Seven Regular Expressions

    2. Python Practical Combat

    1. Practical combat of Python crawler: Crawling embarrassing encyclopedia jokes

    2. Python Crawler Practical Combat 2 Crawling*

    3. Python crawler practice three: Calculating university grade points for this semester

    4. Python crawler practice four to capture Taobao MM photos

    5. Python crawler practice five simulations of logging into Taobao and getting all orders

    3. Python Advanced

    1. Python crawler advanced one - crawler framework Scrapy installation configuration

    These are the articles for now. They will be updated as the study progresses, so stay tuned~

    Hope it helps everyone, thank you!

    Please indicate when reprinting: Jingmi » Python crawler learning tutorial series

    reply
    0
  • 高洛峰

    高洛峰2017-04-17 14:29:26

    If you just want a spider that works
    http://segmentfault.com/blog/eric/1190000002543828

    reply
    0
  • 黄舟

    黄舟2017-04-17 14:29:26

    https://github.com/binux/pyspider
    Powerful WebUI with script editor, task monitor, project manager and result viewer

    reply
    0
  • 高洛峰

    高洛峰2017-04-17 14:29:26

    Crawling anime pictures on Konachan. This was done when I first learned crawling. I can make do with it after getting started

    reply
    0
  • 高洛峰

    高洛峰2017-04-17 14:29:26

    For simple purposes, you can use: to obtain web pages, you can use beautifulsoup, regular, and urllib2.
    For in-depth analysis, you can look at some open source frameworks, such as Python's scrapy, etc.
    You can also look at some video tutorials, such as
    A word from Geek Academy, practice more. . .

    reply
    0
  • 天蓬老师

    天蓬老师2017-04-17 14:29:26

    Here is an existing example, you can refer to it:
    How to crawl business information on Dianping.com (with chestnuts and codes attached)

    reply
    0
  • Cancelreply