Home  >  Article  >  Backend Development  >  Python crawler to create Scrapy crawler framework in Anaconda environment

Python crawler to create Scrapy crawler framework in Anaconda environment

不言
不言Original
2018-09-07 15:38:426680browse

How to create Scrapy crawler framework in Anaconda environment? This article will introduce you to the steps to create a Scrapy crawler framework project in the Anaconda environment. It is worth reading.

Python crawler tutorial-31-Create Scrapy crawler framework project

First of all, this article is in the Anaconda environment, so if Anaconda is not installed, please go to the official website to download and install it first

Anaconda download address: https://www.anaconda.com/download/

Creation of Scrapy crawler framework project

0. Open [cmd]

1. Enter the Anaconda environment you want to use

Here we have created the project and analyze the role of the automatically generated files

1. The environment name can be found under [Settings] in [Pycharm] Find

Python crawler to create Scrapy crawler framework in Anaconda environment

under [Project:] 2. Use the command: activate environment name, for example:

activate learn

3. Enter the desired The directory where the scrapy project is to be stored [Note]

4. New project: scrapy startproject xxx project name, for example:

scrapy startproject new_project

5. Operation screenshot:

Python crawler to create Scrapy crawler framework in Anaconda environment

6. Open the directory in the file explorer and you will find that several files have been generated

Python crawler to create Scrapy crawler framework in Anaconda environment

7. Just use Pycharm to open the directory where the project is located

Development of Scrapy crawler framework project

0. Use Pycharm to open the project and take a screenshot:
Python crawler to create Scrapy crawler framework in Anaconda environment

The general process of project development:

The address spider/xxspider.py is responsible for decomposing and extracting the downloaded data

1. Clarify the target/product that needs to be crawled: write item. py

2. Download and create a python file in the spider directory to make a crawler:

3. Store content: pipelines.py

Pipeline.py file

Called when the spider object is closed

Called when the spider object is opened

Initializes some necessary parameters

The item extracted by spider is used as Parameters are passed in, and spider

This method must be implemented

Must return an Item object, and the discarded item will not be corresponding to the subsequent pipeline

pipelines file

After the crawler extracts the data and stores it in the item, the data saved in the item needs further processing, such as cleaning, deworming, storage, etc.

Pipeline needs to process the process_item function

process_item

_ init _: constructor

open_spider(spider):

close_spider(spider):

Spider directory

corresponds to the files under the folder spider

_ init _: initialize the crawler name, start _urls list

start_requests: generate Requests object exchange Download and return the response to Scrapy

parse: parse the corresponding item according to the returned response, and the item automatically enters the pipeline: if necessary, parse the url, and the url is automatically handed over to the requests module, and the cycle continues

start_requests: This method can be called once, read the start _urls content and start the loop process

name: Set the crawler name

start_urls: Set the url to start the first batch of crawling

allow_domains: list of domain names that spider is allowed to crawl

start_request(self): only called once

parse: detection encoding

log: log record

Related recommendations:

Detailed explanation of scrapy examples of python crawler framework

Scrapy crawler introductory tutorial four Spider (crawler)

A simple example of writing a web crawler using Python’s Scrapy framework

The above is the detailed content of Python crawler to create Scrapy crawler framework in Anaconda environment. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn