python - Using CrawlSpider in scrapy, urls cannot be matched

Question

My crawler code is as follows. The rules are not obtained. I don’t know what the problem is? {Code...} Run error reminder: {Code...}

世界只因有你 · Answer

The main problem is allow_domains. Your extraction rules are fine. If you write the code like this, you can capture the link

# encoding: utf-8
import time
from tutorial.items import CrawlerItem
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor


class MoyanSpider(CrawlSpider):
    name = 'maoyan'
    allowed_domains = ["maoyan.com"]
    start_urls = ['http://maoyan.com/films']

    rules = (
        Rule(LinkExtractor(allow=(r"films/\d+.*")), callback='parse_item', follow=True),
    )

    def parse_item(self, response):
        print(response.url)
        item = CrawlerItem()
        try:

            time.sleep(2)
            item['name'] = response.text.find("p", class_="movie-brief-container").find("h3", class_="name").get_text()
            item['score'] = response.text.find("p", class_="movie-index-content score normal-score").find("span",
                                                                                                       class_="stonefont").get_text()
            url = "http://maoyan.com" + response.text.find("p", class_="channel-detail movie-item-title").find("a")["href"]
            item['id'] = response.url.split("/")[-1]
            temp = response.text.find("p", "movie-brief-container").find("ul").get_text()
            temp = temp.split('
')
            item['tags'] = temp[1]
            item['countries'] = temp[3].strip()
            item['duration'] = temp[4].split('/')[-1]
            item['time'] = temp[6]
            return item
        except Exception as e:
            print(e)

Mainly allow_domain别带上http://strings.

In addition, there is something wrong with your parsing module. I have not modified it for you. You should be able to modify it yourself once you have the data.

Also, let me complain about the previous classmate. He didn’t debug his code at all, and he still answered like this. It’s obviously misleading.

習慣沉默 · Answer

Several module components have been deprecated, allowing you to use similar modules instead

阿神 · Answer

Just a warning, no errors. Maybe the website you crawled has taken anti-crawler measures, causing you to be unable to obtain it normally.

python - Using CrawlSpider in scrapy, urls cannot be matched

reply all(3)I'll reply