Home  >  Q&A  >  body text

python - 使用Scrapy中的Request的时候,怎么把拿到的内容编码转换为utf-8?

当使用第三方库requests的时候,可以这样转换:

import requests

html = requests.get('http://example.com')
html.encoding = 'utf-8'

问题:
使用Scrapy中的Request的时候,怎么把拿到的内容编码转换为utf-8?

demo:

import scrapy


class StackOverflowSpider(scrapy.Spider):
    name = 'stackoverflow'
    start_urls = ['http://stackoverflow.com/questions?sort=votes']

    def parse(self, response):
        for href in response.css('.question-summary h3 a::attr(href)'):
            full_url = response.urljoin(href.extract())
            yield scrapy.Request(full_url, callback=self.parse_question)

    def parse_question(self, response):
        yield {
            'title': response.css('h1 a::text').extract_first(),
            'votes': response.css('.question .vote-count-post::text').extract_first(),
            'body': response.css('.question .post-text').extract_first(),
            'tags': response.css('.question .post-tag::text').extract(),
            'link': response.url,
        }
PHPzPHPz2741 days ago1064

reply all(2)I'll reply

  • 大家讲道理

    大家讲道理2017-04-18 09:08:14

    Trying to answer your question, I feel like your understanding of python coding is a bit off.
    1. Both requests and requests are just implementation packages of the http protocol.
    The encoding of the packet return message comes from the website visited by the HTTP protocol. The encoding format will be written in the header of the http protocol.
    For example:
    r=requests.get('http://www.baidu.com')
    print r.headers['Content-Type']
    Output:
    text/html;charset=UTF-8
    This shows the UTF-8 format of the response message.
    Scrapy.Request is the same.
    2. If the returned charset=gbk2312, you can determine whether to transcode it to the encoding you need based on your code needs.
    r=requests.get('http://www.baidu.com')
    print r.content[:1000].decode('utf-8')
    print r.content[:1000].decode(' utf-8').encode('gbk')

    reply
    0
  • PHP中文网

    PHP中文网2017-04-18 09:08:14

    Just use decode and encode, regardless of whether it’s scrapy or not.

    reply
    0
  • Cancelreply