search

Home  >  Q&A  >  body text

sublime-text - python爬虫编码问题

PHP中文网PHP中文网2811 days ago440

reply all(8)I'll reply

  • 巴扎黑

    巴扎黑2017-04-18 10:12:29

    Try without tuples

    print h2, a

    It should still be a leftover encoding problem

    When printing, the __str__() of tuple is actually called

    >>> h = u'你好'
    >>> (h, 8).__str__()
    "(u'\u4f60\u597d', 8)"

    reply
    0
  • 巴扎黑

    巴扎黑2017-04-18 10:12:29

    It is caused by different encoding methods. The encoding of the windows platform is generally gbk and isoxxx. Check the encoding method of the web (you can check it in chrome), and then convert the encoding to the same as the system and it will be ok

    reply
    0
  • 高洛峰

    高洛峰2017-04-18 10:12:29

    In fact, you can output Chinese by outputting h2 alone. If you have to output tuples like you do, refer to the code below

    from __future__ import unicode_literals
    #-*-coding:utf-8-*-
    import requests
    from bs4 import BeautifulSoup
    res = requests.get('http://news.sina.com.cn/china/')
    res.encoding='utf-8'
    soup=BeautifulSoup(res.text,'html.parser')
    for news in soup.select('.news-item'):
        if len(news.select('h2'))>0:
            h2=news.select('h2')[0].text
            a=news.select('a')[0]['href']
            test = str((h2, a))
            print(test.decode("unicode-escape"))

    reply
    0
  • 巴扎黑

    巴扎黑2017-04-18 10:12:29

    If you encounter coding problems and want to understand the historical origins of coding, you can read this article, http://foofish.net/python-cha... You will know how to analyze the problem when you encounter coding in the future.

    reply
    0
  • 大家讲道理

    大家讲道理2017-04-18 10:12:29

    python3

    reply
    0
  • PHPz

    PHPz2017-04-18 10:12:29

    The beginning of

    u'' indicates that it is already unicode. There is no problem with the encoding, but there is a problem with the way you print. If you change it to this in 2.7, it should be fine

    print '%s,%s'%(h2, a)

    reply
    0
  • 高洛峰

    高洛峰2017-04-18 10:12:29

    After reading it, just convert it directly into a string

    reply
    0
  • PHP中文网

    PHP中文网2017-04-18 10:12:29

    print(h2 + a)

    reply
    0
  • Cancelreply