sublime-text - python爬虫编码问题

Question

跟着教程写了个爬虫，结果爬到的中文都是乱码的，应该怎么解决 python代码 {代码...} 爬取结果： (u'u539fu56fdu52a1u9662u5b98u5458uff1au804cu5de5u65e9u9000u4f11u53bbu8df3u5e7fu573au821eu662fu6d6au8d39', ...

巴扎黑 · Answer

Try without tuples

print h2, a

It should still be a leftover encoding problem

When printing, the __str__() of tuple is actually called

>>> h = u'你好'
>>> (h, 8).__str__()
"(u'\u4f60\u597d', 8)"

巴扎黑 · Answer

It is caused by different encoding methods. The encoding of the windows platform is generally gbk and isoxxx. Check the encoding method of the web (you can check it in chrome), and then convert the encoding to the same as the system and it will be ok

高洛峰 · Answer

In fact, you can output Chinese by outputting h2 alone. If you have to output tuples like you do, refer to the code below

from __future__ import unicode_literals
#-*-coding:utf-8-*-
import requests
from bs4 import BeautifulSoup
res = requests.get('http://news.sina.com.cn/china/')
res.encoding='utf-8'
soup=BeautifulSoup(res.text,'html.parser')
for news in soup.select('.news-item'):
    if len(news.select('h2'))>0:
        h2=news.select('h2')[0].text
        a=news.select('a')[0]['href']
        test = str((h2, a))
        print(test.decode("unicode-escape"))

巴扎黑 · Answer

If you encounter coding problems and want to understand the historical origins of coding, you can read this article, http://foofish.net/python-cha... You will know how to analyze the problem when you encounter coding in the future.

大家讲道理 · Answer

<p>python3</p>

PHPz · Answer

The beginning of

u'' indicates that it is already unicode. There is no problem with the encoding, but there is a problem with the way you print. If you change it to this in 2.7, it should be fine

print '%s,%s'%(h2, a)

高洛峰 · Answer

After reading it, just convert it directly into a string

PHP中文网 · Answer

print(h2 + a)

sublime-text - python爬虫编码问题

reply all(8)I'll reply