巴扎黑2017-04-18 10:12:29
Try without tuples
print h2, a
It should still be a leftover encoding problem
When printing, the __str__() of tuple is actually called
>>> h = u'你好'
>>> (h, 8).__str__()
"(u'\u4f60\u597d', 8)"
巴扎黑2017-04-18 10:12:29
It is caused by different encoding methods. The encoding of the windows platform is generally gbk and isoxxx. Check the encoding method of the web (you can check it in chrome), and then convert the encoding to the same as the system and it will be ok
高洛峰2017-04-18 10:12:29
In fact, you can output Chinese by outputting h2 alone. If you have to output tuples like you do, refer to the code below
from __future__ import unicode_literals
#-*-coding:utf-8-*-
import requests
from bs4 import BeautifulSoup
res = requests.get('http://news.sina.com.cn/china/')
res.encoding='utf-8'
soup=BeautifulSoup(res.text,'html.parser')
for news in soup.select('.news-item'):
if len(news.select('h2'))>0:
h2=news.select('h2')[0].text
a=news.select('a')[0]['href']
test = str((h2, a))
print(test.decode("unicode-escape"))
巴扎黑2017-04-18 10:12:29
If you encounter coding problems and want to understand the historical origins of coding, you can read this article, http://foofish.net/python-cha... You will know how to analyze the problem when you encounter coding in the future.
PHPz2017-04-18 10:12:29
The beginning of
u'' indicates that it is already unicode. There is no problem with the encoding, but there is a problem with the way you print. If you change it to this in 2.7, it should be fine
print '%s,%s'%(h2, a)