Home  >  Q&A  >  body text

字符串 - Python编码问题?

我用Python3的requests库从一个api请求一个json数据,然后试图去print出来:


    res = requests.get("http://aaa.com/bbb.php")
    res.encoding='utf-8'
    name = res.json(encoding = "utf8")["name"]
    print(name)

也试了一下方法:

name.encode('utf8').decode("utf8")
print(name)

我这个name字符串有可能有中文,数字,英文,也有可能有阿拉伯文。或者只有他们之中的一个。
我每次print的时候有时候能输出成功,有时候有以下错误:

  File "demo.py", line 53, in play_one
    print(json.loads(result_str)["name"])
UnicodeEncodeError: 'gbk' codec can't encode character '\u062f' in position 0: illegal multibyte sequence

我该怎么处理这个字符串,有可能同一个字符串混有不同的编码?还是我获取到的字符串每次都是不同编码的,我应该怎么正确输去这个不确定的字符串?

PHPzPHPz2741 days ago477

reply all(2)I'll reply

  • 大家讲道理

    大家讲道理2017-04-18 10:35:39

    Standard JSON does not require specifying encoding.

    You are using the Simplified Chinese version of Windows. The system console needs to output characters in GBK encoding, but your character "U+062F د ARABIC LETTER DAL" has no correspondence in GBK encoding, so it cannot be output.

    You can choose to write to a file, or install the Arabic version of Windows. Or use another operating system/terminal with better Unicode support.

    reply
    0
  • 高洛峰

    高洛峰2017-04-18 10:35:39

    1. First you have to understand why requests have this problem

    Requests will obtain the character set encoding from the Content-Type of the response header returned by the server. If the content-type has a charset field, then requests can correctly identify the encoding. Otherwise, the default ISO-8859-1 will be used. Please read this article for details. Blog code analysis Python requests library Chinese coding issues

    Several methods are mentioned in the article, but it seems that 3.x has fixed this problem.

    1. My suggestion
      First go to the page manually to see what encoding the charset in the header part of this page is, assuming it is GBK

    resp = requests.get(item_info_url)
    resp.encoding = 'GBK'
    html = resp.text
    name = json.loads(html)['name']
    
    # or
    # 我不太用res.json这个方法==
    
    res = requests.get("http://aaa.com/bbb.php")
    res.encoding='GBK'
    name = res.json()["name"]
    print(name)

    reply
    0
  • Cancelreply