python3.x - python 中的maketrans在utf-8文件中该怎么使用

Question

我写了一个处理文本的文件就是把文本中所有的符号都替换掉，替换成空格。用的python中maketrans和translate。其中在使用对于ASCII编码的文件时是正常的，但对于utf-8文件时，就报错，提示maketrans中的参数不等长...

滿天的星座 · Answer

首先这两个字符串长度不相等， " 是一个字符， \ 也是一个字符
你可以用 len() 查看。
然后关于字符串什么的问题，最好说明 python 的版本

maketrans 参数长度不相等

 my_substitutions = the_text.maketrans(
        # If you find any of these
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!\"#$%&()*+,-./:;<=>?@[]^_`{|}~'\",
        # Replace them by these
        "abcdefghijklmnopqrstuvwxyz                                            ")

测试代码：

from string import translate, maketrans

def text_to_words(the_text):
    """ 
        Return a list of words with all punctuation removed,
        and all in lowercase.
    """
    my_substitutions = maketrans(
        # If you find any of these
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!\"#$%&()*+,-./:;<=>?@[]^_`{|}~'\",
        # Replace them by these
        "abcdefghijklmnopqrstuvwxyz                                          ")
    # Translate the text now.
    cleaned_text = the_text.translate(my_substitutions)
    wds = cleaned_text.split()
    return wds

text_to_words('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!\"#$%&()*+,-./:;<=>?@[]^_`{|}~\'\测试')

output

['abcdefghijklmnopqrstuvwxyz', '\xe6\xb5\x8b\xe8\xaf\x95']

这是 python2 的运行结果

python3.x - python 中的maketrans在utf-8文件中该怎么使用

全部回复(1)我来回复