python3.x - How to use maketrans in python in utf-8 files

Question

I wrote a file to process text, which is to replace all the symbols in the text with spaces. Use maketrans and translate in python. It is normal when using ASCII encoded files, but when using UTF-8 files, an error is reported, prompting that the parameters in maketrans are not of equal length...

滿天的星座 · Answer

First of all, the lengths of these two strings are not equal, " 是一个字符， \ 也是一个字符
你可以用 len() check.
Then for questions about strings, it’s best to indicate the version of python

maketrans Parameter lengths are not equal

 my_substitutions = the_text.maketrans(
        # If you find any of these
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!\"#$%&()*+,-./:;<=>?@[]^_`{|}~'\",
        # Replace them by these
        "abcdefghijklmnopqrstuvwxyz                                            ")

Test code:

from string import translate, maketrans

def text_to_words(the_text):
    """ 
        Return a list of words with all punctuation removed,
        and all in lowercase.
    """
    my_substitutions = maketrans(
        # If you find any of these
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!\"#$%&()*+,-./:;<=>?@[]^_`{|}~'\",
        # Replace them by these
        "abcdefghijklmnopqrstuvwxyz                                          ")
    # Translate the text now.
    cleaned_text = the_text.translate(my_substitutions)
    wds = cleaned_text.split()
    return wds

text_to_words('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!\"#$%&()*+,-./:;<=>?@[]^_`{|}~\'\测试')

output

['abcdefghijklmnopqrstuvwxyz', '\xe6\xb5\x8b\xe8\xaf\x95']

This is the result of running python2

python3.x - How to use maketrans in python in utf-8 files

reply all(1)I'll reply