search

Home  >  Q&A  >  body text

python - How do I write regex?

s = u'\ud83d\udc8b'
co = re.compile( u'\ud83d\udc8b')
co.sub(u'',s)
print(u'ud83d ')

The output is as follows
UnicodeEncodeError: 'utf-8' codec can't encode character 'ud83d' in position 0: surrogates not allowed

s is probably a Weibo emoticon, but it couldn’t be displayed after working on it all afternoon. I thought about replacing it, but it couldn’t be matched. Why?

天蓬老师天蓬老师2729 days ago702

reply all(2)I'll reply

  • 高洛峰

    高洛峰2017-05-27 17:41:31

    First of all, there are 2 questions
    1. Why can’t it be displayed? 2. I want to replace it but why can’t it match?
    Answer

    1. Special encoding cannot be displayed on the terminal. If it is displayed on the UI, then the UI encoding needs to be set.

    2. Try the following code

    import re
    s = u'hello \ud83d\udc8b world'
    co = re.compile( u'\ud83d\udc8b')
    ss = co.sub(u'',s)
    print(ss)

    Run result:

    hello world

    reply
    0
  • 黄舟

    黄舟2017-05-27 17:41:31

    I copied them all

    reply
    0
  • Cancelreply