s = u'\ud83d\udc8b'
co = re.compile( u'\ud83d\udc8b')
co.sub(u'',s)
print(u'ud83d ')
The output is as follows
UnicodeEncodeError: 'utf-8' codec can't encode character 'ud83d' in position 0: surrogates not allowed
s is probably a Weibo emoticon, but it couldn’t be displayed after working on it all afternoon. I thought about replacing it, but it couldn’t be matched. Why?
高洛峰2017-05-27 17:41:31
First of all, there are 2 questions
1. Why can’t it be displayed? 2. I want to replace it but why can’t it match?
Answer
2. Try the following code
import re
s = u'hello \ud83d\udc8b world'
co = re.compile( u'\ud83d\udc8b')
ss = co.sub(u'',s)
print(ss)