Home  >  Q&A  >  body text

Regular expression - How to match Chinese Pinyin using Python?

For example, use regular expressions to match the pinyin of shá.
ps: What I said before may not be clear. I used the word "for example", which means that there is pinyin in the text to be processed, but I don't know what the specific pinyin is. You need to find out these pinyin, and the text to be processed will be in Chinese. , pinyin, symbols (,.: and the like), so please do not answer questions such as re.search(u'shá',text) It needs to be regular, not a simple fixed string. . .

ringa_leeringa_lee2701 days ago1689

reply all(3)I'll reply

  • 巴扎黑

    巴扎黑2017-05-27 17:41:30

    import re
    regex = re.compile(r'\b[a-z]*[āáǎàōóǒòêēéěèīíǐìūúǔùǖǘǚǜüńňǹɑɡ]+[a-z]*\b')
    text = "Thǐs ís à pìnyin abóut shá"
    m = regex.findall(text)
    print(m)

    Matching results:
    ['ís', 'à', 'pìnyin', 'abóut', 'shá']
    The first Thǐs is not matched because the default pinyin is all lowercase, excluding uppercase.

    reply
    0
  • PHPz

    PHPz2017-05-27 17:41:30

    Do you want to match all legal pinyin?

    If so, you can find the pinyin index of a dictionary and put all the pinyin | together. It can only be like this, because Pinyin is not defined according to regular rules or some other mechanical rules. This is all you can do if you don’t miss anything and don’t have too many, and there aren’t many anyway.

    reply
    0
  • 伊谢尔伦

    伊谢尔伦2017-05-27 17:41:30

    >>> import re
    >>> d='shá'
    >>> data='This is a pinyin about shá'
    >>> re.search(d,data)
    <_sre.SRE_Match at 0x404e308>

    reply
    0
  • Cancelreply