首頁  >  問答  >  主體

python特定段落的文本匹配

a='''
[Scene: Central Perk, Chandler, Joey, Phoebe, and Monica are there.]
Monica: There's nothing to tell! He's just some guy I work with!
Joey: C'mon, you're going out with the guy! There's gotta be something wrong with him!
Chandler: All right Joey, be nice.? So does he have a hump? A hump and a hairpiece?
Phoebe: Wait, does he eat chalk?
[Scene: Chandler, Joey,abcsde.]
Phoebe: Just, 'cause, I don't want her to go through what I went through with Carl- oh!
Monica: Okay, everybody relax. This is not even a date. It's just two people going out to dinner and- not having sex.
Chandler:## Sounds like a date to me.
[Scene: Joey.]
'''

我有一段文本a,如上,
我想取得每個場景的對話文本,保存成lsit,每個場景的區分是[Scene: 加一句英文.],如上面加粗的部分
然後用正規表示式寫,
paragraphs = re.findall('[Scene: w .](.*?)[Scene: w .]',a,re.S)

我發現沒有符合出內容來,paragraphs是個空的,
請問錯誤的原因在哪,該如何去匹配每一場景的對話內容?
謝謝。

PHP中文网PHP中文网2687 天前680

全部回覆(1)我來回復

  • 滿天的星座

    滿天的星座2017-05-18 10:59:26

    錯誤有幾點
    沒有使用原生字串
    沒有轉義[

    以下是我修改後的程式碼。

    paragraphs = re.findall(r"\[Scene: [\w\s,]+\.]\s([^[]+)\s(?=\[Scene: [\w\s,]+\.])", a, re.S)
    

    python正規表示式指引
    http://www.cnblogs.com/huxi/a...

    回覆
    0
  • 取消回覆