search

Home  >  Q&A  >  body text

python - How to get the content I want by including the <dl></dl> tag

1. When I add the <dl> tag, I get empty content. How should I write the matching rules? I can get the desired content without adding the <dl> tag.
2. Question code

pattern = re.compile(r'<dl>.*?<dd><a href="(.*?)">(.*?)</a></dd>.*?</dl>')

3. You can get the content you want without adding the <dl> tag

4. Attach the web page source code

<dl>
                <dt>《明末工程师》正文</dt>
                <dd><a href="/book/1440/xx">第一章 穿越后的窘境</a></dd>
</dl>
ringa_leeringa_lee2764 days ago576

reply all(2)I'll reply

  • 黄舟

    黄舟2017-05-18 10:51:18

    # 你可能需要加个模式
    # re.S    使 . 匹配包括换行在内的所有字符
    pattern = re.compile(r'<dl>.*?<dd><a href="(.*?)">(.*?)</a></dd>.*?</dl>', re.S)
    print re.findall(pattern, a)

    reply
    0
  • 迷茫

    迷茫2017-05-18 10:51:18

    // /需要转义下
    <dl>.*?<dd><a href="(.*?)">(.*?)<\/a><\/dd>.*?<\/dl>

    reply
    0
  • Cancelreply