Keep specific html tags when splitting string

Question

I need to split a string by a specific number of tags (

,...). I came up with the regex pattern=

|

P粉787806024 · Answer

To answer your specific questions:

<(p|li|ul|ol|dl|h1|h2|h3|h4|h5|h6)>[^<]*

And match instead of split.

\1 refers to what is captured in the opening tag.

Similar to:

for match in re.finditer(r"<(p|li|ul|ol|dl|h1|h2|h3|h4|h5|h6)>[^<]*", subject, re.DOTALL):

However, in most real cases this is not sufficient to handle HTML and you should consider a DOM parser.