Home  >  Article  >  Backend Development  >  How to Extract Matches Between Two Strings in Logs with a Regex?

How to Extract Matches Between Two Strings in Logs with a Regex?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-10-23 22:17:02806browse

How to Extract Matches Between Two Strings in Logs with a Regex?

Regex to Extract Matches Between Two Strings

Given a large log file containing multi-line strings enclosed by specific start and end markers, the goal is to extract and print only the shortest such strings. However, the start marker is used elsewhere in the file, so a simple regex will not suffice.

To address this, we can employ the following regular expression:

(start((?!start).)*?end)

This regex matches strings that:

  • Begin with "start" followed by characters that do not contain "start".
  • End with "end".

Using Python's re.findall method with the single-line modifier (re.S), we can retrieve all such strings from the input file:

<code class="python">import re

text = """
start spam
start rubbish
start wait for it...
    profit!
here end
start garbage
start second match
win. end
"""

matches = re.findall('(start((?!start).)*?end)', text, re.S)
print(matches)</code>

This will output the desired result:

['start wait for it...
    profit!
here end', 'start second match
win. end']

The above is the detailed content of How to Extract Matches Between Two Strings in Logs with a Regex?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn