Home  >  Article  >  Backend Development  >  Regular expression with golang to match complete lines containing "error" or "warning" (case insensitive)

Regular expression with golang to match complete lines containing "error" or "warning" (case insensitive)

王林
王林forward
2024-02-13 08:21:10396browse

正则表达式与 golang 匹配包含“错误”或“警告”(不区分大小写)的完整行

php editor Baicao today introduces how to use regular expressions to match complete lines containing "error" or "warning" (case-insensitive) in golang. Regular expressions are a powerful text pattern matching tool that can help us find content that matches a specific pattern in a string. In golang, using regular expressions requires introducing the regexp package and using the Compile function to compile the regular expression. Next, we will introduce in detail how to use regular expressions for line matching in golang.

Question content

I want to print the complete line of each line in the log file that contains warn or error (case-insensitive) to the user.

Therefore:

[01-17|18:53:38.179] info server/server.go:381 this would be skipped
[01-17|18:53:38.280] info server/server.go:620 this also
[01-17|18:53:41.180] warn server/server.go:388 something is warned, so show this
[01-17|18:53:41.394] warn server/server.go:188 something reported an ->error<-
[01-17|18:53:41.395] error server/server.go:191 blabla
[01-17|18:53:41.395] debug server/server.go:196 obviously skipped
[01-17|18:53:41.395] debug server/server.go:196 this debug contains an ->error<- so match this
[01-17|18:53:41.395] warn server/server.go:198 you get the idea

I want:

[01-17|18:53:41.180] warn server/server.go:388 something is warned, so show this
[01-17|18:53:41.394] warn server/server.go:188 something reported an ->error<-
[01-17|18:53:41.395] error server/server.go:191 blabla
[01-17|18:53:41.395] debug server/server.go:196 this debug contains an ->error<- so match this
[01-17|18:53:41.395] warn server/server.go:198 you get the idea

I started innocently

errorregex := regexp.mustcompile(`(?is)error|warn`)

It will only print (from a different run, may not match the example above exactly)

warn
error

Then I thought I should change it to match more:

errorRegEx := regexp.MustCompile(`(?is).*error.*|.*warn.*`)

But this doesn’t print anything at all

How to get the complete line and all lines where warn or error (case insensitive) will match?

ps: This is different from the suggested regex matching lines containing string, as this is a question specifically for the go language, which doesn't seem to use the exact same standard engine.

Solution

Considering that the question has been marked as a dupe, the op's comments are as follows.

This question is marked as a duplicate, and the linked post has a lot of answers that we can use to try to piece together an answer to the OP's question, but it's still not complete as the answers seem to be related to pcre and go using re2.

var logs = `
[01-17|18:53:38.179] info server/server.go:381 this would be skipped
[01-17|18:53:38.280] info server/server.go:620 this also
[01-17|18:53:41.180] warn server/server.go:388 something is warned, so show this
[01-17|18:53:41.394] warn server/server.go:188 something reported an ->error<-
[01-17|18:53:41.395] error server/server.go:191 blabla
[01-17|18:53:41.395] debug server/server.go:196 obviously skipped
[01-17|18:53:41.395] debug server/server.go:196 this debug contains an ->error<- so match this
[01-17|18:53:41.395] warn server/server.go:198 you get the idea
`

func init() {
    logs = strings.trimspace(logs)
}

First of all, I don't understand why this prints nothing for op:

Then I thought I should change it to match more:

errorregex := regexp.mustcompile(`(?is).*error.*|.*warn.*`)

But this doesn’t print anything at all

Because all contents should be printed:

fmt.println("original regexp:")
reoriginal := regexp.mustcompile(`(?is).*error.*|.*warn.*`)
lines := reoriginal.findallstring(logs, -1)

fmt.println("match\t\tentry")
fmt.println("=====\t\t=====")
for i, line := range lines {
    fmt.printf("%d\t\t%q\n", i+1, line)
}
original regexp:
match           entry
=====           =====
1               "[01-17|18:53:38.179] info server/server.go:381 this would be skipped\n[01-17|18:53:38.280] info server/server.go:620 this also\n[01-17|18:53:41.180] warn server/server.go:388 something is warned, so show this\n[01-17|18:53:41.394] warn server/server.go:188 something reported an ->error<-\n[01-17|18:53:41.395] error server/server.go:191 blabla\n[01-17|18:53:41.395] debug server/server.go:196 obviously skipped\n[01-17|18:53:41.395] debug server/server.go:196 this debug contains an ->error<- so match this\n[01-17|18:53:41.395] warn server/server.go:198 you get the idea"
The

s flag in (?is)... means to match newlines with dots (.)^1 , and because your stars (*) are greedy^2, they will match the entire string if an "error" or "warning" is found All content.

The real solution is to not match "\n" with dots - remove the s flag and you get what you want:

fmt.println("whole text:")
rewholetext := regexp.mustcompile(`(?i).*error.*|.*warn.*`)
lines = rewholetext.findallstring(logs, -1)

fmt.println("match\t\tentry")
fmt.println("=====\t\t=====")
for i, line := range lines {
    fmt.printf("%d\t\t%q\n", i+1, line)
}
whole text:
match           entry
=====           =====
1               "[01-17|18:53:41.180] warn server/server.go:388 something is warned, so show this"
2               "[01-17|18:53:41.394] warn server/server.go:188 something reported an ->error<-"
3               "[01-17|18:53:41.395] error server/server.go:191 blabla"
4               "[01-17|18:53:41.395] debug server/server.go:196 this debug contains an ->error<- so match this"
5               "[01-17|18:53:41.395] warn server/server.go:198 you get the idea"

Now we match between instances of "\n" (valid rows), because we use the all form, which only looks for non-overlapping matches:

If "all" is present, the routine will match consecutive non-overlapping matches of the entire expression. ^3

We get complete and clear lines.

You can tighten this regex slightly:

`(?i).*(?:error|warn).*` // "anything before either "error" or "warn" and anything after (for a line)"

(?:...) is a non-capturing group^1 since you don't seem to care about individual "bugs" in each match or "warning" instance.

Also, I still want to show that splitting by line before trying to match gives you more control/precision and makes the regex very easy to reason about:

r := strings.newreader(logs)
scanner := bufio.newscanner(r)

fmt.println("line-by-line:")
reline := regexp.mustcompile(`(?i)error|warn`)

fmt.println("match\tline\tentry")
fmt.println("=====\t====\t=====")

var matchno, lineno, match = 1, 1, ""
for scanner.scan() {
    line := scanner.text()
    match = reline.findstring(line)
    if match != "" {
        fmt.printf("%d\t%d\t%q\n", matchno, lineno, line)
        matchno++
    }
    lineno++
}
Line-by-line:
match   line    entry
=====   ====    =====
1       3       "[01-17|18:53:41.180] Warn server/server.go:388 Something is warned, so show this"
2       4       "[01-17|18:53:41.394] warn server/server.go:188 Something reported an ->error<-"
3       5       "[01-17|18:53:41.395] Error server/server.go:191 Blabla"
4       7       "[01-17|18:53:41.395] DEBUG server/server.go:196 This debug contains an ->error<- so match this"
5       8       "[01-17|18:53:41.395] WARN server/server.go:198 You get the idea"

All three examples are located in this playground.

The above is the detailed content of Regular expression with golang to match complete lines containing "error" or "warning" (case insensitive). For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:stackoverflow.com. If there is any infringement, please contact admin@php.cn delete