Home  >  Article  >  Find similar text using regular expressions

Find similar text using regular expressions

王林
王林forward
2024-02-14 19:03:08804browse

php editor Youzi regular expression is a powerful text matching tool that can help us quickly find similar text. Whether in string processing, data extraction or validating input, regular expressions play an important role. Its flexibility and efficiency enable us to handle complex text operations more conveniently, greatly improving development efficiency. Whether you are a beginner or an experienced developer, mastering regular expressions is an essential skill. Let's explore its charm together!

Question content

I identified text lists in different pdf documents. Now I need to extract some values ​​from each text using regular expressions. Some of my patterns look like this:

some text[ -]?(.+)[ ,-]+some other text

But the problem is that some letters may be wrong after recognition ("0" replaces "o", "i" replaces "l " wait). That's why my pattern doesn't match it.

I want to use a regular expression like jaro-winkler or levenshtein similarity so that I can extract my_value from s0me text my_value, some other text etc.

I know this looks awesome. But maybe there is a solution to this problem.

btw I'm using java but solutions in other languages ​​are acceptable

Workaround

If used in pythonregex module, you can use fuzzy matching. The following regular expression allows up to 2 errors per phrase. You can use more complex error tests (for insertions, substitutions and deletions), see the linked documentation for details.

import regex

txt = 's0me text my_value, some otner text'
pattern = regex.compile(r'(?:some text){e<=2}[ -]?(.+?)[ ,-]+(?:some other text){e<=2}')

m = pattern.search(txt)
if m is not none:
    print(m.group(1))

Output:

my_value
rrree

Regular expression pattern(?i)(some\s*\w*\s*text\s*)([^,] ) Used to capture phrases similar to "some text" , followed by any character

before the comma

The above is the detailed content of Find similar text using regular expressions. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:stackoverflow.com. If there is any infringement, please contact admin@php.cn delete