Home  >  Article  >  Backend Development  >  Regular expression to Glob and vice versa conversion

Regular expression to Glob and vice versa conversion

PHPz
PHPzforward
2024-02-06 11:03:15771browse

正则表达式到 Glob 以及反之亦然的转换

Question content

We have a requirement to convert regular expressions to Globs supported by the cloud frontend and vice versa. Any suggestions how we can achieve this and is it possible in the first place? Specifically from Regex to Glob, as I understand regex is a superset so it might not be possible to convert all Regex to corresponding Glob?


Correct answer


To convert from a glob, you need to write a parser that splits the pattern into an abstract syntax tree. For example, glob *-{[0-9],draft}.docx might resolve to [anything(), "-", oneof([range("0", "9" ), "draft"] ), ".docx"].

You would then iterate over ast and output the equivalent regular expression for each node. For example, a rule you might use for this might be:

anything()  -> .*
range(x, y) -> [x-y]
oneof(x, y) -> (x|y)

Generate regular expression .*-([0-9]|draft).docx.

This isn't perfect, as you also have to remember to escape any special characters; . is a special character in a regex, so you should escape it, resulting in .*-([0-9]|draft)\.docx.

Strictly speaking, regular expressions cannot all be converted to glob patterns. The kleene star operation does not exist in globbing; the simple regular expression a* (that is, any number of a characters) cannot be converted to a glob pattern.

I'm not sure what types of globs cloudfront supports (the documentation returns no matches for the term "glob"), but here is some documentation on commonly supported shell glob pattern wildcards .

The following is a summary of some equivalent sequences:

Glob Wildcard Regular Expression Meaning
? . Any single character
* .* Zero or more characters
[a-z] [a-z] Any character from the range
[!a-m] [^a-m] A character not in the range
[a,b,c] [abc] One of the given characters
{cat,dog,bat} (cat|dog|bat) One of the given options
{*.tar,*.gz} (.*\.tar|.*\.gz) One of the given options, considering nested wildcards

The above is the detailed content of Regular expression to Glob and vice versa conversion. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:stackoverflow.com. If there is any infringement, please contact admin@php.cn delete