Home > Article > Backend Development > Regular Expression Tutorial - Subexpression Usage Analysis
The examples in this article describe the usage of subexpressions in the regular expression tutorial. Share it with everyone for your reference, the details are as follows:
Note: In all examples, the regular expression matching results are included between [and] in the source text. Some examples will be implemented using java. If The usage of regular expressions in Java itself will be explained in the corresponding places. All java examples are tested under JDK1.6.0_13.
1. Introduction to the problem
First let’s look at an example. Although some phrases such as Windows 2000 are composed of multiple words, they are actually a whole. Non-newline spaces can be used in HTML pages ( That is, non-breaking space) to make it appear on one line in the browser, now to match multiple such spaces:
Text: Your operation system is Windows 2000.
Regular expression: nbsp;{2,}
Result: Your operation system is Windows 2000.
Analysis: The pattern used here wants to match 2 or more non-newline spaces, but it can be seen from the results , nothing is matched, because the pattern nbsp;{2,} can only match text starting with nbsp;;;;;;;, with 2 or more consecutive semicolons like nbsp;;;;;;;.
Because the repeated matching mentioned earlier is multiple repetitions of the character immediately before the repeated matching metacharacter, but what should we do if we want to match a string multiple times? ?
2. Subexpression
From the above we derive the subexpression. A subexpression is part of a larger expression. The purpose of dividing an expression into multiple subexpressions is to use those subexpressions as an independent element. Subexpressions must be enclosed in (and). Therefore, the regular expression in the previous example should be written as (nbsp;){2,}.
Let’s look at a regular expression that matches a valid year:
Text: 1988-11-13
Regular expression: (19|20)\d{2}
Result: [1988]-11-13
Analysis: In this example, in order to exclude meaningless years, the first two digits of the year are limited to 19 or 20, | is a regular expression or operator inside. Here you must put 19|20 into a subexpression, that is (19|20), otherwise it can only match the year starting with 20,
3. Nesting of subexpressions
Subexpressions are allowed to be nested, and multiple levels of nesting are allowed. There is no limit to the nesting level in theory.
In the expression ((A)(B(C))), there are the following sub-expressions:
1 ((A)(B(C)))
2 (A)
3 (B(C))
4 (C)
There are 4 in total, and the 0th one always represents the entire expression. In the following back references, we will introduce the use of \n (n is the number of the subexpression) to reference subexpressions.
For examples of nested subexpressions, see the regular expression matching IPV4 addresses later.
I hope this article will be helpful for everyone to learn regular expressions.
For more articles related to regular expression tutorial and subexpression usage analysis, please pay attention to the PHP Chinese website!