Regular Expressions - Example


Simple expressions

The simplest form of a regular expression is a single ordinary character that matches itself in the search string. For example, a single-character pattern such as A will always match the letter A no matter where it appears in the search string. Here are some examples of single-character regular expression patterns:

/a/
/7/
/M/

Many single characters can be combined to form large expressions. For example, the following regular expression combines the single-character expressions: a, 7, and M.

/a7M/

Please note that there is no concatenation operator. Just type one character after another.

Character match

Period (.) Matches any printing or non-printing character in the string with one exception. The exception is the newline character (\n). The following regular expression matches aac, abc, acc, adc, and so on, as well as a1c, a2c, a-c, and a#c:

/a.c/

To match a string that contains a file name, and a period (.) is To enter a component of a string, precede the period in the regular expression with a backslash (\) character. To illustrate, the following regular expression matches filename.ext:

/filename\.ext/

These expressions only let you match "any" single character. It may be necessary to match specific groups of characters in a list. For example, you might want to find the numerical chapter titles (Chapter 1, Chapter 2, and so on).

Bracket expressions

To create a list of matching character groups, place one or more single characters within square brackets ([ and ]). When characters are enclosed in brackets, the list is called a "bracket expression". An ordinary character represents itself within square brackets as in any other position, that is, it matches itself once in the input text. Most special characters lose their meaning when they appear inside a square bracket expression. However, there are some exceptions, such as:

  • If the ] character is not the first item, it ends a list. To match the ] character in a list, place it first, immediately after the opening [. The

  • \ character continues as an escape character. To match \ characters, use \\.

Characters enclosed in bracket expressions match only a single character at that position in the regular expression. The following regular expression matches Chapter 1, Chapter 2, Chapter 3, Chapter 4, and Chapter 5:

/Chapter [12345]/

Note that the position of the word Chapter and the following spaces are fixed relative to the characters within the square brackets. The bracket expression specifies a character set that matches only the single character position immediately following the word Chapter and a space. This is the ninth character position.

To use a range instead of the characters themselves to represent a matching group of characters, use a hyphen (-) to separate the starting and ending characters in the range. The character value of a single character determines the relative order within the range. The following regular expression contains a range expression that is equivalent to the bracketed list shown above.

/Chapter [1-5]/

When a range is specified in this way, both the start value and the end value are included in the range. Note, it is also important to note that the start value must come before the end value in Unicode sort order.

To include a hyphen in a bracket expression, use one of the following methods:

  • Escape it with a backslash:

    [\-]
  • Put a hyphen at the beginning or end of the bracket list. The following expression matches all lowercase letters and hyphens:

    [-a-z]
    [a-z-]
  • Creates a range in which the starting character value is less than the hyphen and the ending character value is equal to or greater than the hyphen . The following two regular expressions satisfy this requirement:

    [!--]
    [!-~]

To find all characters that are not in a list or range, place the caret (^) at the beginning of the list . If the caret appears anywhere else in the list, it matches itself. The following regular expression matches any number and character other than 1, 2, 3, 4, or 5:

/Chapter [^12345]/

In the above example, the expression matches 1, 2, 3, Any numbers and characters other than 4 or 5. So, for example, Chapter 7 is a match, and Chapter 9 is also a match.

The above expression can be represented using a hyphen (-):

/Chapter [^1-5]/

The typical use of bracket expressions is to specify the match of any uppercase or lowercase letter or any number. The following expression specifies such a match:

/[A-Za-z0-9]/

Replacement and Grouping

Replacement uses the | character to allow selection between two or more replacement options. For example, the chapter title regular expression can be extended to return a wider range of matches than the chapter title. However, it's not as simple as you might think. Replacement matches the largest expression on either side of the | character.

You may think that the following expression matches a Chapter or Section that appears at the beginning and end of a line, followed by one or two numbers:

/^Chapter|Section [1-9][0-9]{0,1}$/

Unfortunately, the above regular expression Either matches the word Chapter at the beginning of the line, or matches the word Section at the end of the line and any number that follows it. If the input string is Chapter 22, then the above expression only matches the word Chapter. If the input string is Section 22, then this expression matches Section 22.

To make the regular expression more controllable, you can use parentheses to limit the scope of the replacement, i.e., ensure that it only applies to the two words Chapter and Section. However, parentheses are also used to create subexpressions and possibly capture them for later use, as discussed in the section on backreferences. You can make the regular expression match Chapter 1 or Section 3 by adding parentheses at the appropriate places in the regular expression above.

The following regular expression uses parentheses to combine Chapter and Section so that the expression works correctly:

/^(Chapter|Section) [1-9][0-9]{0,1}$/

Although these expressions work correctly, the parentheses around Chapter|Section will also Capture either of the two matching words for later use. Since there is only one set of parentheses in the above expression, there is only one "submatch" that is captured.

In the above example, you only need to use brackets to combine the words Chapter and Section. To prevent matches from being saved for future use, place ?: before the bracketed regular expression pattern. The following modification provides the same ability without saving submatches:

/^(?:Chapter|Section) [1-9][0-9]{0,1}$/

In addition to the ?: metacharacter, two other non-capturing metacharacters create something called "predictive lookahead" matching. Forward lookahead is specified using ?= , which matches the search string at the beginning of a parenthesized matching regular expression pattern. Backward lookahead is specified using ?!, which matches the search string at the beginning of a string that does not match the regular expression pattern.

For example, suppose you have a document that contains references to Windows 3.1, Windows 95, Windows 98, and Windows NT. Suppose further that you need to update the document to change all references to Windows 95, Windows 98, and Windows NT to Windows 2000. The following regular expression (which is an example of forward prediction) matches Windows 95, Windows 98, and Windows NT:

/Windows(?=95 |98 |NT )/

After a match is found, the matched text (excluding predictions) is followed by character in the preceding line) and then search for the next match. For example, if the expression above matches Windows 98, the search will continue after Windows instead of after 98.

Other Examples

Some regular expression examples are listed below:

Regular ExpressionDescription
/\b([a-z]+) \1\b/giThe position where a word appears continuously.
/(\w+):\/\/([^/:]+)(:\d*)?([^# ]*)/Resolve a URL into protocol, domain, port and relative path.
/^(?:Chapter|Section) [1-9][0-9]{0,1}$/Locate the position of the chapter.
/[-a-z]/A to z, a total of 26 letters plus a - sign.
/ter\b/ can match chapter, but not terminal.
/\Bapt/ can match chapter, but not aptitude.
/Windows(?=95 |98 |NT )/ can match Windows95 or Windows98 or WindowsNT. When a match is found, start from behind Windows. A single search match.
/^\s*$/ Matches empty lines.
/\d{2}-\d{5}/Verify an ID number consisting of two digits, a hyphen, and 5 digits.
/<\s*(\S+)(\s[^>]*)?>[\s\S]*<\s*\/\ 1\s*>/ matches HTML tags.