Home >Web Front-end >JS Tutorial >How to write regex to match a group of characters

How to write regex to match a group of characters

php中世界最好的语言
php中世界最好的语言Original
2018-03-30 09:51:071479browse

This time I will show you how to write a regular expression to match a group of characters. What are the precautions for matching a group of characters with a regular expression? The following is a practical case, let's take a look.

The example in this article describes the method of matching a group of characters in the

regular expressiontutorial. Share it with everyone for your reference, as follows:

Note: In all examples, the regular expression matching results contain [ and ]## in the source text. #, some examples will be implemented using Java. If it is the usage of regular expressions in Java itself, it will be explained in the corresponding place. All java examples are tested under JDK1.6.0_13.

1. Match one of multiple characters

A match in the previous article "Regular Expression Tutorial: Detailed Explanation of Matching a Single Character" In the example of a text file starting with na or sa, the regular expression used is .a.\.txt. If there is another file called cal.txt, it will also be matched. What should I do if I only want to match files starting with na or sa?

Since we only want to find n or s, using one that can match any character is obviously not possible. In regular expressions, we can use [and] to define a

character set

combination. In the character set defined using [and], all characters between these two metacharacters are the A component of a set. The matching result of a character set is text that can match any member of the set. Let’s look at an example similar to the previous one:

Text:

sales.txt

na1 .txt

na2.txt

sa1.txt

sanatxt.txt

cal.txt

Regular expression:

[ns]a.\.txt

Result:

sales.txt

【na1.txt】

【na2.txt】

【sa1.txt】

sanatxt.txt

##cal.txt

Analysis : The regular expression used here starts with [na]. This set will match the characters n or s and will not match any other characters. [ and ] do not match any characters; they only define a set of characters. Next, a matches a character a, \. will match a . character itself, txt matches the txt character itself, and the matching results are consistent with our expectations.

However, if one of the files is usa1.txt, then it will also be matched. This is a problem of positional matching, which will be discussed later.

2. Use the character set interval

In the above example, what if we only want to match files that start with na or sa and are followed by a number? In the regular expression [ns]a.\.txt, . will match any character, including numbers. This problem can be solved using the character set:

sales.txt

na1.txt

na2. txt

sa1.txt

san.txt

sanatxt.txt

cal.txt

Regular expression: [ns]a[0123456789]\.txt

Result:

sales.txt

【na1.txt】

【na2.txt】

【sa1.txt】

san.txt

sanatxt.txt

cal.txt

Analysis: As you can see from the results, we only match those starting with na or sa , followed by a number file, and san.txt was not matched because the character set [0123456789] was used to limit the third character to only a number.

In regular expressions, some character intervals are frequently used, such as 0-9, a-z, etc. In order to simplify the definition of character intervals, regular expressions provide a special metacharacter - to Define character range. Like the example above, we can use regular expressions to match: [ns]a[0-9]\.txt, and the result is exactly the same as above.

The character range is not limited to numbers. The following are legal character ranges:

[A-F]: Matches all uppercase letters from A to F.

[A-Z]: Matches all uppercase letters from A to Z.

[A-z]: Matches all letters from ASCII character A to ASCII character z. But this interval is generally not used, it is just an example. Because they also contain characters such as [ and ^, which are arranged between Z and a in ASCII.

The first and last characters of the character interval can be any character in the ASCII character list. But in actual use, the most commonly used ranges are numbers and alphabetic characters.

Note: When defining a character interval, the last character of the interval cannot be smaller than the first character (such as [9-0]). This is not allowed. - as a metacharacter can only appear between [ and ], if it is anywhere outside [ and ], it is just an ordinary character and will only match - itself.

Multiple character ranges can be given in the same character set. For example: [0-9a-zA-Z] will match any uppercase and lowercase letters and numbers.

Let’s look at an example of matching colors in a web page:

Text:

<span style="background-color:#3636FF;height:30px; width:60px;">测试</span>

Regular expression: #[0-9A-Fa-f] [0 -9A-Fa-f] [0-9A-Fa-f] [0-9A-Fa-f] [0-9A-Fa-f] [0-9A-Fa-f]

Result:【#3636FF】;height:30px; width:60px;">Test

Analysis: In web pages, color is generally expressed as an RGB value starting with #, R represents red, G represents green, and B represents blue. Any color can be blended through different combinations of RGB. RGB values ​​are represented by hexadecimal values, such as #000000 representing white, #FFFFFF representing black, and #FF0000 representing red. Therefore, the regular expression for matching colors in web pages starts with #, followed by the same set of 6 [0-9A-Fa-f] characters (this can be abbreviated as #[0-9A-Fa-f]{6}, This will be discussed later in Repeat Matching).

3. Get non-matching

Character sets are usually used to specify a set of characters that must match one of them, but in some cases, we need to do the opposite. , gives a set of characters that do not need to be obtained. In other words, except for the characters in that character set, any other characters can be matched.

For example, to match files that begin with na or sa and are not followed by numbers:

Text:

sales.txt

na1.txt

na2.txt

sa1.txt

sanatxt.txt

san.txt

Regular expression: [ns]a[^0-9]\.txt

Result:

sales.txt

na1.txt

na2. txt

sa1.txt

sanatxt.txt

【san.txt】

Analysis: The pattern used in this example is exactly the opposite of the previous one. The previous [0-9] only matched numbers, but here [^0-9] matched non-numbers.

Note: ^ between [and] means negation. If it appears at the beginning of the regular expression, it means that the positional match is matched, which will be discussed later. At the same time, the effect of ^ will apply to all characters or character intervals in a given character set, not just the character or character interval immediately following the ^ character. For example, [^0-9a-z] means it does not match any numbers or lowercase letters.

4. Summary

Metacharacters [and] are used to define a set of characters, and their meaning is that they must match one of the characters in the set. There are two ways to define a character set: one is to list all characters; the other is to use metacharacters - given in the form of character intervals. Character sets can be negated using the metacharacter ^, which will forcibly exclude the given character set from the matching operation. Except for the characters in the character set, other characters can be matched.

In the next article, we will discuss the use of some metacharacters in regular expressions.

I believe you have mastered the method after reading the case in this article. For more exciting information, please pay attention to other related articles on the php Chinese website!

Recommended reading:

JS password strength verification regular expression (with code)

Regular expression in JQ Verification cannot contain Chinese methods

The above is the detailed content of How to write regex to match a group of characters. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn