search
HomeWeb Front-endJS TutorialDetailed explanation of the use of regular metacharacters

Detailed explanation of the use of regular metacharacters

Mar 30, 2018 am 09:49 AM
usecharacterDetailed explanation

This time I will bring you a detailed explanation of the use of regular metacharacters. What are the precautions when using regular metacharacters? The following is a practical case, let’s take a look.

Note: In all examples, the regular expression matching result is contained between [ and ] in the source text, Some examples will be implemented using Java. If it is the usage of regular expressions in Java itself, it will be explained in the corresponding place. All java examples are tested under JDK1.6.0_13.

1. Escape special characters

Metacharacters are characters that have special meanings in regular expressions. Because metacharacters have special meanings in regular expressions, these characters cannot be used to represent themselves. You can escape a metacharacter by preceding it with a backslash, so that the resulting escape sequence will match that character itself rather than its special metacharacter meaning. For example, if you want to match [and], you must escape it:

and
.

To escape metacharacters, you need to use the slash \ character, which means that the \ character itself is also a metacharacter. To match the \ character itself, it must be escaped into \\. Such as matching windows file path.

2. Match white space characters

Metacharacters can be roughly divided into two types: one is used to match text (such as .), and the other is regular The expression's syntax requires it (such as [and]).

When performing regular expression searches, we often encounter situations where we need to match non-printing whitespace characters in the original text. For example, we may need to find all tab characters, or we need to find newline characters. Such characters are difficult to be directly input into a regular expression. In this case, we can use the special elements listed below. characters to enter them:

\b Go back (and delete) one character (Backspace key)
\f Form feed character
\n Line feed character
\r Carriage return character
\t Tab character (Tab key)
\v Vertical Tab

Let’s look at an example to remove blank lines from the file:

Text:

8 5 4 1 6 3 2 7 9
7 6 2 9 5 8 3 4 1
9 3 1 4 2 7 8 5 6

6 9 3 8 7 5 1 2 4
5 1 8 3 4 2 6 9 7
2 4 7 6 1 9 5 3 8

3 26 7 8 4 9 1 5
4 8 9 5 3 1 7 6 2
1 7 5 2 9 6 4 8 3

Regular expression: \r\n\r\n

Analysis: \r\n matches a carriage return + line feed combination, it is used as the end tag of a text line in the Windows operating system. A search using the regular expression \r\n\r\n will match two consecutive end-of-line tags, which happen to be blank lines.

Note: Unix and Linux operating systems only use a newline character to end a text line. In other words, to match blank lines in Unix or Linux systems, just use \n\n. No need to add \r. Regular expressions applicable to both windows and Unix/Linux should include an optional \r and a must-match \n, that is, \r?\n\r?\n, which will be discussed in a later article .

The Java code is as follows:

public static void matchBlankLine() throws Exception{
  BufferedReader br = new BufferedReader(new FileReader(new File("E:/九宫格.txt")));
  StringBuilder sb = new StringBuilder();
  char[] cbuf = new char[1024];
  int len = 0;
  while(br.ready() && (len = br.read(cbuf)) > 0){
    br.read(cbuf);
    sb.append(cbuf, 0, len);
  }
  String reg = "\r\n\r\n";
  System.out.println("原内容:\n" + sb.toString());
  System.out.println("处理后:-----------------------------");
  System.out.println(sb.toString().replaceAll(reg, "\r\n"));
}

The running result is as follows:

原内容:
8 5 4 1 6 3 2 7 9
7 6 2 9 5 8 3 4 1
9 3 1 4 2 7 8 5 6
6 9 3 8 7 5 1 2 4
5 1 8 3 4 2 6 9 7
2 4 7 6 1 9 5 3 8
3 2 6 7 8 4 9 1 5
4 8 9 5 3 1 7 6 2
1 7 5 2 9 6 4 8 3
 
处理后:-----------------------------
8 5 4 1 6 3 2 7 9
7 6 2 9 5 8 3 4 1
9 3 1 4 2 7 8 5 6
6 9 3 8 7 5 1 2 4
5 1 8 3 4 2 6 9 7
2 4 7 6 1 9 5 3 8
3 2 6 7 8 4 9 1 5
4 8 9 5 3 1 7 6 2
1 7 5 2 9 6 4 8 3

3. Match specific character categories

Character sets (matching one of multiple characters) are the most common form of matching, and some commonly used character sets can be replaced by special metacharacters. These metacharacters match a certain class of characters (class metacharacters). Class metacharacters are not essential because you can match a certain class of characters by enumerating the relevant characters one by one or by defining a character range, but using them The constructed regular expression is concise and easy to understand and is commonly used in practical applications.

1. Match numbers and non-numbers

\d Any number, equivalent to any one of [0-9] or [0123456789]
\D Non-digits, equivalent to [^0-9] or [^0123456789]

2. Match letters and numbers with non-letters and numbers

letters (A-Z is not Case-sensitive), numbers, and underscores are a commonly used set of characters. The following metacharacters can be used:

\w Any letter (case-insensitive), numbers, and underscores are equivalent to [0- 9a-zA-Z_]
\W Any non-alphanumeric and underscore, equivalent to [^0-9a-zA-Z_]

3. Matches whitespace characters and non-whitespace characters

\s Any white space character is equivalent to [\f\n\r\t\v]
\S Any white space character is equivalent to [^\f\n \r\t\v]

Note: The backspace metacharacter \b is not within the range of \s.

4. Match hexadecimal or octal values ​​

Hexadecimal: given with the prefix \x, for example: \x0A corresponds to the ASCII character 10 (newline character), its effect is equivalent to \n.
Octal: given with the prefix \0, the value itself can be two or three digits, for example: \011 corresponds to ASCII character 9 (tab), and its effect is equivalent to \t.

4. Use POSIX character classes

POSIX character classes are a shorthand form supported by many regular expression implementations. Java also supports it, but JavaScript does not. POSIX characters are as follows:

##[ :blank:]Space or tab character, equivalent to [\t]##[:cntrl:][:digit:][:graph:][:lower:][:print:][:punct:][:space:][:upper:][:xdigit:]

POSIX characters are not the same as the metacharacters we have seen before. Let’s look at an example of using regular expressions to match colors on a web page:

Text: background-color:#3636FF;height:30px;width:60px;">Test

Regular expression:#[[ :xdigit:]] [[:xdigit:]] [[:xdigit:]] [[:xdigit:]] [[:xdigit:]] [[:xdigit:]]

Result :【#3636FF】;height:30px;width:60px;">Test

Note: The pattern used here begins with [[ and ends with ]], which is necessary to use POSIX character classes. POSIX characters must be enclosed between [: and:], and the outer [and] characters are used to define a Set, the inner [ and ] characters are part of the POSIX character class itself.

The POSIX character representation in java is different. It is not included between [: and:], but starts with \p and is included between { and }, and the case is different. At the same time Added \p{ASCII} as follows:

[:alnum:] Any letter or number, equivalent to [a-zA-Z0-9]
[:alpha:] Any letter is equivalent to [a-zA-Z]
ASCII control character ( ASCII 0 to 31, plus ASCII 127)
Any number, equivalent to [0-9]
Any printable character, but not including spaces
Any lowercase letter, equivalent to [a-z]
Any printable character
Any character that does not belong to [:alnum:] and [:cntrl:]
Any whitespace character, including spaces, is equivalent to [^\f\n\r\t\v]
Any uppercase letter is equivalent to [A-Z]
Any hexadecimal digit is equivalent to [a- fA-F0-9]
##\p{Graph}Visible characters: [\p{Alnum}\p{Punct}]\p{Lower}Lowercase alphabetic characters: [a-z]\p{Print}Printable characters: [\p{Graph}\x20]\p{Punct}Punctuation: !"#$%&'()*+,-./:;?@[\]^_`{|}~ \p{Space}White space characters: [ \t\n\x0B\f\r]\p{Upper}uppercase Alphabetical characters: [A-Z]\p{XDigit} Hexadecimal digits: [0-9a-fA-F]
\p{Alnum} Alphanumeric characters: [\p{Alpha}\p {Digit}]
\p{Alpha} Alphabetic characters: [\p{Lower}\p{Upper}]
\p{ASCII} All ASCII: [\x00-\x7F]
\p{Blank} space or Tab character: [ \t]
\p{Cntrl} Control character: [\x00-\x1F\x7F]
\p{Digit} Decimal digits: [0-9]

## I believe you have mastered the method after reading the case in this article. For more exciting information, please pay attention to other related articles on the php Chinese website!

Recommended reading:

Position matching tutorial of regular expression tutorial (with code)


JS password strength verification regular expression (with code) Code)

The above is the detailed content of Detailed explanation of the use of regular metacharacters. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Python and JavaScript: Understanding the Strengths of EachPython and JavaScript: Understanding the Strengths of EachMay 06, 2025 am 12:15 AM

Python and JavaScript each have their own advantages, and the choice depends on project needs and personal preferences. 1. Python is easy to learn, with concise syntax, suitable for data science and back-end development, but has a slow execution speed. 2. JavaScript is everywhere in front-end development and has strong asynchronous programming capabilities. Node.js makes it suitable for full-stack development, but the syntax may be complex and error-prone.

JavaScript's Core: Is It Built on C or C  ?JavaScript's Core: Is It Built on C or C ?May 05, 2025 am 12:07 AM

JavaScriptisnotbuiltonCorC ;it'saninterpretedlanguagethatrunsonenginesoftenwritteninC .1)JavaScriptwasdesignedasalightweight,interpretedlanguageforwebbrowsers.2)EnginesevolvedfromsimpleinterpreterstoJITcompilers,typicallyinC ,improvingperformance.

JavaScript Applications: From Front-End to Back-EndJavaScript Applications: From Front-End to Back-EndMay 04, 2025 am 12:12 AM

JavaScript can be used for front-end and back-end development. The front-end enhances the user experience through DOM operations, and the back-end handles server tasks through Node.js. 1. Front-end example: Change the content of the web page text. 2. Backend example: Create a Node.js server.

Python vs. JavaScript: Which Language Should You Learn?Python vs. JavaScript: Which Language Should You Learn?May 03, 2025 am 12:10 AM

Choosing Python or JavaScript should be based on career development, learning curve and ecosystem: 1) Career development: Python is suitable for data science and back-end development, while JavaScript is suitable for front-end and full-stack development. 2) Learning curve: Python syntax is concise and suitable for beginners; JavaScript syntax is flexible. 3) Ecosystem: Python has rich scientific computing libraries, and JavaScript has a powerful front-end framework.

JavaScript Frameworks: Powering Modern Web DevelopmentJavaScript Frameworks: Powering Modern Web DevelopmentMay 02, 2025 am 12:04 AM

The power of the JavaScript framework lies in simplifying development, improving user experience and application performance. When choosing a framework, consider: 1. Project size and complexity, 2. Team experience, 3. Ecosystem and community support.

The Relationship Between JavaScript, C  , and BrowsersThe Relationship Between JavaScript, C , and BrowsersMay 01, 2025 am 12:06 AM

Introduction I know you may find it strange, what exactly does JavaScript, C and browser have to do? They seem to be unrelated, but in fact, they play a very important role in modern web development. Today we will discuss the close connection between these three. Through this article, you will learn how JavaScript runs in the browser, the role of C in the browser engine, and how they work together to drive rendering and interaction of web pages. We all know the relationship between JavaScript and browser. JavaScript is the core language of front-end development. It runs directly in the browser, making web pages vivid and interesting. Have you ever wondered why JavaScr

Node.js Streams with TypeScriptNode.js Streams with TypeScriptApr 30, 2025 am 08:22 AM

Node.js excels at efficient I/O, largely thanks to streams. Streams process data incrementally, avoiding memory overload—ideal for large files, network tasks, and real-time applications. Combining streams with TypeScript's type safety creates a powe

Python vs. JavaScript: Performance and Efficiency ConsiderationsPython vs. JavaScript: Performance and Efficiency ConsiderationsApr 30, 2025 am 12:08 AM

The differences in performance and efficiency between Python and JavaScript are mainly reflected in: 1) As an interpreted language, Python runs slowly but has high development efficiency and is suitable for rapid prototype development; 2) JavaScript is limited to single thread in the browser, but multi-threading and asynchronous I/O can be used to improve performance in Node.js, and both have advantages in actual projects.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software