search
HomeJavajavaTutorialDetailed introduction to Java regular expressions

Expression meaning:

x Character x. For example, a represents the character a
\\ backslash character. When writing, write \\\\. (Note: Because Java parses \\\\ into a regular expression \\ during the first parsing, and then parses it into \\ during the second parsing, so any escape characters that are not listed in 1.1 include those in 1.1 \\, and those with \ must be written twice)
\0n Character n with octal value 0 (0 \0nn Character with octal value 0 Character nn (0 \0mnn Character mnn with octal value 0 (0 \xhh Character hh
\uhhhh with hexadecimal value 0x Character hhhh
\t with hexadecimal value 0x Tab ('\u0009')
\n New line (Line feed) character ('\u000A')
\r Carriage return character ('\u000D')
\f Page feed character ('\u000C')
\a Alarm (bell) character (' \u0007')
\e Escape character ('\u001B')
\cx Control character corresponding to x
2. Character class
[abc] a, b or c (simple class ). For example, [egd] indicates that it contains the characters e, g or d.
[^abc] Any character except a, b or c (negative). For example [^egd] means it does not contain the characters e, g or d.
[a-zA-Z] a to z or A to Z, including the letters at both ends (range)
[a-d[m-p]] a to d or m to p: [a-dm-p ] (Union)
[a-z&&[def]] d, e or f (Intersection)
[a-z&&[^bc]] a to z, except b and c: [ad-z] (minus)
[a-z&&[^m-p]] a to z, not m to p: [a-lq-z] (minus)
3. Predefined character classes (note the backslash The bar must be written twice, for example \d is written as \\d) any character

(may or may not match the line terminator)
\d Numbers: [0-9]
\D Non-numbers: [^0-9]
\s Blank characters: [ \t\n\x0B\f\r]
\S Non-whitespace characters: [^\s]
\w Word characters: [a-zA-Z_0-9]
\W Non-word characters :[^\w]
4.POSIX character class (US-ASCII only) (note that the backslash must be written twice, for example, \p{Lower} is written as \\p{Lower})
\p {Lower} Lowercase alphabetic characters: [a-z].
\p{Upper} Uppercase alphabetic characters: [A-Z]
\p{ASCII} All ASCII: [\x00-\x7F]
\p{Alpha} Alphabetic characters: [\p{Lower} \p{Upper}]
\p{Digit} Decimal digits: [0-9]
\p{Alnum} Alphanumeric characters: [\p{Alpha}\p{Digit}]
\ p{Punct} Punctuation: !"#$%&'()*+,-./:;?@[\]^_`{|}~
\p{Graph} Visible Characters: [\p{Alnum}\p{Punct}]
\p{Print} Printable characters: [\p{Graph}\x20]
\p{Blank} Space or tab character: [ \t]
\p{Cntrl} Control characters: [\x00-\x1F\x7F]
\p{XDigit} Hexadecimal digits: [0-9a-fA-F]
\ p{Space} Blank character: [ \t\n\x0B\f\r]
5.java.lang.Character class (simple java character type)
\p{javaLowerCase} Equivalent to java. lang.Character.isLowerCase()
\p{javaUpperCase} Equivalent to java.lang.Character.isUpperCase()
\p{javaWhitespace} Equivalent to java.lang.Character.isWhitespace()
\p{javaMirrored} Equivalent to java.lang.Character.isMirrored()
6. Class for Unicode blocks and categories
\p{InGreek} Characters in Greek blocks (simple blocks)
\p {Lu} Uppercase letters (simple category)
\p{Sc} Currency symbols
\P{InGreek} All characters except those in Greek blocks (negated)
[\p{L}&&[^ \p{Lu}]] All letters, except uppercase letters (minus)
7. Boundary matcher
^ At the beginning of the line, use ^ at the beginning of the regular expression. For example: ^(abc). Represents a string starting with abc. Note that the parameter MULTILINE must be set when compiling, such as Pattern p = Pattern.compile(regex,Pattern.MULTILINE);
$ Please use it at the end of the regular expression. For example: (^bca).*(abc$) means a line starting with bca and ending with abc.
\b Word boundaries. For example, \b(abc) means that the beginning or end of the word contains abc, (both abcjj and jjabc can match)
\B Non-word boundary. For example, \B(abc) means that the middle of the word contains abc, (jjabcjj matches but jjabc, abcjj does not match)
\A The beginning of the input
\G The end of the previous match (I personally feel that this parameter is useless) . For example, \\Gdog means to search for dog at the end of the previous match. If there is no dog, then search from the beginning. Note that if the beginning is not dog, it cannot match.
\Z End of input, used only for the final terminator (if any)
Line terminator is a sequence of one or two characters that marks the end of the line of the input character sequence.
The following codes are recognized as line terminators:
-New line (line feed) character ('\n'),
-Carriage return character followed by new line character ("\r\n" ),
‐a single carriage return ('\r'),
‐next line character ('\u0085'),
‐line separator ('\u2028') or
‐ Paragraph separator ('\u2029).
\z End of input
When compiling a pattern, one or more flags can be set, for example
Pattern pattern = Pattern.compile(patternString,Pattern.CASE_INSENSITIVE + Pattern.UNICODE_CASE);
Below Six flags are supported:
‐CASE_INSENSITIVE: Matching characters is case-independent. This flag only considers US ASCII characters by default.
‐UNICODE_CASE: When combined with CASE_INSENSITIVE, use Unicode letter matching
‐MULTILINE: ^ and $ match the beginning and end of a line, rather than the entire input
‐UNIX_LINES: When matching ^ in multiline mode and $, only '\n' is treated as a line terminator
‐DOTALL: When this flag is used, the . symbol matches all characters including line terminators
‐CANON_EQ: Consider the specification of Unicode characters Equivalent
8.Greedy quantifier
X? X, one or not once
X* X, zero or more times
X+ X, exactly n times
X{n,} X, at least n times
X{n,m} ? X, one or none
X*? X, zero or more times
X+? n,}? X, at least n times
X{n,m}? X, at least n times, but not more than m times
10.Possessive quantifier
X?+ times
X++ X, one or more
X{n}+ X, exactly n times
X{n,}+ At least n times, but no more than m times
The difference between Greedy, Reluctant, and Possessive is: (Note that it is only applicable when fuzzy processing is performed.)
The greedy quantifier is regarded as "greedy" because it is the first time Read the entire fuzzy matched string. If the first match attempt (the entire input string) fails, the matcher will back off one character after the last character in the matched string and try again, repeating this process until a match is found or there are no more remaining characters. until you can retreat. Depending on the quantifier used in the expression, the last thing it tries to match is 1 or 0 characters.
However, reluctant quantifiers take the opposite approach: they start at the beginning of the string being matched, and then progressively read one character at a time to search for a match. The last thing they try to match is the entire input string.
Finally, the possessive quantifier always reads the entire input string and attempts a match once (and only once). Unlike the greedy quantifier, possessive never retreats.
11.Logical operator
XY X followed by Y
X|Y X or Y
(X) X as a capturing group. For example, (abc) means to capture abc as a whole. For example, in the expression ((A)(B(C))), there are four such groups:
1 ((A)(B(C)))
2 \A
3 ( B (C))
4 (C)
can be referenced to the corresponding group through \ n in the expression. 1\2 means ab34cdabcd.
13. Quote
\ Nothing, but quote the following characters
\Q Nothing, but quote all characters up to \E. The string between QE will be used unchanged (except for the escaped characters in 1.1). For example, ab\\Q{|}\\\\E
can match ab{|}\\
\E Nothing, but ends the reference starting from \Q
14. Special construction (non-capturing)
(?:X) X, as a non-capturing group
(?idmsux-idmsux) Nothing, but changes the matching flag from on to off. For example: expression (?i)abc(?-i)def At this time, (?i) turns on the case-insensitive switch, abc matches
idmsux description is as follows:
‐i CASE_INSENSITIVE: US-ASCII character set not case sensitive. (?i)
‐d UNIX_LINES: Turn on UNIX line breaks
‐m MULTILINE: Multiline mode (?m)
UNIX switching behavior\n
WINDOWS switching behavior\r\n( ?s)
‐u UNICODE_CASE : Unicode is not case sensitive. (?u)
‐x COMMENTS: You can use comments in pattern, ignore the whitespace in pattern, and "#" until the end (# is followed by comments). (?x) For example (?x)abc#asfsdadsa can match the string abc
(?idmsux-idmsux:X) X as a non-capturing group with the given flags on - off. Similar to the above, the above expression can be rewritten as: (?i:abc)def, or (?i)abc(?-i:def)
(?=X) lookahead. A zero-width positive lookahead assertion continues matching only if subexpression X matches to the right of this position. For example, \w+(?=\d) means a letter followed by a number, but does not capture the number (no backtracking)
(?!X) X, via a zero-width negative lookahead. Zero-width negative lookahead assertion. Continue matching only if subexpression X does not match to the right of this position. For example, \w+(?!\d) means a letter is not followed by a digit, and digits are not captured.
(?(? (?>X) X, as an independent non-capturing group (no backtracking)
The difference between (?=X) and (?> ) does not backtrack. For example, when the matched string is abcm
, it can be matched when the expression is a(?:b|bc), and when the expression is a(?>b|bc) It can also match


##.

The above is the detailed content of Detailed introduction to Java regular expressions. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
带你搞懂Java结构化数据处理开源库SPL带你搞懂Java结构化数据处理开源库SPLMay 24, 2022 pm 01:34 PM

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于结构化数据处理开源库SPL的相关问题,下面就一起来看一下java下理想的结构化数据处理类库,希望对大家有帮助。

Java集合框架之PriorityQueue优先级队列Java集合框架之PriorityQueue优先级队列Jun 09, 2022 am 11:47 AM

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于PriorityQueue优先级队列的相关知识,Java集合框架中提供了PriorityQueue和PriorityBlockingQueue两种类型的优先级队列,PriorityQueue是线程不安全的,PriorityBlockingQueue是线程安全的,下面一起来看一下,希望对大家有帮助。

完全掌握Java锁(图文解析)完全掌握Java锁(图文解析)Jun 14, 2022 am 11:47 AM

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于java锁的相关问题,包括了独占锁、悲观锁、乐观锁、共享锁等等内容,下面一起来看一下,希望对大家有帮助。

一起聊聊Java多线程之线程安全问题一起聊聊Java多线程之线程安全问题Apr 21, 2022 pm 06:17 PM

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于多线程的相关问题,包括了线程安装、线程加锁与线程不安全的原因、线程安全的标准类等等内容,希望对大家有帮助。

详细解析Java的this和super关键字详细解析Java的this和super关键字Apr 30, 2022 am 09:00 AM

本篇文章给大家带来了关于Java的相关知识,其中主要介绍了关于关键字中this和super的相关问题,以及他们的一些区别,下面一起来看一下,希望对大家有帮助。

Java基础归纳之枚举Java基础归纳之枚举May 26, 2022 am 11:50 AM

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于枚举的相关问题,包括了枚举的基本操作、集合类对枚举的支持等等内容,下面一起来看一下,希望对大家有帮助。

java中封装是什么java中封装是什么May 16, 2019 pm 06:08 PM

封装是一种信息隐藏技术,是指一种将抽象性函式接口的实现细节部分包装、隐藏起来的方法;封装可以被认为是一个保护屏障,防止指定类的代码和数据被外部类定义的代码随机访问。封装可以通过关键字private,protected和public实现。

归纳整理JAVA装饰器模式(实例详解)归纳整理JAVA装饰器模式(实例详解)May 05, 2022 pm 06:48 PM

本篇文章给大家带来了关于java的相关知识,其中主要介绍了关于设计模式的相关问题,主要将装饰器模式的相关内容,指在不改变现有对象结构的情况下,动态地给该对象增加一些职责的模式,希望对大家有帮助。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool