Home >Java >JavaBase >Detailed explanation of java regular knowledge

Detailed explanation of java regular knowledge

尚
forward
2019-11-29 13:11:332759browse

Detailed explanation of java regular knowledge

Expression meaning: (Recommended: java video tutorial)

1, character

x character x . For example, a represents the character a

\\ backslash character. When writing, write \\\\. (Note: Because Java parses \\\\ into a regular expression \\ during the first parsing, and then parses it into \\ during the second parsing, so any escape characters that are not listed in 1.1 include those in 1.1 \\, and those with \ must be written twice)

\0n Character n with octal value 0 (0 \0nn Character nn with octal value 0 (0 \0mnn With octal value Character mnn with value 0 (0 \xhh Character with hexadecimal value 0x hh

\uhhhh Character with hexadecimal value 0x hhhh

\t Tab character ('\u0009')

\n New line (line feed) character ('\u000A')

\r Carriage return character ('\u000D')

\ f form feed character ('\u000C')

\a alarm (bell) character ('\u0007')

\e escape character ('\u001B')

\cx corresponds to the control character of x

2, character class

[abc] a, b or c (simple class). For example, [egd] means containing the characters e, g or d.

[^abc] Any character except a, b, or c (negative). For example, [^egd] means not containing the characters e, g, or d.

[a- zA-Z] a to z or A to Z, inclusive (range)

[a-d[m-p]] a to d or m to p: [a-dm-p] (and Set)

[a-z&&[def]] d, e or f (intersection)

[a-z&&[^bc]] a to z, except b and c: [ad -z] (subtract)

[a-z&&[^m-p]] a to z, not m to p: [a-lq-z] (subtract)

3 , predefined character classes (note that the backslash must be written twice, for example, \d is written as \\d) any character (which may or may not match the line terminator)

\d Number: [0 -9]

\D Non-digits: [^0-9]

\s Blank characters: [ \t\n\x0B\f\r]

\ S Non-whitespace characters: [^\s]

\w Word characters: [a-zA-Z_0-9]

\W Non-word characters: [^\w]

4.POSIX character class (US-ASCII only) (note that the backslash must be written twice, for example, \p{Lower} is written as \\p{Lower})

\p{Lower} Lowercase alphabetic characters: [a-z].

\p{Upper} Uppercase alphabetic characters: [A-Z]

\p{ASCII} All ASCII: [\x00-\x7F]

\p{Alpha} Alphabetic characters: [\p{Lower}\p{Upper}]

\p{Digit} Decimal digits: [0-9]

\p {Alnum} Alphanumeric characters: [\p{Alpha}\p{Digit}]

\p{Punct} Punctuation: !"#$%&'()* ,-./:;? @[\]^_`{|}~

\p{Graph} Visible characters: [\p{Alnum}\p{Punct}]

\p{Print} Printable Characters: [\p{Graph}\x20]

\p{Blank} Space or tab: [ \t]

\p{Cntrl} Control characters: [\x00- \x1F\x7F]

\p{XDigit} Hexadecimal digits: [0-9a-fA-F]

\p{Space} White space characters: [ \t\n \x0B\f\r]

5.java.lang.Character class (simple java character type)

\p{javaLowerCase} is equivalent to java.lang.Character.isLowerCase( )

\p{javaUpperCase} is equivalent to java.lang.Character.isUpperCase()

\p{javaWhitespace} is equivalent to java.lang.Character.isWhitespace()

\p{javaMirrored} Equivalent to java.lang.Character.isMirrored()

6. Classes for Unicode blocks and categories

\p{InGreek} Greek blocks (simple blocks ) characters in

\p{Lu} Uppercase letters (simple category)

\p{Sc} Currency symbols

\P{InGreek} All characters, Greek blocks Except in (negation)

[\p{L}&&[^\p{Lu}]] All letters, except uppercase letters (minus)

7. Boundary matcher

^ At the beginning of the line, use ^ at the beginning of the regular expression. For example: ^(abc) represents a string starting with abc. Note that the parameter MULTILINE must be set when compiling, such as Pattern p = Pattern.compile(regex,Pattern.MULTILINE);

$ at the end of the line, please use it at the end of the regular expression. For example: (^bca).*(abc$) means a line starting with bca and ending with abc.

\b Word boundaries. For example, \b(abc) means that the beginning or end of the word contains abc, (both abcjj and jjabc can match)

\B Non-word boundary. For example, \B(abc) means that the middle of the word contains abc, (jjabcjj matches but jjabc, abcjj do not match)

\A The beginning of the input

\G The end of the previous match (personal I feel like this parameter is useless). For example, \\Gdog means to search for dog at the end of the previous match. If there is no dog, then search from the beginning. Note that if the beginning is not dog, it cannot match.

\Z The end of the input, used only for the final terminator (if any)

The line terminator is a sequence of one or two characters that marks the end of the line of the input character sequence .

The following codes are recognized as line terminators:

-new line (newline) character ('\n'),

-return followed by a new line character Carriage return character ("\r\n"),

-single carriage return character ('\r'),

-next line character ('\u0085'),

‐Line separator ('\u2028') or

‐Paragraph separator ('\u2029).

\z End of input

When compiling a pattern, one or more flags can be set, for example

Pattern pattern = Pattern.compile(patternString,Pattern.CASE_INSENSITIVE Pattern .UNICODE_CASE);

The following six flags are supported:

‐CASE_INSENSITIVE: Matching characters is case-independent. This flag only considers US ASCII characters by default.

‐UNICODE_CASE: When combined with CASE_INSENSITIVE, use Unicode letter matching

‐MULTILINE: ^ and $ match the beginning and end of a line, rather than the entire input

‐UNIX_LINES : When matching ^ and $ in multiline mode, treat only '\n' as a line terminator

‐DOTALL: When this flag is used, the . symbol matches all line terminators including Character

‐CANON_EQ: Consider the canonical equivalent of Unicode characters

8, Greedy quantifier

X? X, not once or not

X* X, zero or more times

X X, one or more times

X{n} X, exactly n times

X{n,} X, at least n times

X{n,m} X, at least n times, but not more than m times

9.Reluctant quantifier

X??

##X*? X, zero or more times

X ? #X{n,}? X, at least n times

X{n,m}? ##X? Exactly n times

X{n,} X, at least n times

X{n,m} X, at least n times, but not more than m times

Greedy, The difference between Reluctant and Possessive is: (Note only when performing fuzzy processing)

The greedy quantifier is considered "greedy" because it reads the entire fuzzy matched string for the first time. If the first match attempt (the entire input string) fails, the matcher will back off one character after the last character in the matched string and try again, repeating this process until a match is found or there are no more remaining characters. until you can retreat. Depending on the quantifier used in the expression, the last thing it tries to match is 1 or 0 characters.

However, reluctant quantifiers take the opposite approach: they start at the beginning of the string being matched, and then progressively read one character at a time to search for a match. The last thing they try to match is the entire input string.

Finally, the possessive quantifier always reads the entire input string, trying one (and only one) match. Unlike the greedy quantifier, possessive never retreats.

11. Logical operator

XY X followed by Y

X|Y X or Y

(X) X, as a capturing group. For example (abc) means capturing abc as a whole

12, Back reference

\n Any matching nth capture group

capture group can be passed from left to right Count its opening brackets to number. For example, in the expression ((A)(B(C))), there are four such groups:

1 ((A)(B(C)))

2 \A

3 (B(C))

4 (C)

The corresponding group can be referenced by \n in the expression, for example (ab) 34\1 means ab34ab, (ab)34(cd)\1\2 means ab34cdabcd.

13. Quote

\ Nothing, but quote the following characters

\Q Nothing, but quote all characters until \E. The string between QE will be used unchanged (except for the escaped characters in 1.1). For example, ab\\Q{|}\\\\E

would match ab{|}\\

\E Nothing, but end the reference starting with \Q

14, Special construction (non-capturing)

(?:X) X, as a non-capturing group

(?idmsux-idmsux) Nothing, but changes the matching flag from on to off. For example: the expression (?i)abc(?-i)def At this time, (?i) turns on the case-insensitive switch, abc matches

idmsux description is as follows:


‐i CASE_INSENSITIVE The :US-ASCII character set is not case sensitive. (?i)

‐d UNIX_LINES: Turn on UNIX line breaks

‐m MULTILINE: Multiline mode (?m)

UNIX line breaks\n

WINDOWS switching behavior\r\n(?s)

‐u UNICODE_CASE: Unicode is not case sensitive. (?u)

‐x COMMENTS: You can use comments in pattern, ignore the whitespace in pattern, and "#" until the end (# is followed by comments). (?x) For example (?x)abc#asfsdadsa can match the string abc

(?idmsux-idmsux:X) X as a non-capturing group with the given flags on - off. Similar to the above, the above expression can be rewritten as: (?i:abc)def, or (?i)abc(?-i:def)

(?=X) X, passing through zero The width of the positive lookahead. A zero-width positive lookahead assertion continues matching only if subexpression X matches to the right of this position. For example, \w (?=\d) means a letter followed by a number, but does not capture the number (no backtracking)

(?!X) X, via a zero-width negative lookahead. Zero-width negative lookahead assertion. Continue matching only if subexpression X does not match to the right of this position. For example, \w (?!\d) means a letter is not followed by a digit, and digits are not captured.

(? (? (?>X) X, as an independent non-capturing group (no backtracking)

The difference between (?=X) and (?>X) is ( ?> >b|bc) cannot be matched, because when the latter matches b, since it has already been matched, it jumps out of the non-capturing group and does not match the characters in the group again. This can speed up the process.

For more java knowledge, please pay attention to the

java basic tutorial

column.

The above is the detailed content of Detailed explanation of java regular knowledge. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:aizhan. If there is any infringement, please contact admin@php.cn delete