This article mainly introduces a summary of the usage of regular expressions in Java programming. Regular expressions are a powerful string processing tool. Java’s support for regular expressions is still very good. Let’s sort out the regular expressions first. Some basic knowledge of expressions:
1. Regular expressions in strings
Regular expressions can be used to search, extract, split, replace and other operations on strings. The String class provides the following special methods:
boolean matches(String regex): Determine whether the string matches the specified regular expression.
String replaceAll(String regex, String replacement): Replace all substrings matching regex in the string with replacement.
String[] split(String regex): Use regex as the separator to split the string into multiple substrings.
The above special methods all rely on the regular expressions provided by Java.
2. Create a regular expression
x: Character x (x can represent any legal character);
\0mnn: The character represented by the octal number Omnn;
\xhh: The character represented by hexadecimal 0xhh;
\uhhhh: The UNICODE character represented by hexadecimal 0xhhhh;
\t: Tab character ('\u0009');
\n: New line (line feed) character ('\u000A');
\r: Carriage return character ('\u000D');
\f: Form feed character ('\u000C');
\a: Alarm (bell) character ('\u0007');
\e: Escape character ( '\u001B');
\cx: The control character corresponding to x. For example, \cM matches Ctrl-M. The x value must be one of A~Z or a~z;
3. Special characters in regular expressions
$: Matches the end of a line. To match the $ character itself, use \$;
^: to match the beginning of a line. To match the ^ character itself, use \^;
(): to mark the beginning and end of a subexpression. To match these characters, use \(and \);
[]: Used to determine the start and end position of the bracket expression. To match these characters, use \[ and \];
{}: used to mark the frequency of occurrence of the previous subexpression. To match these characters, use \{ and \};
*: Specifies that the preceding subexpression may appear zero or more times. To match the * character itself, use \*;
+: Specifies that the preceding subexpression can appear one or more times. To match the + character itself, use \+;
?: to specify that the preceding subexpression can appear zero or once. To match the ? character itself, use \?;
.: matches any unit character except the newline character \n. To match the character itself, use \.;
\: used to escape the next character, or specify octal or hexadecimal characters. To match the \ character, use \\;
|: to specify one of the two items. To match the | character itself, use \|;
4. Predefined characters
.: Can match any character;
\d: Match all 0~9 Numbers;
\D: Match non-digits;
\s: Match all whitespace characters, including spaces, tabs, carriage returns, form feeds, line feeds, etc.;
\S: Matches all non-whitespace characters;
\w: Matches all word characters, including all numbers from 0 to 9, 26 English letters and underscores (_);
\W: Match all non-word characters;
5. Boundary matching character
^: Beginning of line
$: End of line
\b: Word boundary
\B: Non-word boundary
\A: Beginning of input
\G: End of previous match
\Z: The end of the input, only used for the last terminator
\z: The end of the input
6. The symbol indicating the number of matches
The figure shows the symbols representing the number of matches, which are used to determine the number of occurrences of the symbol immediately to the left of the symbol:
(1) Suppose we want to in a text file Search for US Social Security numbers. The format of this number is 999-99-9999. The regular expression used to match it is shown in Figure 1. In regular expressions, the hyphen ("-") has a special meaning. It represents a range, such as from 0 to 9. Therefore, when matching a hyphen in a Social Security number, it is preceded by an escape character "\".
(2) Assume that when searching, you want the hyphen to appear or not appear - that is, 999-99- 9999 and 999999999 are both correct formats. At this time, you can add the "?" quantity limit symbol after the hyphen, as shown in the figure:
(3 ) Let’s look at another example below. One format for U.S. car license plates is four numbers plus two letters. Its regular expression is preceded by the numeric part "[0-9]{4}", plus the letter part "[A-Z]{2}". The image below shows the complete regular expression.
7.一些实例
例子1
function replace(content){ var reg = '\\[(\\w+)\\]', pattern = new RegExp(reg, 'g'); return content.replace(pattern, ''); } //或 function replace(content){ return content.replace(/\[(\w+)\/g, ''); }
例子2
//zero-width look behind的替换方案 //(?<=...)和(? //方法一:反转字符串,用lookahead进行搜索,替换以后再倒回来,例如: String.prototype.reverse = function () { return this.split('').reverse().join(''); } //模拟'foo.bar|baz'.replace(/(?<=\.)b/, 'c') 即将前面有'.'的b换成c 'foo.bar|baz'.reverse().replace(/b(?=\.)/g, 'c').reverse() //foo.car|baz //方法二:不用零宽断言,自己判断 //模拟'foo.bar|baz'.replace(/(?<=\.)b/, 'c') 即将前面有'.'的b换成c 'foo.bar|baz'.replace(/(\.)?b/, function ($0, $1) { return $1 ? $1 + 'c' : $0; }) //foo.car|baz //模拟'foo.bar|baz'.replace(/(? 'foo.bar|baz'.replace(/(\.)?b/, function ($0, $1) { return $1 ? $0 : 'c'; }) //foo.bar|caz //这个方法在一些比较简单的场景下有用,并且可以和lookahead一起用 //但也有很多场景无效,例如: //'tttt'.replace(/(?<=t)t/g, 'x') 结果应该是'txxx' 'tttt'.replace(/(t)?t/g, function ($0, $1) { return $1 ? $1 + 'x' : $0; }) // txtx
例子3
$&符号的使用 function escapeRegExp(str) { return str.replace(/[abc]/g, "($&)"); } var str = 'a12b34c'; console.log(escapeRegExp(str)); //(a)12(b)34(c)
以上就是Java编程中正则表达式的用法总结的内容,更多相关内容请关注PHP中文网(www.php.cn)!