Scala regular expressions
Scala supports regular expressions through the Regex class of the scala.util.matching package. The following example demonstrates the use of regular expressions to find the word Scala:
import scala.util.matching.Regex object Test { def main(args: Array[String]) { val pattern = "Scala".r val str = "Scala is Scalable and cool" println(pattern findFirstIn str) } }
Execute the above code, the output result is:
$ scalac Test.scala $ scala Test Some(Scala)
Use the r() method of the String class in the example Constructed a Regex object.
Then use the findFirstIn method to find the first match.
If you need to view all matches, you can use the findAllIn method.
You can use the mkString() method to connect the strings of regular expression matching results, and you can use pipes (|) to set different modes:
import scala.util.matching.Regex object Test { def main(args: Array[String]) { val pattern = new Regex("(S|s)cala") // 首字母可以是大写 S 或小写 s val str = "Scala is scalable and cool" println((pattern findAllIn str).mkString(",")) // 使用逗号 , 连接返回结果 } }
Execute the above code and output the results For:
$ scalac Test.scala $ scala Test Scala,scala
If you need to replace the matched text with the specified keyword, you can use the replaceFirstIn( ) method to replace the first match, use replaceAllIn( ) method replaces all matching items, the example is as follows:
object Test { def main(args: Array[String]) { val pattern = "(S|s)cala".r val str = "Scala is scalable and cool" println(pattern replaceFirstIn(str, "Java")) } }
Execute the above code, the output result is:
$ scalac Test.scala $ scala Test Java is scalable and cool
Regular expression
Scala's regular expression The formula inherits the syntax rules of Java, and Java mostly uses the rules of Perl language.
The following table provides some commonly used regular expression rules:
<td repeated n times or more< td="">Expression | Matching Rule |
---|---|
^ | Matches the beginning of the input string. |
$ | Matches the position at the end of the input string. |
. | Matches any single character except "\r\n". |
[...] | character set. Matches any character contained in . For example, "[abc]" matches the "a" in "plain". |
[^...] | Reverse character set. Matches any characters not included. For example, "[^abc]" matches "p", "l", "i", and "n" in "plain". |
\\A | Matches the beginning of the input string (no multi-line support) |
\\z | End of string (similar to $, but not affected by handling multi-line options) |
\\Z | End of string or end of line (not affected by Affected by handling multiple lines option) |
re* | Repeat zero or more times |
re+ | Repeat one or more times |
re? | Repeat zero or one time |
re{ n } | Repeat n times |
re{ n,} | |
re{ n, m} | Repeat n to m times |
a|b | match a or b |
matches re, and captures the text into an automatically named group | |
matches re, and does not capture the match text, nor assign a group number to this group | |
Greedy subexpression | |
Matches letters or numbers or underscores or Chinese characters | |
Matches anything that is not letters, numbers, underscores, or Chinese characters The characters | |
match any whitespace character, which is equivalent to [\t\n\r\f] | |
Matches any character that is not a whitespace character | |
Matches a number, similar to [0-9] | |
Matches any non-digit character | |
The beginning of the current search | |
Line break | |
is usually the word boundary position, but if it is in the character Used in the class to represent backspace | |
matches a position that is not the beginning or end of a word | |
Tab character | |
Starting quotation mark: | \Q(a+b)*3\Ecan match Text "(a+b)*3". |
Ending quote: | \Q(a+b)*3\EMatches the text "(a+b)*3 ". |
Example | Description |
---|---|
. | Match any single character except "\r\n". |
[Rr]uby | matches "Ruby" or "ruby" |
Match "ruby" or "rube" | |
Match lowercase letters: aeiou | |
matches any number, similar to [0123456789] | ##[a-z] |
[A-Z] | |
[a-zA-Z0-9] | |
##[^aeiou] | |
[^0-9] | |
\\d | |
\\D | |
\\s | |
\\S | |
\\w | |
##\\W | Matches non-letters, numbers, and underscores, similar to: [^A-Za-z0-9_] |
ruby? | Matches "rub" or "ruby": y is optional |
ruby* | Matches "rub" plus 0 or more y's. |
ruby+ | Matches "rub" plus 1 or more y's. |
\\d{3} | matches exactly 3 numbers. |
\\d{3,} | Matches 3 or more numbers. |
\\d{3,5} | Matches 3, 4 or 5 numbers. |
\\D\\d+ | No grouping: + Repeat\d |
(\\D\\d) +/ | Grouping: + Repeat\D\d Matches "Ruby" for |
([Rr]uby(, )?)+ | , "Ruby, ruby, ruby", etc. |
#Note that each character in the above table uses two backslashes. This is because backslashes in strings are escape characters in Java and Scala. So if you want to output .\., you need to write .\\. in the string to get a backslash. View the following example: |