Scala regular expressions


Scala supports regular expressions through the Regex class of the scala.util.matching package. The following example demonstrates the use of regular expressions to find the word Scala:

import scala.util.matching.Regex

object Test {
   def main(args: Array[String]) {
      val pattern = "Scala".r
      val str = "Scala is Scalable and cool"
      
      println(pattern findFirstIn str)
   }
}

Execute the above code, the output result is:

$ scalac Test.scala 
$ scala Test
Some(Scala)

Use the r() method of the String class in the example Constructed a Regex object.

Then use the findFirstIn method to find the first match.

If you need to view all matches, you can use the findAllIn method.

You can use the mkString() method to connect the strings of regular expression matching results, and you can use pipes (|) to set different modes:

import scala.util.matching.Regex

object Test {
   def main(args: Array[String]) {
      val pattern = new Regex("(S|s)cala")  // 首字母可以是大写 S 或小写 s
      val str = "Scala is scalable and cool"
      
      println((pattern findAllIn str).mkString(","))   // 使用逗号 , 连接返回结果
   }
}

Execute the above code and output the results For:

$ scalac Test.scala 
$ scala Test
Scala,scala

If you need to replace the matched text with the specified keyword, you can use the replaceFirstIn( ) method to replace the first match, use replaceAllIn( ) method replaces all matching items, the example is as follows:

object Test {
   def main(args: Array[String]) {
      val pattern = "(S|s)cala".r
      val str = "Scala is scalable and cool"
      
      println(pattern replaceFirstIn(str, "Java"))
   }
}

Execute the above code, the output result is:

$ scalac Test.scala 
$ scala Test
Java is scalable and cool

Regular expression

Scala's regular expression The formula inherits the syntax rules of Java, and Java mostly uses the rules of Perl language.

The following table provides some commonly used regular expression rules:

<td repeated n times or more< td="">## (re)matches re, and captures the text into an automatically named group(?: re)matches re, and does not capture the match text, nor assign a group number to this group (?> re)Greedy subexpression\\w Matches letters or numbers or underscores or Chinese characters \\W Matches anything that is not letters, numbers, underscores, or Chinese characters The characters ##\\s\\S\\d\\D\\G\\n\\b\\B\\t\\Q\Q(a+b)*3\E\\E\Q(a+b)*3\E

Regular Expression Example

ExpressionMatching Rule
^ Matches the beginning of the input string.
$ Matches the position at the end of the input string.
. Matches any single character except "\r\n".
[...]character set. Matches any character contained in . For example, "[abc]" matches the "a" in "plain".
[^...]Reverse character set. Matches any characters not included. For example, "[^abc]" matches "p", "l", "i", and "n" in "plain".
\\A Matches the beginning of the input string (no multi-line support)
\\zEnd of string (similar to $, but not affected by handling multi-line options)
\\ZEnd of string or end of line (not affected by Affected by handling multiple lines option)
re*Repeat zero or more times
re+Repeat one or more times
re?Repeat zero or one time
re{ n }Repeat n times
re{ n,}
re{ n, m}Repeat n to m times
a|bmatch a or b
match any whitespace character, which is equivalent to [\t\n\r\f]
Matches any character that is not a whitespace character
Matches a number, similar to [0-9]
Matches any non-digit character
The beginning of the current search
Line break
is usually the word boundary position, but if it is in the character Used in the class to represent backspace
matches a position that is not the beginning or end of a word
Tab character
Starting quotation mark: can match Text "(a+b)*3".
Ending quote: Matches the text "(a+b)*3 ".
##rub[ye]Match "ruby" or "rube"##[aeiou][0 -9]##[a-z] matches any ASCII lowercase letter[A-Z] Matches any ASCII uppercase letter [a-zA-Z0-9] Matches numbers, size Write the letters to match other characters except aeiou Matches other characters except numbers Matches numbers, similar to: [0-9] matches non-digits, similar to: [^0-9] matches spaces, similar to: [ \t \r\n\f] matches non-spaces, similar to: [^ \t\r\n\f] Matches letters, numbers, and underscores, similar to: [A-Za-z0-9_]##\\W Matches non-letters, numbers, and underscores, similar to: [^A-Za-z0-9_]ruby? Matches "rub" or "ruby": y is optional ruby* Matches "rub" plus 0 or more y's. ruby+ Matches "rub" plus 1 or more y's. \\d{3}matches exactly 3 numbers. \\d{3,} Matches 3 or more numbers. \\d{3,5} Matches 3, 4 or 5 numbers. \\D\\d+No grouping: + Repeat\d(\\D\\d) +/Grouping: + Repeat\D\d Matches "Ruby" for ([Rr]uby(, )?)+ , "Ruby, ruby, ruby", etc.
import scala.util.matching.Regex

object Test {
   def main(args: Array[String]) {
      val pattern = new Regex("abl[ae]\d+")
      val str = "ablaw is able1 and cool"
      
      println((pattern findAllIn str).mkString(","))
   }
}
Execute the above code, the output result is:
$ scalac Test.scala 
$ scala Test
able1
ExampleDescription
.Match any single character except "\r\n".
[Rr]ubymatches "Ruby" or "ruby"
Match lowercase letters: aeiou
matches any number, similar to [0123456789]
##[^aeiou]
[^0-9]
\\d
\\D
\\s
\\S
\\w
#Note that each character in the above table uses two backslashes. This is because backslashes in strings are escape characters in Java and Scala. So if you want to output .\., you need to write .\\. in the string to get a backslash. View the following example: