Ruby regular expressions
Regular expression is a special sequence of characters that matches or finds a set of strings using a pattern with specialized syntax.
Regular expressions use some predefined specific characters and combinations of these specific characters to form a "rule string". This "rule string" is used to express a filtering logic for strings. .
Syntax
Regular expression is literally a pattern between slashes or between any delimiters following %r , as shown below:
/pattern/ /pattern/im # 可以指定选项 %r!/usr/local! # 使用分隔符的正则表达式
Example
#!/usr/bin/ruby line1 = "Cats are smarter than dogs"; line2 = "Dogs also like meat"; if ( line1 =~ /Cats(.*)/ ) puts "Line1 contains Cats" end if ( line2 =~ /Cats(.*)/ ) puts "Line2 contains Dogs" end
The output result of the above example is:
Line1 contains Cats
Regular expression modifier
Regular expressions may literally contain an optional modifier that controls various aspects of matching. The modifier is specified after the second slash character, as shown in the example above. The subscripts list the possible modifiers:
Modifier | Description |
---|---|
i | Ignore case when matching text. |
#o | Execute #{} interpolation only once, and the regular expression is judged the first time. |
x | Ignore whitespace and allow whitespace and comments throughout the expression. |
m | Matches multiple lines and recognizes newline characters as normal characters. |
u,e,s,n | Interpret regular expressions as Unicode (UTF-8), EUC, SJIS, or ASCII. If no modifier is specified, the regular expression is assumed to use the source encoding. |
Just like strings are delimited by %Q, Ruby allows you to start a regular expression with %r, followed by any delimiter. This is useful when descriptions contain a lot of slash characters that you don't want to escape.
# 下面匹配单个斜杠字符,不转义 %r|/| # Flag 字符可通过下面的语法进行匹配 %r[</(.*)>]i
Regular expression pattern
Except for control characters, (+ ? . * ^ $ ( ) [ ] { } | \), all other characters match themselves . You can escape control characters by placing a backslash before the control character.
The following table lists the regular expression syntax available in Ruby.
Pattern | Description |
---|---|
^ | Matches the beginning of a line. |
$ | Matches the end of a line. |
. | Matches any single character except newline characters. When using the m option, it can also match newlines. |
[...] | matches any single character enclosed in square brackets. |
[^...] | Matches any single character not enclosed in square brackets. |
re* | Matches the preceding subexpression zero or more times. |
re+ | Matches the preceding subexpression one or more times. |
re? | Matches the preceding subexpression zero or one time. |
re{ n} | Matches the previous subexpression n times. |
re{ n,} | Matches the previous subexpression n times or more. |
re{ n, m} | Matches the previous subexpression at least n times and at most m times. |
a| b | Matches a or b. |
(re) | Group regular expressions and remember to match text. |
(?imx) | Temporarily turn on the i, m, or x option within a regular expression. If inside parentheses, only the part inside the parentheses is affected. |
(?-imx) | Temporarily turn off the i, m or x option within the regular expression. If inside parentheses, only the part inside the parentheses is affected. |
(?: re) | Groups regular expressions without remembering the matching text. |
(?imx: re) | Temporarily turn on the i, m, or x options within parentheses. |
(?-imx: re) | Temporarily turn off the i, m, or x options within parentheses. |
(?#...) | Comments. |
(?= re) | Specify the location using a pattern. There is no scope. |
(?! re) | Specify the position using the negation of the pattern. There is no scope. |
(?> re) | Matches standalone patterns without backtracking. |
\w | Matches word characters. |
\W | Matches non-word characters. |
\s | Matches whitespace characters. Equivalent to [\t\n\r\f]. |
\S | Matches non-whitespace characters. |
\d | Match numbers. Equivalent to [0-9]. |
\D | Matches non-digits. |
\A | Matches the beginning of the string. |
\Z | Matches the end of the string. If a newline character exists, only matches up to the newline character. |
\z | Matches the end of the string. |
\G | Matches the last point where the match is completed. |
\b | Matches word boundaries when outside brackets, and backspace (0x08) when inside brackets. |
\B | Matches non-word boundaries. |
\n, \t, etc. | Matches newlines, carriage returns, tabs, etc. |
\1...\9 | Matches the nth grouped subexpression. |
\10 | If already matched, match the nth grouping subexpression.Otherwise points to the octal representation of the character encoding. |
Regular expression example
Characters
Example | Description |
---|---|
/ruby/ | Matches "ruby" |
¥ | Matches Yen symbols. Ruby 1.9 and Ruby 1.8 support multiple characters. |
Character class
Instance | Description |
---|---|
/[Rr]uby/ | matches "Ruby" or "ruby" |
matches "ruby" Or "rube" | |
matches any lowercase vowel | |
matches any number, which is the same as /[0123456789]/ | |
matches any lowercase ASCII letter | |
Matches any uppercase ASCII letter | |
Matches any character within brackets | |
Matches any character that is not a lowercase vowel letter | |
Match any non-numeric character |
Description | |
---|---|
Matches anything except newlines Any character | |
in multi-line mode can also match the newline character | |
matches a number, which is equivalent to /[0-9]/ | |
matches a non-digit, which is equivalent to / [^0-9]/ | |
matches a whitespace character, which is equivalent to /[ \t\r\n\f]/ | |
matches a non-whitespace character, equivalent to /[^ \t\r\n\f]/ | |
matches a word character, equivalent to /[A-Za-z0-9_]/ | |
Matches a non-word character, equivalent to /[^A-Za-z0-9_]/ |
Description | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Matches "rub" or "ruby". Among them, y is optional. | |||||||||||
Matches "rub" plus 0 or more y's. | |||||||||||
Matches "rub" plus 1 or more y's. | |||||||||||
matches exactly 3 numbers. | |||||||||||
Matches 3 or more numbers. | |||||||||||
Matches 3, 4 or 5 numbers. |
Example | Description |
---|---|
/<.*>/ | Greedy Repeat: match "<ruby>perl>" |
/<.*?>/ | Non-greedy repeat: match "<ruby>perl> "<ruby>" |
Group by parentheses
Description | |
---|---|
No grouping: + Repeat\d | |
Grouping: + Repeat\D\d matches | |
"Ruby", "Ruby, ruby, ruby", etc. |
Description | |
---|---|
Match ruby&rails or Ruby&Rails | |
Single or double quotes String. \1 matches the characters matched by the first group, \2 matches the characters matched by the second group, and so on. |
Description | |
---|---|
Match "ruby" or "rube " | |
matches "ruby" or "ruble" | ##/ruby (!+|\?)/ |
/^Ruby/ | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
/Ruby$/ | |||||||||||
/\ARuby/ | |||||||||||
/Ruby\Z/ | |||||||||||
/\bRuby\b/ | |||||||||||
/\brub\B/ | |||||||||||
/Ruby(?=!)/ | |||||||||||
/Ruby(?!!)/ | |||||||||||
Special syntax of parentheses
Search and replacesub and gsub and their substitution variables sub! and gsub! is an important string method when using regular expressions. All these methods use regular expression patterns to perform search and replace operations. sub and sub! replace the first occurrence of the pattern, and gsub and gsub! replace all occurrences of the pattern. sub and gsub return a new string, leaving the original string unmodified, while sub! and gsub! will modify the string they are called on. The following is an example: #!/usr/bin/ruby # -*- coding: UTF-8 -*- phone = "138-3453-1111 #这是一个电话号码" # 删除 Ruby 的注释 phone = phone.sub!(/#.*$/, "") puts "电话号码 : #{phone}" # 移除数字以外的其他字符 phone = phone.gsub!(/\D/, "") puts "电话号码 : #{phone}" The output result of the above example is: 电话号码 : 138-3453-1111 电话号码 : 13834531111 The following is another example: #!/usr/bin/ruby # -*- coding: UTF-8 -*- text = "rails 是 rails, Ruby on Rails 非常好的 Ruby 框架" # 把所有的 "rails" 改为 "Rails" text.gsub!("rails", "Rails") # 把所有的单词 "Rails" 都改成首字母大写 text.gsub!(/\brails\b/, "Rails") puts "#{text}" The output result of the above example is: Rails 是 Rails, Ruby on Rails 非常好的 Ruby 框架 |