Ruby regular expressions

Regular expression is a special sequence of characters that matches or finds a set of strings using a pattern with specialized syntax.

Regular expressions use some predefined specific characters and combinations of these specific characters to form a "rule string". This "rule string" is used to express a filtering logic for strings. .

Syntax

Regular expression is literally a pattern between slashes or between any delimiters following %r , as shown below:

/pattern/
/pattern/im    # 可以指定选项
%r!/usr/local! # 使用分隔符的正则表达式

Example

#!/usr/bin/ruby

line1 = "Cats are smarter than dogs";
line2 = "Dogs also like meat";

if ( line1 =~ /Cats(.*)/ )
  puts "Line1 contains Cats"
end
if ( line2 =~ /Cats(.*)/ )
  puts "Line2 contains  Dogs"
end

The output result of the above example is:

Line1 contains Cats

Regular expression modifier

Regular expressions may literally contain an optional modifier that controls various aspects of matching. The modifier is specified after the second slash character, as shown in the example above. The subscripts list the possible modifiers:

Modifier	Description
i	Ignore case when matching text.
#o	Execute #{} interpolation only once, and the regular expression is judged the first time.
x	Ignore whitespace and allow whitespace and comments throughout the expression.
m	Matches multiple lines and recognizes newline characters as normal characters.
u,e,s,n	Interpret regular expressions as Unicode (UTF-8), EUC, SJIS, or ASCII. If no modifier is specified, the regular expression is assumed to use the source encoding.

Just like strings are delimited by %Q, Ruby allows you to start a regular expression with %r, followed by any delimiter. This is useful when descriptions contain a lot of slash characters that you don't want to escape.

# 下面匹配单个斜杠字符，不转义
%r|/|               

# Flag 字符可通过下面的语法进行匹配
%r[</(.*)>]i

Regular expression pattern

Except for control characters, (+ ? . * ^ $ ( ) [ ] { } | \), all other characters match themselves . You can escape control characters by placing a backslash before the control character.

The following table lists the regular expression syntax available in Ruby.

Pattern	Description
^	Matches the beginning of a line.
$	Matches the end of a line.
.	Matches any single character except newline characters. When using the m option, it can also match newlines.
[...]	matches any single character enclosed in square brackets.
[^...]	Matches any single character not enclosed in square brackets.
re*	Matches the preceding subexpression zero or more times.
re+	Matches the preceding subexpression one or more times.
re?	Matches the preceding subexpression zero or one time.
re{ n}	Matches the previous subexpression n times.
re{ n,}	Matches the previous subexpression n times or more.
re{ n, m}	Matches the previous subexpression at least n times and at most m times.
a\| b	Matches a or b.
(re)	Group regular expressions and remember to match text.
(?imx)	Temporarily turn on the i, m, or x option within a regular expression. If inside parentheses, only the part inside the parentheses is affected.
(?-imx)	Temporarily turn off the i, m or x option within the regular expression. If inside parentheses, only the part inside the parentheses is affected.
(?: re)	Groups regular expressions without remembering the matching text.
(?imx: re)	Temporarily turn on the i, m, or x options within parentheses.
(?-imx: re)	Temporarily turn off the i, m, or x options within parentheses.
(?#...)	Comments.
(?= re)	Specify the location using a pattern. There is no scope.
(?! re)	Specify the position using the negation of the pattern. There is no scope.
(?> re)	Matches standalone patterns without backtracking.
\w	Matches word characters.
\W	Matches non-word characters.
\s	Matches whitespace characters. Equivalent to [\t\n\r\f].
\S	Matches non-whitespace characters.
\d	Match numbers. Equivalent to [0-9].
\D	Matches non-digits.
\A	Matches the beginning of the string.
\Z	Matches the end of the string. If a newline character exists, only matches up to the newline character.
\z	Matches the end of the string.
\G	Matches the last point where the match is completed.
\b	Matches word boundaries when outside brackets, and backspace (0x08) when inside brackets.
\B	Matches non-word boundaries.
\n, \t, etc.	Matches newlines, carriage returns, tabs, etc.
\1...\9	Matches the nth grouped subexpression.
\10	If already matched, match the nth grouping subexpression.Otherwise points to the octal representation of the character encoding.

Regular expression example

Characters

Example	Description
/ruby/	Matches "ruby"
¥	Matches Yen symbols. Ruby 1.9 and Ruby 1.8 support multiple characters.

Character class

##/rub[ye]/matches "ruby" Or "rube"/[aeiou]/ matches any lowercase vowel ##/[0- 9]//[a-z]//[A-Z]//[a-zA-Z0-9 ]//[^aeiou]//[^0-9]/Special character class

Instance	Description
/[Rr]uby/	matches "Ruby" or "ruby"


matches any number, which is the same as /[0123456789]/
matches any lowercase ASCII letter
Matches any uppercase ASCII letter
Matches any character within brackets
Matches any character that is not a lowercase vowel letter
Match any non-numeric character

Example/.//./m/\d //\D//\s//\S//\w//\W/Repeat

Description
Matches anything except newlines Any character
in multi-line mode can also match the newline character
matches a number, which is equivalent to /[0-9]/
matches a non-digit, which is equivalent to / [^0-9]/
matches a whitespace character, which is equivalent to /[ \t\r\n\f]/
matches a non-whitespace character, equivalent to /[^ \t\r\n\f]/
matches a word character, equivalent to /[A-Za-z0-9_]/
Matches a non-word character, equivalent to /[^A-Za-z0-9_]/

Example/ruby?//ruby*//ruby+//\d{3}//\d{3,}//\d{3,5}/

Non-greedy repetition

This will match the minimum number of repetitions.

Description
Matches "rub" or "ruby". Among them, y is optional.
Matches "rub" plus 0 or more y's.
Matches "rub" plus 1 or more y's.
matches exactly 3 numbers.
Matches 3 or more numbers.
Matches 3, 4 or 5 numbers.

Example	Description
/<.*>/	Greedy Repeat: match "<ruby>perl>"
/<.*?>/	Non-greedy repeat: match "<ruby>perl> "<ruby>"

Group by parentheses

##ExampleDescription/\D\d+/No grouping: + Repeat\d/(\D\d )+/Grouping: + Repeat\D\d matches /([Rr]uby(, )?)+/ "Ruby", "Ruby, ruby, ruby", etc.

Backreference

This will match the previously matched group again.

ExampleDescription/([Rr])uby&\1ails/Match ruby&rails or Ruby&Rails/(['"])(?:(?!\1).)*\1/Single or double quotes String. \1 matches the characters matched by the first group, \2 matches the characters matched by the second group, and so on. ## replaces

Example/ruby|rube//rub(y|le))/##/ruby (!+|\?)/"ruby" followed by one or more! Or followed by a ?anchor

Description
Match "ruby" or "rube "
matches "ruby" or "ruble"

this Need to specify the matching location

##InstanceDescription Matches a string or line starting with "Ruby" Matches a string or line ending with "Ruby" Matches a string starting with "Ruby" Matches a string starting with "Ruby" A string ending in "Ruby" matches a word boundary of "Ruby"\B is a non-word boundary: matches "rub" in "rube" and "ruby", but not "rub" aloneIf "Ruby" is followed by an exclamation point, it matches "Ruby"If "Ruby" is not followed by an exclamation point, it matches "Ruby"

/^Ruby/

/Ruby$/

/\ARuby/

/Ruby\Z/

/\bRuby\b/

/\brub\B/

/Ruby(?=!)/

/Ruby(?!!)/

Special syntax of parentheses

Example	Description
/R(?#comment) /	matches "R". All remaining characters are comments.
/R(?i)uby/	Not case sensitive when matching "uby".
/R(?i:uby)/	Same as above.
/rub(?:y\|le))/	Only grouping, no \1 backreference

Search and replace

sub and gsub and their substitution variables sub! and gsub! is an important string method when using regular expressions.

All these methods use regular expression patterns to perform search and replace operations. sub and sub! replace the first occurrence of the pattern, and gsub and gsub! replace all occurrences of the pattern.

sub and gsub return a new string, leaving the original string unmodified, while sub! and gsub! will modify the string they are called on.

The following is an example:

#!/usr/bin/ruby
# -*- coding: UTF-8 -*-

phone = "138-3453-1111 #这是一个电话号码"

# 删除 Ruby 的注释
phone = phone.sub!(/#.*$/, "")   
puts "电话号码 : #{phone}"

# 移除数字以外的其他字符
phone = phone.gsub!(/\D/, "")    
puts "电话号码 : #{phone}"

The output result of the above example is:

电话号码 : 138-3453-1111 
电话号码 : 13834531111

The following is another example:

#!/usr/bin/ruby
# -*- coding: UTF-8 -*-

text = "rails 是 rails,  Ruby on Rails 非常好的 Ruby 框架"

# 把所有的 "rails" 改为 "Rails"
text.gsub!("rails", "Rails")

# 把所有的单词 "Rails" 都改成首字母大写
text.gsub!(/\brails\b/, "Rails")

puts "#{text}"

The output result of the above example is:

Rails 是 Rails,  Ruby on Rails 非常好的 Ruby 框架

← Ruby object-oriented

Ruby Database Access - DBI Tutorial →

submit

Ruby regular expressions

Syntax

Example

Regular expression modifier

Regular expression pattern

Regular expression example

Characters

Character class

Non-greedy repetition

Group by parentheses

Special syntax of parentheses

Search and replace

php.cn