search
HomeWeb Front-endJS TutorialRegular expression introductory tutorial

Regular expression introductory tutorial

Aug 09, 2017 pm 01:44 PM
Getting Started Tutorialregularexpression

30-minute introductory tutorial on regular expressions

    1. ##Objective of this article

    2. How to use this tutorial

    3. What exactly are regular expressions?

    4. Getting started

    5. Testing regular expressions

    6. Metacharacters

    7. Character escape

    8. Duplication

    9. Character class

    10. Branch condition

    11. Antonym

    12. Group

    13. Backreference

    14. Zero-width assertion

    15. Negative zero-width assertion

    16. Comments

    17. Greedy With lazy

    18. Handling options

    19. Balanced group/recursive matching

    20. Is there anything else missing? Mention

    21. Contact the author

    22. Finally, let’s get some advertising...

    23. Online resources And the references of this article

    24. Update record

    Version: v2.31 (2009-4-11) Author: deerchao Please indicate the source for reprinting

    Table of Contents

    Skip Table of Contents

    The goal of this article

    In 30 minutes, you will understand what regular expressions are and have some basic understanding of them. , allowing you to use it in your own programs or web pages.

    How to use this tutorial

    Most importantly - please give me

    30 minutes, if you have no experience with regular expressions, please do not try to do it in 30SecondsIntroduction - Unless you are superman:)

    Don’t be intimidated by the complex expressions below, just follow me step by step, you will find that regular expressions do not actually have you It's as difficult as imagined. Of course, if after reading this tutorial, you find that you understand a lot, but can’t remember almost anything, that’s normal - I think that people who have never been exposed to regular expressions will find that they understand a lot after reading this tutorial. , the possibility of remembering more than 80% of the mentioned grammar is zero. This is just to let you understand the basic principles. You will need to practice more and use it more in the future to master regular expressions proficiently.

    In addition to being an introductory tutorial, this article also attempts to become a regular expression syntax reference manual that can be used in daily work. As far as the author's own experience is concerned, this goal has been achieved well - you see, I can't write down everything myself, can I?

    Clear format Text format convention: Technical terminology Metacharacter/grammar format Regular expression Part of the regular expression (for analysis) The source character to match String Explanation of regular expressions or part of them

    Hide side notes There are some comments on the right side of this article, mainly It is used to provide some relevant information or explain some basic concepts to readers without a programmer background, and can usually be ignored.

    What exactly is a regular expression?

    Character is the most basic unit when computer software processes text, which may be letters, numbers, punctuation marks, spaces, newlines, Chinese characters, etc. String is a sequence of 0 or more characters. Text is text, string. Saying that a certain string matches a certain regular expression usually means that part (or several parts) of the string can satisfy the conditions given by the expression.

    When writing programs or web pages that process strings, there is often a need to find strings that match certain complex rules.

    Regular expression is a tool used to describe these rules. In other words, regular expressions are codes that record text rules.

    It is very likely that you have used the wildcard (wildcard) used for file search under Windows/Dos, that is, * and ?. If you wanted to find all Word documents in a certain directory, you would search for *.doc. Here, * will be interpreted as any string. Similar to wildcards, regular expressions are also tools used for text matching, but they can describe your needs more accurately than wildcards - of course, at the cost of being more complicated - for example, you can write a regular expression, Used to find all strings starting with 0, followed by 2-3 digits, then a hyphen "-", and finally 7 or 8 digits(like 010-12345678 or 0376-7654321).

    Getting Started

    The best way to learn regular expressions is to start with examples. After understanding the examples, you can modify and experiment with them yourself. A number of simple examples are given below, and they are explained in detail.

    Suppose you are searching for hi in an English novel, you can use the regular expression hi.

    This is almost the simplest regular expression. It can accurately match such a string: consists of two characters, the first character is h, and the next character is i. Usually, tools that handle regular expressions will provide an option to ignore case. If this option is checked, it can match hi,HI ,Hi,hIAny one of these four situations.

    Unfortunately, many words contain the two consecutive characters hi, such as him,history,high and so on. If you use hi to search, the hi will also be found. If we want to accurately search for the word hi , we should use /bhi/b.

    /b is a special code specified by the regular expression (well, some people call it metacharacter, metacharacter ), represents the beginning or end of the word, which is the boundary of the word . Although English words are usually separated by spaces, punctuation, or newlines, /b does not match any of these word-separating characters, it only matches one Location.

    If you need a more precise statement, /b matches a position where the preceding character and the following character are not all (one is, one is not or not exists)/w.

    If you are looking for hi followed not far by a Lucy, you should use /bhi/b.*/bLucy /b.

    Here, . is another metacharacter that matches any character except newline characters. * is also a metacharacter, but it represents not a character, nor a position, but a quantity - it specifies that the content preceding * can be reused continuously. Any number of times so that the entire expression is matched . Therefore, .* taken together means any number of characters that do not include newlines. Now the meaning of /bhi/b.*/bLucy/b is obvious: is first a word hi, then any number of characters (but not Line break), and finally the word Lucy .

    The newline character is '/n', the character whose ASCII encoding is 10 (hexadecimal 0x0A).

    If other metacharacters are used at the same time, we can construct a more powerful regular expression. For example, the following example:

    0/d/d-/d/d/d/d/d/d/d/d matches such a string: starts with 0, then two digits, then a hyphen "-", and finally 8 digits (that is, China's phone number. Of course, this example can only Matches the situation where the area code is 3 digits).

    Here/d is a new metacharacter, matching one digit (0, or 1, or 2, or...) . - is not a metacharacter and only matches itself - the hyphen (or minus sign, or dash, or whatever you want to call it).

    In order to avoid so many annoying repetitions, we can also write this expression like this: 0/d{2}-/d{8}. Here the meaning of {2}({8}) after /d The previous /d must be repeated and matched 2 times (8 times) .

    Testing regular expressions

    Other available testing tools:

    • RegexBuddy

    • Javascript regular expression online testing tool

    If you don’t find regular expressions difficult to read and write, either you are a genius, or you are not from Earth. The syntax of regular expressions can be confusing, even for people who use it regularly. Because it is difficult to read and write and prone to errors, it is necessary to find a tool to test regular expressions.

    Some details of regular expressions are different in different environments. This tutorial introduces the behavior of regular expressions under Microsoft .Net Framework 2.0, so I will introduce you to a tool under .Net Regex Tester. First make sure you have .Net Framework 2.0 installed, and then download Regex Tester. This is a green software. After downloading, open the compressed package and run RegexTester.exe directly.

    The following is a screenshot of Regex Tester running:

    Regex Tester运行时的截图

    Metacharacters

    Now you already know a few useful metacharacters , such as /b,.,*, and /d. There are more metacharacters in regular expressions, such as /s matches any whitespace character in , including Spaces, tabs, newlines, Chinese full-width spaces, etc.. /w matches letters or numbers or underscores or Chinese characters, etc. .

    Special processing of Chinese/Chinese characters is supported by the regular expression engine provided by .Net. For details in other environments, please check the relevant documents.

    Here are some more examples:

    /ba/w*/bmatches with the letters ## Words starting with #a - first the beginning of a word (/b), then the letters a, then any number of letters or numbers (/w*), and finally the end of the word (/b).

    Okay, now let’s talk about what the words in the regular expression mean: no less than one consecutive

    /w. Yes, this has little to do with the thousands of things with the same name that you have to memorize when learning English:)

    /d+match1 or more consecutive digits. The + here is a metacharacter similar to *, but the difference is *matches repeated any number of times (possibly 0 times) , while + matches repeated 1 or More times .

    /b/w{6}/b matches the word with exactly 6 characters.

    Regular expression engines usually provide a method to "test whether a specified string matches a regular expression", such as the RegExp.test() method in JavaScript or the Regex.IsMatch() method in .NET. Matching here refers to whether there is any part of the string that conforms to the expression rules. If ^ and $ are not used, for /d{5,12}In terms of ##, using this method can only ensure that the string contains 5 to 12 consecutive digits , instead of the entire string being 5 to 12 digits. The

    metacharacters ^ (the symbol on the same key as the number 6) and $ both match A position, which is somewhat similar to /b. ^ matches the beginning of the string you are looking for, and $ matches the end. These two codes are very useful when verifying the input content. For example, if a website requires that the QQ number you fill in must be 5 to 12 digits, you can use: ^/d{5,12} $.

    The {5,12} here is similar to the {2} introduced before, except However, {2} matching can only be repeated 2 times , {5,12} means cannot be repeated less than 5 times and cannot be more than 12 times , otherwise it will not match.

    Because ^ and $ are used, the entire input string must be used with /d{5,12} to match, that is to say, the entire input must be 5 to 12 numbers , so if the input QQ If the number can match this regular expression, it meets the requirements.

    Similar to the option to ignore case, some regular expression processing tools also have an option to process multiple lines. If this option is selected, the meaning of ^ and $ becomes the beginning of the matching line and ends with .

    Character escape

    If you want to find the metacharacter itself, for example, you search ., or *, the problem arises: you can't specify them, because they will be interpreted as something else. At this time you have to use / to cancel the special meaning of these characters. Therefore, you should use /. and /*. Of course, to find / itself, you also have to use //.

    For example: unibetter/.commatchesunibetter.com,C://Windowsmatches C:/Windows.

    Repeat

    You have already seen the previous *,+,{2},{5,12}These are the repeated matching methods. The following are all qualifiers in regular expressions (specified number of codes, such as *, {5,12}, etc.):

    Table 1. Commonly used metacharacters
    Code Description
    . Matches any character except newline characters
    /w Match letters or numbers or underscores or Chinese characters
    ##/s Matches any whitespace character
    /d Match numbers
    /b Matches the beginning or end of a word
    ^ Matches the beginning of the string
    $ Matches the end of the string

    Here are some examples of using repetition:

    Windows/d+matches Windows followed by 1 or more digits

    ^/w+matches the first word of a line (or the first word of the entire string, specifically Which meaning to match depends on the option settings)

    Character class

    To find numbers, letters or numbers, blanks is very simple, because there are already corresponding characters Metacharacters for a set, but what if you want to match a set of characters without predefined metacharacters (such as the vowels a, e, i, o, u)?

    is very simple, you just need to list them in square brackets, like [aeiou] will match any English element The phonetic letters , [.?!] match punctuation marks (. or ? or !) .

    We can also easily specify a character range, like [0-9] represents the same meaning /d is exactly the same: One digit; similarly [a-z0-9A-Z_ ] is also completely equivalent to /w (if only English is considered).

    The following is a more complex expression: /(?0/d{2}[) -]?/d{8}.

    "(" and ")" are also metacharacters, which will be mentioned in the grouping section later, so they need to be escaped here.

    This expression can match phone numbers in several formats, like (010)88886666, or 022-22334455, or 02912345678, etc. Let’s do some analysis on it: First, there is an escape character /(, which can appear 0 or 1 times (?), then a 0, followed by 2 numbers (/d{2}), then ) One of or - or space, it appears 1 time or not ( ?), and finally 8 numbers (/d{8}).

    Branch condition

    Unfortunately, the expression just now can also match010)12345678or(022 -87654321 Such "incorrect" format. To solve this problem, we need to use ##branch condition in the regular expression. Branch conditions refers to several rules. If any one of them is met, it should be regarded as a match. The specific method is to use | Different rules are separated. Don’t understand? It doesn’t matter. Look at the example:

    #0/d{2}-/d{8}|0/d{3}-/d {7}

    This expression canmatch two types of phone numbers separated by hyphens: one is a three-digit area code and an 8-digit local number (such as 010-12345678). It is a 4-digit area code and a 7-digit local code (0376-2233445)/(0/d{2}/)[- ]?/d{ 8}|0/d{2}[- ]?/d{8}This expression

    matches a phone number with a 3-digit area code, where the area code can be enclosed in parentheses, It doesn't need to be used. The area code and the local number can be separated by a hyphen or a space, or there can be no separation

    . You can try to use branch conditions to expand this expression to also support 4-digit area codes ##. #

    /d{5}-/d{4}|/d{5}This expression is used to match zip codes in the United States. The rule for U.S. zip codes is 5 digits, or 9 digits separated by hyphens. The reason why this example is given is because it can illustrate a problem: When using branch conditions, pay attention to the order of each condition. If you change it to /d{5}|/d{5}-/d{4}, then only 5-digit postal codes (and 9-digit postal codes) will be matched. the first 5 digits of the zip code). The reason is that when matching branch conditions, each condition will be tested from left to right. If a certain branch is met, other conditions will not be considered.

    Grouping

    We have already mentioned how to repeat a single character (just add the qualifier directly after the character); but what if you want to repeat multiple characters? You can use parentheses to specify a subexpression (also called grouping), and then you can specify the subexpression Once the number of repetitions is determined, you can also perform other operations on subexpressions (will be introduced later).

    (/d{1,3}/.){3}/d{1,3} is a simple IP The address matches the expression. To understand this expression, analyze it in the following order: /d{1,3}matches a number from 1 to 3 digits, (/d{1,3}/.){3}matches three digits plus an English period (the whole is thisGroup) Repeat 3 times , and finally add a one to three digit number( /d{1,3}).

    No number in the IP address can be greater than 255. Don’t let the writers of the third season of "24" fool you...

    Unfortunately, it will also match256.300.888.999This IP address cannot exist. If you can use arithmetic comparison, you may be able to solve this problem simply, but regular expressions do not provide any mathematical functions, so you can only use lengthy grouping, selection, and character classes to describe a correct IP address: ((2[0-4]/d|25[0-5]|[01]?/d/d?)/.){3}(2[0-4]/d|25 [0-5]|[01]?/d/d?).

    The key to understanding this expression is to understand 2[0-4]/d|25[0-5]|[01]?/d/d?, I won’t go into details here, you should be able to analyze its meaning yourself.

    Table 2. Commonly used qualifiers
    Code/Syntax Description
    * Repeat zero or more times
    + Repeat one or more times
    ? Repeat zero or one time
    {n} Repeat n times
    {n,} ##Repeat n times or more
    {n,m} Repeat n to m times

    The above is the detailed content of Regular expression introductory tutorial. For more information, please follow other related articles on the PHP Chinese website!

    Statement
    The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
    JavaScript in Action: Real-World Examples and ProjectsJavaScript in Action: Real-World Examples and ProjectsApr 19, 2025 am 12:13 AM

    JavaScript's application in the real world includes front-end and back-end development. 1) Display front-end applications by building a TODO list application, involving DOM operations and event processing. 2) Build RESTfulAPI through Node.js and Express to demonstrate back-end applications.

    JavaScript and the Web: Core Functionality and Use CasesJavaScript and the Web: Core Functionality and Use CasesApr 18, 2025 am 12:19 AM

    The main uses of JavaScript in web development include client interaction, form verification and asynchronous communication. 1) Dynamic content update and user interaction through DOM operations; 2) Client verification is carried out before the user submits data to improve the user experience; 3) Refreshless communication with the server is achieved through AJAX technology.

    Understanding the JavaScript Engine: Implementation DetailsUnderstanding the JavaScript Engine: Implementation DetailsApr 17, 2025 am 12:05 AM

    Understanding how JavaScript engine works internally is important to developers because it helps write more efficient code and understand performance bottlenecks and optimization strategies. 1) The engine's workflow includes three stages: parsing, compiling and execution; 2) During the execution process, the engine will perform dynamic optimization, such as inline cache and hidden classes; 3) Best practices include avoiding global variables, optimizing loops, using const and lets, and avoiding excessive use of closures.

    Python vs. JavaScript: The Learning Curve and Ease of UsePython vs. JavaScript: The Learning Curve and Ease of UseApr 16, 2025 am 12:12 AM

    Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

    Python vs. JavaScript: Community, Libraries, and ResourcesPython vs. JavaScript: Community, Libraries, and ResourcesApr 15, 2025 am 12:16 AM

    Python and JavaScript have their own advantages and disadvantages in terms of community, libraries and resources. 1) The Python community is friendly and suitable for beginners, but the front-end development resources are not as rich as JavaScript. 2) Python is powerful in data science and machine learning libraries, while JavaScript is better in front-end development libraries and frameworks. 3) Both have rich learning resources, but Python is suitable for starting with official documents, while JavaScript is better with MDNWebDocs. The choice should be based on project needs and personal interests.

    From C/C   to JavaScript: How It All WorksFrom C/C to JavaScript: How It All WorksApr 14, 2025 am 12:05 AM

    The shift from C/C to JavaScript requires adapting to dynamic typing, garbage collection and asynchronous programming. 1) C/C is a statically typed language that requires manual memory management, while JavaScript is dynamically typed and garbage collection is automatically processed. 2) C/C needs to be compiled into machine code, while JavaScript is an interpreted language. 3) JavaScript introduces concepts such as closures, prototype chains and Promise, which enhances flexibility and asynchronous programming capabilities.

    JavaScript Engines: Comparing ImplementationsJavaScript Engines: Comparing ImplementationsApr 13, 2025 am 12:05 AM

    Different JavaScript engines have different effects when parsing and executing JavaScript code, because the implementation principles and optimization strategies of each engine differ. 1. Lexical analysis: convert source code into lexical unit. 2. Grammar analysis: Generate an abstract syntax tree. 3. Optimization and compilation: Generate machine code through the JIT compiler. 4. Execute: Run the machine code. V8 engine optimizes through instant compilation and hidden class, SpiderMonkey uses a type inference system, resulting in different performance performance on the same code.

    Beyond the Browser: JavaScript in the Real WorldBeyond the Browser: JavaScript in the Real WorldApr 12, 2025 am 12:06 AM

    JavaScript's applications in the real world include server-side programming, mobile application development and Internet of Things control: 1. Server-side programming is realized through Node.js, suitable for high concurrent request processing. 2. Mobile application development is carried out through ReactNative and supports cross-platform deployment. 3. Used for IoT device control through Johnny-Five library, suitable for hardware interaction.

    See all articles

    Hot AI Tools

    Undresser.AI Undress

    Undresser.AI Undress

    AI-powered app for creating realistic nude photos

    AI Clothes Remover

    AI Clothes Remover

    Online AI tool for removing clothes from photos.

    Undress AI Tool

    Undress AI Tool

    Undress images for free

    Clothoff.io

    Clothoff.io

    AI clothes remover

    Video Face Swap

    Video Face Swap

    Swap faces in any video effortlessly with our completely free AI face swap tool!

    Hot Tools

    SublimeText3 English version

    SublimeText3 English version

    Recommended: Win version, supports code prompts!

    Safe Exam Browser

    Safe Exam Browser

    Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

    Dreamweaver Mac version

    Dreamweaver Mac version

    Visual web development tools

    EditPlus Chinese cracked version

    EditPlus Chinese cracked version

    Small size, syntax highlighting, does not support code prompt function

    SublimeText3 Mac version

    SublimeText3 Mac version

    God-level code editing software (SublimeText3)