Home  >  Article  >  Backend Development  >  PHP Regular Expressions: How to match all links in HTML

PHP Regular Expressions: How to match all links in HTML

王林
王林Original
2023-06-22 13:15:071145browse

In web development, we often need to deal with links in HTML pages. How to use PHP regular expressions to match all links in an HTML page? here we come to find out.

Links in HTML pages are generally implemented through the tag. We can match links based on this tag. First, we need to get the source code of the HTML page through PHP's file_get_contents() function, for example:

$html = file_get_contents('http://www.example.com');

Next, we can use regular expressions to match all links. The following is a simple regular expression that matches links:

$pattern = '/<a href="https://www.php.cn/link/d28a3097fa7cf63ad01c4f328314e2f2">https://www.php.cn/link/d28a3097fa7cf63ad01c4f328314e2f2</a>/';

In the regular expression, matches a link tag that starts with and starts with the href attribute. href="https://www.php.cn/link/2b9bd744f7c0d06123d9d9557310fa80" matches the link address. The brackets indicate that this is a capturing group, which means that we can use the $matches variable to access the matching result later. >(.?) matches the link text and is also a capturing group.

Next, we can use the preg_match_all() function to apply the regular expression to the HTML page source code to match all links:

preg_match_all($pattern, $html, $matches);

The function returns an array $matches, where $ matches[0] contains the complete string of all matching links, $matches[1] corresponds to capture group 1, which is the link address, and $matches[2] corresponds to capture group 2, which is the link text.

Finally, we can loop through the $matches[1] array, which is the link address array, to get the addresses of all links:

foreach ($matches[1] as $link) {
    echo $link . "
";
}

The complete code is as follows:

$html = file_get_contents('http://www.example.com');
$pattern = '/<a href="https://www.php.cn/link/d28a3097fa7cf63ad01c4f328314e2f2">https://www.php.cn/link/d28a3097fa7cf63ad01c4f328314e2f2</a>/';
preg_match_all($pattern, $html, $matches);

foreach ($matches[1] as $link) {
    echo $link . "
";
}

Note , this regular expression can only match basic link formats, for example:

<a href="http://www.example.com">Example</a>

If the link contains other attributes or the label format does not meet the basic requirements, it cannot be matched. In practical applications, the regular expression can be modified as needed to adapt to different link formats.

In summary, to use PHP regular expressions to match links in HTML pages, you can use the file_get_contents() function to obtain the page source code, then use the preg_match_all() function and appropriate regular expressions to complete the matching, and finally loop Just access the matching results.

The above is the detailed content of PHP Regular Expressions: How to match all links in HTML. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn