Home  >  Article  >  Backend Development  >  PHP Regular Expression: How to match all meta tags in HTML

PHP Regular Expression: How to match all meta tags in HTML

WBOY
WBOYOriginal
2023-06-22 22:21:06910browse

In web development, the meta tag is a very important element. It provides additional information about the content of the web page, such as web page title, web page description, keywords, etc. When processing HTML pages, sometimes you need to use regular expressions to match the meta tags in the front-end code. Let's introduce how to use PHP regular expressions to match all meta tags in the HTML page.

First of all, we need to understand the conventional writing method of meta tags in HTML pages. The general format is as follows:

<meta charset="UTF-8">
<meta name="description" content="这里是网页的描述">
<meta name="keywords" content="这里是网页的关键词">
<title>这里是网页标题</title>

According to this template, we can use regular expressions to match these meta tags. First, we need to get the source code of the HTML page, and then use PHP's preg_match_all() function to match the meta tags in it, as shown below:

$html = file_get_contents("http://www.example.com");
preg_match_all('/<meta.*?>/i', $html, $matches);
print_r($matches);

In the above code, first use the file_get_contents() function to obtain Get the source code of an HTML page, then use the preg_match_all() function to match all meta tags in the source code, and store the matching results in the $matches variable. Among them, /6a04ca81eba32ac506955b5caa466824/i is the regular expression used to match the meta tag, where 968665f35f374bc13504677d947d6d7d of the tag is matched. i means that case is ignored when matching.

The execution result of the above code may be as follows:

Array
(
    [0] => Array
        (
            [0] => <meta charset="UTF-8">
            [1] => <meta name="description" content="这里是网页的描述">
            [2] => <meta name="keywords" content="这里是网页的关键词">
        )

)

We can see that through the preg_match_all() function, we successfully matched all meta tags in the HTML page and will match the results Saved in the $matches array.

At the same time, if we need to match specific attribute values ​​in the meta tag, such as charset, name, content, etc., we can also add corresponding matching rules to the above regular expression, as shown below:

$html = file_get_contents("http://www.example.com");
preg_match_all('/<metas+.*?charset="(S+).*?>/i', $html, $matches);
print_r($matches);

In the above code, we added a matching rule of s to match the spaces between tag attributes, and then added the matching rule of charset="(S )" to the regular expression to use Matches the charset attribute and its attribute value in the meta tag. Where S means matching any character in the character set except spaces, indicating that the character set appears at least once. After running the above code, the output may look like the following:

Array
(
    [0] => Array
        (
            [0] => <meta charset="UTF-8">
        )

    [1] => Array
        (
            [0] => UTF-8
        )

)

From the above matching results, we can see that the charset attribute and its attribute value in the page are successfully matched.

In short, by using PHP's regular expressions, we can flexibly match various elements in HTML pages, including meta tags. It should be noted that although regular expressions are convenient, they also have certain limitations. For example, they cannot handle some complex nested tags, so you need to be careful when using regular expressions.

The above is the detailed content of PHP Regular Expression: How to match all meta tags in HTML. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn