Home  >  Article  >  Backend Development  >  Sample code for parsing and processing HTML/XML using PHP regular expressions

Sample code for parsing and processing HTML/XML using PHP regular expressions

WBOY
WBOYOriginal
2023-09-09 09:55:451128browse

Sample code for parsing and processing HTML/XML using PHP regular expressions

Sample code using PHP's regular expressions to parse and process HTML/XML

Introduction:
Regular expressions are a powerful text pattern matching Tools can provide convenient parsing and processing capabilities when processing structured data such as HTML and XML. This article will introduce how to use PHP's regular expressions to parse and process HTML/XML, and provide relevant code examples.

1. Extraction of HTML tags
When processing HTML, it is often necessary to extract all HTML tags from the text. We can use PHP's regular expression function preg_match_all to achieve this function. The following is a sample code:

<?php

$html = "<div id='container'><h1>标题</h1><p>内容</p></div>";
$pattern = "/<[^>]+>/";
preg_match_all($pattern, $html, $matches);

foreach ($matches[0] as $tag) {
    echo $tag . "
";
}

?>

In the above code, we use the regular expression /] >/ to match the content in angle brackets, that is, HTML Label. Through the preg_match_all function, all matched tags are saved in the $matches variable and traversed to print them out.

2. Attribute extraction of HTML tags
In addition to extracting HTML tags, sometimes it is also necessary to extract attributes in HTML tags. We can use PHP's regular expression function preg_match to achieve this function. The following is a sample code:

<?php

$html = "<a href='http://www.example.com' target='_blank'>链接</a>";
$pattern = "/<as+.*?>/i";
preg_match($pattern, $html, $matches);

if (isset($matches[0])) {
    $tag = $matches[0];
    $pattern = "/href=['"](.*?)['"]/i";
    preg_match($pattern, $tag, $hrefMatches);

    if (isset($hrefMatches[1])) {
        $href = $hrefMatches[1];
        echo "链接地址:" . $href . "
";
    }
}

?>

In the above code, we first use the regular expression /<as .>/i</as> to match the a tag, and use the preg_match function to match The obtained tags are stored in the $matches variable. Then, we use the regular expression / href=['"](.*?)['"]/i to match the href attribute, and use the preg_match function to save the matched attribute value in $ in the hrefMatches variable. Finally, we get the attribute value and print it out.

3. Extraction of XML nodes
Similar to HTML, we can also use PHP regular expressions to extract nodes in XML. The following is a sample code:

<?php

$xml = "<root><item id='1'>内容1</item><item id='2'>内容2</item></root>";
$pattern = "/<items+.*?>/i";
preg_match_all($pattern, $xml, $matches, PREG_SET_ORDER);

foreach ($matches as $match) {
    $tag = $match[0];
    $pattern = "/id=['"](.*?)['"]/i";
    preg_match($pattern, $tag, $idMatches);

    if (isset($idMatches[1])) {
        $id = $idMatches[1];
        echo "ID:" . $id . "
";
    }
}

?>

In the above code, we first use the regular expression /<items .>/i</items> to match the item node, and use the preg_match_all function to match The arrived nodes are saved in the $matches variable. Then, we use the regular expression / id=['"](.*?)['"]/i to match the id attribute, and use the preg_match function to save the matched attribute value in $ idMatches variable. Finally, we get the attribute value and print it out.

Conclusion:
The above is a sample code that uses PHP's regular expressions to parse and process HTML/XML. Through the powerful function of regular expressions, we can easily extract and process tags and attributes in HTML/XML to achieve flexible processing of structured data. I hope this article will help you understand the application of regular expressions in HTML/XML processing.

The above is the detailed content of Sample code for parsing and processing HTML/XML using PHP regular expressions. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn