Home >Backend Development >PHP Tutorial >How can I efficiently extract specific text from HTML using PHP DOMDocument and DOMXpath?

How can I efficiently extract specific text from HTML using PHP DOMDocument and DOMXpath?

Susan SarandonOriginal: 2024-10-31 01:18:29434browse

Parsing HTML with PHP DOMDocument

Utilizing the DOMDocument class in PHP provides a more efficient and reliable method for parsing HTML compared to using regular expressions. To extract specific text from an HTML document, the DOMXpath class plays a crucial role.

Example:

Consider the following HTML string:

<code class="html"><div class="main">
    <div class="text">
        Capture this text 1
    </div>
</div>

<div class="main">
    <div class="text">
        Capture this text 2
    </div>
</div></code>

Our goal is to retrieve the text "Capture this text 1" and "Capture this text 2."

XPath Query Approach:

Instead of relying on DOMDocument::getElementsByTagName, which retrieves all tags with a given name, XPath allows us to target specific elements based on their structure.

<code class="php">$html = <<<HTML
<div class="main">
    <div class="text">
        Capture this text 1
    </div>
</div>

<div class="main">
    <div class="text">
        Capture this text 2
    </div>
</div>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);</code>

Using XPath, we can execute the following query:

<code class="php">$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');
foreach ($tags as $tag) {
    var_dump(trim($tag->nodeValue));
}</code>

This query retrieves all div tags with the class "text" that are nested within div tags with the class "main."

Output:

string 'Capture this text 1' (length=19)
string 'Capture this text 2' (length=19)

This demonstrates the effectiveness of using PHP's DOMDocument and DOMXpath for accurate HTML parsing and extraction of specific content.

The above is the detailed content of How can I efficiently extract specific text from HTML using PHP DOMDocument and DOMXpath?. For more information, please follow other related articles on the PHP Chinese website!

php html String for using class this

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：How do you achieve the equivalent of PHP\'s explode() function in JavaScript?Next article：How do you achieve the equivalent of PHP\'s explode() function in JavaScript?

See more

How can I efficiently extract specific text from HTML using PHP DOMDocument and DOMXpath?

Related articles