Home >Backend Development >PHP Tutorial >How to Efficiently Extract Text from Specific HTML Elements Using PHP\'s DOMDocument and XPath?

How to Efficiently Extract Text from Specific HTML Elements Using PHP\'s DOMDocument and XPath?

Barbara Streisand
Barbara StreisandOriginal
2024-11-02 08:48:29516browse

How to Efficiently Extract Text from Specific HTML Elements Using PHP's DOMDocument and XPath?

Parsing HTML with PHP's HTML DOMDocument

Question:

Utilizing the DOMDocument object, capture text within specific HTML elements. For example, extracting "Capture this text 1" and "Capture this text 2" from the following HTML:

<div class="main">
    <div class="text">
    Capture this text 1
    </div>
</div>

<div class="main">
    <div class="text">
    Capture this text 2
    </div>
</div>

Answer:

Using DOMDocument::getElementsByTagName to retrieve all tags with a specific name may prove inefficient for this task. Instead, consider employing an XPath query on the document, leveraging the DOMXPath class.

Implementation:

  1. Load HTML into a DOMDocument Object:

    <code class="php">$html = <<<HTML
    <div class="main">
     <div class="text">
     Capture this text 1
     </div>
    </div>
    
    <div class="main">
     <div class="text">
     Capture this text 2
     </div>
    </div>
    HTML;
    
    $dom = new DOMDocument();
    $dom->loadHTML($html);</code>
  2. Instantiate DOMXPath Object:

    <code class="php">$xpath = new DOMXPath($dom);</code>
  3. Execute XPath Query:

    <code class="php">$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');</code>
  4. Retrieve Text Values:

    <code class="php">foreach ($tags as $tag) {
     var_dump(trim($tag->nodeValue));
    }</code>

This approach effectively extracts "Capture this text 1" and "Capture this text 2" from the provided HTML.

The above is the detailed content of How to Efficiently Extract Text from Specific HTML Elements Using PHP\'s DOMDocument and XPath?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn