Home >Backend Development >PHP Tutorial >How to Extract Text from HTML Elements with Specific Classes into Flat Arrays using PHP DOM?

How to Extract Text from HTML Elements with Specific Classes into Flat Arrays using PHP DOM?

DDD
DDDOriginal
2024-11-15 17:18:03962browse

How to Extract Text from HTML Elements with Specific Classes into Flat Arrays using PHP DOM?

Extracting Flat Text from Elements with a Designated Class Using PHP DOM

Extracting text from specific HTML elements is a common task in web development. PHP DOM provides robust tools for parsing HTML and accessing its contents. This article addresses a specific requirement to extract text from elements with a nominated class into two flat arrays.

Problem

Given HTML content containing text distributed between multiple p elements with alternating class names, the task is to save the text into two arrays: one for headings and one for content. For instance, given the following HTML:

<p class="Heading1-P">
    <span class="Heading1-H">Chapter 1</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 1</span>
</p>

We need to obtain the following output:

$heading = ['Chapter 1', 'Chapter 2', 'Chapter 3'];
$content = ['This is chapter 1', 'This is chapter 2', 'This is chapter 3'];

Solution

To accomplish this extraction using PHP DOM, we employ DOMDocument and DOMXPath. The solution involves the following steps:

  1. Load the HTML into a DOMDocument object:
$dom = new DOMDocument();
$dom->loadHTML($test);
  1. Create a DOMXPath object to perform XPaths:
$xpath = new DOMXPath($dom);
  1. Use parseToArray() function to extract text from elements with specified class:
$heading = parseToArray($xpath, 'Heading1-H');
$content = parseToArray($xpath, 'Normal-H');

In the parseToArray() function:

  • It performs an XPath query for the designated class.
  • Iterates through the matched nodes and extracts their text content.
  • Stores the extracted text in an array, which is returned.

Here's the complete PHP code:

query($xpathquery);

    $resultarray = [];
    foreach ($elements as $element) {
        $nodes = $element->childNodes;
        foreach ($nodes as $node) {
            $resultarray[] = $node->nodeValue;
        }
    }

    return $resultarray;
}

$test = <<
    Chapter 2

This is chapter 2

Chapter 3

This is chapter 3

HTML; $dom = new DOMDocument(); $dom->loadHTML($test); $xpath = new DOMXPath($dom); $heading = parseToArray($xpath, 'Heading1-H'); $content = parseToArray($xpath, 'Normal-H'); var_dump($heading); echo "
"; var_dump($content); echo "
";

This approach utilizes the power of PHP DOM and XPath to efficiently extract text from HTML documents, allowing for more complex and targeted content manipulation.

The above is the detailed content of How to Extract Text from HTML Elements with Specific Classes into Flat Arrays using PHP DOM?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn