Home >Backend Development >PHP Tutorial >How do I extract specific text from HTML using PHP\'s DOMDocument and XPath?

How do I extract specific text from HTML using PHP\'s DOMDocument and XPath?

DDD
DDDOriginal
2024-11-01 13:00:03473browse

How do I extract specific text from HTML using PHP's DOMDocument and XPath?

Parse HTML with PHP's DOMDocument

To extract specific text elements from HTML using PHP's DOMDocument, leveraging XPath queries can be more effective than relying solely on DOMDocument::getElementsByTagName. XPath queries allow for precise selection based on specific criteria within the document structure.

Capturing Text from Nested DIVs

The example HTML provided contains nested

tags, where the target text is located within
elements with class "text", which are in turn nested within
elements with class "main".

To capture the target text, an XPath query can be employed:

<code class="php">$xpath->query('//div[@class="main"]/div[@class="text"]');</code>

This query selects all

elements that have a class attribute set to "text" and are descendants of
elements with a class attribute set to "main". The result is a list of the matching elements.

Iterating and Extracting Node Values

To access the actual text content, each matching element can be iterated over and its nodeValue property accessed:

<code class="php">foreach ($tags as $tag) {
    var_dump(trim($tag->nodeValue));
}</code>

The trim() function is used to remove any leading or trailing whitespace from the extracted text.

Execution Output

Executing the code will output the following:

string 'Capture this text 1' (length=19)
string 'Capture this text 2' (length=19)

The above is the detailed content of How do I extract specific text from HTML using PHP\'s DOMDocument and XPath?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn