Home > Article > Backend Development > How can DOMDocument and XPath be used to Target and Extract Specific Text Content from HTML?
DOMDocument Parsing for Targeting Specific Content
Using "DOMDocument", a powerful PHP library, allows for precise parsing of HTML documents. Unlike "getElementsByTagName", which retrieves all tags with a specific name, this method utilizes XPath queries to effectively target desired elements.
Capture Text Nodes within Specific Contexts
To extract specific text content, the process involves:
$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');
This query retrieves all
Iterating through the resulting list of elements using a "foreach" loop allows for the extraction of "nodeValue", which contains the actual text:
foreach ($tags as $tag) { var_dump(trim($tag->nodeValue)); }
Example Implementation
Consider the following HTML snippet:
<code class="html"><div class="main"> <div class="text"> Capture this text 1 </div> </div> <div class="main"> <div class="text"> Capture this text 2 </div> </div></code>
Using the provided query, the output would be:
string 'Capture this text 1' (length=19) string 'Capture this text 2' (length=19)
This demonstrates the ability to precisely extract specific text content within a hierarchical HTML structure using "DOMDocument" and XPath.
The above is the detailed content of How can DOMDocument and XPath be used to Target and Extract Specific Text Content from HTML?. For more information, please follow other related articles on the PHP Chinese website!