Home > Article > Backend Development > How do I extract specific text from HTML using PHP\'s DOMDocument and XPath?
Parse HTML with PHP's DOMDocument
To extract specific text elements from HTML using PHP's DOMDocument, leveraging XPath queries can be more effective than relying solely on DOMDocument::getElementsByTagName. XPath queries allow for precise selection based on specific criteria within the document structure.
Capturing Text from Nested DIVs
The example HTML provided contains nested
To capture the target text, an XPath query can be employed:
<code class="php">$xpath->query('//div[@class="main"]/div[@class="text"]');</code>
This query selects all
Iterating and Extracting Node Values
To access the actual text content, each matching element can be iterated over and its nodeValue property accessed:
<code class="php">foreach ($tags as $tag) { var_dump(trim($tag->nodeValue)); }</code>
The trim() function is used to remove any leading or trailing whitespace from the extracted text.
Execution Output
Executing the code will output the following:
string 'Capture this text 1' (length=19) string 'Capture this text 2' (length=19)
The above is the detailed content of How do I extract specific text from HTML using PHP\'s DOMDocument and XPath?. For more information, please follow other related articles on the PHP Chinese website!