首页 >后端开发 >php教程 >如何使用 PHP 的 DOMDocument 和 XPath 高效地从特定 HTML 元素中提取文本？

如何使用 PHP 的 DOMDocument 和 XPath 高效地从特定 HTML 元素中提取文本？

Barbara Streisand原创: 2024-11-02 08:48:29586浏览

How to Efficiently Extract Text from Specific HTML Elements Using PHP's DOMDocument and XPath?

用 PHP 的 HTML DOMDocument 解析 HTML

问题：

利用 DOMDocument 对象，捕获特定 HTML 元素中的文本。例如，从以下 HTML 中提取“捕获此文本 1”和“捕获此文本 2”：

<div class="main">
    <div class="text">
    Capture this text 1
    </div>
</div>

<div class="main">
    <div class="text">
    Capture this text 2
    </div>
</div>

答案：

使用 DOMDocument::getElementsByTagName检索具有特定名称的所有标签可能对此任务效率低下。相反，请考虑利用 DOMXPath 类对文档使用 XPath 查询。

实现：

将 HTML 加载到 DOMDocument对象：

<code class="php">$html = <<<HTML
<div class="main">
 <div class="text">
 Capture this text 1
 </div>
</div>

<div class="main">
 <div class="text">
 Capture this text 2
 </div>
</div>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);</code>

实例化 DOMXPath 对象：

<code class="php">$xpath = new DOMXPath($dom);</code>

执行 XPath查询：

<code class="php">$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');</code>

检索文本值：

<code class="php">foreach ($tags as $tag) {
 var_dump(trim($tag->nodeValue));
}</code>

此方法有效提取“从提供的 HTML 中捕获此文本 1”和“捕获此文本 2”。

以上是如何使用 PHP 的 DOMDocument 和 XPath 高效地从特定 HTML 元素中提取文本？的详细内容。更多信息请关注PHP中文网其他相关文章！

php html Object for using class this

声明：

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系admin@php.cn

上一篇：How to Generate a Query String from an Array in PHP?下一篇：How to convert UTF-8 and ISO-8859-1 strings in PHP?

查看更多