问题:
您拥有包含标题和元素的 HTML 内容常规文本。您的目标是将具有指定类的元素中的文本(标题为“Heading1-H”,文本为“Normal-H”)提取到两个单独的数组中:$heading 和 $content。
解决方案:
使用 PHP DOM 和XPath
PHP DOM(文档对象模型)和 XPath(XML 路径语言)为此任务提供了强大的解决方案。这是实现:
$test = <<<HTML <p class="Heading1-P"> <span class="Heading1-H">Chapter 1</span> </p> <p class="Normal-P"> <span class="Normal-H">This is chapter 1</span> </p> <p class="Heading1-P"> <span class="Heading1-H">Chapter 2</span> </p> <p class="Normal-P"> <span class="Normal-H">This is chapter 2</span> </p> <p class="Heading1-P"> <span class="Heading1-H">Chapter 3</span> </p> <p class="Normal-P"> <span class="Normal-H">This is chapter 3</span> </p> HTML; $dom = new DOMDocument(); $dom->loadHTML($test); $xpath = new DOMXPath($dom); $heading = parseToArray($xpath, 'Heading1-H'); $content = parseToArray($xpath, 'Normal-H'); var_dump($heading); echo "<br/>"; var_dump($content); echo "<br/>"; function parseToArray(DOMXPath $xpath, string $class): array { $xpathquery = "//*[@class='$class']"; $elements = $xpath->query($xpathquery); $resultarray = []; foreach ($elements as $element) { $nodes = $element->childNodes; foreach ($nodes as $node) { $resultarray[] = $node->nodeValue; } } return $resultarray; }
输出:
array(3) { [0] => string(8) "Chapter 1" [1] => string(8) "Chapter 2" [2] => string(8) "Chapter 3" } <br/> array(3) { [0] => string(15) "This is chapter 1" [1] => string(15) "This is chapter 2" [2] => string(15) "This is chapter 3" } <br/>
以上是如何使用 PHP 将具有不同类的特定 HTML 元素中的文本提取到单独的数组中?的详细内容。更多信息请关注PHP中文网其他相关文章!