Heim >Backend-Entwicklung >PHP-Tutorial >使用PHP XPath采集的时候,如何保留nodeValue里的html符号
代码如下:
<code>$html = <meta charset="UTF-8"> <title>Test</title> <div id="content"> <p> <span> abcdefghijklmn<br>opqrstuvwxyz </span> </p> </div> EOF; // create document object model $dom = new DOMDocument(); // load html into document object model @$dom->loadHTML($html); // create domxpath instance $xPath = new DOMXPath($dom); // get all elements with a particular id and then loop through and print the href attribute $elements = $xPath->query('//*[@id="content"]/p/span'); $content = $elements->item(0)->nodeValue; echo $content;</code>
内容里的<br>
会被去除,使用什么操作比如有没有$e->innerHtml之类的,可以保留html标签。
8.18 更新:
<code>$html = <meta charset="UTF-8"> <title>Test</title> <div id="content"> <p> <span class="aaa"> abcdefghijklmn<br><span>opq</span>rstuvwxyz </span> </p> </div> EOF; // create document object model $dom = new DOMDocument(); // load html into document object model @$dom->loadHTML($html); // create domxpath instance $xPath = new DOMXPath($dom); // get all elements with a particular id and then loop through and print the href attribute $elements = $xPath->query('//*[@id="content"]/p/span'); $nodeName = $elements->item(0)->nodeName; // $content = $elements->item(0)->nodeValue; $content = $dom->saveXml($elements->item(0)); $content = $dom->saveHtml($elements->item(0)); $content = preg_replace(array("#^#isU", "#{$nodeName}>$#isU"), array('', ''), $content); echo $content;</code>
代码如下:
<code>$html = <meta charset="UTF-8"> <title>Test</title> <div id="content"> <p> <span> abcdefghijklmn<br>opqrstuvwxyz </span> </p> </div> EOF; // create document object model $dom = new DOMDocument(); // load html into document object model @$dom->loadHTML($html); // create domxpath instance $xPath = new DOMXPath($dom); // get all elements with a particular id and then loop through and print the href attribute $elements = $xPath->query('//*[@id="content"]/p/span'); $content = $elements->item(0)->nodeValue; echo $content;</code>
内容里的<br>
会被去除,使用什么操作比如有没有$e->innerHtml之类的,可以保留html标签。
8.18 更新:
<code>$html = <meta charset="UTF-8"> <title>Test</title> <div id="content"> <p> <span class="aaa"> abcdefghijklmn<br><span>opq</span>rstuvwxyz </span> </p> </div> EOF; // create document object model $dom = new DOMDocument(); // load html into document object model @$dom->loadHTML($html); // create domxpath instance $xPath = new DOMXPath($dom); // get all elements with a particular id and then loop through and print the href attribute $elements = $xPath->query('//*[@id="content"]/p/span'); $nodeName = $elements->item(0)->nodeName; // $content = $elements->item(0)->nodeValue; $content = $dom->saveXml($elements->item(0)); $content = $dom->saveHtml($elements->item(0)); $content = preg_replace(array("#^#isU", "#{$nodeName}>$#isU"), array('', ''), $content); echo $content;</code>
自己找到了办法。。。
<code>$content = $elements->item(0)->nodeValue; // >> 改成 >> $content = $dom->saveXml($elements->item(0));</code>