Home >Backend Development >PHP Tutorial >How to Perform Search and Replace on HTML Content While Ignoring HTML Tags?
When using preg_replace to perform search and replace operations on strings that contain HTML, it is often desirable to ignore the HTML tags and only modify the actual text content. However, this can be challenging using regular expressions alone, as they are not well-suited for parsing HTML.
One alternative approach is to utilize DOMDocument and DOMXPath to handle the HTML structure. By leveraging XPath queries, it is possible to locate text nodes within the HTML document that match the search criteria, and then wrap those nodes with the desired HTML elements without affecting the rest of the HTML tags.
For example, consider the following code snippet which avoids HTML tag interference:
$str = '...'; // HTML document $search = 'text to highlight'; $doc = new DOMDocument; $doc->loadXML($str); $xp = new DOMXPath($doc); $anchor = $doc->getElementsByTagName('body')->item(0); if (!$anchor) { throw new Exception('Anchor element not found.'); } // XPath query to locate text nodes containing the search text $r = $xp->query('//*[contains(., "'.$search.'")]/*[FALSE = contains(., "'.$search.'")]/..', $anchor); if (!$r) { throw new Exception('XPath failed.'); } // Process search results foreach($r as $i => $node) { $textNodes = $xp->query('.//child::text()', $node); $range = new TextRange($textNodes); // Identify matching text node ranges $ranges = array(); while (FALSE !== $start = $range->indexOf($search)) { $base = $range->split($start); $range = $base->split(strlen($search)); $ranges[] = $base; } // Wrap matching text nodes with HTML elements foreach($ranges as $range) { foreach($range->getNodes() as $node) { $span = $doc->createElement('span'); $span->setAttribute('class', 'search_highlight'); $node = $node->parentNode->replaceChild($span, $node); $span->appendChild($node); } } } echo $doc->saveHTML();
This code utilizes XPath queries to locate text nodes that contain the search term, and then creates a TextRange class to manage the subranges within the text nodes. Each matching range is then wrapped within a span element with a custom class, which can be used for highlighting or other purposes.
By employing DOMDocument and DOMXPath instead of relying solely on regular expressions, this approach provides a more efficient and reliable way to ignore HTML tags when performing search and replace operations on HTML content.
The above is the detailed content of How to Perform Search and Replace on HTML Content While Ignoring HTML Tags?. For more information, please follow other related articles on the PHP Chinese website!