Home >Backend Development >PHP Tutorial >How to Truncate Text with Embedded HTML Without Breaking Tags?

How to Truncate Text with Embedded HTML Without Breaking Tags?

Linda Hamilton
Linda HamiltonOriginal
2024-11-10 04:37:02836browse

How to Truncate Text with Embedded HTML Without Breaking Tags?

Truncating Text with Embedded HTML

When dealing with text containing HTML tags, it's essential to ensure proper handling during truncation to avoid breaking tags or displaying invalid content. Here's how you can truncate text while maintaining the integrity of HTML:

PHP Implementation:

The following PHP function uses regular expressions to parse HTML and maintains a stack of open tags:

function printTruncated($maxLength, $html, $isUtf8 = true) { ... }

This function scans the HTML input, identifying tags and character entities. It ensures that tags are closed properly and counts character entities as single characters. This approach ensures that truncation occurs at a valid point without breaking any HTML structure.

Example Usage:

printTruncated(10, '<b><Hello&amp;gt;</b> <img src="world.png" alt="" /> world!'); // Outputs: 'Hello<b></b> <img src="world.png" alt="" />'

Python Implementation:

HTML parsing libraries like BeautifulSoup can assist with this task in Python:

from bs4 import BeautifulSoup, NavigableString

def truncate_html(text, max_length):
    soup = BeautifulSoup(text, 'lxml')
    truncated = soup.new_tag("div")
    tail = soup.new_string('')

    node_len = 0
    for node in soup.children:
        if isinstance(node, NavigableString):
            node_len += len(node)
            if node_len <= max_length:
                truncated.append(node)
            else:
                tail.append(node.string[:max_length - node_len])
                break
        else:
            node_len += len(str(node))
            truncated.append(node)
        
    return str(truncated) + str(tail)

Example Usage:

print(truncate_html('<b><Hello&amp;gt;</b> <img src="world.png" alt="" /> world!', 10)) # Outputs: 'Hello<b></b> <img src="world.png" alt="" />'

Conclusion:

By parsing and handling HTML tags during truncation, these methods ensure that the resulting text maintains its intended structure and content validity.

The above is the detailed content of How to Truncate Text with Embedded HTML Without Breaking Tags?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn