Home  >  Article  >  Backend Development  >  How to Truncate HTML Text Without Breaking Tags?

How to Truncate HTML Text Without Breaking Tags?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-11-12 09:44:01859browse

How to Truncate HTML Text Without Breaking Tags?

Truncating HTML Text without Breaking Tags

When truncating text containing HTML, it is crucial to ensure that tags are properly handled to avoid breaking the layout and content flow.

The Problem:

In traditional methods, tags are included in the truncated text, resulting in incomplete or broken tags. This can disrupt formatting, create confusing content, and potentially trigger Tidy cleanup issues.

The Solution:

To address this problem, it is necessary to parse the HTML and keep track of open tags. By closing open tags before truncating the text, one can ensure tag integrity.

PHP Implementation:

The following PHP code demonstrates how to truncate HTML text while preserving tag structure:

function printTruncated($maxLength, $html, $isUtf8=true)
{
    // Initialization
    $printedLength = 0;
    $position = 0;
    $tags = array();

    // Regex pattern for matching HTML tags and entities
    $re = $isUtf8
        ? '{</?([a-z]+)[^>]*>|&amp;#?[a-zA-Z0-9]+;|[\x80-\xFF][\x80-\xBF]*}'
        : '{</?([a-z]+)[^>]*>|&amp;#?[a-zA-Z0-9]+;}';

    // Iterate through the HTML
    while ($printedLength < $maxLength &amp;&amp; preg_match($re, $html, $match, PREG_OFFSET_CAPTURE, $position))
    {
        // Extract tag and tag position
        list($tag, $tagPosition) = $match[0];

        // Print text leading up to the tag
        $str = substr($html, $position, $tagPosition - $position);
        $printedLength += strlen($str);

        // Handle the tag
        if ($tag[0] == '&amp;' || ord($tag) >= 0x80)
        {
            // Pass entity or UTF-8 sequence unchanged
            print($tag);
            $printedLength++;
        }
        else
        {
            if ($tag[1] == '/')
            {
                // Closing tag
                assert(array_pop($tags) == $match[1][0]); // Check for nested tags
                print($tag);
            }
            else if ($tag[strlen($tag) - 2] == '/')
            {
                // Self-closing tag
                print($tag);
            }
            else
            {
                // Opening tag
                print($tag);
                $tags[] = $match[1][0];
            }
        }

        // Continue after the tag
        $position = $tagPosition + strlen($tag);
    }

    // Print any remaining text
    if ($position < strlen($html))
        print(substr($html, $position, $maxLength - $printedLength));

    // Close open tags
    while (!empty($tags))
        printf('</%s>', array_pop($tags));
}

Usage:

printTruncated(10, '<b>&amp;lt;Hello&amp;gt;</b> <img src="world.png" alt="" /> world!'); print("\n");
printTruncated(10, '<table><tr><td>Heck, </td><td>throw</td></tr><tr><td>in a</td><td>table</td></tr></table>'); print("\n");
printTruncated(10, "<em><b>Hello</b>&amp;#20;w\xC3\xB8rld!</em>"); print("\n");

The above is the detailed content of How to Truncate HTML Text Without Breaking Tags?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn