Home  >  Article  >  Backend Development  >  PHP implementation code to get the text between tags

PHP implementation code to get the text between tags

WBOY
WBOYOriginal
2016-07-25 08:56:531551browse
This article introduces several examples of using PHP to obtain text between tags. Friends in need can refer to it.

The following example provides a way to retrieve text between tags.

Note: Do not use regular expressions to parse html.

By using the regular expression preg_match() and preg_match_all() functions, these two functions work like a PHP loop, traversing multiple times to get the desired result. In addition, using the dom function can speed up the parsing speed and obtain clean parsing results.

The following example is implemented using the preg_match() function. Code:

<?php

 /**
 *
 * @get text between tags
 *
 * @param string (The string with tags)
 *
 * @param string $tagname (the name of the tag
 *
 * @return string (Text between tags)
 *
 */
 function getTextBetweenTags($string, $tagname)
 {
    $pattern = "/<$tagname>(.*?)<\/$tagname>/";
    preg_match($pattern, $string, $matches);
    return $matches[1];
 }
?>

Note: The above implements a simple tag acquisition function, but it cannot handle nested tags and incomplete tags. This problem can be easily solved by using php's dom extension.

Look at the example below, the function itself has three references: $tag text between tags $html HTML or XML to search for $strict Tells the function to load HTML or XML mode, defaulting to HTML mode. The third parameter allows setting up functions to parse XML and some custom tags found in XHTML documents.

Code:

<?php

/**
 *
 * @get text between tags
 *
 * @param string $tag The tag name
 *
 * @param string $html The XML or XHTML string
 *
 * @param int $strict Whether to use strict mode
 *
 * @return array
 *
 */
function getTextBetweenTags($tag, $html, $strict=0)
{
    /*** a new dom object ***/
    $dom = new domDocument;

    /*** load the html into the object ***/
    if($strict==1)
    {
        $dom->loadXML($html);
    }
    else
    {
        $dom->loadHTML($html);
    }

    /*** discard white space ***/
    $dom->preserveWhiteSpace = false;

    /*** the tag by its tag name ***/
    $content = $dom->getElementsByTagname($tag);

    /*** the array to return ***/
    $out = array();
    foreach ($content as $item)
    {
        /*** add node value to the out array ***/
        $out[] = $item->nodeValue;
    }
    /*** return the results ***/
    return $out;
}
?>

In this example, if using normal html, there is no need to provide the third parameter. This allows handling of invalid, incomplete html tags. The closing

tag is missing, however, using dom and loadHtml allows for such a deviation. This example will still parse the html and retrieve all anchor tags between all text in an array.

Code:

<?php

$html = '<body>
<h1>Heading</h1>
jbxue.com
<p>paragraph here</p>
<p>Paragraph with a LINK TO jbxue.com</p>
<p>This is a broken paragraph
</body>';

$content = getTextBetweenTags('a', $html);

foreach( $content as $item )
{
    echo $item.'<br />';
}
?>

In this final example, two custom tags are used, applied to XML or XHTML files. The third parameter is set to use XML schema and parsed custom tags.

Code:

<?php

$xhtml = '<html>
<body>
<para>This is a paragraph</para>
<para>This is another paragraph</para>
</body>
</html>';

$content2 = getTextBetweenTags('para', $xhtml, 1);
foreach( $content2 as $item )
{
    echo $item.'<br />';
}
?>


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn