Home  >  Article  >  php教程  >  strip_tags延伸函数-处理html(采集用的上)

strip_tags延伸函数-处理html(采集用的上)

PHP中文网
PHP中文网Original
2016-05-25 17:07:041522browse

strip_tags延伸函数-处理html(采集用的上)

/** 
 * This function turns HTML into text  * 将html转化为txt   
*/ 
function html2txt($document) {     
$search = array (&#39;@<script[^>]*?>.*?</script>@si&#39;, 
// Strip out javascript                     
&#39;@<style[^>]*?>.*?</style>@siU&#39;, 
// Strip style tags properly                     
&#39;@<[\/\!]*?[^<>]*?>@si&#39;, // Strip out HTML tags                     
&#39;@<![\s\S]*?--[ \t\n\r]*>@&#39; )// Strip multi-line comments including CDATA     ;     
$text = preg_replace ( $search, &#39;&#39;, $document );     
return $text; }  
/**  
* removes the HTML tags along with their contents  
* 移除/过滤html标签并且移除标签内的内容  
* 注:$tags为需要保留的标签  invert为true时结果相反  
*/ 
function strip_tags_content($text, $tags = &#39;&#39;, $invert = FALSE) {    
 preg_match_all ( &#39;/<(.+?)[\s]*\/?[\s]*>/si&#39;, trim ( $tags ), $tags );     
 $tags = array_unique ( $tags [1] );           
 if (is_array ( $tags ) and count ( $tags ) > 0) {         
 if ($invert == FALSE) {             
 return preg_replace ( &#39;@<(?!(?:&#39; . implode ( &#39;|&#39;, $tags ) . &#39;)\b)(\w+)\b.*?>.*?</\1>@si&#39;, &#39;&#39;, $text );       
   } else {             
   return preg_replace ( &#39;@<(&#39; . implode ( &#39;|&#39;, $tags ) . &#39;)\b.*?>.*?</\1>@si&#39;, &#39;&#39;, $text );         }    
    } elseif ($invert == FALSE) {        
     return preg_replace ( &#39;@<(\w+)\b.*?>.*?</\1>@si&#39;, &#39;&#39;, $text );    
      }     
     return $text; 
     }


 以上就是strip_tags延伸函数-处理html(采集用的上)的内容,更多相关内容请关注PHP中文网(www.php.cn)!


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn