首页  >  文章  >  后端开发  >  求关于正则表达式PHP过滤编辑器的新闻内容

求关于正则表达式PHP过滤编辑器的新闻内容

WBOY
WBOY原创
2016-06-20 12:27:24956浏览

从网站A数据库中读取的新闻内容(HTML源码格式)写入网站B的新闻表中,格式不统一,而且有很多冗余代码,很多是从office复制过去的,需要过滤掉网站A新闻内容中冗余的HTML代码。新闻内容在php的$NEWS字段中,给这个字段用正则表达式处理一下。
比如

<font id=888 style="FONT-SIZE: 18px; FONT-FAMILY: FONT-SIZE: 18px"><P><STRONG><SPAN lang=EN-US style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">    一级</SPAN><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">标题<SPAN lang=EN-US>粗体1</SPAN></SPAN></STRONG></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">新闻内容</SPAN></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><STRONG></STRONG></SPAN></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><STRONG><IMG src="http://192.168.1.1/Webimage/2222.jpg" width=800></STRONG></SPAN></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><STRONG>一级标题粗体2</STRONG></SPAN></P> <BR><BR><P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"></SPAN> </P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><STRONG><IMG src="http://192.168.1.1/Webimage/2233.jpg" width=800></STRONG></SPAN></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"></SPAN></P> <P align=center><SPAN style="FONT-FAMILY: 仿宋_GB2312; FONT-SIZE: 16pt; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">这段文字为居中</SPAN></P><BR><P align=right><SPAN style="FONT-FAMILY: 仿宋_GB2312; FONT-SIZE: 16pt; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">这段文字为右对齐</SPAN></P><BR><P align=left><SPAN style="FONT-FAMILY: 仿宋_GB2312; FONT-SIZE: 16pt; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">这段文字是斜体</SPAN></P><BR><P align=left><SPAN style="FONT-FAMILY: 仿宋_GB2312; FONT-SIZE: 16pt; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><A title="" href="http://www.baidu.com/">加一个链接</A></SPAN></P></FONT>

要处理成
<p><strong>一级标题粗体</strong></p><p>新闻内容</p><p><img  src="http://192.168.1.1/Webimage/2222.jpg" / alt="求关于正则表达式PHP过滤编辑器的新闻内容" ></p><p>第二段新闻正文</p><p><strong>一级标题粗体2</strong></p><p><img  src="http://192.168.1.1/Webimage/2233.jpg" / alt="求关于正则表达式PHP过滤编辑器的新闻内容" ></p><p   style="max-width:90%">这段文字为居中</p><p style="text-align: right;">这段文字为右对齐</p><p><em>这段文字为斜体</em></p><p><a href="http://www.baidu.com">加一个链接</a></p>


具体的代码说明写了个网页,方便大神看: http://www.sunmuu.com/help/editorHelp.html

后面是php连接查询的代码,方便测试,数据库mysql,表是editor,两个字段ID(INIT)和news(MEDIUMTEXT):
$mysql_db_hostname = "localhost";$mysql_db_user = "root";$mysql_db_password = "root";$mysql_db_database = "test";$con = mysqli_connect($mysql_db_hostname, $mysql_db_user, $mysql_db_password, $mysql_db_database);mysqli_query($con, "SET NAMES utf8");$sql="SELECT * FROM editor";$re=mysqli_query($con,$sql)or die("读取数据出错". mysqli_error());while($row=mysqli_fetch_array($re)){$str=$row["news"];echo $str;}


回复讨论(解决方案)

//去掉不允许的 tag$s = strip_tags($s,'<p>,<a>,<img  alt="求关于正则表达式PHP过滤编辑器的新闻内容" >,<strong>,<em>');//去掉tag 里面的属性$s = preg_replace('/<(p|strong|em)[^>]+?>/i','<$1>',$s);//img 和 a 单独处理$s = preg_replace('/<(a|img).+?(href|src)="([^"]+?)"[^>]*?>/i','<$1 $2="$3">',$s);echo $s;

试试吧,其它不影响显示的你就自己去改吧

//去掉不允许的 tag$s = strip_tags($s,'<p>,<a>,<img  alt="求关于正则表达式PHP过滤编辑器的新闻内容" >,<strong>,<em>');//去掉tag 里面的属性$s = preg_replace('/<(p|strong|em)[^>]+?>/i','<$1>',$s);//img 和 a 单独处理$s = preg_replace('/<(a|img).+?(href|src)="([^"]+?)"[^>]*?>/i','<$1 $2="$3">',$s);echo $s;

试试吧,其它不影响显示的你就自己去改吧


太感谢你了jam00!!
有一点,p标签里面的style align属性需要保留, style="max-width:90%"和style="text-align: left;",这个逻辑顺序是怎样的?可以给个提示吗?

//去掉不允许的 tag$s = strip_tags($s,'<p>,<a>,<img  alt="求关于正则表达式PHP过滤编辑器的新闻内容" >,<strong>,<em>');//去掉tag 里面的属性$s = preg_replace('/<(strong|em)[^>]+?>/i','<$1>',$s);//img 和 a 单独处理$s = preg_replace('/<(a|img).+?(href|src)="([^"]+?)"[^>]*?>/i','<$1 $2="$3">',$s);//单独处理 p$s = preg_replace('/<p\s*align="*?(right|left|center)"*?.*?>/i','<p   style="max-width:90%">',$s);echo $s;

//去掉不允许的 tag$s = strip_tags($s,'<p>,<a>,<img  alt="求关于正则表达式PHP过滤编辑器的新闻内容" >,<strong>,<em>');//去掉tag 里面的属性$s = preg_replace('/<(strong|em)[^>]+?>/i','<$1>',$s);//img 和 a 单独处理$s = preg_replace('/<(a|img).+?(href|src)="([^"]+?)"[^>]*?>/i','<$1 $2="$3">',$s);//单独处理 p$s = preg_replace('/<p\s*align="*?(right|left|center)"*?.*?>/i','<p   style="max-width:90%">',$s);echo $s;


很给力,谢谢!
声明:
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn