Home  >  Article  >  Backend Development  >  求关于正则表达式PHP过滤编辑器的新闻内容

求关于正则表达式PHP过滤编辑器的新闻内容

WBOY
WBOYOriginal
2016-06-20 12:27:24995browse

从网站A数据库中读取的新闻内容(HTML源码格式)写入网站B的新闻表中,格式不统一,而且有很多冗余代码,很多是从office复制过去的,需要过滤掉网站A新闻内容中冗余的HTML代码。新闻内容在php的$NEWS字段中,给这个字段用正则表达式处理一下。
比如

<font id=888 style="FONT-SIZE: 18px; FONT-FAMILY: FONT-SIZE: 18px"><P><STRONG><SPAN lang=EN-US style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">    一级</SPAN><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">标题<SPAN lang=EN-US>粗体1</SPAN></SPAN></STRONG></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">新闻内容</SPAN></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><STRONG></STRONG></SPAN></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><STRONG><IMG src="http://192.168.1.1/Webimage/2222.jpg" width=800></STRONG></SPAN></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><STRONG>一级标题粗体2</STRONG></SPAN></P> <BR><BR><P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"></SPAN> </P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><STRONG><IMG src="http://192.168.1.1/Webimage/2233.jpg" width=800></STRONG></SPAN></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"></SPAN></P> <P align=center><SPAN style="FONT-FAMILY: 仿宋_GB2312; FONT-SIZE: 16pt; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">这段文字为居中</SPAN></P><BR><P align=right><SPAN style="FONT-FAMILY: 仿宋_GB2312; FONT-SIZE: 16pt; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">这段文字为右对齐</SPAN></P><BR><P align=left><SPAN style="FONT-FAMILY: 仿宋_GB2312; FONT-SIZE: 16pt; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">这段文字是斜体</SPAN></P><BR><P align=left><SPAN style="FONT-FAMILY: 仿宋_GB2312; FONT-SIZE: 16pt; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><A title="" href="http://www.baidu.com/">加一个链接</A></SPAN></P></FONT>

要处理成
<p><strong>一级标题粗体</strong></p><p>新闻内容</p><p><img  src="http://192.168.1.1/Webimage/2222.jpg" / alt="求关于正则表达式PHP过滤编辑器的新闻内容" ></p><p>第二段新闻正文</p><p><strong>一级标题粗体2</strong></p><p><img  src="http://192.168.1.1/Webimage/2233.jpg" / alt="求关于正则表达式PHP过滤编辑器的新闻内容" ></p><p   style="max-width:90%">这段文字为居中</p><p style="text-align: right;">这段文字为右对齐</p><p><em>这段文字为斜体</em></p><p><a href="http://www.baidu.com">加一个链接</a></p>


具体的代码说明写了个网页,方便大神看: http://www.sunmuu.com/help/editorHelp.html

后面是php连接查询的代码,方便测试,数据库mysql,表是editor,两个字段ID(INIT)和news(MEDIUMTEXT):
$mysql_db_hostname = "localhost";$mysql_db_user = "root";$mysql_db_password = "root";$mysql_db_database = "test";$con = mysqli_connect($mysql_db_hostname, $mysql_db_user, $mysql_db_password, $mysql_db_database);mysqli_query($con, "SET NAMES utf8");$sql="SELECT * FROM editor";$re=mysqli_query($con,$sql)or die("读取数据出错". mysqli_error());while($row=mysqli_fetch_array($re)){$str=$row["news"];echo $str;}


回复讨论(解决方案)

//去掉不允许的 tag$s = strip_tags($s,'<p>,<a>,<img  alt="求关于正则表达式PHP过滤编辑器的新闻内容" >,<strong>,<em>');//去掉tag 里面的属性$s = preg_replace('/<(p|strong|em)[^>]+?>/i','<$1>',$s);//img 和 a 单独处理$s = preg_replace('/<(a|img).+?(href|src)="([^"]+?)"[^>]*?>/i','<$1 $2="$3">',$s);echo $s;

试试吧,其它不影响显示的你就自己去改吧

//去掉不允许的 tag$s = strip_tags($s,'<p>,<a>,<img  alt="求关于正则表达式PHP过滤编辑器的新闻内容" >,<strong>,<em>');//去掉tag 里面的属性$s = preg_replace('/<(p|strong|em)[^>]+?>/i','<$1>',$s);//img 和 a 单独处理$s = preg_replace('/<(a|img).+?(href|src)="([^"]+?)"[^>]*?>/i','<$1 $2="$3">',$s);echo $s;

试试吧,其它不影响显示的你就自己去改吧


太感谢你了jam00!!
有一点,p标签里面的style align属性需要保留, style="max-width:90%"和style="text-align: left;",这个逻辑顺序是怎样的?可以给个提示吗?

//去掉不允许的 tag$s = strip_tags($s,'<p>,<a>,<img  alt="求关于正则表达式PHP过滤编辑器的新闻内容" >,<strong>,<em>');//去掉tag 里面的属性$s = preg_replace('/<(strong|em)[^>]+?>/i','<$1>',$s);//img 和 a 单独处理$s = preg_replace('/<(a|img).+?(href|src)="([^"]+?)"[^>]*?>/i','<$1 $2="$3">',$s);//单独处理 p$s = preg_replace('/<p\s*align="*?(right|left|center)"*?.*?>/i','<p   style="max-width:90%">',$s);echo $s;

//去掉不允许的 tag$s = strip_tags($s,'<p>,<a>,<img  alt="求关于正则表达式PHP过滤编辑器的新闻内容" >,<strong>,<em>');//去掉tag 里面的属性$s = preg_replace('/<(strong|em)[^>]+?>/i','<$1>',$s);//img 和 a 单独处理$s = preg_replace('/<(a|img).+?(href|src)="([^"]+?)"[^>]*?>/i','<$1 $2="$3">',$s);//单独处理 p$s = preg_replace('/<p\s*align="*?(right|left|center)"*?.*?>/i','<p   style="max-width:90%">',$s);echo $s;


很给力,谢谢!
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn