Home > Article > Backend Development > 求关于正则表达式PHP过滤编辑器的新闻内容
从网站A数据库中读取的新闻内容(HTML源码格式)写入网站B的新闻表中,格式不统一,而且有很多冗余代码,很多是从office复制过去的,需要过滤掉网站A新闻内容中冗余的HTML代码。新闻内容在php的$NEWS字段中,给这个字段用正则表达式处理一下。
比如
<font id=888 style="FONT-SIZE: 18px; FONT-FAMILY: FONT-SIZE: 18px"><P><STRONG><SPAN lang=EN-US style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"> 一级</SPAN><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">标题<SPAN lang=EN-US>粗体1</SPAN></SPAN></STRONG></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">新闻内容</SPAN></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><STRONG></STRONG></SPAN></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><STRONG><IMG src="http://192.168.1.1/Webimage/2222.jpg" width=800></STRONG></SPAN></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><STRONG>一级标题粗体2</STRONG></SPAN></P> <BR><BR><P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"></SPAN> </P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><STRONG><IMG src="http://192.168.1.1/Webimage/2233.jpg" width=800></STRONG></SPAN></P> <P><SPAN style="FONT-SIZE: 16pt; FONT-FAMILY: 仿宋_GB2312; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"></SPAN></P> <P align=center><SPAN style="FONT-FAMILY: 仿宋_GB2312; FONT-SIZE: 16pt; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">这段文字为居中</SPAN></P><BR><P align=right><SPAN style="FONT-FAMILY: 仿宋_GB2312; FONT-SIZE: 16pt; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">这段文字为右对齐</SPAN></P><BR><P align=left><SPAN style="FONT-FAMILY: 仿宋_GB2312; FONT-SIZE: 16pt; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA">这段文字是斜体</SPAN></P><BR><P align=left><SPAN style="FONT-FAMILY: 仿宋_GB2312; FONT-SIZE: 16pt; mso-hansi-font-family: Calibri; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA"><A title="" href="http://www.baidu.com/">加一个链接</A></SPAN></P></FONT>
<p><strong>一级标题粗体</strong></p><p>新闻内容</p><p><img src="http://192.168.1.1/Webimage/2222.jpg" / alt="求关于正则表达式PHP过滤编辑器的新闻内容" ></p><p>第二段新闻正文</p><p><strong>一级标题粗体2</strong></p><p><img src="http://192.168.1.1/Webimage/2233.jpg" / alt="求关于正则表达式PHP过滤编辑器的新闻内容" ></p><p style="max-width:90%">这段文字为居中</p><p style="text-align: right;">这段文字为右对齐</p><p><em>这段文字为斜体</em></p><p><a href="http://www.baidu.com">加一个链接</a></p>
$mysql_db_hostname = "localhost";$mysql_db_user = "root";$mysql_db_password = "root";$mysql_db_database = "test";$con = mysqli_connect($mysql_db_hostname, $mysql_db_user, $mysql_db_password, $mysql_db_database);mysqli_query($con, "SET NAMES utf8");$sql="SELECT * FROM editor";$re=mysqli_query($con,$sql)or die("读取数据出错". mysqli_error());while($row=mysqli_fetch_array($re)){$str=$row["news"];echo $str;}
//去掉不允许的 tag$s = strip_tags($s,'<p>,<a>,<img alt="求关于正则表达式PHP过滤编辑器的新闻内容" >,<strong>,<em>');//去掉tag 里面的属性$s = preg_replace('/<(p|strong|em)[^>]+?>/i','<$1>',$s);//img 和 a 单独处理$s = preg_replace('/<(a|img).+?(href|src)="([^"]+?)"[^>]*?>/i','<$1 $2="$3">',$s);echo $s;
//去掉不允许的 tag$s = strip_tags($s,'<p>,<a>,<img alt="求关于正则表达式PHP过滤编辑器的新闻内容" >,<strong>,<em>');//去掉tag 里面的属性$s = preg_replace('/<(p|strong|em)[^>]+?>/i','<$1>',$s);//img 和 a 单独处理$s = preg_replace('/<(a|img).+?(href|src)="([^"]+?)"[^>]*?>/i','<$1 $2="$3">',$s);echo $s;
//去掉不允许的 tag$s = strip_tags($s,'<p>,<a>,<img alt="求关于正则表达式PHP过滤编辑器的新闻内容" >,<strong>,<em>');//去掉tag 里面的属性$s = preg_replace('/<(strong|em)[^>]+?>/i','<$1>',$s);//img 和 a 单独处理$s = preg_replace('/<(a|img).+?(href|src)="([^"]+?)"[^>]*?>/i','<$1 $2="$3">',$s);//单独处理 p$s = preg_replace('/<p\s*align="*?(right|left|center)"*?.*?>/i','<p style="max-width:90%">',$s);echo $s;
//去掉不允许的 tag$s = strip_tags($s,'<p>,<a>,<img alt="求关于正则表达式PHP过滤编辑器的新闻内容" >,<strong>,<em>');//去掉tag 里面的属性$s = preg_replace('/<(strong|em)[^>]+?>/i','<$1>',$s);//img 和 a 单独处理$s = preg_replace('/<(a|img).+?(href|src)="([^"]+?)"[^>]*?>/i','<$1 $2="$3">',$s);//单独处理 p$s = preg_replace('/<p\s*align="*?(right|left|center)"*?.*?>/i','<p style="max-width:90%">',$s);echo $s;