PHP中文替换

<?php

//定义编码

header( 'Content-Type:text/html;charset=utf-8 ');

$words=array('我','你','他');

$content="测一测我是不是违禁词";

$banned=generateRegularExpression($words);

//检查违禁词

$res_banned=check_words($banned,$content);

write_html($content,$res_banned);

/**

* @describe 数组生成正则表达式

* @param array $words

* @return string

function generateRegularExpression($words)

{

$regular = implode('|', array_map('preg_quote', $words));

return "/$regular/i";

}

/**

* @describe 字符串生成正则表达式

* @param array $words

* @return string

function generateRegularExpressionString($string){

$str_arr[0]=$string;

$str_new_arr= array_map('preg_quote', $str_arr);

return $str_new_arr[0];

}

/**

* 检查敏感词

* @param $banned

* @param $string

* @return bool|string

function check_words($banned,$string)

{ $match_banned=array();

//循环查出所有敏感词

$new_banned=strtolower($banned);

$i=0;

do{

$matches=null;

if (!empty($new_banned) && preg_match($new_banned, $string, $matches)) {

$isempyt=empty($matches[0]);

if(!$isempyt){

$match_banned = array_merge($match_banned, $matches);

$matches_str=strtolower(generateRegularExpressionString($matches[0]));

$new_banned=str_replace("|".$matches_str."|","|",$new_banned);

$new_banned=str_replace("/".$matches_str."|","/",$new_banned);

$new_banned=str_replace("|".$matches_str."/","/",$new_banned);

}

$i++;

if($i>20){

$isempyt=true;

break;

}

}while(count($matches)>0 && !$isempyt);

//查出敏感词

if($match_banned){

return $match_banned;

}

//没有查出敏感词

return array();

}

/**

* 打印到页面上

* @param $filepath

* @param $res_mingan

* @param $res_banned

function write_html($content,$res_banned){

print_r($content);

if($res_banned){

print_r(" <font color='red'>违禁词（".count($res_banned)."）：</font>".implode('|',$res_banned));

}

echo "<br>";

}

1、匹配中文

$str = "中文“;
preg_match_all("/[\x{4e00}-\x{9fa5}]+/u",$str,$match);

2、替换中文:

在所在的php文件里，要加上

mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");

这样才能支持多字节进行模式匹配。详细介绍:http://blog.chinaunix.net/uid-20279807-id-1711213.html

3、php提供了四个替换函数，分别是str_replace，preg_replace，mb_ereg_replace，ereg_replace（在php7.1已经摒弃掉）

在替换中文时，发现用preg_replace替换中文最合适.

str_replace 不支持正则表达式，不能完全匹配，导致局部字段被替换。例如: $str = "模块一模块一断电"，$str = str_replace("模块一","module1",$str);，导致"模块一断电"被替换成"module1断电"。

mixed preg_replace ( mixed $pattern , mixed $replacement , mixed $subject [, int $limit = -1 [, int &$count ]] ) 支持$pattern,$replacement 以数组的方式进行查找替换，但数组过多时，进行搜索匹配，耗CPU严重。

mb_ereg_replace 支持正则表达式，但不用分隔符//进行匹配，但使用mb_ereg_replace，发现有些中文匹配不了。具体原因暂不清楚。

相关文章