最常见的词语二分法:
$str = '这是我的网站www.7di.net!' ;
//$str = iconv('GB2312','UTF-8',$str); $result = spStr( $str );
print_r( $result );
/** * UTF-8版 中文二元分词
*/ function spStr( $str )
{ $cstr = array ();
$search = array ( "," , "/" , "\\" , "." , ";" , ":" , "\"" , "!" , "~" , "`" , "^" , "(" , ")" , "?" , "-" , "\t" , "\n" , "'" , "<code class="php plain">,
$str = str_replace ( $search ,
preg_match_all(
preg_match_all(
$str = preg_replace(
$str = preg_replace(
$str = explode (
foreach ( $str as $s ) {
<code class="php variable">$l = strlen ( $s );
$bf = null;
for ( $i = 0; $i <code class="php variable">$l ; $i = $i +3) {
$ns1 = $s { $i }. $s { $i +1}. $s { $i +2};
if (isset( $s { $i +3})) {
$ns2 = $s { $i +3}. $s { $i +4}. $s { $i +5};
if (preg_match(
} else if ( $i == 0) {
$cstr [] = $ns1 ;
}
}
}
$estr = isset( $estr [0])? $estr [0]: array ();
$nstr = isset( $nstr [0])? $nstr [0]: array ();
return array_merge ( $nstr , $estr , $cstr );
} |
執行結果是:
Array ( [0] => 7 [1] => www [2] => di [3] => net [4] => 这是 [5] => 是我 [6] => 我的 [7] => 的网 [8] => 网站 ) |
接下来,将以上结果转换为区位码,PHP代码是:
foreach ( $result as $s ) {
$s = iconv( 'UTF-8' , 'GB2312' , $s );
$code [] = gbCode( $s );
} $code = implode(
echo $code ;
function gbCode( $str ) {
$return = null;
if (!preg_match(
<code class="php variable">$len = strlen ( $str );
for ( $i = 0; $i <code class="php variable">$len ; $i = $i +2) {
$return .= sprintf(
}
return $return ;
} |