Home  >  Article  >  Backend Development  >  Several ways to split Chinese and English strings in PHP

Several ways to split Chinese and English strings in PHP

WBOY
WBOYOriginal
2016-07-25 08:58:061957browse
This article introduces the method of splitting Chinese and English strings in PHP, calculating the total length of characters, intercepting the string from the left side, and cutting the string into an array according to the given text. Friends in need can refer to it.

Split a piece of text according to the number of words, because the text may be a mixture of Chinese and English, and the PHP function strlen can only calculate the number of bytes of the string, so I implemented several functions and shared them.

Example 1, calculate the total length of characters.

<?php
function ccStrLen($str) #计算中英文混合字符串的长度
{
$ccLen=0;
$ascLen=strlen($str);
$ind=0;
$hasCC=ereg(”[xA1-xFE]“,$str); #判断是否有汉字
$hasAsc=ereg(”[x01-xA0]“,$str); #判断是否有ASCII字符
if($hasCC && !$hasAsc) #只有汉字的情况
return strlen($str)/2;
if(!$hasCC && $hasAsc) #只有Ascii字符的情况
return strlen($str);
for($ind=0;$ind<$ascLen;$ind++)
{
if(ord(substr($str,$ind,1))>0xa0)
{
$ccLen++;
$ind++;
}
else
{
$ccLen++;
}
}
return $ccLen;
}
?>

Example 2, intercept the string from the left side.

<?php
function ccStrLeft($str,$len) #从左边截取中英文混合字符串
{
$ascLen=strlen($str); if($ascLen<=$len) return $str;
$hasCC=ereg(”[xA1-xFE]“,$str); #同上
$hasAsc=ereg(”[x01-xA0]“,$str);
if(!$hasCC) return substr($str,0,$len);
if(!$hasAsc)
if($len & 0×01) #如果长度是奇数
return substr($str,0,$len+$len-2);
else
return substr($str,0,$len+$len);
$cind=0;$flag=0;$reallen=0;//实际取字节长
while($cind<$ascLen && $reallen<$len)
{ //by bbs.it-home.org
if(ord(substr($str,$cind,1))<0xA1){ //如果该字节为英文 则加一
$cind++;
}else{//否则 加2个字节
$cind+=2;
}
$reallen++;
}
return substr($str,0,$cind);
}
?>

Example 3, store the given text into an array according to the number of cuts (suitable for short text, long articles can be processed directly without dividing a part)

<?php
function SplitContent($content,$smslen){
$str_tmp=$content;
$arr_cont=array();
$len_tmp=0;
$i=0;//分割绝对位置
while (strlen($str_tmp)>0){
$str_tmp=ccStrLeft($str_tmp,$smslen);
array_push($arr_cont,$str_tmp);
$i+=strlen($str_tmp);
$str_tmp=substr($content,$i,strlen($content));
}
return $arr_cont;
} //by bbs.it-home.org
?>

Test:

<?php
$str=’a计算中英文混合1234字符串的长度abcd’;
echo $str.’的长度为:’.ccStrLen($str);
echo ‘<br>’;
$smslen=3;//截取长度
print_r(SplitContent($str,$smslen));
?>

Segmentation results: Array ( [0] => a calculation [1] => Chinese and English [2] => Mix 1 [3] => 234 [4] => string [5] => length of [6] => abc [7] => d )



Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn