Friends in need of PHP correctly parsing UTF-8 strings can refer to it.
The code is as follows
代码如下 |
复制代码 |
$str = '今天非常Happy,所有决定去KFC吃可乐鸡翅!!!';
/*
$str 是待截取的字符串
$len 是截取的字符数
*/
function utf8sub($str,$len) {
if($len <= 0){
return '';
}
$offset = 0; // 截取高位字节时的偏移量
$chars = 0; // 截取到的字符数
$res = ''; // 存放截取的结果字符串
while($chars < $len){
// 先取字符串的第一个字节
// 将它转为十进制
// 再转为二进制
$high = ord(substr($str,$offset,1));
// echo '$high='. $high .' ';
if($high == null ){ // 如果取出高位为null,证明已经取到末尾,直接break
break;
}
if(($high>>2) === 0x3F){ // 将高位右移2位,和二进制111111比较,相同则取6个字节
// 截取2个字节
$count = 6;
}else if(($high>>3) === 0x1F){ // 将高位右移2位,和二进制11111比较,相同则取5个字节
// 截取3个字节
$count = 5;
}else if(($high>>4) === 0xF){ // 将高位右移2位,和二进制1111比较,相同则取4个字节
// 截取4个字节
$count = 4;
}else if(($high>>5) === 0x7){ // 将高位右移2位,和二进制111比较,相同则取3个字节
// 截取5个字节
$count = 3;
}else if(($high>>6) === 0x3){ // 将高位右移2位,和二进制11比较,相同则取2个字节
// 截取6个字节
$count = 2;
}else if(($high>>7) === 0x0){ // 将高位右移2位,和二进制0比较,相同则取1个字节
$count = 1;
}
// echo '$count='.$count.' ';
$res .= substr($str,$offset,$count); // 取出一个字符与$res字符串连接
$chars += 1; // 截取到的字符数+1
$offset += $count; // 截取高位偏移量向后移$count字节
}
return $res;
}
echo utf8sub($str,100);
|
|
Copy code |
|
$str = 'Today is very happy, so we decided to go to KFC to eat Coke Chicken Wings!!!';
/*
$str is the string to be intercepted
$len is the number of characters intercepted
*/
function utf8sub($str,$len) {
if($len <= 0){
return '';
}
$offset = 0; // Offset when intercepting high-order bytes
$chars = 0; // Number of characters intercepted
$res = ''; // Store the intercepted result string
while($chars < $len){
//Get the first byte of the string first
//Convert it to decimal
//Convert to binary
$high = ord(substr($str,$offset,1));
// echo '$high='. $high .'
';
if($high == null ){ // If the high bit is null, it proves that it has been fetched to the end, break directly
break;
}
if(($high>>2) === 0x3F){ // Shift the high bit to the right by 2 bits and compare it with binary 111111. If it is the same, take 6 bytes
//Intercept 2 bytes
$count = 6;
}else if(($high>>3) === 0x1F){ // Shift the high bit to the right by 2 bits and compare it with binary 11111. If it is the same, take 5 bytes
// Intercept 3 bytes
$count = 5;
}else if(($high>>4) === 0xF){ // Shift the high bit to the right by 2 bits and compare it with binary 1111. If it is the same, take 4 bytes
//Intercept 4 bytes
$count = 4;
}else if(($high>>5) === 0x7){ // Shift the high bit to the right by 2 bits and compare it with binary 111. If it is the same, take 3 bytes
//Intercept 5 bytes
$count = 3;
}else if(($high>>6) === 0x3){ // Shift the high bit to the right by 2 bits, compare it with binary 11, if it is the same, take 2 bytes
//Intercept 6 bytes
$count = 2;
}else if(($high>>7) === 0x0){ // Shift the high bit to the right by 2 bits, compare it with binary 0, if it is the same, take 1 byte
$count = 1;
}
// echo '$count='.$count.'
';
$res .= substr($str,$offset,$count); // Take out a character and connect it to the $res string
$chars += 1; // Number of intercepted characters +1
$offset += $count; // Intercept the high offset and move it backward by $count bytes
}
return $res;
}
echo utf8sub($str,100);
http://www.bkjia.com/PHPjc/632169.htmltruehttp: //www.bkjia.com/PHPjc/632169.htmlTechArticleFriends in need of PHP correctly parsing UTF-8 strings can refer to it. The code is as follows Copy the code $str = 'Today is very happy, so we decided to go to KFC to eat Coke chicken wings!!!'; /* $str is the word to be intercepted...
Statement:The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn