Home >php教程 >php手册 >PHP correctly parses UTF-8 string skills application_php basics

PHP correctly parses UTF-8 string skills application_php basics

WBOY
WBOYOriginal
2016-05-16 09:00:252120browse

In "Learning PHP & MYSQL - Character Encoding (Part 1)", the conversion relationship between Unicode and UTF-8 is introduced, and a UTF-8 encoding rule is summarized. Based on this encoding rule, a UTF-8 encoding parsing program is written. , the following is the implementation of PHP:

Copy code The code is as follows:

/*
Program function, $str is a UTF-8 encoded string mixed with Chinese and English.
This string is correctly decoded and displayed according to UTF-8 encoding rules.
*/


$str = 'Today is very happy, so we decided to go to KFC to eat Coke chicken wings!!!';

/*
$str is to be intercepted The string
$len is the number of characters intercepted
*/
function utf8sub($str,$len) {
if($len return '';
}

$offset = 0; // Offset when intercepting high-order bytes
$chars = 0; // Number of characters intercepted
$res = ''; // Store the intercepted result string

while($chars // Take the first byte of the string first
// Convert it to decimal
// Then convert to binary
$high = ord(substr($str,$offset,1));

// echo '$high='. $high .'
' ;

if($high == null ){ // If the high bit is null, it proves that it has been fetched to the end, break directly
break;
}
if(($high> >2) === 0x3F){ // Shift the high bit to the right by 2 bits and compare it with binary 111111. If they are the same, take 6 bytes
// Intercept 2 bytes
$count = 6;
}else if(($high>>3) === 0x1F){ // Shift the high bit to the right by 2 bits and compare it with binary 11111. If they are the same, take 5 bytes
// Intercept 3 bytes$count = 5;
}else if(($high>>4) === 0xF){ // Shift the high bit to the right by 2 bits and compare it with binary 1111. If it is the same, take 4 bytes

//Intercept 4 bytes
$count = 4;
}else if(($high>>5) === 0x7){ // Shift the high bit right by 2 bits, and binary 111 comparison, if they are the same, take 3 bytes

// Intercept 5 bytes
$count = 3;
}else if(($high>>6) === 0x3) { // Shift the high bit to the right by 2 bits, compare it with binary 11, if it is the same, take 2 bytes
// Intercept 6 bytes
$count = 2;
}else if(($high >>7) === 0x0){ // Shift the high bit to the right by 2 bits, compare it with binary 0, if it is the same, take 1 byte
$count = 1;
}
// echo ' $count='.$count.'
';

$res .= substr($str,$offset,$count); // Take out a character and concatenate it with $res string
$chars = 1; // The number of characters intercepted is 1
$offset = $count; // The intercepted high offset is moved backward by $count bytes
}
return $res;
}

echo utf8sub($str,100);
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn