Home  >  Article  >  Backend Development  >  PHP correctly parses UTF-8 string skills application_PHP tutorial

PHP correctly parses UTF-8 string skills application_PHP tutorial

WBOY
WBOYOriginal
2016-07-21 15:15:14904browse

In "Learning PHP & MYSQL - Character Encoding (Part 1)", the conversion relationship between Unicode and UTF-8 is introduced, and a UTF-8 encoding rule is summarized. Based on this encoding rule, a UTF-8 encoding parsing program is written. , the following is the implementation of PHP:

Copy code The code is as follows:

/*
Program function, $str is a UTF-8 encoded string mixed with Chinese and English.
Decode and display this string correctly according to UTF-8 encoding rules.
*/


$str = 'Today is very happy, so we decided to go to KFC to eat Coke chicken wings!!!';

/*
$str is to be intercepted The string
$len is the number of intercepted characters
*/
function utf8sub($str,$len) {
if($len <= 0){
return '' ;
}

$offset = 0; // Offset when intercepting high-order bytes
$chars = 0; // Number of characters intercepted
$res = '' ; // Store the intercepted result string

while($chars < $len){
// Take the first byte of the string first
// Convert it to decimal
// Then convert to binary
$high = ord(substr($str,$offset,1));

// echo '$high='. $high .'
';

if($high == null ){ // If the high bit is null, it proves that it has been fetched to the end, break directly
break;
}
if( ($high>>2) === 0x3F){ // Shift the high bit to the right by 2 bits and compare it with binary 111111. If it is the same, take 6 bytes
// Intercept 2 bytes
$count = 6;
}else if(($high>>3) === 0x1F){ // Shift the high bit to the right by 2 bits and compare it with binary 11111. If it is the same, take 5 bytes
// Intercept 3 bytes
$count = 5;
}else if(($high>>4) === 0xF){ // Shift the high bit to the right by 2 bits and compare it with binary 1111. If they are the same, then Take 4 bytes

// Take 4 bytes
$count = 4;
}else if(($high>>5) === 0x7){ // Will Shift the high bit right by 2 bits and compare with binary 111. If it is the same, take 3 bytes

// Intercept 5 bytes
$count = 3;
}else if(($high> >6) === 0x3){ // Shift the high bit to the right by 2 bits, compare with binary 11, if the same, take 2 bytes
// Intercept 6 bytes
$count = 2;
}else if(($high>>7) === 0x0){ // Shift the high bit to the right by 2 bits and compare it with binary 0. If they are the same, take 1 byte
$count = 1;
}
// echo '$count='.$count.'
';

$res .= substr($str,$offset,$count); / / Take out a character and connect it to $res string
$chars += 1; // Number of intercepted characters + 1
$offset += $count; // Intercept the high offset and move it backward by $count Bytes
}
return $res;
}

echo utf8sub($str,100);

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/326131.htmlTechArticleIn "Learn PHPlt;?php /* Program function, $str is a mixed Chinese and English UTF-8 encoded character string, correctly decode and display this string according to UTF-8 encoding rules. */ $str = 'Today is very...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn