search
Homephp教程php手册PHP correctly parses UTF-8 string skills application_php basics

In "Learning PHP & MYSQL - Character Encoding (Part 1)", the conversion relationship between Unicode and UTF-8 is introduced, and a UTF-8 encoding rule is summarized. Based on this encoding rule, a UTF-8 encoding parsing program is written. , the following is the implementation of PHP:

Copy code The code is as follows:

/*
Program function, $str is a UTF-8 encoded string mixed with Chinese and English.
This string is correctly decoded and displayed according to UTF-8 encoding rules.
*/


$str = 'Today is very happy, so we decided to go to KFC to eat Coke chicken wings!!!';

/*
$str is to be intercepted The string
$len is the number of characters intercepted
*/
function utf8sub($str,$len) {
if($len return '';
}

$offset = 0; // Offset when intercepting high-order bytes
$chars = 0; // Number of characters intercepted
$res = ''; // Store the intercepted result string

while($chars // Take the first byte of the string first
// Convert it to decimal
// Then convert to binary
$high = ord(substr($str,$offset,1));

// echo '$high='. $high .'
' ;

if($high == null ){ // If the high bit is null, it proves that it has been fetched to the end, break directly
break;
}
if(($high> >2) === 0x3F){ // Shift the high bit to the right by 2 bits and compare it with binary 111111. If they are the same, take 6 bytes
// Intercept 2 bytes
$count = 6;
}else if(($high>>3) === 0x1F){ // Shift the high bit to the right by 2 bits and compare it with binary 11111. If they are the same, take 5 bytes
// Intercept 3 bytes$count = 5;
}else if(($high>>4) === 0xF){ // Shift the high bit to the right by 2 bits and compare it with binary 1111. If it is the same, take 4 bytes

//Intercept 4 bytes
$count = 4;
}else if(($high>>5) === 0x7){ // Shift the high bit right by 2 bits, and binary 111 comparison, if they are the same, take 3 bytes

// Intercept 5 bytes
$count = 3;
}else if(($high>>6) === 0x3) { // Shift the high bit to the right by 2 bits, compare it with binary 11, if it is the same, take 2 bytes
// Intercept 6 bytes
$count = 2;
}else if(($high >>7) === 0x0){ // Shift the high bit to the right by 2 bits, compare it with binary 0, if it is the same, take 1 byte
$count = 1;
}
// echo ' $count='.$count.'
';

$res .= substr($str,$offset,$count); // Take out a character and concatenate it with $res string
$chars = 1; // The number of characters intercepted is 1
$offset = $count; // The intercepted high offset is moved backward by $count bytes
}
return $res;
}

echo utf8sub($str,100);
Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft