Home  >  Article  >  Backend Development  >  Why is the return result of gbk encoding 3?

Why is the return result of gbk encoding 3?

WBOY
WBOYOriginal
2016-08-22 11:45:371285browse

php > $s="Hello";
php > echo mb_strlen($s,"utf8");
2
utf8 returns 2, I understand
php > echo mb_strlen($s,"gb2312") ;
4
This returns 4, I understand it too
php > echo mb_strlen($s,"gbk");
3
I don't understand here?

Reply content:

php > $s="Hello";
php > echo mb_strlen($s,"utf8");
2
utf8 returns 2, I understand
php > echo mb_strlen($s,"gb2312") ;
4
This returns 4, I understand it too
php > echo mb_strlen($s,"gbk");
3
I don't understand here?

Because $s is UTF8 encoded, you can get its length through GBK encoding without converting it to GBK.

UTF8 encoded Hello is HUAN犲ソ on GBK, so its length is 3.

This is what you should do:

<code>$a = mb_strlen(iconv( 'utf-8','gbk', $s), 'gbk');
$b = mb_strlen(iconv( 'utf-8','gb2312', $s), 'gb2312');
</code>

In other words, GB2312 is also wrong.

mb_strlen is the number of characters returned, so only returning 2 is correct. I don’t know how you understand the two cases of 4 and 3?

But when $s = "Hello", $s stores a UTF8 encoded string (encoded according to your source file). If you use GBK or GB2312 to decode this encoded data, It is possible to get garbled codes, so 4 and 3 should be the length of garbled codes.

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn