PHP character encoding conversion from gb2312 to utf8

Home

Backend Development

PHP Tutorial

PHP character encoding conversion from gb2312 to utf8_PHP tutorial

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 13, 2016 am 10:48 AM

icphputf8generallyexistcharacteruscodingConvert

In character encoding conversion in PHP, we generally use iconv and mb_convert_encoding to operate, but the conversion performance of mb_convert_encoding is much worse than iconv.

string iconv ( string in_charset, string out_charset, string str )
Note: In addition to specifying the encoding to be converted to, the second parameter can also add two suffixes: //TRANSLIT and //IGNORE, where //TRANSLIT will automatically convert characters that cannot be directly converted into one or more Approximate characters, //IGNORE will ignore characters that cannot be converted, and the default effect is to truncate from the first illegal character.
Returns the converted string or FALSE on failure.

string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] )
You need to enable the mbstring extension library first, and remove the ; in front of extension=php_mbstring.dll in php.ini
mb_convert_encoding can specify multiple input encodings. It will automatically identify based on the content, but the execution efficiency is much worse than iconv;

Use:

It was found that iconv would make an error when converting the character "-" to gb2312. Without the ignore parameter, all strings following this character cannot be saved. No matter what, this "—" cannot be converted successfully and cannot be output. In addition, mb_convert_encoding does not have this bug.

In general, use iconv. Only use the mb_convert_encoding function when you are unable to determine what the original encoding is, or when iconv cannot be displayed normally after conversion.

The code is as follows

Copy code

代码如下

复制代码

/**
*自动判断把gbk或gb2312编码的字符串转为utf8
*能自动判断输入字符串的编码类，如果本身是utf-8就不用转换，否则就转换为utf-8的字符串
*支持的字符编码类型是：utf-8,gbk,gb2312
*@$str:string 字符串
*/
function yang_gbk2utf8($str){
    $charset = mb_detect_encoding()($str,array('UTF-8','GBK','GB2312'));
    $charset = strtolower($charset);
    if('cp936' == $charset){
        $charset='GBK';
    }
    if("utf-8" != $charset){
        $str = iconv($charset,"UTF-8//IGNORE",$str);
    }
    return $str;
}

/**

* Automatically determine and convert gbk or gb2312 encoded strings to utf8

*Can automatically determine the encoding type of the input string. If it is UTF-8, there is no need to convert it. Otherwise, it will be converted to a UTF-8 string

*Supported character encoding types are: utf-8, gbk, gb2312

*@$str:string string

代码如下	复制代码
$str="测试ing"; $cha=mb_detect_encoding($str); echo $cha; ?>

function yang_gbk2utf8($str){

$charset = mb_detect_encoding()($str,array('UTF-8','GBK','GB2312'));

$charset = strtolower($charset);

If('cp936' == $charset){

代码如下	复制代码
$str="测试ing"; $cha=mb_detect_encoding($str); $s = iconv($cha,"UTF-8",$str); var_dump($s); ?>

$charset='GBK'; } If("utf-8" != $charset){ $str = iconv($charset,"UTF-8//IGNORE",$str); } Return $str; }

Now I will look at some problems in converting character encoding Use the mb_detect_encoding($str); function. To use this function, you must open the php extension=php_mbstring.dll extension

The code is as follows	Copy code
$str="Testing"; $cha=mb_detect_encoding($str); echo $cha; ?>

I entered it on the gb2312 page, but the output result is UTF-8, which is very strange, and I haven’t found the reason yet. I want to uniformly convert to UTF-8 encoding, use the following method

The code is as follows	Copy code
$str="Testing"; $cha=mb_detect_encoding($str); $s = iconv($cha,"UTF-8",$str); var_dump($s); ?>

Result returned:
string(0) “”
It's really strange why this happens.
Use

The code is as follows	Copy code

代码如下	复制代码
$str="测试ing"; $cha=mb_detect_encoding($str); $s = iconv("GB2312","UTF-8",$str); var_dump($s); ?>

$str="Testing";

$cha=mb_detect_encoding($str);
$s = iconv("GB2312","UTF-8",$str);
var_dump($s);
?>

代码如下	复制代码
<br> <?php <br /> $a="我很好";<br> echo mb_convert_encoding ($a,'UTF-8');<br> ?> <br>

The returned result is correct. It is found that the function mb_detect_encoding($str); is still inaccurate. I don't know why.

Function string mb_convert_encoding ( string $str , string $to_encoding [, mixed $from_encoding ] )

Can be converted to a string with a specified encoding, I wrote an example

The code is as follows

Copy code

<p>
<?php </p>
$a="I'm fine";</p><p>
echo mb_convert_encoding ($a,'UTF-8');</p>
?>

代码如下	复制代码
$str=chr(254)."测试ing".chr(254); $s = iconv("GB2312","UTF-8",$str); var_dump($s); ?>

The result is: ??枞?枞? The question now is if I convert different string encodings to UTF-8, can I use iconv if I know the change in advance? But what should I do if I don’t know the encoding? Problem 3: iconv problem, if the encoding of the first byte of the converted string is greater than a certain number, null will be returned. For example:

The code is as follows	Copy code
$str=chr(254)."Testing".chr(254); $s = iconv("GB2312","UTF-8",$str); var_dump($s); ?>

Return
string(0) “”

For the usage of mb_convert_encoding, please see the official website:

http://cn.php.net/manual/en/function.mb-convert-encoding.php

Another function iconv in PHP is also used to convert string encoding, and its function is similar to the function above.

There are some detailed examples below:
iconv — Convert string to requested character encoding
(PHP 4 >= 4.0.5, PHP 5)
mb_convert_encoding — Convert character encoding
(PHP 4 >= 4.0.6, PHP 5)

Usage:
string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] )
You need to enable the mbstring extension library first, and remove the ; in front of extension=php_mbstring.dll in php.ini
mb_convert_encoding can specify multiple input encodings. It will automatically identify based on the content, but the execution efficiency is much worse than iconv;

string iconv (string in_charset, string out_charset, string str)
Note: In addition to specifying the encoding to be converted to, the second parameter can also add two suffixes: //TRANSLIT and //IGNORE, where //TRANSLIT will automatically convert characters that cannot be directly converted into one or more Approximate characters, //IGNORE will ignore characters that cannot be converted, and the default effect is to truncate from the first illegal character.
Returns the converted string or FALSE on failure.

Use:
It was found that iconv would make an error when converting the character "-" to gb2312. Without the ignore parameter, all strings following this character cannot be saved. No matter what, this "—" cannot be converted successfully and cannot be output. In addition, mb_convert_encoding does not have this bug.
Under normal circumstances, use iconv. Only use the mb_convert_encoding function when you cannot determine what the original encoding is, or when iconv cannot be displayed normally after conversion.

from_encoding is specified by character code name before conversion. it can be array or string – comma separated enumerated list. If it is not specified, the internal encoding will be used.
/* Auto detect encoding from JIS, eucjp-win, sjis-win, then convert str to UCS-2LE */
$str = mb_convert_encoding($str, “UCS-2LE”, “JIS, eucjp-win, sjis-win”);
/* “auto” is expanded to “ASCII,JIS,UTF-8,EUC-JP,SJIS” */
$str = mb_convert_encoding($str, “EUC-JP”, “auto”);

Example:

The code is as follows

代码如下	复制代码
$content = iconv("GBK", "UTF-8", $content); $content = mb_convert_encoding($content, "UTF-8", "GBK"); ?>

Copy code

$content = iconv("GBK", "UTF-8", $content);

代码如下	复制代码
function phpcharset($data, $to) { if(is_array($data)) { foreach($data as $key => $val) { $data[$key] = phpcharset($val, $to); } } else { $encode_array = array('ASCII', 'UTF-8', 'GBK', 'GB2312', 'BIG5'); $encoded = mb_detect_encoding($data, $encode_array); $to = strtoupper($to); if($encoded != $to) { $data = mb_convert_encoding($data, $to, $encoded); } } return $data; } ?>

$content = mb_convert_encoding($content, "UTF-8", "GBK");

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

What is the difference between unset() and session_destroy()?May 04, 2025 am 12:19 AM

Thedifferencebetweenunset()andsession_destroy()isthatunset()clearsspecificsessionvariableswhilekeepingthesessionactive,whereassession_destroy()terminatestheentiresession.1)Useunset()toremovespecificsessionvariableswithoutaffectingthesession'soveralls

What is sticky sessions (session affinity) in the context of load balancing?May 04, 2025 am 12:16 AM

Stickysessionsensureuserrequestsareroutedtothesameserverforsessiondataconsistency.1)SessionIdentificationassignsuserstoserversusingcookiesorURLmodifications.2)ConsistentRoutingdirectssubsequentrequeststothesameserver.3)LoadBalancingdistributesnewuser

What are the different session save handlers available in PHP?May 04, 2025 am 12:14 AM

PHPoffersvarioussessionsavehandlers:1)Files:Default,simplebutmaybottleneckonhigh-trafficsites.2)Memcached:High-performance,idealforspeed-criticalapplications.3)Redis:SimilartoMemcached,withaddedpersistence.4)Databases:Offerscontrol,usefulforintegrati

What is a session in PHP, and why are they used?May 04, 2025 am 12:12 AM

Session in PHP is a mechanism for saving user data on the server side to maintain state between multiple requests. Specifically, 1) the session is started by the session_start() function, and data is stored and read through the $_SESSION super global array; 2) the session data is stored in the server's temporary files by default, but can be optimized through database or memory storage; 3) the session can be used to realize user login status tracking and shopping cart management functions; 4) Pay attention to the secure transmission and performance optimization of the session to ensure the security and efficiency of the application.

Explain the lifecycle of a PHP session.May 04, 2025 am 12:04 AM

PHPsessionsstartwithsession_start(),whichgeneratesauniqueIDandcreatesaserverfile;theypersistacrossrequestsandcanbemanuallyendedwithsession_destroy().1)Sessionsbeginwhensession_start()iscalled,creatingauniqueIDandserverfile.2)Theycontinueasdataisloade

What is the difference between absolute and idle session timeouts?May 03, 2025 am 12:21 AM

Absolute session timeout starts at the time of session creation, while an idle session timeout starts at the time of user's no operation. Absolute session timeout is suitable for scenarios where strict control of the session life cycle is required, such as financial applications; idle session timeout is suitable for applications that want users to keep their session active for a long time, such as social media.

What steps would you take if sessions aren't working on your server?May 03, 2025 am 12:19 AM

The server session failure can be solved through the following steps: 1. Check the server configuration to ensure that the session is set correctly. 2. Verify client cookies, confirm that the browser supports it and send it correctly. 3. Check session storage services, such as Redis, to ensure that they are running normally. 4. Review the application code to ensure the correct session logic. Through these steps, conversation problems can be effectively diagnosed and repaired and user experience can be improved.

What is the significance of the session_start() function?May 03, 2025 am 12:18 AM

session_start()iscrucialinPHPformanagingusersessions.1)Itinitiatesanewsessionifnoneexists,2)resumesanexistingsession,and3)setsasessioncookieforcontinuityacrossrequests,enablingapplicationslikeuserauthenticationandpersonalizedcontent.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

3 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks agoByDDD

Roblox: Dead Rails - How To Tame Wolves

4 weeks agoByDDD

Strength Levels for Every Enemy & Monster in R.E.P.O.

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Roblox: Grow A Garden - Complete Mutation Guide

2 weeks agoByDDD

Hot Tools

SublimeText3 Chinese version

Chinese version, very easy to use

Notepad++7.3.1

Easy-to-use and free code editor

Zend Studio 13.0.1

Powerful PHP integrated development environment

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Hot Topics

1655

1414

1307

1254

1228