Home >Backend Development >PHP Tutorial >PHP automatically recognizes character set encoding and completes transcoding_PHP tutorial

PHP automatically recognizes character set encoding and completes transcoding_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 10:48:59908browse

The principle is very simple, because gb2312/gbk is Chinese two bytes, these two bytes have a value range, while Chinese characters in utf-8 are three bytes, and each byte also has a value range. Regardless of the encoding situation, English is less than 128 and only takes up one byte (except full-width)

When PHP processes pages, we use functions such as iconv or mb_convert to convert character sets. However, this actually has a premise. That is, we must know in advance what encoding in and out are so that we can perform the correct conversion.
The following function can automatically determine the encoding of the source string and convert it without knowing its encoding. Although it only supports UTF8 encoding and GB2312 encoding, it is enough for most domestic websites.

The code is as follows Copy code
 代码如下 复制代码


function safeEncoding($string,$outEncoding = 'UTF-8')
{
    $encoding = "UTF-8";
    for($i=0;$i<128)
            continue;

        if((ord($string{$i})&224)==224)
        {
            //第一个字节判断通过
            $char = $string{++$i};
            if((ord($char)&128)==128)
            {
                //第二个字节判断通过
                $char = $string{++$i};
                if((ord($char)&128)==128)
                {
                    $encoding = "UTF-8";
                    break;
                }
            }
        }
        if((ord($string{$i})&192)==192)
        {
            //第一个字节判断通过
            $char = $string{++$i};
            if((ord($char)&128)==128)
            {
                //第二个字节判断通过
                $encoding = "GB2312";
                break;
            }
        }
    }

    if(strtoupper($encoding) == strtoupper($outEncoding))
        return $string;
    else
        return iconv($encoding,$outEncoding,$string);
}

function safeEncoding($string,$outEncoding = 'UTF-8') { $encoding = "UTF-8"; for($i=0;$i<128)                  continue; if((ord($string{$i})&224)==224)            { //The first byte passed                $char = $string{++$i}; If((ord($char)&128)==128)                  { //The second byte passed $char = $string{++$i}; If((ord($char)&128)==128)                                                  {                          $encoding = "UTF-8"; break;                 }             } } If((ord($string{$i})&192)==192)            { //The first byte passed                $char = $string{++$i}; If((ord($char)&128)==128)                  { //The second byte passed                   $encoding = "GB2312";                  break;             } } } if(strtoupper($encoding) == strtoupper($outEncoding))           return $string; else          return iconv($encoding,$outEncoding,$string); }

Example 2

The code is as follows Copy code


//Identify Chinese character encoding, because YBlog uses utf-8, if the citation notification is sent with gb2312 encoding, it needs to be able to identify and complete the encoding conversion
Function safeEncoding($string,$outEncoding = 'UTF-8')

         $encoding = "UTF-8";
for($i=0;$i                                                                                    If(ord($string{$i})<128)
Continue;
     
If((ord($string{$i})&224)==224)
                                                                     //The first byte passed
$char = $string{++$i};
If((ord($char)&128)==128)
                                                                              //The second byte passed
$char = $string{++$i};
If((ord($char)&128)==128)
                                                                                              $encoding = "UTF-8";
                               break;                                                                                                                                                                                                                                                                                                                                                                                                      If((ord($string{$i})&192)==192)
                                                                     //The first byte passed
$char = $string{++$i};
If((ord($char)&128)==128)
                                                                            //The second byte passed
                         $encoding = "GB2312";
break;
                                                                                                                                                                                                                                                                                                                                                                                                                                                           If(strtoupper($encoding) == strtoupper($outEncoding))
                     return $string;                                            else
                return iconv($encoding,$outEncoding,$string);
}

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/632750.htmlTechArticleThe principle is very simple, because gb2312/gbk is Chinese two bytes, and these two bytes have a value range , and Chinese characters in UTF-8 are three bytes, and each byte also has a value range. And English no matter where...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn