Home  >  Article  >  Backend Development  >  How to intercept Chinese text strings in PHP without garbled characters_PHP Tutorial

How to intercept Chinese text strings in PHP without garbled characters_PHP Tutorial

WBOY
WBOYOriginal
2016-07-13 10:48:541051browse

I found a lot of information on how to intercept Chinese in PHP on the Internet. The most common one is to use the mb_substr function to intercept. This function requires an extension support in php.ini, but I don’t have the permission to modify it and I have to find another way.

substr interception

The substr() function returns a part of a string.

 代码如下 复制代码
$rest = substr("我是中国人", -1); // returns "乱码"
echo $rest.'
';
 $rest = substr("abcdef", -2);    // returns "ef"
 echo $rest.'
';
 $rest = substr("abcdef", -3, 1); // returns "d"
 echo $rest.'
';
?>

Then Baidu Goole said that it can be intercepted by mb_substr

Later a friend told me that most servers do not open php_mbstring.dll. I need to open php_mbstring.dll in php.ini
The code is as follows
 代码如下 复制代码


echo mb_substr('我们都是好孩子hehe',0,9);
?>

Copy code


echo mb_substr('We are all good kids hehe',0,9);
?>

Operating mechanism tips

 代码如下 复制代码

function msubstr($str, $start=0, $length, $charset="utf-8", $suffix=true)
    {
        if(function_exists("mb_substr"))
            return mb_substr($str, $start, $length, $charset);
        elseif(function_exists('iconv_substr')) {
            return iconv_substr($str,$start,$length,$charset);
        }
        $re['utf-8']   = "/[x01-x7f]|[xc2-xdf][x80-xbf]|[xe0-xef][x80-xbf]{2}|[xf0-xff][x80-xbf]{3}/";
        $re['gb2312'] = "/[x01-x7f]|[xb0-xf7][xa0-xfe]/";
        $re['gbk']    = "/[x01-x7f]|[x81-xfe][x40-xfe]/";
        $re['big5']   = "/[x01-x7f]|[x81-xfe]([x40-x7e]|xa1-xfe])/";
        preg_match_all($re[$charset], $str, $match);
        $slice = join("",array_slice($match[0], $start, $length));
        if($suffix) return $slice."…";
        return $slice;
    }

Fatal error: Call to undefined function mb_substr()...

. I found that the idc provider could not open it, so I had to find another way
The code is as follows

Copy code
function msubstr($str, $start=0, $length, $charset="utf-8", $suffix=true)
{
          if(function_exists("mb_substr"))
                return mb_substr($str, $start, $length, $charset);
         elseif(function_exists('iconv_substr')) {
               return iconv_substr($str,$start,$length,$charset);
         }
$re['utf-8'] = "/[x01-x7f]|[xc2-xdf][x80-xbf]|[xe0-xef][x80-xbf]{2}|[xf0-xff][x80 -xbf]{3}/";
          $re['gb2312'] = "/[x01-x7f]|[xb0-xf7][xa0-xfe]/";
         $re['gbk']       = "/[x01-x7f]|[x81-xfe][x40-xfe]/";
          $re['big5']     = "/[x01-x7f]|[x81-xfe]([x40-x7e]|xa1-xfe])/";
Preg_match_all($re[$charset], $str, $match);
          $slice = join("",array_slice($match[0], $start, $length));
If($suffix) return $slice."…";
          return $slice;
} This just solves all the problems. It seems to be intercepted according to character encoding http://www.bkjia.com/PHPjc/632743.html
www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/632743.htmlTechArticleI found a lot of information on how to intercept Chinese in PHP on the Internet, and the most common one talks about using the mb_substr function to intercept. This function requires an extension support in php.ini, but I don’t have the permission to modify it...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn