Home  >  Article  >  Backend Development  >  Summary of methods for intercepting Chinese characters and preventing garbled characters in PHP

Summary of methods for intercepting Chinese characters and preventing garbled characters in PHP

PHP中文网
PHP中文网Original
2016-07-13 09:55:17845browse

Directly using the PHP function substr to intercept Chinese characters may cause garbled characters. The main reason is that substr may forcibly "saw" a Chinese character in half. So let's see how to solve this problem.

I believe that everyone often uses interception of strings in their own programs, but often encounters the problem of garbled characters when intercepting Chinese strings. It is very troublesome. Next, we will introduce two methods to prevent garbled characters when intercepting Chinese strings.
First of all, a function written by yourself is convenient to use.
Use this function to intercept and there will be no garbled characters.

/** 
 * 支持中文字符串截取 
 */ 
function msubstr($str, $start=0, $length, $charset="utf-8", $suffix=true){ 
  switch($charset){ 
    case 'utf-8':$char_len=3;break; 
    case 'UTF8':$char_len=3;break; 
    default:$char_len=2; 
  } 
  //小于指定长度,直接返回 
  if(strlen($str)<=($length*$char_len)){   
    return $str; 
  } 
  if(function_exists("mb_substr")){  
    $slice= mb_substr($str, $start, $length, $charset); 
  }else if(function_exists(&#39;iconv_substr&#39;)){ 
    $slice=iconv_substr($str,$start,$length,$charset); 
  }else{ 
    $re[&#39;utf-8&#39;]  = "/[\x01-\x7f]|[\xc2-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf]{2}|[\xf0-\xff][\x80-\xbf]{3}/"; 
    $re[&#39;gb2312&#39;] = "/[\x01-\x7f]|[\xb0-\xf7][\xa0-\xfe]/"; 
    $re[&#39;gbk&#39;]  = "/[\x01-\x7f]|[\x81-\xfe][\x40-\xfe]/"; 
    $re[&#39;big5&#39;]  = "/[\x01-\x7f]|[\x81-\xfe]([\x40-\x7e]|\xa1-\xfe])/"; 
    preg_match_all($re[$charset], $str, $match); 
    $slice = join("",array_slice($match[0], $start, $length)); 
  } 
  if($suffix) 
    return $slice; 
  return $slice; 
}

The second is a built-in function in PHP, the mb_substr function

Specifies the encoding format of the string to be intercepted , can effectively prevent garbled characters.

Description

string mb_substr ( string $str , int $start [, int $length [, string $encoding ]] ) 
<?php 
 function substr_unicode($str, $s, $l = null) { 
   return join("", array_slice( 
     preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY), $s, $l)); 
 } 
 
$str = "Büyük"; 
 $s = 0; // start from "0" (nth) char 
 $l = 3; // get "3" chars 
 echo substr($str, $s, $l) ."\n";  
 echo mb_substr($str, $s, $l) ."\n"; 
 echo substr_unicode($str, $s, $l); 
 ?>

The above is the entire content of this article, I hope you all like it.

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn