Home  >  Article  >  Backend Development  >  Briefly describe the method of intercepting Chinese characters in PHP to prevent garbled characters

Briefly describe the method of intercepting Chinese characters in PHP to prevent garbled characters

墨辰丷
墨辰丷Original
2018-06-11 11:55:561584browse

Directly using the PHP function substr to intercept Chinese characters may cause garbled characters. The main reason is that substr may "saw" a Chinese character in half. So let's see how to solve this problem.

I believe that everyone often uses interception of strings in their own programs, but often encounters the problem of garbled characters when intercepting Chinese strings. It is very troublesome. Next, we will introduce two methods to prevent garbled characters when intercepting Chinese strings.
First of all, a function written by yourself is convenient to use.
Use this function to intercept and there will be no garbled characters.

/** 
 * 支持中文字符串截取 
 */ 
function msubstr($str, $start=0, $length, $charset="utf-8", $suffix=true){ 
  switch($charset){ 
    case 'utf-8':$char_len=3;break; 
    case 'UTF8':$char_len=3;break; 
    default:$char_len=2; 
  } 
  //小于指定长度,直接返回 
  if(strlen($str)<=($length*$char_len)){   
    return $str; 
  } 
  if(function_exists("mb_substr")){  
    $slice= mb_substr($str, $start, $length, $charset); 
  }else if(function_exists(&#39;iconv_substr&#39;)){ 
    $slice=iconv_substr($str,$start,$length,$charset); 
  }else{ 
    $re[&#39;utf-8&#39;]  = "/[\x01-\x7f]|[\xc2-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf]{2}|[\xf0-\xff][\x80-\xbf]{3}/"; 
    $re[&#39;gb2312&#39;] = "/[\x01-\x7f]|[\xb0-\xf7][\xa0-\xfe]/"; 
    $re[&#39;gbk&#39;]  = "/[\x01-\x7f]|[\x81-\xfe][\x40-\xfe]/"; 
    $re[&#39;big5&#39;]  = "/[\x01-\x7f]|[\x81-\xfe]([\x40-\x7e]|\xa1-\xfe])/"; 
    preg_match_all($re[$charset], $str, $match); 
    $slice = join("",array_slice($match[0], $start, $length)); 
  } 
  if($suffix) 
    return $slice; 
  return $slice; 
}

The second is a built-in function in PHP mb_substr function

Specify the encoding format of the string to be intercepted, just It can effectively prevent garbled characters.

Description

string mb_substr ( string $str , int $start [, int $length [, string $encoding ]] ) 
<?php 
 function substr_unicode($str, $s, $l = null) { 
   return join("", array_slice( 
     preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY), $s, $l)); 
 } 
 
$str = "Büyük"; 
 $s = 0; // start from "0" (nth) char 
 $l = 3; // get "3" chars 
 echo substr($str, $s, $l) ."\n";  
 echo mb_substr($str, $s, $l) ."\n"; 
 echo substr_unicode($str, $s, $l); 
 ?>

Summary: The above is the entire content of this article, I hope it can be helpful to everyone learning helps.

Related recommendations:

Principle of PHP event mechanism

PHP operation session and database Method

PHP WeChat interface implements QR code generation class

##

The above is the detailed content of Briefly describe the method of intercepting Chinese characters in PHP to prevent garbled characters. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn