Home  >  Article  >  Backend Development  >  What to do if php string lengths are inconsistent

What to do if php string lengths are inconsistent

藏色散人
藏色散人Original
2023-02-07 09:58:293868browse

Solution to the inconsistent string length in php: 1. Check the encoding method of the string through the mb_detect_encoding() function; 2. Check the specific character length through the mb_strlen function; 3. Use the regular expression "preg_match_all('/ [\x{4e00}-\x{9fff}] /u', $str1, $matches);" Just remove non-Chinese characters.

What to do if php string lengths are inconsistent

The operating environment of this tutorial: Windows 10 system, PHP version 8.1, DELL G3 computer

php What to do if the string lengths are inconsistent ?

The problem with the same strings in php but different lengths

Question:

What to do if php string lengths are inconsistent

As shown in the picture, at first glance there are two identical Chinese character strings "Logistics Support Department", but One has a length of 21 and one has a length of 15.

First of all, you may intuitively think that it is caused by different encoding methods.
Use the mb_detect_encoding() function to check the encoding methods of the two strings. The code is as follows

<?php
header("Content-Type: text/html;charset=utf-8"); 

$data[0]=$str1="后勤保障部‍";
$data[1]=$str2="后勤保障部";
var_dump($data);

//查看编码方式
$encode1 = mb_detect_encoding($str1,  array("ASCII","UTF-8","GB2312","GBK","BIG5"));
$encode2 = mb_detect_encoding($str2,  array("ASCII","UTF-8","GB2312","GBK","BIG5"));
echo "str1=&#39;".$str1."&#39;"."&emsp;编码:".$encode1."</br>";
echo "str2=&#39;".$str2."&#39;"."&emsp;编码:".$encode2."</br>";
?>

But the output results are all UTF-8

What to do if php string lengths are inconsistent

So what is the reason? Let’s check the specific character length in the output

<?php
header("Content-Type: text/html;charset=utf-8"); 

$data[0]=$str1="后勤保障部‍";
$data[1]=$str2="后勤保障部";
var_dump($data);

//查看编码方式
$encode1 = mb_detect_encoding($str1,  array("ASCII","UTF-8","GB2312","GBK","BIG5"));
$encode2 = mb_detect_encoding($str2,  array("ASCII","UTF-8","GB2312","GBK","BIG5"));

//当mb_strlen的内码选择为UTF-8的时候,则会将中文字符当成一个字符
//strlen,得到的是字符串所占的字节数
echo "str1=&#39;".$str1."&#39;".":&emsp;字符长度:".mb_strlen($str1).":&emsp;字节长度:".strlen($str1)."&emsp;编码:".$encode1."</br>";
echo "str2=&#39;".$str2."&#39;".":&emsp;字符长度:".mb_strlen($str2).":&emsp;字节长度:".strlen($str2)."&emsp;编码:".$encode2."</br>";
?>

The output results are as follows:

What to do if php string lengths are inconsistent

It was found that the string str1 has 7 Chinese characters, but only 5 are actually displayed, which is the "Logistics Support Department"

By intercepting the last two characters of str1 Character view

//截取str1后面两个未显示字符
$res=mb_substr($str1, 5,2);
echo "最后两字符:".$res."</br>";
echo mb_strlen($res);

cannot be echo displayed, but it does occupy two characters

If the strings that look the same are actually required to be equal, processing needs to be performed. The processing is to eliminate non-Chinese characters:

//剔除str1字串中未显示的字符(非中文字符)
preg_match_all(&#39;/[\x{4e00}-\x{9fff}]+/u&#39;, $str1, $matches);
$str1 = join(&#39;&#39;, $matches[0]);

The final code is as follows

<?php
header("Content-Type: text/html;charset=utf-8"); 

$data[0]=$str1="后勤保障部‍";
$data[1]=$str2="后勤保障部";
var_dump($data);

//查看编码方式
$encode1 = mb_detect_encoding($str1,  array("ASCII","UTF-8","GB2312","GBK","BIG5"));
$encode2 = mb_detect_encoding($str2,  array("ASCII","UTF-8","GB2312","GBK","BIG5"));

//当mb_strlen的内码选择为UTF-8的时候,则会将中文字符当成一个字符
//strlen,得到的是字符串所占的字节数
echo "str1=&#39;".$str1."&#39;".":&emsp;字符长度:".mb_strlen($str1).":&emsp;字节长度:".strlen($str1)."&emsp;编码:".$encode1."</br>";
echo "str2=&#39;".$str2."&#39;".":&emsp;字符长度:".mb_strlen($str2).":&emsp;字节长度:".strlen($str2)."&emsp;编码:".$encode2."</br>";

//截取str1后面两个未显示字符
echo "</br>------------------截取str1后面两个未显示字符---------------------</br>";
$res=mb_substr($str1, 5,2);
echo "str1最后两字符:&emsp;".$res."</br>";
echo "str1长度:&emsp;".mb_strlen($res)."</br>";

//比较
echo "</br>--------------------------相等比较----------------------------------</br>";
echo "str1 与 str2比较:&emsp;";
echo strcomp($str1,$str2)."</br>";
echo "str2 与 str2比较:&emsp;";
echo strcomp($str2,$str2)."</br>";


//剔除str1字串中非中文
preg_match_all(&#39;/[\x{4e00}-\x{9fff}]+/u&#39;, $str1, $matches);
$str1 = join(&#39;&#39;, $matches[0]);

echo "</br>---------------------剔除str1字串中非中文后----------------------</br>";
echo "str1=&#39;".$str1."&#39;".":&emsp;字符长度:".mb_strlen($str1).":&emsp;字节长度:".strlen($str1)."&emsp;编码:".$encode1."</br>";
echo "str1 与 str2比较:&emsp;";
echo strcomp($str1,$str2)."</br>";

function strcomp($str1,$str2){ 
  if($str1 == $str2){ 
    return "相等"; 
  }else{ 
    return "不等"; 
  } 
} 

?>

Running results
What to do if php string lengths are inconsistent


##Note: Copy the 21-byte str1 to The sql input box of phpmyadmin displays as follows

What to do if php string lengths are inconsistent

Well, it’s the two extra characters

Recommended study: "

PHP Video Tutorial

The above is the detailed content of What to do if php string lengths are inconsistent. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn