Home >Backend Development >PHP Problem >Detailed explanation of PHP's Chinese conversion function

Detailed explanation of PHP's Chinese conversion function

PHPz
PHPzOriginal
2023-04-21 09:12:58958browse

With the development of the Internet, more and more websites and applications have begun to involve cross-language issues. As a special language, Chinese is relatively difficult to encode and convert. In the PHP language, a wealth of Chinese conversion functions are provided. This article will introduce these functions in detail.

1. Chinese encoding

  1. urlencode() function

urlencode() function can encode Chinese characters and convert them into %XX form, where XX is the hexadecimal representation of the character in the character set. For example, the word "中文" will be converted to "中文" after using the urlencode() function.

Example:

$str = "中文";
echo urlencode($str);  // 输出 %E4%B8%AD%E6%96%87
  1. rawurlencode() function

rawurlencode() function has basically the same function as urlencode() function, the difference is rawurlencode( ) function does not encode spaces, but converts them to " " signs.

Example:

$str = "中文 test";
echo rawurlencode($str);  // 输出 %E4%B8%AD%E6%96%87+test
  1. urldecode() function

The urldecode() function can decode a string encoded using the urlencode() function. Convert the characters in the form of %XX into corresponding Chinese characters.

Example:

$str = "%E4%B8%AD%E6%96%87";
echo urldecode($str);  // 输出 中文
  1. rawurldecode() function

rawurldecode() function has the same function as urldecode() function, the difference is rawurldecode() function The " " sign will be converted into a space.

Example:

$str = "%E4%B8%AD%E6%96%87+test";
echo rawurldecode($str);  // 输出 中文 test

2. Chinese conversion

  1. iconv() function

iconv() function can complete the conversion between different encodings Conversion, including commonly used encoding formats such as utf-8, gbk, big5, etc. The syntax format is:

iconv($in_charset, $out_charset, $string);

where $in_charset represents the encoding format of the input string, $out_charset represents the encoding format of the output string, and $string represents the string to be converted.

For example, convert a utf-8 encoded string into a gbk encoded string:

$str = "中文";
$str = iconv("utf-8", "gbk", $str);
echo $str;  // 输出乱码,应该在gbk编码的环境下查看

Note: garbled characters may appear after the iconv() function is converted. This is mainly due to the The correspondence between characters in the two encodings may not exist and therefore cannot be converted correctly. A solution to this problem can be using the Unicode conversion method.

  1. mb_convert_encoding() function

mb_convert_encoding() function can also complete the conversion between different encodings. The difference from the iconv() function is that its use is more flexible and can Specify more conversion options. The syntax format is:

mb_convert_encoding($string, $to_encoding, $from_encoding);

where $string represents the string to be converted, $to_encoding represents the converted encoding format, and $from_encoding represents the encoding format of the original string.

For example, convert a utf-8 encoded string to a gbk encoded string:

$str = "中文";
$str = mb_convert_encoding($str, "gbk", "utf-8");
echo $str;  // 输出乱码,应该在gbk编码的环境下查看
  1. utf8_encode() function and utf8_decode() function
## The #utf8_encode() function can convert an ISO-8859-1 encoded string into a utf-8 encoded string, and the utf8_decode() function can convert a utf-8 encoded string into an ISO-8859-1 encoded string. String.

For example, convert an ISO-8859-1 encoded string to a utf-8 encoded string:

$str = "中文";
$str = utf8_encode($str);
echo $str;  // 输出中文
Note: garbled characters may appear after the utf8_encode() function is converted, so you should be cautious use.

    chr() function and ord() function
The chr() function can convert a given ASCII code value into the corresponding character, and the ord() function Then you can convert the given character into the corresponding ASCII code value. In particular, in UTF-8 encoding, each character can consist of 1 to 4 bytes. For the UTF-8 encoding of a certain character, you can get its decimal value through the ord() function, and then use the chr() function to convert it into a character.

For example, convert the character "中" to its UTF-8 encoding:

$ord1 = ord("中");  // 取得字符"中"的UTF-8编码的第一个字节的值
$ord2 = ord(substr("中", 1));  // 取得字符"中"的UTF-8编码的第二个字节的值

$str = chr(0xe4) . chr(0xb8) . chr(0xad);  // 使用chr()函数转换为UTF-8编码的字符串
echo $str;  // 输出 "中"
Note: When using the chr() function and ord() function, you must carefully consider the encoding of different character sets difference.

3. Chinese length judgment

    strlen() function
strlen() function is used to calculate the length of a string, including Chinese and English character. However, since Chinese characters occupy different numbers of bytes in different encodings, the number of Chinese characters cannot be accurately counted when calculating the length of the Chinese string. For example, use the strlen() function to calculate the length of "Chinese", and the result is 6.

Example:

$str = "中文";
echo strlen($str);  // 输出 6
    mb_strlen() function
mb_strlen() function can accurately calculate the length of Chinese strings. Strings with different encodings can Can be calculated.

Example:

$str = "中文";
echo mb_strlen($str);  // 输出 2
Note: When using the mb_strlen() function, you must specify the correct Chinese character set. If you don't know the character set, you can use the mb_detect_encoding() function to detect it.

The above is the detailed content of Detailed explanation of PHP's Chinese conversion function. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn