Home > Article > Backend Development > php iconv() : Detected an illegal character in input string_PHP教程
It was used like this at first
$str = iconv('UTF-8', 'GB2312', unescape(isset($_GET['str'])? $_GET['str']:''));
After going online, a bunch of errors like this were reported: iconv(): Detected an illegal character in input string
Considering that the GB2312 character set is relatively small, let’s change it to a larger one, so we changed it to GBK:
$ str = iconv('UTF-8', 'GBK', unescape(isset($_GET['str'])? $_GET['str']:''));
The same error is still reported after going online !
Read the manual carefully and find this paragraph:
If you append the string //TRANSLIT to out_charset transliteration is activated. This means that when a character can't be represented in the target charset, it can be approximated through one or several similarly looking characters. If you append the string //IGNORE, characters that cannot be represented in the target charset are silently discarded. Otherwise, str is cut from the first illegal character.
So change to :
$str = iconv('UTF-8', 'GBK//IGNORE', unescape(isset($_GET['str'])? $_GET['str']:''));
Local test //IGNORE can ignore the words it doesn't recognize and continue to scroll down without reporting an error, while //TRANSLIT can truncate the words it doesn't recognize and the content behind it, and report an error. //IGNORE is what I need.
Now wait to go online to see the results (this is not a good idea, continue to ponder the manual and search online), haha. . .
I found the following article on the Internet and found that mb_convert_encoding can also be used, but the efficiency is worse than iconv.
The difference between convert string encoding iconv and mb_convert_encoding
iconv — Convert string to requested character encoding(PHP 4 >= 4.0.5, PHP 5 )
mb_convert_encoding — Convert character encoding (PHP 4 >= 4.0.6, PHP 5)
Usage:
string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] )
You need to enable the mbstring extension library first. Remove the ; in front of extension=php_mbstring.dll in php.ini.
string iconv (string in_charset, string out_charset, string str)
Note:
Two parameters, in addition to specifying the encoding to be converted to, can also add two suffixes: //TRANSLIT and //IGNORE,
among which:
//TRANSLIT will automatically convert characters that cannot be directly converted into One or more approximate characters,
//IGNORE will ignore characters that cannot be converted, and the default effect is to truncate from the first illegal character.
Returns the converted string or FALSE on failure.
Use:
1. It is found that iconv will make an error when converting the character "-" to gb2312. If there is no ignore parameter, all characters after this character None of the strings can be saved. No matter what, this "-" cannot be converted successfully and cannot be output. In addition, mb_convert_encoding does not have this bug.
2. mb_convert_encoding can specify multiple input encodings, which will be automatically identified based on the content, but the execution efficiency is much worse than iconv; for example: $str = mb_convert_encoding($str,"euc-jp" ,"ASCII,JIS,EUC-JP,SJIS,UTF-8");The effect will be different depending on the order of "ASCII,JIS,EUC-JP,SJIS,UTF-8"
3. Generally use iconv, Use the mb_convert_encoding function only when you are unable to determine what the original encoding is, or iconv cannot be displayed normally after conversion.
from_encoding is specified by character code name before conversion. it can be array or string - comma separated enumerated list. If it is not specified, the internal encoding will be used.
$str = mb_convert_encoding($str, "UCS-2LE", "JIS, eucjp-win, sjis-win");
$str = mb_convert_encoding($str, "EUC-JP', "auto");
Example:
$content = iconv("GBK", "UTF-8", $content) ;
$content = mb_convert_encoding($content, "UTF-8", "GBK");