Home > Article > Backend Development > php iconv() encoding conversion error Detected an illegal character_PHP tutorial
Number prototype: string iconv ( string $in_charset , string $out_charset , string $str )
Especially the second parameter description:
the output charset.
When using iconv() to convert a character that is not supported by the output character encoding, such as iconv('utf-8', 'gb2312', 'www.bKjia.c0m'), you will encounter this error message:
notice: iconv() [function.iconv]: detected an illegal character in input string ...
Because gb2312 represents Simplified Chinese and does not support more complex Chinese characters like "www.bKjia.c0m" and some special characters, of course an error will be reported. There are two solutions:
1. Expand the range of output character encoding, such as iconv('utf-8', 'gbk', 'www.bKjia.c0m'), which can be output correctly because gbk supports a wider range of characters;
2. Add "//ignore" after the output character encoding string, such as iconv('utf-8', 'gb2312//ignore', 'www.bKjia.c0m'). This is actually Characters that cannot be converted are ignored, avoiding errors but not being able to output correctly (i.e. blanks are not output).
Let’s take a look at the php tutorial iconv() : detected an illegal character in input string processing method
$str = iconv('utf-8', 'gbk//ignore', unescape(isset($_get['str'])? $_get['str']:''));
The local test //ignore can ignore the words it does not recognize and continue to scroll down without reporting an error, while //translit can intercept the words it does not recognize and the content after it, and report an error. //ignore is what I need.
Now wait to go online to see the results (this is not a good idea, continue to ponder the manual and search online), haha. . .
I found the following article on the Internet and found that mb_convert_encoding can also be used, but the efficiency is worse than iconv.
The difference between converting string encoding iconv and mb_convert_encoding
iconv — convert string to requested character encoding(php 4 >= 4.0.5, php 5)
mb_convert_encoding — convert character encoding(php 4 >= 4.0.6, php 5)
Usage:
string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] )
You need to enable the mbstring extension library first, and remove the ; in front of extension=php_mbstring.dll in php.ini
string iconv (string in_charset, string out_charset, string str)
Note:
The second parameter, in addition to specifying the encoding to be converted to, can also add two suffixes: //translit and //ignore,
Among them:
//translit will automatically convert characters that cannot be directly converted into one or more approximate characters,
//ignore will ignore characters that cannot be converted, and the default effect is to truncate from the first illegal character.
returns the converted string or false on failure.
Use:
1. It is found that iconv will make an error when converting the character "-" to gb2312. Without the ignore parameter, all strings following this character cannot be saved. No matter what, this "-" cannot be converted successfully and cannot be output. In addition, mb_convert_encoding does not have this bug.
2. mb_convert_encoding can specify multiple input encodings. It will automatically identify according to the content, but the execution efficiency is much worse than iconv; for example: $str = mb_convert_encoding($str,"euc-jp","ascii,jis,euc-jp ,sjis,utf-8"); The effect of "ascii,jis,euc-jp,sjis,utf-8" is different depending on the order
3. Under normal circumstances, use iconv. Only use the mb_convert_encoding function
from_encoding is specified by character code name before conversion. it can be array or string - comma separated enumerated list. if it is not specified, the internal encoding will be used.
$str = mb_convert_encoding($str, "ucs-2le", "jis, eucjp-win, sjis-win");
$str = mb_convert_encoding($str, "euc-jp', "auto");