Home  >  Article  >  Backend Development  >  php iconv() : Detected an illegal character in input string_PHP教程

php iconv() : Detected an illegal character in input string_PHP教程

WBOY
WBOYOriginal
2016-07-21 15:33:241130browse

It was used like this at first
$str = iconv('UTF-8', 'GB2312', unescape(isset($_GET['str'])? $_GET['str']:''));
After going online, a bunch of errors like this were reported: iconv(): Detected an illegal character in input string

Considering that the GB2312 character set is relatively small, let’s change it to a larger one, so we changed it to GBK:
$ str = iconv('UTF-8', 'GBK', unescape(isset($_GET['str'])? $_GET['str']:''));
The same error is still reported after going online !

Read the manual carefully and find this paragraph:
If you append the string //TRANSLIT to out_charset transliteration is activated. This means that when a character can't be represented in the target charset, it can be approximated through one or several similarly looking characters. If you append the string //IGNORE, characters that cannot be represented in the target charset are silently discarded. Otherwise, str is cut from the first illegal character.
So change to :
$str = iconv('UTF-8', 'GBK//IGNORE', unescape(isset($_GET['str'])? $_GET['str']:''));
Local test //IGNORE can ignore the words it doesn't recognize and continue to scroll down without reporting an error, while //TRANSLIT can truncate the words it doesn't recognize and the content behind it, and report an error. //IGNORE is what I need.
Now wait to go online to see the results (this is not a good idea, continue to ponder the manual and search online), haha. . .

I found the following article on the Internet and found that mb_convert_encoding can also be used, but the efficiency is worse than iconv.


The difference between convert string encoding iconv and mb_convert_encoding

iconv — Convert string to requested character encoding(PHP 4 >= 4.0.5, PHP 5 )
mb_convert_encoding — Convert character encoding (PHP 4 >= 4.0.6, PHP 5)

Usage:
string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] )
You need to enable the mbstring extension library first. Remove the ; in front of extension=php_mbstring.dll in php.ini.

string iconv (string in_charset, string out_charset, string str)
Note:
Two parameters, in addition to specifying the encoding to be converted to, can also add two suffixes: //TRANSLIT and //IGNORE,
among which:
//TRANSLIT will automatically convert characters that cannot be directly converted into One or more approximate characters,
//IGNORE will ignore characters that cannot be converted, and the default effect is to truncate from the first illegal character.
Returns the converted string or FALSE on failure.

Use:
1. It is found that iconv will make an error when converting the character "-" to gb2312. If there is no ignore parameter, all characters after this character None of the strings can be saved. No matter what, this "-" cannot be converted successfully and cannot be output. In addition, mb_convert_encoding does not have this bug.
2. mb_convert_encoding can specify multiple input encodings, which will be automatically identified based on the content, but the execution efficiency is much worse than iconv; for example: $str = mb_convert_encoding($str,"euc-jp" ,"ASCII,JIS,EUC-JP,SJIS,UTF-8");The effect will be different depending on the order of "ASCII,JIS,EUC-JP,SJIS,UTF-8"
3. Generally use iconv, Use the mb_convert_encoding function only when you are unable to determine what the original encoding is, or iconv cannot be displayed normally after conversion.

from_encoding is specified by character code name before conversion. it can be array or string - comma separated enumerated list. If it is not specified, the internal encoding will be used.

$str = mb_convert_encoding($str, "UCS-2LE", "JIS, eucjp-win, sjis-win");
$str = mb_convert_encoding($str, "EUC-JP', "auto");

Example:
$content = iconv("GBK", "UTF-8", $content) ;
$content = mb_convert_encoding($content, "UTF-8", "GBK");

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/322593.htmlTechArticleThis is how it was used at first $str = iconv('UTF-8', 'GB2312', unescape(isset( $_GET['str'])? $_GET['str']:'')); After going online, a bunch of errors like this were reported: iconv() : Detected an illegal character in in...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn