Home > Article > Web Front-end > Summary of UTF-8 encoding problems encountered in web development
There are mainly five aspects:
1. The problem of converting HTML pages to UTF-8 encoding
2. The problem of converting PHP pages to UTF-8 encoding
3. The problem of using UTF-8 encoding for MYSQL databases
4. JS Related UTF-8 encoding issues
5. FLASH related UTF-8 encoding issues
1. HTML page conversion to UTF-8 encoding issues
1. Add a line after 93f0f5c25f18dab9d176bd4f6de5d30e and before b2386ffb911b14667cb8f0f91ea547a7:
346cd7ec2de86af9e9cd9afa489b447c
The order cannot be wrong, and must be added before the b2386ffb911b14667cb8f0f91ea547a7 tag, otherwise if there are Chinese characters between b2386ffb911b14667cb8f0f91ea547a7 and 6e916e0f7d1e588d4f442bf645aedb2f, the displayed title may be garbled!
2.html file encoding problem:
Click the menu of the editor: "File"->"Save As", you can see the encoding of the current file, make sure the file encoding is: UTF-8, if it is ANSI, it is required Change the encoding to: UTF-8.
3. HTML file header BOM problem:
When converting files from other encodings to UTF-8 encoding, sometimes a BOM tag is added at the beginning of the file. The BOM tag may cause the browser to Garbled characters appear when displaying Chinese characters.
How to delete this BOM tag:
1. You can open the file with Dreamweaver and resave it to remove the BOM tag!
2. You can open the file with EditPlus, and in the menu "Preferences"->"File"->"UTF-8 Identity", set it to: "Always delete signatures", then save the file, and you can remove it. BOM tag!
4. WEB server UTF-8 encoding problem:
If you follow the steps listed above and still have Chinese garbled problems, please check the encoding problem of the WEB server you are using
If you are using Apache, please set the charset in the configuration file to: utf-8 (only the methods are listed here, please refer to the apache configuration file for the specific format).
If you are using Nginx, please set: charset in nginx.conf to utf-8, specifically find "charset " gb2312;" or similar statement, change it to: "charset utf-8;".
2. PHP page conversion to UTF-8 encoding problem
1. Add a line at the beginning of the code:
header("Content-Type: text/html;charset=utf-8");
2.PHP file encoding problem
Click the menu of the editor: "File"->"Save As", you can see Go to the encoding of the current file and make sure the file encoding is: UTF-8. If it is ANSI, you need to change the encoding to: UTF-8.
3. PHP file header BOM problem:
PHP files must not have BOM tags. Otherwise, there will be a situation where the session cannot be used, and there will be a similar prompt:
Warning: session_start() [function.session-start]: Cannot send session cache limiter - headers already sent
This is because, while executing session_start() When , the entire page cannot be output, but because the BOM tag exists in the previous PHP page, PHP treats the BOM tag as output, so an error occurs!
So the BOM tag must be deleted on the PHP page
How to delete this BOM tag:
1. You can open the file with Dreamweaver and resave it to remove the BOM tag!
2. You can open the file with EditPlus, and in the menu "Preferences"->"File"->"UTF-8 Identity", set it to: "Always delete signatures", then save the file, and you can remove it. BOM tag!
4. UTF-8 encoding problem when PHP saves a file as an attachment:
When PHP saves a file as an attachment, the file name must be GB2312 encoded. Otherwise, if there is Chinese in the file name, it will be garbled:
If your PHP itself is a file in UTF-8 encoding format, you need to convert the file name variable from UTF-8 to GB2312:
iconv("UTF-8", "GB2312", "$filename");
5. When truncating and displaying the article title, garbled characters or "?" question marks appear:
Generally, when the article title is very long, part of the title will be displayed, and the article title will be truncated, because a UTF-8 encoded Chinese The characters will occupy 3 characters in width. When intercepting the title, sometimes only 1 character or 2 characters in width of a Chinese character will be intercepted. If the interception is not complete, garbled characters or "?" question marks will appear. Use the following function to intercept Title, there will be no problem:
function get_brief_str($str, $max_length) { echo strlen($str) ."<br>"; if(strlen($str) > $max_length) { $check_num = 0; for($i=0; $i < $max_length; $i++) { if (ord($str[$i]) > 128) $check_num++; } if($check_num % 3 == 0) $str = substr($str, 0, $max_length)."..."; else if($check_num % 3 == 1) $str = substr($str, 0, $max_length + 2)."..."; else if($check_num % 3 == 2) $str = substr($str, 0, $max_length + 1)."..."; } return $str; }
3. Problems with using UTF-8 encoding for MYSQL database
1. Use phpmyadmin to create databases and data tables
When creating a database, please set "Organization" to: "utf8_general_ci ” or execute the statement:
CREATE DATABASE `dbname` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
When creating a data table: If the field stores Chinese, you need to set “Organization” to: “utf8_general_ci”,
If the field stores English or numbers, the default is fine .
The corresponding SQL statement, for example:
CREATE TABLE `test` ( `id` INT NOT NULL , `name` VARCHAR( 10 ) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL , PRIMARY KEY ( `id` ) ) ENGINE = MYISAM ;
2. Use PHP to read and write the database
After connecting to the database:
$connection = mysql_connect($host_name, $host_user, $host_pass);
Add two lines:
mysql_query("set character set 'utf8'");//读库 mysql_query("set names 'utf8'");//写库
Then you can read and write the MYSQL database normally.
4. UTF-8 encoding issues related to JS
1. Chinese garbled problem when JS reads cookies
PHP needs to escape encode Chinese characters when writing cookies, otherwise JS reads Chinese characters in cookies will be gibberish.
但php本身没有escape函数,我们新写一个escape函数:
function escape($str) { preg_match_all("/[\x80-\xff].|[\x01-\x7f]+/",$str,$r); $ar = $r[0]; foreach($ar as $k=>$v) { if(ord($v[0]) < 128) $ar[$k] = rawurlencode($v); else $ar[$k] = "%u".bin2hex(iconv("UTF-8","UCS-2",$v)); } return join("",$ar); }
JS读cookie的时候,用unescape解码,然后就解决cookie中有中文乱码的问题了。
2.外部JS文件UTF-8编码问题
当一个HTML页面或则PHP页面包含一个外部的JS文件时,如果HTML页面或则PHP页面是UTF-8编码格式的文件,外部的JS文件同样要转成UTF-8的文件,否则将出现,没有包含不成功,调用函数时没有反应的情况。
点击编辑器的菜单:“文件”->“另存为”,可以看到当前文件的编码,确保文件编码为:UTF-8,如果是ANSI,需要将编码改成:UTF-8。
五.FLASH相关的UTF-8编码问题
FLASH内部对所有字符串,默认都是以UTF-8处理
1.FLASH读文普通本文件(txt,html)
要将文本文件的编码存为UTF-8
点击编辑器的菜单:“文件”->“另存为”,可以看到当前文件的编码,确保文件编码为:UTF-8,如果是ANSI,需要将编码改成:UTF-8。
2.FLASH读XML文件
要将XML文件的编码存为UTF-8
点击编辑器的菜单:“文件”->“另存为”,可以看到当前文件的编码,确保文件编码为:UTF-8,如果是ANSI,需要将编码改成:UTF-8。
在XML第1行写:
<?xml version="1.0" encoding="utf-8"?>
3.FLASH读PHP返回数据
如果PHP编码本身是UTF-8的,直接echo就可以了。
如果PHP编码本身是GB2312的,可以将PHP转存成UTF-8编码格式的文件,直接echo就可以了
如果PHP编码本身是GB2312的,而且不允许改文件的编码格式,用下面的语句将字符串转换成UTF-8的编码格式
$new_str = iconv("GB2312", "UTF-8", "$str");
再echo就可以了
4.FLASH读数据库(MYSQL)的数据
FLASH要通过PHP读取数据库中的数据,PHP本身的编码不重要,关键是如果数据库的编码是GB2312的话,需要用下面的语句将字符串转换成UTF-8的编码格式。
$new_str = iconv("GB2312", "UTF-8", "$str");
5.FLASH通过PHP写数据
一句话,FLASH传过来的字符串是UTF-8格式的,要转换成相应的编码格式,再操作(写文件、写数据库、直接显示等等),还是用iconv函数转换。
6.FLASH使用本地编码(理论上不推荐使用)
如果想让FLASH不使用UTF-8编码,而是使用本地编码。对于中国大陆地区而言,本地编码是GB2312或GBK
AS程序内,可以添加以下代码:
System.useCodepage = true;
那么FLASH内所有字符都是使用GB2312的编码了,所有导入到FLASH或者从FLASH导出的数据,都应该做相应的编码转换。
因为使用本地编码,会造成使用繁体中文地区的用户产生乱码,所以不推荐使用。
以上就是Web 开发中遇到的UTF-8编码的问题总结的内容,更多相关文章请关注PHP中文网(www.php.cn)!