Web development often involves front-end web pages-php- —Data interaction between mysql usually does not cause any problems when the data is only in English, but once it involves Chinese, the character encoding in the three places is inconsistent (for example, the web page uses gbk and mysql uses utf-8) It may lead to garbled characters.
(Note: For information on character encoding, please refer to Baidu Encyclopedia: http://baike.baidu.com/view/1204863.htm?fr=aladdin)
Front-end web page coding:
Usually we all think that we can pass the item in the
tag (such as ) to set the character encoding of the entire page. Most pages can use this method to tell the browser what encoding to use when displaying this page, but sometimes we will find that This sentence still doesn't work. No matter which xxx is, the browser always uses the same encoding.This situation involves the header part of the http protocol communication. In fact, When a user browses a web page, the content sent by the server to the user not only includes our web page (including code content such as html/css/js), but also includes descriptive content called the header, which tells the client what is going to happen. The type of data received (whether it is HTML, plain text, multimedia files, etc.), size, source and other information (if you want to see this information, you can use the telnet tool (instead of through the browser) to initiate get and other requests yourself according to the http protocol Try). Since the header is sent before HTML, as part of HTML has a lower priority than the header. If the header already contains a description of the character encoding of the web page, the browser will eventually The web page will be parsed according to the character encoding set specified in the header.
In PHP, you can use header("content-type:text/html; charset=xxx"); to send headers about the character set Department.
As for the apache server, it has an AddDefaultCharset function, which means that the corresponding header will be set according to the server's default character set for each web page sent.
Check /etc/apache2/httpd.conf (before 2.4) or /etc/apache2/conf-available/charset.conf (2.4 and later). There is a sentence AddDefaultCharset xxx. If this sentence is not commented, then for each web page The function of adding a default character set header is turned on. At this time, setting the character set in the tag alone will have no effect.
Note: The encoding method indicated on the html page should be consistent with the encoding method used to actually save the html page (actually plain text).
Generally speaking, in order to be compatible with Chinese and even more other languages, using utf-8 encoding is the most trouble-free way, because utf-8 supports almost all commonly used languages in the world.
mysql database encoding:
In the terminal mysql -uusername -ppassword and then enter the mysql control program, then type show variables like 'character%'; (note that The semicolon cannot be omitted when typing command statements or sql statements in the mysql terminal) You can see a picture similar to the following:
The above lists the character sets used by mysql at various levels. , where (*)
character-set-server/default-character-set: server character set, used by default.
character-set-database: Database character set.
character-set-table: Database table character set.
The priorities increase sequentially. Therefore, generally you only need to set the character-set-server, and do not specify a character set when creating databases and tables. In this way, the character-set-server character set is uniformly used.
character-set-client: The client’s character set. Client default character set. When a client sends a request to the server, the request is encoded in this character set.
character-set-results: Result character set. When the server returns results or information to the client, the results are encoded in this character set.
On the client side, if character-set-results is not defined, the character-set-client character set is used as the default character set. So only need to set the character-set-client character set.
So we will find that the character-set-server shown in the above picture does not use utf8 (note: in mysql, the utf-8 encoding method is expressed as utf8, without "-"). This is because the default storage method of mysql is latin1 without modification. In this case, when we use the mysql terminal to create databases and data tables, if the character set used is not specified in the sql statement, the encoding used for storage will be latin1. Obviously, Chinese characters are stored in This encoding method originally used to store Latin text will definitely be garbled when displayed.
So how to modify it? You can use set character-set-server = utf8; (because character-set-server has a high priority, just modifying it can achieve the effect of modifying the database storage encoding method). After that, when you use SQL to create a table in the terminal, the storage encoding method of the table is UTF-8.
However, this modification is only effective for the current service. Use quit; to exit and then enter the mysql terminal again and you will find that the character set has changed back to latin1. The information I have found so far shows that the method to make the modification permanent is only achieved by modifying the compilation parameters when recompiling mysql. If there are experts who know how to achieve this without compiling, please leave a message to let me know.
php encoding:
So, when it comes to mysql, how does php ensure that no garbled characters appear during data transmission when interacting with mysql?
According to the description at (*), in fact, in order to prevent garbled characters when storing and retrieving data from mysql, we only need to set the following three system parameters to match the server character set character-set -Server same character set. They are:
character_set_client: The client's character set.
character_set_results: Result character set.
character_set_connection: connection character set.
Setting these three system parameters can be achieved by sending the statement to MySQL: set names xxx (xxx can be utf8)
Therefore, when it comes to sending Chinese and other non-English characters from php to mysql, in After using the mysql_query("set names utf8"); statement after the mysql_connect statement (it is assumed that the database storage uses utf8), you can safely transmit and retrieve Chinese.
In addition, since HTML pages may actually be dynamically generated by PHP, how to ensure that the encoding method used by PHP dynamically generated pages is the same as that declared in the header or ?
Find the php.ini file in the php directory and modify default_charset = "utf-8" to make php use utf-8 to encode when outputting the page.
Recommended learning: "PHP Video Tutorial"