Home  >  Article  >  Database  >  mysql乱码问题_MySQL

mysql乱码问题_MySQL

WBOY
WBOYOriginal
2016-06-01 13:51:43756browse

mysql字符编码是版本4.1引入的,支持多国语言,而且一些特性已经超过了其他的数据库系统。

可以在MySQL Command Line Client 下输入如下命令查看mysql的字符集

mysql> SHOW CHARACTER SET;
+----------+-----------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
| ascii | US ASCII | ascii_general_ci | 1 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| greek | ISO 8859-7 Greek | greek_general_ci | 1 |
| cp1250 | Windows Central European | cp1250_general_ci | 1 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
| cp866 | DOS Russian | cp866_general_ci | 1 |
| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |
| macce | Mac Central European | macce_general_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| cp852 | DOS Central European | cp852_general_ci | 1 |
| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |
| cp1256 | Windows Arabic | cp1256_general_ci | 1 |
| cp1257 | Windows Baltic | cp1257_general_ci | 1 |
| binary | Binary pseudo charset | binary | 1 |
| geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
+----------+-----------------------------+---------------------+--------+
36 rows in set (0.02 sec)



MySQL 4.1的字符集支持(Character Set Support)有两个方面:字符集(Character set)和排序方式(Collation)。对于字符集的支持细化到四个层次: 服务器(server),数据库(database),数据表(table)和连接(connection)。
查看系统的字符集和排序方式的设定可以通过下面的两条命令:

mysql> SHOW VARIABLES LIKE 'character_set_%';
+--------------------------+-------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | D:/MySQL/MySQL Server 5.0/share/charsets/ |
+--------------------------+-------------------------------------------+
8 rows in set (0.06 sec)

mysql> SHOW VARIABLES LIKE 'collation_%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.02 sec)

上面列出的值就是系统的默认值。latin1默认校对规则是latin1_swedish_ci,默认是latin1的瑞典语排序方式. 为什么呢默认会是latin1_swedish_ci呢,追溯一下mysql历史很容易发现.

1979 年,一家瑞典公司Tcx欲开发一个快速的多线程、多用户数据库系统。Tcx 公司起初想利用mSQL和他们自己的快速低级例程 (Indexed Sequential Access Method,ISAM)去连接数据库表,然而,在一些测试以后得出结论:mSQL对其需求来说不够快速和灵活。这就产生了一个连接器数据库的新SQL接 口,它使用几乎和mSQL一样的API接口。这个API被设计成可以使那些由mSQL而写的第三方代码更容易地移植到MySQL。

当然也可以需要修改mysql的默认字符集
在mysql配置文档my.ini,找到如下两句:

[mysql]

default-character-set=latin1



# created and no character set is defined
default-character-set=latin1

修改后面的值就可以。

这里不建议改,仍保留默认值
也就是说启动 mysql时,如果没指定指定一个默认的的字符集,这个值继承自配置文件中的; 
此时 character_set_server 被设定为这个默认的字符集;当创建一个新的数据库时,
除非明确指定,这个数据库的字符集被缺省设定为 character_set_server; 当选定了一个数据库时,
character_set_database 被设定为这个数据库默认的字符集; 在这个数据库里创建一张表时,
表默认的字符集被设定为 character_set_database,也就是这个数据库默认的字符集; 
当在表内设置一栏时,除非明确指定,否则此栏缺省的字符集就是表默认的字符集。

这样问题就随之而来了,假如一数据库是gbk编码。如果访问数据库时没指定其的字符集是gbk。
那么这个值将继承系统的latin1,这样就做成mysql中文乱码。


乱码解决方法

要解决乱码问题,首先必须弄清楚数据库用什么编码。如果没有指明,将是默认的latin1。
用得最多的应该是这3种字符集 gb2312,gbk,utf8。

如何去指定数据库的字符集呢?下面也gbk为例

【在MySQL Command Line Client创建数据库 】

mysql> CREATE TABLE `mysqlcode` (
  -> `id` TINYINT( 255 ) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
  -> `content` VARCHAR( 255 ) NOT NULL
  -> ) TYPE = MYISAM CHARACTER SET gbk COLLATE gbk_chinese_ci;
Query OK, 0 rows affected, 1 warning (0.03 sec)

mysql> desc mysqlcode;
+---------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+-----------------------+------+-----+---------+----------------+
| id | tinyint(255) unsigned | NO | PRI | | auto_increment |
| content | varchar(255) | NO | | | |
+---------+-----------------------+------+-----+---------+----------------+
2 rows in set (0.02 sec)

其中后面的TYPE = MYISAM CHARACTER SET gbk COLLATE gbk_chinese_ci;
就是指定数据库的字符集,COLLATE (校勘),让mysql同时支持多种编码的数据库。

当然也可以通过如下指令修改数据库的字符集
alter database da_name default character set 'charset'.

客户端以 gbk格式发送 ,可以采用下述配置:

SET character_set_client='gbk'
SET character_set_connection='gbk'
SET character_set_results='gbk'

这个配置就等价于 SET NAMES 'gbk'。


现在对刚才创建的数据库操作

mysql> use test;
Database changed

mysql> insert into mysqlcode values(null,'php爱好者');
ERROR 1406 (22001): Data too long for column 'content' at row 1

没有指定字符集为gbk,插入时出错

mysql> set names 'gbk';
Query OK, 0 rows affected (0.02 sec)

指定字符集为 gbk

mysql> insert into mysqlcode values(null,'php爱好者');
Query OK, 1 row affected (0.00 sec)

插入成功

mysql> select * from mysqlcode;
+----+-----------+
| id | content |
+----+-----------+
| 1 | php爱好着 |
+----+-----------+
1 row in set (0.00 sec)

在没有指定字符集gbk时读取也会出现乱码,如下

mysql> select * from mysqlcode;
+----+---------+
| id | content |
+----+---------+
| 1 | php??? |
+----+---------+
1 row in set (0.00 sec)

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn