> 文章列表 > mysql字符集和校验规则(史上最全)

mysql字符集和校验规则(史上最全)

mysql字符集和校验规则(史上最全)

前言

MySQL是一个关系型数据库管理系统,由瑞典MySQL AB 公司开发,属于 Oracle 旗下产品。MySQL是最流行的关系型数据库管理系统之一,在 WEB 应用方面,MySQL是最好的 RDBMS (Relational Database Management System,关系数据库管理系统) 应用软件之一。本文章收录在MySQL性能优化+原理+实战专栏,点击此处查看开篇介绍。

在这里插入图片描述
我们在工作的时候,在查询和导出数据的时候,出现乱码。这跟我们的配置有关。

字符集一般分为三类:服务端、连接端、操作系统,三者不一致会出现乱码。

字符集和校验规则

  • 一、mysql 字符集和校验规则
  • 二、查看字符集方法
    • 2.1 查询字符集方法
    • 2.2 查看字符集的校对规则
    • 2.3 查询当前数据库设置的字符集
    • 2.4 查看当前数据库的校对规则
  • 三、mysql字符设置
    • 3.1 字符集设置的层级关系
    • 3.2 设置mysql服务器级别的字符集
      • 3.2.1 永久设置
      • 3.2.2 临时设置
    • 3.3 设置对象字符集
  • 四、字符集案例
    • 4.1 常用的字符集汉字占多少字节
    • 4.2 大小案例
  • 五、插入中文乱码解决
  • 六、数据库常见的字符集及如何选择字符集
  • 七、生产坏境中,如何避免乱码

一、mysql 字符集和校验规则

  • 字符集 :是一套符号和编码的规则
  • 校验规则:是对该套符号和编码的校验,定义符号的排序和比较规则,其中区分大小写,跟校验规则有关。

二、查看字符集方法

1、查询mysql服务是否正常启动

[root@mysql2 ~]# lsof -i:3306
COMMAND   PID  USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
mysqld  27473 mysql   32u  IPv6 1648089      0t0  TCP *:mysql (LISTEN)
[root@mysql2 ~]# netstat -lntup | grep 3306
tcp6       0      0 :::3306                 :::*                    LISTEN      27473/mysqld 

2.1 查询字符集方法

show character set 查询mysql支持的字符集

mysql> show character set;
+----------+---------------------------------+---------------------+--------+
| Charset  | Description                     | Default collation   | Maxlen |
+----------+---------------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese        | big5_chinese_ci     |      2 |
| dec8     | DEC West European               | dec8_swedish_ci     |      1 |
| cp850    | DOS West European               | cp850_general_ci    |      1 |
| hp8      | HP West European                | hp8_english_ci      |      1 |
| koi8r    | KOI8-R Relcom Russian           | koi8r_general_ci    |      1 |
| latin1   | cp1252 West European            | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European     | latin2_general_ci   |      1 |
| swe7     | 7bit Swedish                    | swe7_swedish_ci     |      1 |
| ascii    | US ASCII                        | ascii_general_ci    |      1 |
| ujis     | EUC-JP Japanese                 | ujis_japanese_ci    |      3 |
| sjis     | Shift-JIS Japanese              | sjis_japanese_ci    |      2 |
| hebrew   | ISO 8859-8 Hebrew               | hebrew_general_ci   |      1 |
| tis620   | TIS620 Thai                     | tis620_thai_ci      |      1 |
| euckr    | EUC-KR Korean                   | euckr_korean_ci     |      2 |
| koi8u    | KOI8-U Ukrainian                | koi8u_general_ci    |      1 |
| gb2312   | GB2312 Simplified Chinese       | gb2312_chinese_ci   |      2 |
| greek    | ISO 8859-7 Greek                | greek_general_ci    |      1 |
| cp1250   | Windows Central European        | cp1250_general_ci   |      1 |
| gbk      | GBK Simplified Chinese          | gbk_chinese_ci      |      2 |
| latin5   | ISO 8859-9 Turkish              | latin5_turkish_ci   |      1 |
| armscii8 | ARMSCII-8 Armenian              | armscii8_general_ci |      1 |
| utf8     | UTF-8 Unicode                   | utf8_general_ci     |      3 |
| ucs2     | UCS-2 Unicode                   | ucs2_general_ci     |      2 |
| cp866    | DOS Russian                     | cp866_general_ci    |      1 |
| keybcs2  | DOS Kamenicky Czech-Slovak      | keybcs2_general_ci  |      1 |
| macce    | Mac Central European            | macce_general_ci    |      1 |
| macroman | Mac West European               | macroman_general_ci |      1 |
| cp852    | DOS Central European            | cp852_general_ci    |      1 |
| latin7   | ISO 8859-13 Baltic              | latin7_general_ci   |      1 |
| utf8mb4  | UTF-8 Unicode                   | utf8mb4_general_ci  |      4 |
| cp1251   | Windows Cyrillic                | cp1251_general_ci   |      1 |
| utf16    | UTF-16 Unicode                  | utf16_general_ci    |      4 |
| utf16le  | UTF-16LE Unicode                | utf16le_general_ci  |      4 |
| cp1256   | Windows Arabic                  | cp1256_general_ci   |      1 |
| cp1257   | Windows Baltic                  | cp1257_general_ci   |      1 |
| utf32    | UTF-32 Unicode                  | utf32_general_ci    |      4 |
| binary   | Binary pseudo charset           | binary              |      1 |
| geostd8  | GEOSTD8 Georgian                | geostd8_general_ci  |      1 |
| cp932    | SJIS for Windows Japanese       | cp932_japanese_ci   |      2 |
| eucjpms  | UJIS for Windows Japanese       | eucjpms_japanese_ci |      3 |
| gb18030  | China National Standard GB18030 | gb18030_chinese_ci  |      4 |
+----------+---------------------------------+---------------------+--------+
41 rows in set (0.00 sec)

2.2 查看字符集的校对规则

show collation;

  • gbk_chinese_ci 不区分大小写
  • gbk_bin 区分大小写
mysql> show collation like 'gbk%';
+----------------+---------+----+---------+----------+---------+
| Collation      | Charset | Id | Default | Compiled | Sortlen |
+----------------+---------+----+---------+----------+---------+
| gbk_chinese_ci | gbk     | 28 | Yes     | Yes      |       1 |
| gbk_bin        | gbk     | 87 |         | Yes      |       1 |
+----------------+---------+----+---------+----------+---------+
2 rows in set (0.00 sec)

2.3 查询当前数据库设置的字符集

\\s

mysql> \\s
--------------
mysql  Ver 14.14 Distrib 5.7.39, for linux-glibc2.12 (x86_64) using  EditLine wrapperConnection id:		4
Current database:	
Current user:		multis@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.7.39-log MySQL Community Server (GPL)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	latin1
Db     characterset:	latin1
Client characterset:	utf8
Conn.  characterset:	utf8
UNIX socket:		/tmp/mysql.sock
Uptime:			31 min 26 secThreads: 1  Questions: 15  Slow queries: 0  Opens: 109  Flush tables: 2  Open tables: 0  Queries per second avg: 0.007
--------------
mysql> show variables like 'character%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | utf8                                                           |
| character_set_connection | utf8                                                           |
| character_set_database   | latin1                                                         |
| character_set_filesystem | binary                                                         |
| character_set_results    | utf8                                                           |
| character_set_server     | latin1                                                         |
| character_set_system     | utf8                                                           |
| character_sets_dir       | /opt/mysql/mysql-5.7.39-linux-glibc2.12-x86_64/share/charsets/ |
+--------------------------+----------------------------------------------------------------+
8 rows in set (0.01 sec)
  • character_set_client #客户端请求数据的字符集
  • character_set_connection #客户端与服务端连接的字符集
  • character_set_database # 数据库默认使用的字符集
  • character_set_filesystem # binary表示操作系统中的文件名不做任何转换
  • character_set_results #查询结果返回数据的字符集
  • character_set_server # mysql服务端的字符集
  • character_set_system #系统的字符集

2.4 查看当前数据库的校对规则

show variables like ‘collation%’;

mysql> show variables like 'collation%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | utf8_general_ci   |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)

三、mysql字符设置

3.1 字符集设置的层级关系

服务端:character_set_server > database > table

也就是说如果在创建表的时候没有指定字符集,默认使用数据库,如果数据库也没有设置,就默认使用server的字符集

客户端:character_set_connection > character_set_result

服务端发出一条sql语句,以什么字符集展示,如果设置查询结果返回数据的字符集集,会使用该字符集,没有设置就会使用客户端与服务端连接的字符集

3.2 设置mysql服务器级别的字符集

3.2.1 永久设置

在配置文件设置,也就是永久生效

[client]
default_character_set=utf8 

客户端不需要重启,就能影响。它影响的参数是:character_set_client character_set_connection character_set_results ,如果有别的客户端没有配置,也是根据别的客服端的设置

[mysqld]
character_set_server=utf8 

服务端设置需要重启才能生效,影响的是character_set_server character_set_database 以及表的字符集

[root@mysql2 ~]# cat /etc/my.cnf 
[client]
default_character_set=utf8
[mysqld]
character_set_server=utf8
user=mysql
basedir=/usr/local/mysql
datadir=/data/mysql/my3306/data
socket = /data/mysql/my3306/mysql.sock
server_id = 1
port = 3306
log_error=/data/mysql/my3306/logs/error.log
log_bin=/data/mysql/my3306/logs/mysql-bin
binlog_format=row
gtid_mode=on
enforce_gtid_consistency=true
log_slave_updates=1
max_connections=1024
wait_timeout=60
sort_buffer_size=2M
max_allowed_packet=32M
join_buffer_size=2M
innodb_buffer_pool_size=128M
innodb_flush_log_at_trx_commit=1
innodb_log_buffer_size=32M
innodb_log_file_size=128M
innodb_log_files_in_group=2
binlog_cache_size=2M
max_binlog_cache_size=8M
max_binlog_size=512M
expire_logs_days=7
slow_query_log=on
slow_query_log_file=/data/mysql/my3306/logs/slow.log
long_query_time=0.5
log_queries_not_using_indexes=1
##skip-grant-tables1

我们重启数据库查询我们现在的字符集

mysql> show variables like 'character%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | utf8                                                           |
| character_set_connection | utf8                                                           |
| character_set_database   | utf8                                                           |
| character_set_filesystem | binary                                                         |
| character_set_results    | utf8                                                           |
| character_set_server     | utf8                                                           |
| character_set_system     | utf8                                                           |
| character_sets_dir       | /opt/mysql/mysql-5.7.39-linux-glibc2.12-x86_64/share/charsets/ |
+--------------------------+----------------------------------------------------------------+
8 rows in set (0.01 sec)

这样就说明我们存储的数据还是得到的结果都是utf8

3.2.2 临时设置

set 一旦数据库重启,设置失效

  • 针对当前会话窗口有效
    set [session] character_set_server=utf8;
mysql> set character_set_server=gbk;
Query OK, 0 rows affected (0.00 sec)mysql> \\s
--------------
mysql  Ver 14.14 Distrib 5.7.39, for linux-glibc2.12 (x86_64) using  EditLine wrapperConnection id:		7
Current database:	
Current user:		root@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.7.39-log MySQL Community Server (GPL)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	gbk
Db     characterset:	utf8
Client characterset:	utf8
Conn.  characterset:	utf8
UNIX socket:		/tmp/mysql.sock
Uptime:			4 days 22 hours 48 min 31 secThreads: 2  Questions: 29  Slow queries: 0  Opens: 110  Flush tables: 1  Open tables: 103  Queries per second avg: 0.000
--------------

我们重新打开一个窗口进行字符集查询

mysql> \\s
--------------
mysql  Ver 14.14 Distrib 5.7.39, for linux-glibc2.12 (x86_64) using  EditLine wrapperConnection id:		8
Current database:	
Current user:		root@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.7.39-log MySQL Community Server (GPL)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	utf8
Db     characterset:	utf8
Client characterset:	utf8
Conn.  characterset:	utf8
UNIX socket:		/tmp/mysql.sock
Uptime:			4 days 22 hours 49 min 45 secThreads: 2  Questions: 34  Slow queries: 0  Opens: 110  Flush tables: 1  Open tables: 103  Queries per second avg: 0.000
--------------
  • 设置全局,对所有会话有效。只针对新打开的会话。
    set global character_set_server=utf8;
mysql> set global character_set_server=gbk;
Query OK, 0 rows affected (0.00 sec)mysql> \\s
--------------
mysql  Ver 14.14 Distrib 5.7.39, for linux-glibc2.12 (x86_64) using  EditLine wrapperConnection id:		8
Current database:	
Current user:		root@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.7.39-log MySQL Community Server (GPL)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	utf8
Db     characterset:	utf8
Client characterset:	utf8
Conn.  characterset:	utf8
UNIX socket:		/tmp/mysql.sock
Uptime:			4 days 22 hours 50 min 53 secThreads: 1  Questions: 39  Slow queries: 0  Opens: 110  Flush tables: 1  Open tables: 103  Queries per second avg: 0.000
--------------

重新打开一个窗口,进行字符集查询

mysql> \\s
--------------
mysql  Ver 14.14 Distrib 5.7.39, for linux-glibc2.12 (x86_64) using  EditLine wrapperConnection id:		9
Current database:	
Current user:		root@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.7.39-log MySQL Community Server (GPL)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	gbk
Db     characterset:	utf8
Client characterset:	utf8
Conn.  characterset:	utf8
UNIX socket:		/tmp/mysql.sock
Uptime:			4 days 22 hours 51 min 58 secThreads: 2  Questions: 43  Slow queries: 0  Opens: 110  Flush tables: 1  Open tables: 103  Queries per second avg: 0.000
--------------
  • 设置客户端的字符集,只影响client、connection、results
    set names gbk;
mysql> set names gbk;
Query OK, 0 rows affected (0.00 sec)mysql> \\s
--------------
mysql  Ver 14.14 Distrib 5.7.39, for linux-glibc2.12 (x86_64) using  EditLine wrapperConnection id:		9
Current database:	
Current user:		root@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.7.39-log MySQL Community Server (GPL)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	gbk
Db     characterset:	utf8
Client characterset:	gbk
Conn.  characterset:	gbk
UNIX socket:		/tmp/mysql.sock
Uptime:			4 days 22 hours 53 min 34 secThreads: 2  Questions: 47  Slow queries: 0  Opens: 110  Flush tables: 1  Open tables: 103  Queries per second avg: 0.000
--------------mysql> show variables like 'character%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | gbk                                                            |
| character_set_connection | gbk                                                            |
| character_set_database   | utf8                                                           |
| character_set_filesystem | binary                                                         |
| character_set_results    | gbk                                                            |
| character_set_server     | gbk                                                            |
| character_set_system     | utf8                                                           |
| character_sets_dir       | /opt/mysql/mysql-5.7.39-linux-glibc2.12-x86_64/share/charsets/ |
+--------------------------+----------------------------------------------------------------+
8 rows in set (0.02 sec)

3.3 设置对象字符集

设置对象字符集跟服务器字符集有关联的。

mysql> create database dbsql default character set gbk;
mysql> use dbsql;
mysql> create table t1(id int,name varchar(10)) default character set utf8mb4;
mysql> show create table t1;
+-------+----------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                               |
+-------+----------------------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (`id` int(11) DEFAULT NULL,`name` varchar(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
+-------+----------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)mysql> create table t2(id int,name varchar(10));
Query OK, 0 rows affected (0.02 sec)mysql> show create table t2;
+-------+------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                           |
+-------+------------------------------------------------------------------------------------------------------------------------+
| t2    | CREATE TABLE `t2` (`id` int(11) DEFAULT NULL,`name` varchar(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=gbk |
+-------+------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

也就是说如果在创建表的时候没有指定字符集,默认使用数据库,如果数据库也没有设置,就默认使用server的字符集,以此内推。

四、字符集案例

4.1 常用的字符集汉字占多少字节

  • utf8mb4:4byte
  • utf8:3byte
  • gbk:2byte

4.2 大小案例

字段类型为varchar(30),30代表的什么?
30代表的30个字符,不是字节

字段类型为varchar(30),在utf8字符集下,可以存多少汉字?占用多少字节?
30个汉字,30*3+1字节(+1因为字段类型是可变的,<255,多占1字节,>255占2字节)

字段类型为varchar(30),在utf8字符集下,可以存多少英文?占用多少字节?
30个英文,30*1+1字节

字段类型为varchar(30),在gbk字符集下,可以存多少汉字?占用多少字节?
30个汉字,30*2+1字节

字段类型为varchar(30),在gbk字符集下,可以存多少英文?占用多少字节?
30个英文,30*1+1字节

如何选择:

  • 如果是内部人员使用,则使用gbk,节省空间和带宽
  • 如果有交互,则使用utf8(mysql7默认支持)
  • 如果支持大空间,则使用utf8mb4(mysql8默认支持)

五、插入中文乱码解决

mysql> create table t(id int default null,name varchar(20) default null);
Query OK, 0 rows affected (0.01 sec)
mysql> insert into t values(1,'一');
Query OK, 1 row affected (0.00 sec)
mysql> insert into t values(1,'1');
Query OK, 1 row affected (0.00 sec)
mysql> select * from t;
+------+------+
| id   | name |
+------+------+
|    1 | zs   |
|    1 ||
+------+------+
2 rows in set (0.00 sec)mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)mysql> show variables like 'character%' ;
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | latin1                                                         |
| character_set_connection | latin1                                                         |
| character_set_database   | gbk                                                            |
| character_set_filesystem | binary                                                         |
| character_set_results    | latin1                                                         |
| character_set_server     | utf8                                                           |
| character_set_system     | utf8                                                           |
| character_sets_dir       | /opt/mysql/mysql-5.7.39-linux-glibc2.12-x86_64/share/charsets/ |
+--------------------------+----------------------------------------------------------------+
8 rows in set (0.01 sec)mysql> \\s
--------------
mysql  Ver 14.14 Distrib 5.7.39, for linux-glibc2.12 (x86_64) using  EditLine wrapperConnection id:		2
Current database:	dbsql
Current user:		multis@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.7.39-log MySQL Community Server (GPL)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	utf8
Db     characterset:	gbk
Client characterset:	latin1
Conn.  characterset:	latin1
UNIX socket:		/tmp/mysql.sock
Uptime:			30 min 19 secThreads: 1  Questions: 21  Slow queries: 0  Opens: 112  Flush tables: 1  Open tables: 105  Queries per second avg: 0.011
--------------
mysql> select * from t;
+------+------+
| id   | name |
+------+------+
|    1 | zs   |
|    1 | ?    |
+------+------+
2 rows in set (0.00 sec)

插入的时候用的utf8,查询的时候用的latin1,所以出现乱码。我们把字符集修改回去查询结果,一切OK

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)mysql> select * from t;
+------+------+
| id   | name |
+------+------+
|    1 | zs   |
|    1 ||
+------+------+
2 rows in set (0.00 sec)

六、数据库常见的字符集及如何选择字符集

字符集 长度 备注
latin1 1 默认
gbk 2
utf8 3 最常用
utf8mb4 4 最常用,优先选择

七、生产坏境中,如何避免乱码

操作系统、数据库和客户端保持一致

[root@mysql2 ~]# cat /etc/locale.conf 
LANG="zh_CN.UTF-8"[root@mysql2 ~]# cat /etc/my.cnf 
[client]
default_character_set=utf8
[mysqld]
character_set_server=utf8

在这里插入图片描述