Home  >  Article  >  Database  >  [MySQL] MySQL data type optimization

[MySQL] MySQL data type optimization

黄舟
黄舟Original
2017-02-25 10:19:591086browse

Choose optimized data types

MySQL supports many data types, and choosing the correct data type is crucial to achieving high performance. No matter what type of data you store, here are a few principles to help make better choices.

Smaller is usually better

In general, you should try to use the smallest data type that can store data correctly (for example, you only need to store 0-200, tinyint unsigned is better). Smaller data types are generally faster because they occupy less disk, memory, and CPU cache, and require fewer CPU cycles to process.

Simple is good

Operations on simple data types generally require fewer CPU cycles. For example, integer operations are cheaper than character operations because the character set and collation rules make string comparisons more complex than integer comparisons. Here are two examples: one is that MySQL's built-in types (such as date, time, datetime) should be used instead of strings to store date and time, and the other is that integers should be used to store IP addresses.

Try to avoid using NULL

Many tables contain NULL-able columns, even if the application does not need to save NULL. This is because NULL is the default attribute of the column. It's usually best to specify columns as NOT NULL unless you really need to store NULL values.

If the query contains columns that can be NULL, it is more difficult for MySQL to optimize because NULL columns make indexes, index statistics, and value comparisons more complex. Columns that can be NULL use more storage space and require special handling in MySQL. When a NULLable column is indexed, each index record requires an extra byte, which in MyISAM can even cause a fixed-size index (such as an index with only one integer column) to become a variable-size index.

Usually the performance improvement brought by changing the NULL value to NOT NULL is relatively small, so (when tuning) there is no need to first find and modify this situation in the existing schema, unless it is determined This can cause problems. However, if you plan to create an index on a column, you should avoid designing the column to be NULLable.

Of course there are exceptions. For example, it is worth mentioning that InnoDB uses a separate bit to store NULL values, so it has good performance for sparse data (most values ​​​​are NULL, and only a few rows are non-NULL values). space efficiency. But this does not apply to MyISAM.


Integer type

If you store integers, you can use these integer types: TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT. Use 8, 16, 24, 32, and 64-bit storage space respectively. Their storage range is from -2 to the power of (N-1) to 2 to the power of (N-1) -1, where N is the number of bits in the storage space.

The integer type has the optional UNSIGNED attribute, which means that negative values ​​are not allowed. This can roughly double the upper limit of positive numbers. For example, the range that TINYINT UNSIGNED can store is 0-255, and the storage range of TINYINT It's -128~127.

MySQL can specify the width for integer types, such as INT(11). For most applications, this is meaningless: it does not limit the legal range of values. Knowledge stipulates some interactive tools of MySQL (such as MySQL commands line client) used to display the number of characters. For storage, INT(1) and INT(20) are the same.

Real number type

Real numbers are numbers with a decimal part. However, they are not just for storing decimal parts, DECIMAL can also be used to store integers larger than BIGINT. MySQL supports both precise and imprecise types.

The FLOAT and DOUBLE types support approximate calculations using standard floating point arithmetic. If you need to know how floating point operations are calculated, you need to study the specific implementation of floating point numbers on the platform you are using.

The DECIMAL type is used to store precise decimals. However, because the CPU does not support direct calculation of DECIMAL, in MySQL5.0 and later versions, the MySQL server itself implements high-precision calculation of DECIMAL. Relatively speaking, this is slower than the CPU directly supporting native floating point operations.

Both floating point and DECIMAL types can specify precision. For DECIMAL columns, you can specify the maximum number of digits allowed before and after the decimal point. This affects the column's space consumption.

Floating point types usually use less space than DECIMAL when storing the same range of values. FLOAT uses 4 bytes of storage. DOUBLE occupies 8 bytes and has higher precision and larger range than FLOAT.

Because of the additional space and computational overhead required, you should try to use DECIMAL only when performing precise calculations on decimals - such as storing financial data. However, when the amount of data is relatively large, you can consider using BIGINT instead of DECIMAL, and multiply the currency unit to be stored by the corresponding multiple according to the number of decimal points. Assuming that you want to store financial data accurate to one ten thousandth, you can multiply all amounts by 1 million and then store the results in BIGINT. This can avoid the problems of inaccurate floating point storage calculations and high cost of accurate DECIMAL calculations.

String type

The following description assumes that the storage engine used is InnoDB/or MyISAM. If it is not supported by these two storage engines, please refer to the documentation of the storage engine used.

VARCHAR and CHAR

VARCHAR: It is more space-saving than fixed-length types because it only uses the necessary space. VARCHAR saves space, so it also helps performance. However, since the row is variable in length, the row may become longer than the original during UPDATE, which requires additional work.

It is appropriate to use VARCHAR in the following situations: the maximum length of the string is much larger than the average length; the column is updated less, so fragmentation is not a problem; a complex character set like UTF-8 is used, and each character Use a different number of bytes.

In versions 5.0 or higher, MySQL preserves trailing spaces when storing and retrieving. InnoDB is more flexible. It can store long VARCHAR as BLOB

CHAR: fixed length. When storing CHAR values, MySQL will delete all trailing spaces. The fixed-length CHAR type is less prone to fragmentation, and for very short columns, CHAR is also more efficient in storage space than VARCHAR, which has one or two extra bytes of record length. CHAR is suitable for storing very short strings, or when all values ​​are close to the same length. For example: CHAR is very suitable for storing the MD5 value of a password because it is a fixed-length value. CHAR will be padded with spaces as needed to facilitate comparison.

Similar types to CHAR and VARCHAR are BINARY and VARBINARY, which store binary strings. Binary strings store bytecodes rather than characters.

The advantages of binary comparison are not only reflected in case sensitivity. MySQL compares BINARY strings one byte at a time and compares them based on the value of the byte. Therefore, binary is much simpler than characters, so it is faster.

BLOB and TEXT types

BLOB and TEXT types: BLOB and TEXT are both string data types designed to store large data, and are stored in binary and character modes respectively. When the BLOB and TEXT values ​​are too large, InnoDB will use a dedicated "external" storage area for storage. The original table field storage pointer points to the external storage area.

MySQL sorts BLOB and TEXT columns differently from other types: it only sorts the first max_sort_length bytes of the column instead of the entire string. If you only need to sort the first small number of characters, you can reduce the configuration of max_sort_length, or use ORDER BY SUSTRING(column, length).

MySQL cannot index the full length strings of BLOB and TEXT columns, nor can it use these indexes to eliminate sorting.

Use enumeration (ENUM) instead of string type

You can use enumeration (ENUM) instead of string type. Many times it is recommended to use enumeration columns instead of commonly used string types.

(1) The enumeration column can store some unique strings into a predefined collection.
(2) Mysql is very compact when storing enumerations and will be compressed into one or two bytes depending on the number of list values.
(3) Mysql will internally save the position of each value in the list as an integer, and save the "lookup table" of the "number-string" mapping relationship in the .frm file of the table.

Note: One surprising thing is that the enumeration fields are sorted by internally stored integers instead of defined strings.

Note: The worst thing about enumerations is that the string list is fixed. Adding or deleting strings must use ALTER TABLE. Therefore, for a series of strings that may change in the future, use enumerations and Not a good idea unless you accept that you can only add elements at the end of the list.

Note: There is some overhead for enumeration columns because MySQL saves each enumeration value as an integer and must do a lookup to convert it to a string.

Date and time type

For details on data types and usage, please see: http://www.php.cn/

Mysql has many types that can save date and time values, such as YEAR and DATE.

The minimum time granularity that Mysql can store is seconds (MariaDB supports microsecond-level event types). But MySQL can also perform ad hoc operations with microsecond level granularity.

Most of the time there are no alternatives to the type, so there is no question of what is the best choice.

The only question next is what needs to be done when saving the date and time.

DATETIME

(1) This type can save a wide range of values, from 1001 to 9999, with a precision of seconds. (2) DATETIME encapsulates time and date into integers in the format YYYYMMDDHHMMSS, regardless of time zone. (3) DATETIME uses 8 bytes of storage space.

TIMESTAMP

(1) The TIMESTAMP type stores the number of seconds since midnight on January 1, 1970, which is the same as the UNIX timestamp. (2) TIMESTAMP only uses 4 bytes of storage space, so its range is much smaller than DATETIME. (3) The value displayed by TIMESTAMP depends on the time zone.

Comparison between DATETIME and TIMESTAMP:

(1) By default, if the value of the first TIMESTAMP column is not specified when inserting, Mysql sets the value of this column to the current time. (This is a feature that DATETIME does not have) (2) When inserting a row of records, Mysql will also update the value of the first TIMESTAMP column by default. (3) The TIMESTAMP column defaults to NOT NULL, which is different from other data types.

Summarize

(1) In addition to special behavior, TIMESTAMP should generally be used whenever possible because it is more space efficient than DATETIME. (2) Generally speaking, it is not recommended to save UNIX timestamps as integer values. This will not bring any benefits. Saving timestamp formats with integers is usually inconvenient to process. (3) If you need to store date and time values ​​with a smaller granularity than seconds, you can use the BIGINT type to store microsecond-level timestamps, or use DOUBLE to store the decimal part after seconds. You can also use MariaDB instead of Mysql.

Bit Data Type

MySQL has a few storage types that use compact bits to store data. All of these bit types, regardless of the underlying storage format and processing, are technically string types.

BIT

You can use the BIT column to store one or more true/false values ​​in a column. BIT(1) defines a field containing a single bit, BIT(2) stores 2 bits, and so on. The maximum length of a BIT column is 64 bits.

If you want to store a true/false value in a bit storage space, another method is to create a CHAR(0) column that can be null. This column can hold a null value (NULL) or a zero-length string (empty string).

SET

If you need to save many true/false values, consider merging these columns into a SET data type, which is represented internally by MySQL as a set of packed bits. This effectively utilizes storage space, and MySQL has functions like FIND_IN_SET() and FIELD() for easy use in queries. Its main disadvantage is that changing column definitions is expensive: ALTER TABLE is required, which is a very expensive operation for large tables. Generally speaking, it is also not possible to search through indexes on SET columns.

An alternative to SET is to use an integer to wrap a series of bits. For example, 8 bits can be packed into a TINYINT and used in bitwise operations. You can simplify this by defining named constants for each bit in your application.

Compared with SET, the main advantage of this method is that you can change the "enumeration" value represented by the field without using ALTER TABLE. The disadvantage is that the query statement is more difficult to write and more difficult to understand (when the 5th bit What does it mean when a bit is set? ). Some people are very comfortable with this approach, and some are not, so whether or not to adopt this technique comes down to personal preference.

Selecting the identifier (identifier)

It is very important to choose the appropriate data type for the identifier (identity column).

Generally speaking, it is more likely to use the identity column to compare with other values, or to find other columns through the identity column.

When choosing the type of identity column, you not only need to consider the storage type, but also how Mysql performs calculations and comparisons on this type.

Once you select a type, make sure to use the same type in all related tables.

On the premise that the range requirements of the value can be met and room for future growth is reserved, the smallest data type should be selected.

  • Integers are usually the best choice for identity columns because they are fast and can be used with AUTO_INCREMENT.

  • ENUM and SET are the worst choices;

  • If possible, avoid using strings as identity columns because they are very Consumes space and is generally slower than numeric classes.

Special type data

Some types of data are not directly consistent with built-in types. Timestamps with sub-second precision are one example.

Another example is that people usually use VARCHAR(15) to store IP addresses. However, they are actually 32-bit unsigned integers, not strings. The decimal point is used to divide the field into four segments for ease of reading. So IP addresses should be stored as unsigned integers. MySQL provides INET_ATON() and INET_NTOA() functions to convert between these two representation methods.

Select optimized data types

MySQL supports many data types, and choosing the correct data type is crucial to obtaining high performance. No matter what type of data you store, here are a few principles to help make better choices.

Smaller is usually better

In general, you should try to use the smallest data type that can store data correctly (for example, you only need to store 0-200, tinyint unsigned is better). Smaller data types are generally faster because they occupy less disk, memory, and CPU cache, and require fewer CPU cycles to process.

Simple is good

Operations on simple data types generally require fewer CPU cycles. For example, integer operations are cheaper than character operations because the character set and collation rules make string comparisons more complex than integer comparisons. Here are two examples: one is that MySQL's built-in types (such as date, time, datetime) should be used instead of strings to store date and time, and the other is that integers should be used to store IP addresses.

Try to avoid using NULL

Many tables contain NULL-able columns, even if the application does not need to save NULL. This is because NULL is the default attribute of the column. It's usually best to specify columns as NOT NULL unless you really need to store NULL values.

如果查询中包含可为NULL的列,对MySQL来说更难优化,因为可为NULL的列使得索引、索引统计和值比较都更复杂。可为NULL的列会使用更多的存储空间,在MySQL里也需要特殊处理。当可为NULL的列被索引时,每个索引记录需要一个额外的字节,在MyISAM中甚至还可能导致固定大小的索引(例如只有一个整数列的索引)变成可变大小的索引。

通常把可为NULL的值改为NOT NULL带来的性能提升比较小,所以(调优时)没有必要首先在现有的schema中查找并修改掉这种情况,除非确定这会导致问题。但是,如果计划在列上建立索引,就应该避免设计成可为NULL的列。

当然也有例外,例如值得一提的是,InnoDB使用单独的位(bit)存储NULL值,所以对于稀疏数据(大部分值为NULL,只有少数行为非NULL的值)有良好的空间效率。但这一点不适用于MyISAM。


整数类型

如果存储整数,可以使用这几种整数类型:TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT。分别使用8,16,24,32,64位存储空间。它们的存储范围从 -2的(N-1)次方 到 2的(N-1)次方-1,其中N为存储空间的位数。

整数类型有可选的UNSIGNED属性,表示不允许负值,这大致可以使正数的上限提高一倍,例如TINYINT UNSIGNED可以存储的范围是0-255,而TINYINT 的存储范围是-128~127。

MySQL可以为整数类型指定宽度,例如INT(11),对大多数应用这是没有意义的:他不会限制值得合法范围,知识规定了MySQL的一些交互工具(例如MySQL命令行客户端)用来显示字符的个数。对于存储来说,INT(1)和INT(20)是相同的。

实数类型

实数是带有小数部分的数字。然而,它们不只是为了存储小数部分,也可以使用DECIMAL存储比BIGINT还大的整数。MySQL既支持精确类型,也支持不精确类型。

FLOAT 和 DOUBLE 类型支持使用标准的浮点运算进行近似计算。如果需要知道浮点运算时怎么计算的,则需要研究所使用的平台的浮点数的具体实现。

DECIMAL 类型用于存储精确的小数。但因为CPU不支持对DECIMAL的直接计算,所以在MySQL5.0及更高版本中,MySQL服务器自身实现了DECIMAL的高精度计算。相对而言,这比CPU直接支持原生浮点数运算要慢。

浮点和DECIMAL类型都可以指定精度。对于DECIMAL列,可以指定小数点前后所允许的最大位数。这会影响列的空间消耗。

浮点类型在存储同样范围的值时,通常比DECIMAL使用更少的空间。FLOAT使用4个字节存储。DOUBLE占用8个字节,相比FLOAT有更高的精度和更大的范围。

因为需要额外的空间和计算开销,所以应该尽量只在对小数进行精确计算时才使用DECIMAL——例如存储财务数据。但数据量比较大的时候,可以考虑使用BIGINT代替DECIMAL,将需要存储的货币单位根据小数点的位数乘以相应的倍数即可。假设要存储财务数据精确到万分之一分,则可以把所有金额乘以100W,然后将结果存储在BIGINT里,这样可以同时避免浮点存储计算不精确和DECIMAL精确计算代价高的问题。

字符串类型

下面的描述假设使用的存储引擎是InnoDB/或者MyISAM。如果不是这两种存储引擎的,请参考所使用的存储引擎的文档。

VARCHAR and CHAR

VARCHAR: It is more space-saving than fixed-length types because it only uses the necessary space. VARCHAR saves space, so it also helps performance. However, since the row is variable in length, the row may become longer than the original during UPDATE, which requires additional work.

It is appropriate to use VARCHAR in the following situations: the maximum length of the string is much larger than the average length; the column is updated less, so fragmentation is not a problem; a complex character set like UTF-8 is used, and each character Use a different number of bytes.

In versions 5.0 or higher, MySQL preserves trailing spaces when storing and retrieving. InnoDB is more flexible. It can store long VARCHAR as BLOB

CHAR: fixed length. When storing CHAR values, MySQL will delete all trailing spaces. The fixed-length CHAR type is less prone to fragmentation, and for very short columns, CHAR is also more efficient in storage space than VARCHAR, which has one or two extra bytes of record length. CHAR is suitable for storing very short strings, or when all values ​​are close to the same length. For example: CHAR is very suitable for storing the MD5 value of a password because it is a fixed-length value. CHAR will be padded with spaces as needed to facilitate comparison.

Similar types to CHAR and VARCHAR are BINARY and VARBINARY, which store binary strings. Binary strings store bytecodes rather than characters.

The advantages of binary comparison are not only reflected in case sensitivity. MySQL compares BINARY strings one byte at a time and compares them based on the value of the byte. Therefore, binary is much simpler than characters, so it is faster.

BLOB and TEXT types

BLOB and TEXT types: BLOB and TEXT are both string data types designed to store large data, and are stored in binary and character modes respectively. When the BLOB and TEXT values ​​are too large, InnoDB will use a dedicated "external" storage area for storage. The original table field storage pointer points to the external storage area.

MySQL sorts BLOB and TEXT columns differently from other types: it only sorts the first max_sort_length bytes of the column instead of the entire string. If you only need to sort the first small number of characters, you can reduce the configuration of max_sort_length, or use ORDER BY SUSTRING(column, length).

MySQL cannot index the full length strings of BLOB and TEXT columns, nor can it use these indexes to eliminate sorting.

Use enumeration (ENUM) instead of string type

You can use enumeration (ENUM) instead of string type. Many times it is recommended to use enumeration columns instead of commonly used string types.

(1) The enumeration column can store some unique strings into a predefined collection.
(2) Mysql is very compact when storing enumerations and will be compressed into one or two bytes depending on the number of list values.
(3) Mysql will internally save the position of each value in the list as an integer, and save the "lookup table" of the "number-string" mapping relationship in the .frm file of the table.

Note: One surprising thing is that the enumeration fields are sorted by internally stored integers instead of defined strings.

Note: The worst thing about enumerations is that the string list is fixed. Adding or deleting strings must use ALTER TABLE. Therefore, for a series of strings that may change in the future, use enumerations and Not a good idea unless you accept that you can only add elements at the end of the list.

Note: There is some overhead for enumeration columns because MySQL saves each enumeration value as an integer and must do a lookup to convert it to a string.

Date and time type

For details on data types and usage, please see: http://www.php.cn/

Mysql has many types that can save date and time values, such as YEAR and DATE.

The minimum time granularity that Mysql can store is seconds (MariaDB supports microsecond-level event types). But MySQL can also perform ad hoc operations with microsecond level granularity.

Most of the time there are no alternatives to the type, so there is no question of what is the best choice.

The only question next is what needs to be done when saving the date and time.

DATETIME

(1) This type can save a wide range of values, from 1001 to 9999, with a precision of seconds. (2) DATETIME encapsulates time and date into integers in the format YYYYMMDDHHMMSS, regardless of time zone. (3) DATETIME uses 8 bytes of storage space.

TIMESTAMP

(1) The TIMESTAMP type stores the number of seconds since midnight on January 1, 1970, which is the same as the UNIX timestamp. (2) TIMESTAMP only uses 4 bytes of storage space, so its range is much smaller than DATETIME. (3) The value displayed by TIMESTAMP depends on the time zone.

Comparison between DATETIME and TIMESTAMP:

(1) By default, if the value of the first TIMESTAMP column is not specified when inserting, Mysql sets the value of this column to the current time. (This is a feature that DATETIME does not have) (2) When inserting a row of records, Mysql will also update the value of the first TIMESTAMP column by default. (3) The TIMESTAMP column defaults to NOT NULL, which is different from other data types.

Summarize

(1)除了特殊行为之外,通常也应该尽可能使用TIMESTAMP,因为它比DATETIME空间效率更高。 (2)一般来讲不建议把UNIX时间戳保存为整数值,这不会带来任何收益,用整数保存时间戳格式通常不方便处理。 (3)如果需呀存储比秒更小粒度的日期和时间值,可以使用BIGINT类型存储微秒级别的时间戳,或者使用DOUBLE存储秒之后的小数部分,也可以用MariaDB替代Mysql。

位数据类型

MySQL有少数几种存储类型使用紧凑的位存储数据。所有这些位类型,不管底层存储格式和处理方式如何,从技术上来说都是字符串类型的。

BIT

可以使用BIT列在一列中存储一个或多个true/false值。BIT(1)定义了一个包含单个位的字段,BIT(2)存储2个位,依次类推。BIT列的最大长度是64位。

如果想在一个bit的存储空间中存储一个true/false值,另一个方法是创建一个可以为空的CHAR(0)列。该列可以保存空值(NULL)或者长度为零的字符串(空字符串)。

SET

If you need to save many true/false values, consider merging these columns into a SET data type, which is represented internally by MySQL as a set of packed bits. This effectively utilizes storage space, and MySQL has functions like FIND_IN_SET() and FIELD() for easy use in queries. Its main disadvantage is that changing column definitions is expensive: ALTER TABLE is required, which is a very expensive operation for large tables. Generally speaking, it is also not possible to search through indexes on SET columns.

An alternative to SET is to use an integer to wrap a series of bits. For example, 8 bits can be packed into a TINYINT and used by bitwise operations. You can simplify this by defining named constants for each bit in your application.

Compared with SET, the main advantage of this method is that you can change the "enumeration" value represented by the field without using ALTER TABLE. The disadvantage is that the query statement is more difficult to write and more difficult to understand (when the 5th bit What does it mean when a bit is set? ). Some people are very comfortable with this approach, and some are not, so whether or not to adopt this technique comes down to personal preference.

Selecting the identifier (identifier)

It is very important to choose the appropriate data type for the identifier (identity column).

Generally speaking, it is more likely to use the identity column to compare with other values, or to find other columns through the identity column.

When choosing the type of identity column, you not only need to consider the storage type, but also how Mysql performs calculations and comparisons on this type.

Once you select a type, make sure to use the same type in all related tables.

On the premise that the range requirements of the value can be met and room for future growth is reserved, the smallest data type should be selected.

  • Integers are usually the best choice for identity columns because they are fast and can be used with AUTO_INCREMENT.

  • ENUM and SET are the worst choices;

  • If possible, avoid using strings as identity columns because they are very Consumes space and is generally slower than numeric classes.

Special type data

Some types of data are not directly consistent with built-in types. Timestamps with sub-second precision are one example.

Another example is that people usually use VARCHAR(15) to store IP addresses. However, they are actually 32-bit unsigned integers, not strings. Dividing the field into four segments using decimal points is for ease of reading. So IP addresses should be stored as unsigned integers. MySQL provides INET_ATON() and INET_NTOA() functions to convert between these two representation methods.

The above is the content of [MySQL] MySQL data type optimization. For more related content, please pay attention to the PHP Chinese website (www.php.cn)!


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn