MySQL VARCHAR Lengths and UTF-8: Bytes versus Characters
When creating a VARCHAR field in a MySQL table, it's crucial to understand how the specified length is interpreted. In MySQL versions prior to 4.1, VARCHAR lengths were defined in bytes. However, from MySQL 4.1 onwards, lengths are counted in characters.
The VARCHAR(32) field in a UTF-8 table represents 32 characters, not 32 bytes. This is because UTF-8 is a variable-length encoding, where each character can occupy multiple bytes (up to 4 bytes).
The official MySQL documentation for version 5 states:
"MySQL interprets length specifications in character column definitions in character units. This applies to CHAR, VARCHAR, and the TEXT types."
However, the maximum length of a VARCHAR column is also influenced by UTF-8. In MySQL 5.0.3 and later, the effective maximum length is limited by the row size (65,535 bytes) and the character set used.
For example, since UTF-8 characters can require up to 3 bytes per character, a VARCHAR column using UTF-8 can be declared with a maximum of 21,844 characters. This is because 21,844 multiplied by 3 (bytes per character) is 65,532, leaving a buffer of 3 bytes for other column data within the maximum row size.
The above is the detailed content of How do VARCHAR lengths work in MySQL with UTF-8: Bytes or Characters?. For more information, please follow other related articles on the PHP Chinese website!