Home >Database >Mysql Tutorial >In-depth understanding of MySQL data row overflow
This article brings you an in-depth understanding of MySQL data row overflow. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.
1. Let’s start with common errors
Let’s start with a common SQL error message:
I believe that you must have encountered this type of error many times, especially for BGs like OMG, which have content production as their main core work. In the storage of content lines, the data must be large. It's an unavoidable topic. The "bigness" of the data here goes far beyond the large storage space occupied. It also includes the large storage space of a single (table) field, the long data retention time, the large data redundancy, the large volume caused by the inconsistency of hot and cold data, and the access problems. The peak value changes significantly with hot spots, and complex logical processing leads to amplified data storage pressure, etc. Back to the error issue, let’s first take a look at the structure of this table:
Seeing this, I believe everyone will have different opinions. We will not compare the advantages and disadvantages of various processing methods here. We will only describe the two processing methods that are more frequently used.
According to the error guidance, change the two large varchar (22288) into text and blob
According to the business characteristics, reduce the size of varchar Storage length, or split into multiple small vachar and char
according to the rules. These two processing methods also have their own advantages and disadvantages. Changing the field to text or blob not only increases With the increased data storage capacity, only prefix or full-text indexing can be used for the index page of this field. If the business side stores data in json format, it is a good choice for 5.7 to support the json data type, which can be queried for a single subcategory. and output. Similarly, if it is reduced and split, it will depend more on the business scenarios and logical requirements. The logic used by the business needs to be modified, and the engineering volume also needs to be evaluated.
Then let’s deeply analyze some confusing concepts about the limit size “65535”.
1. "65535" is not the maximum limit of N in a single varchar(N), but the total bytes of non-large field type fields in the entire table.
---------------------------------------- -------------------------------------------------- --
Every table (regardless of storage engine) has a maximum row size of 65,535 bytes. Storage engines may place additional constraints on this limit, reducing the effective maximum row size.
-- -------------------------------------------------- --------------------------------------------------
2. Different The character set will have an impact on the maximum storage capacity of the field. For example, UTF8 characters require 3 bytes to store. For VARCHAR (255) CHARACTER SET UTF8 columns, it will occupy 255×3 =765 bytes. Therefore, the table cannot contain more than 65,535/765=85 columns. GBK is double-byte and so on.
3. For variable-length columns, when evaluating the field size, the number of bytes storing the actual length of the column must also be considered. For example, the VARCHAR (255) CHARACTER SET UTF8 column requires an additional two bytes to store the value length information, so the column requires up to 767 bytes to store. In fact, it can store up to 65533 bytes, and the remaining two bytes store the length. information.
4. BLOB, TEXT, and JSON columns are different from fields such as varchar and char. The column length information is stored independently of the row length and can reach 65535 bytes of real storage.
5. Defining NULL columns will Lower the maximum number of columns allowed.
InnoDB table, NULL and NOT NULL column storage sizes are the same
MyISAM table, NULL column requires additional space to record whether its value is NULL. Each NULL requires one extra bit (rounded to the nearest byte). The maximum row length is calculated as follows:
row length = 1 (sum of column lengths) (number of NULL columns delete_flag 7)/8 (number of variable-length columns)
Static table, delete_flag = 1, the static table records a bit in the row to identify whether the row has been deleted.
Dynamic table, delete_flag = 0, this flag is stored at the beginning of the dynamic row, the dynamic table can be specified according to
6. For InnoDB table, NULL and NOT NULL column storage sizes are the same
7. InnoDB allows a single table to have up to 1,000 columns
8. Varchar primary key only supports no more than 767 bytes or 768/2=384 Double bytes or 767/3=255 three-byte fields, while GBK is double bytes and UTF8 is three bytes
9. Different engines have different index restrictions
The length of each column in innodb cannot be greater than 767 bytes; the sum of the lengths of all index columns cannot be greater than 3072 bytes
myisam The length of each column cannot be greater than 1000 bytes, the sum of the lengths of all index columns cannot be greater than 1000 bytes
Let’s talk about the business fault encountered today. A large number of the following errors occurred in the online industry, causing the program to be unable to write data:
According to the prompts and normal thinking, our first reaction is that the business has the following problems:
The fields in the set table structure exceed the limit
The data length inserted in a certain field exceeds the max value set for the changed field
Check next The database table structure of the business is as follows:
The first reason was quickly eliminated, because the business error was not reported when the table was created. If the sum of the non-large fields in the table is 65535, an error will occur when creating the table, and the business will only report the error when writing, and through the library table structure, we can also find that a large number of mediumblob type fields , the sum of non-large fields is much less than 65535.
Then, according to the specific SQL provided by the business, the non-large fields of appversion, datadata, elt_stamp, and id do not exceed the limit. The mediumblob type field can store up to 16M, and the business data is far from reaching this. magnitude. According to the error message, I changed the non-large fields such as appversion, datadata, elt_stamp, and id to blob type, but it still cannot be solved. (According to the previous analysis, it must not be the source of the problem).
After calming down, I found that there is actually another detail that has been ignored. The failure rate of the business is not 100%, which means there are still successful requests. By comparing the successful and failed SQL, I found that there is indeed a difference in the amount of data. Or a mediumblob type field. So the first thing that comes to mind now is whether the max_allowed_packet parameter has been adjusted down. Yes, a single request exceeds the size and is rejected. I checked the configured value (as shown below). The configured size is 1G, and the data length of SQL is far away. It's not that big, so this reason is ruled out.
After checking this, we have basically eliminated several common problems, and then look at the limitations of another parameter: innodb_page_size, this one The default value is 16K, with two rows of data per page, so each row has a maximum of 8k data.
After checking the data table Row_format is Compact, we can infer that the cause of the problem should be The default approach storage format of innodb will store the first 864 bytes of each blob field in page, so if the blob exceeds a certain number, the size of a single row will exceed 8k, so an error will be reported. By comparing the successful and failed SQL of business writing, this inference has also been applied. So how to solve this problem now?
Business splits the table, and large fields are stored in separate tables
Solve the problem by solving the storage method of Row_format
Because The number of stored items in a single business table is not large, and the business logic is not suitable for splitting, so we need to solve this problem on Row_format.
The Barracuda file format has two new row record formats, Compressed and Dynamic. The two new formats use a complete row overflow method for storing BLOB data. Only 20-byte pointers are stored in the data page, and the actual data is stored in the BLOB Page. Another feature of the Compressed row record format is that the data stored in it will be compressed using the zlib algorithm.
The related change operations are relatively simple:
1, Modify MySQL global variables:
SET GLOBAL innodb_file_format='Barracuda';
2. Smoothly change the attributes of the original table:
ROW_FORMAT=COMPRESSED
Through this case, we can extract two points worthy of in-depth study:
1. About innodb_page_size
Starting from MySQL 5.6, innodb_page_size can set the Innodb data page to 8K, 4K, and the default is 16K. This parameter must be added to my.cnf during initialization. If the table has been created and then modified, an error will be reported when starting MySQL.
So what should I do if I need to modify this value before version 5.6? The only way is to do some work on the source code, and then rebuild MySQL.
UNIV_PAGE_SIZE is the data page size. The default is 16K. This value can be set to the power of 2. This value can be set to 4k, 8k, 16k, 32K, 64K. After changing UNIV_PAGE_SIZE at the same time, you need to change UNIV_PAGE_SIZE_SHIFT. The value is UNIV_PAGE_SIZE to the power of 2, so the settings of the data pages are as follows:
Let’s talk about the impact of setting innodb_page_size to different values on mysql performance. The tested table contains 100 million records and the file size is 30G.
①Read and write scenario (50% read and 50% write)
16K, the pressure on the CPU is small, the average is 20%
8K, the CPU pressure is 30%~ 40%, but the select throughput is higher than 16K
②Read scenario (100% read)
The difference between 16K and 8K is not obvious
InnoDB Buffer The Pool management page itself also has a cost. The more Pages, the longer the management list will be of the same size. Therefore, when our data row itself is relatively long (large block insertion), larger pages are more conducive to improving speed, because more rows can be put into one page, and the size of each IO write is larger and less IOPS writes more data. When the line length exceeds 8K, if it is a 16K page, some string types will be forced to be converted to TEXT, and the main body of the string will be transferred to the extension page, which will require one more IO to read the column, and a larger page will Larger row lengths are supported, and 64K pages can support row lengths of approximately 32K without using extension pages. However, if it is random reading and writing of short line lengths, it is not suitable to use such a large page, which will cause IO efficiency to decrease, and large IO can only read a small part.
2. About Row_format
Innodb storage engine saves records in the form of rows. Before InnoDB version 1.0.x, the InnoDB storage engine provided two formats: Compact and Redundant to store row record data. The innodb_plugin in MySQL 5.1 introduces a new file format: Barracuda, which has two new row formats: compressed and dynamic. And compact and redundant are collectively called Antelope. You can use the command SHOW TABLE STATUS LIKE 'table_name'; to view the row format used by the current table, where the row_format column indicates the type of row record structure currently used.
In MySQL 5.6 version, the default Compact, msyql 5.7.9 and later versions, the default row format is determined by the innodb_default_row_format variable, the default value is DYNAMIC, you can also specify ROW_FORMAT=DYNAMIC when creating table (this can be used Dynamically adjust the storage format of the table). If you want to modify the row mode of an existing table to compressed or dynamic, you must first set the file format to Barracuda (set global innodb_file_format=Barracuda;). Then use ALTER TABLE tablename ROW_FORMAT=COMPRESSED; to modify it to take effect, otherwise the modification will be invalid and there will be no prompt.
If the blob column value length > 768 bytes, then the first 768 bytes are still on the data page, and the remaining ones are placed on the overflow page (off-page), as shown below:
The blob or variable-length field types mentioned above include blob, text, and varchar. The varchar column value length is also greater than a certain number N. Overflow pages will be stored. Under the latin1 character set, the N value can be calculated as follows: The default block size of innodb is 16kb. Since the innodb storage engine table is an index-organized table, the leaf nodes at the bottom of the tree are two-way linked lists, so each page has at least There should be two rows of records, which determines that innodb cannot store more than 8k rows of data, minus the number of bytes occupied by other column values, which is approximately equal to N.
Use complete row overflow for blob, that is, the clustered index record (data page) only retains a 20-byte pointer, pointing to the overflow segment address where it is actually stored:
dynamic row format, whether column storage is placed on the off-page page mainly depends on the row size, it will The longer column is placed off-page until the data page can store the next two rows. TEXT/BLOB columns are always stored in the data page when
compressed is similar to dynamic in physical structure, but the data rows of the table are compressed and stored using the zlib algorithm. Used when there are many long blob column types, it can reduce the use of off-page and reduce storage space (about 50%, please refer to the previous "[Database Evaluation Report] Issue 3: InnoDB, TokuDB Compression Performance" report Test results), but requires a higher CPU. The buffer pool may store both compressed and uncompressed versions of the data, so it also takes up more memory.
Finally, I referred to "High Performance MySQL" and gave some suggestions on using variable long field types such as BLOB:
① Large fields may waste a lot of space in InnoDB. For example, if the stored field value is only one byte more than the row requires, the entire page will be used to store the remaining bytes, wasting most of the page's space. Likewise, if you have a value that only slightly exceeds the 32-page size, 96 pages will actually be used.
②A value that is too long may prevent the index from being used as a WHERE condition in the query, resulting in very slow execution. MySQL needs to read all columns before applying the WHERE condition, so it may cause MySQL to ask InnoDB to read a lot of extended storage, then check the WHERE condition and discard all unnecessary data.
③There are many large fields in a table. It is best to combine them and store them in one column. It is better to have all large fields share an extended storage space than to have each field have its own page.
④ Use COMPRESS() to compress large fields and then save them as BLOBs, or compress them in the application before sending them to MySQL. You can gain significant space advantages and performance gains.
⑤ Extended storage disables adaptive hashing because the entire length of the column needs to be completely compared to find out whether the data is correct.
The above is the detailed content of In-depth understanding of MySQL data row overflow. For more information, please follow other related articles on the PHP Chinese website!