Home  >  Article  >  Database  >  How to compress large text storage in MySQL

How to compress large text storage in MySQL

青灯夜游
青灯夜游forward
2023-02-02 20:23:262283browse

How to compress large text storage in MySQL

As mentioned earlier, we have a cloud document project whose snapshot content is directly stored in db, which is a large text storage. Most of the content fields of the document snapshot are kb level, and some Even to the MB level. At present, CDN caching optimization has been carried out for data reading ( Static resource caching tool - CDN). Data writing and storage still need to be optimized. If it can be done in large text through some compression algorithms Compressed storage can save DB storage space to a great extent and relieve DB I/O pressure.

Stock data analysis

select
  table_name as '表名',
  table_rows as '记录数',
  truncate(data_length/1024/1024, 2) as '数据容量(MB)',
  truncate(index_length/1024/1024, 2) as '索引容量(MB)',
  truncate(DATA_FREE/1024/1024, 2) as '碎片占用(MB)'
from
  information_schema.tables
where
  table_schema=${数据库名}
order by
  data_length desc, index_length desc;

How to compress large text storage in MySQLHow to compress large text storage in MySQL

Related content introduction

What happens if the innodb engine page data exceeds 16kb manage?

We all know that the default page block size of innodb is 16k. If the length of a row of data in the table exceeds 16k, row overflow will occur, and the overflowed row is stored in another place (uncompress blob page). Since innodb uses clustered index to store data, that is, B Tree structure, there are at least two rows of data in each page block, otherwise the meaning of B Tree will be lost, so the maximum length of a row of data is limited to 8k (Large fields will store 768 bytes of data in the data page, and the remaining data overflows to another page. The data page also has 20 bytes to record the address of the overflow page)

  • For dynamic format For example, if the size of the data stored in the large object field (text/blob) is less than 40 bytes, all of it is placed on the data page. In the remaining scenarios, the data page only retains a 20-byte pointer pointing to the overflow page. In this scenario, if the data stored in each large object field is less than 40 bytes, it will have the same effect as varchar(40).
  • innodb-row-format-dynamic:dev.mysql.com/doc/refman/…

Linux Sparse Files & Holes

  • Sparse File: Sparse files are basically the same as other ordinary files. The difference is that some data in the file are all 0, and this part of the data does not occupy disk space
  • File holes: file displacement The amount can be greater than the actual length of the file (bytes that are in the file but have not been written are set to 0). Whether the hole takes up disk space is determined by the operating system
    • How to compress large text storage in MySQL

The hole part of the file does not occupy disk space, and the disk space occupied by the file is still continuous.

Compression scheme provided by innodb

Page compression

Applicable scenarios: Due to the large amount of data and insufficient disk space, the load is mainly reflected in IO, and the server's CPU has a relatively large margin.

1) COMPRESS page compression

Related documents: dev.mysql.com/doc/refman/…

  • The page compression function provided before MySQL5.7 version, specify ROW_FORMAT = COMPRESS when creating the table, and set the size of the compressed page through KEY_BLOCK_SIZE
  • There are design flaws, it is possible It will lead to obvious performance degradation, and its original design is to improve performance and introduce the concept of "log is data"
    • For the data modification of the compressed page, the page itself will not be modified directly, but the log will be modified. Stored in this page, this is indeed more friendly to data changes. There is no need to compress/decompress every modification.
      • How to compress large text storage in MySQL
    • For data reading , the compressed data cannot be read directly, so this algorithm will retain a decompressed 16K page in the memory for data reading
      • How to compress large text storage in MySQL
    • This results in a page that may have two versions (compressed version and non-compressed version) in the buffer pool, causing a very serious problem, that is, the number of pages that can be cached in the buffer pool is greatly reduced. , which may cause a great decrease in database performance

2) TPC (Transparent Page Compression)

Related documents: dev. mysql.com/doc/refman/…

  • Working principle: When writing a page, use the specified compression algorithm to compress the page, and write it to the disk after compression, through the hole punching mechanism Release the empty space from the end of the page (requires the operating system to support the hole feature)
  • ALTER TABLE xxx COMPRESSION = ZLIB The TPC page compression function can be enabled, but this is only for subsequent additions. To compress a large amount of data, if you expect to compress the entire table, you need to execute OPTIMIZE TABLE xxx
  • Implementation process: A compressed page is a 16K non-volatile page in the buffer pool The compressed page will be compressed only when the data is flushed. The remaining space after compression will be filled with 0x00. The file system's hole punch is used to crop the file and release the sparse space occupied by 0x00

How to compress large text storage in MySQL

  • Although TPC is good, it relies on the Hole Punch feature of the operating system, and the trimmed file size needs to be aligned with the file system block size ( 4K). That is, if the compressed page size is 9K, then the actual occupied space is 12K

Column Compression

MySQL currently does not have a direct solution for column compression, but there is a curved way to save the country. , which is to use the compression and decompression functions provided by MySQL at the business layer to perform compression and decompression operations on columns. That is, if you need to compress a certain column, call the COMPRESS function to compress the contents of that column when writing, and use the UNCOMPRESS function to compress the compressed data when reading. Unzip.

  • Usage scenario: For the situation where the data length of some columns in the table is relatively large, usually varchar, text, blob, json and other data types
  • Related functions:
    • Compression function: COMPRESS()
    • Decompression function: UNCOMPRESS()
    • String length function: LENGTH()
    • Uncompressed string length function: UNCOMPRESSED_LENGTH()
  • Test:
    • Insert data: insert into xxx (content) values ​​(compress('xxx....'))
    • Read compressed data: select c_id, uncompressed_length(c_content) uncompress_len, length(c_content) compress_len from xxx

How to compress large text storage in MySQL##

为什么innodb提供的都是基于页面的压缩技术?

  • 记录压缩:每次读写记录的时候,都要进行压缩或解压,过度依赖CPU的计算能力,性能相对会比较差
  • 表空间压缩:压缩效率高,但要求表空间文件是静态不增长的,这对于我们大部分的场景都是不适用的
  • 页面压缩:既能提升效率,又能在性能中取得一定的平衡

总结

  • 对于一些性能不敏感的业务表,如日志表、监控表、告警表等,这些表只期望对存储空间进行优化,对性能的影响不是很关注,可以使用COMPRESS页压缩
  • 对于一些比较核心的表,则比较推荐使用TPC压缩
  • 列压缩过度依赖CPU,性能方面会稍差,且对业务有一定的改造成本,不够灵活,需要评估影响范围,做好切换的方案。好处是可以由业务端决定哪些数据需要压缩,并控制解压操作
  • 对页面进行压缩,在业务侧不用进行什么改动,对线上完全透明,压缩方案也非常成熟

为什么要进行数据压缩?

  • 由于处理器和高速缓存存储器的速度提高超过了磁盘存储设备,因此很多时候工作负载都是受限于磁盘I/O。数据压缩可以使数据占用更小的空间,可以节省磁盘I/O、减少网络I/O从而提高吞吐量,虽然会牺牲部分CPU资源作为代价
  • 对于OLTP系统,经常进行update、delete、insert等操作,通过压缩表能够减少存储占用和IO消耗
  • 压缩其实是一种平衡,并不一定是为了提升数据库的性能,这种平衡取决于解压缩带来的收益和开销之间的一种权衡,但压缩对存储空间来说,收益无疑是很大的

简单测试

innodb透明页压缩(TPC)

参考:dev.mysql.com/doc/refman/…

测试数据

1)创建表

  • create table table_origin ( ...... ) comment '测试原表';
  • create table table_compression_zlib ( ...... ) comment '测试压缩表_zlib' compression = 'zlib';
  • create table table_compression_lz4 ( ...... ) comment '测试压缩表_lz4' compression = 'lz4';

2)往表中写入10w行测试数据

压缩率

SELECT NAME, FS_BLOCK_SIZE, FILE_SIZE, ALLOCATED_SIZE
FROM information_schema.INNODB_TABLESPACES WHERE NAME like 'test_compress%';

How to compress large text storage in MySQL

  • FS_BLOCK_SIZE:文件系统块大小,也就是打孔使用的单位大小
  • FILE_SIZE:文件的表观大小,表示文件的最大大小,未压缩
  • ALLOCATED_SIZE:文件的实际大小,即磁盘上分配的空间量

压缩率:

  • zlib:1320636416/3489660928 = 37.8%
  • lz4:1566949376/3489660928 = 45%

耗时

  • 循环插入10w条记录
    • 原表:918275 ms
    • zlib:878540 ms
    • lz4:875259 ms
  • 循环查询10w条记录
    • 原表:332519 ms
    • zlib:373387 ms
    • lz4:343501 ms

【相关推荐:mysql视频教程

The above is the detailed content of How to compress large text storage in MySQL. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:juejin.cn. If there is any infringement, please contact admin@php.cn delete