An introduction to how to optimize batch insertion of data in MYSQL
I have also seen several other methods on the Internet, such as preprocessing SQL and batch submission. So how do these methods perform? This article will make a comparison of these methods
1. What problems did we encounter
In standard SQL, we usually write the following SQL insert statement.
INSERT INTO TBL_TEST (id) VALUES(1);
Obviously, this method is also feasible in MYSQL. But when we need to insert data in batches, such statements will cause performance problems. For example, if you need to insert 100,000 pieces of data, you will need 100,000 insert statements. Each statement needs to be submitted to the relational engine for parsing and optimization before it reaches the storage engine to do the actual insertion work.
It is precisely because of the performance bottleneck problem that the official MYSQL documentation also mentions the use of batch insertion, that is, inserting multiple values in an INSERT statement. That is,
INSERT INTO TBL_TEST (id) VALUES (1), (2), (3)
This approach can indeed speed up batch insertion. The reason is not difficult to understand. Since there are fewer INSERT statements submitted to the server, the network load Less, the most important thing is that the time for parsing and optimization seems to increase, but in fact the number of data rows affected is actually much more. So the overall performance is improved. According to some opinions on the Internet, this method can be improved dozens of times.
However, I have also seen several other methods on the Internet, such as preprocessing SQL and batch submission. So how do these methods perform? This article will make a comparison of these methods.
2. Comparing environments and methods
My environment is relatively difficult, basically a backward virtual machine. There are only 2 cores and 6G of memory. The operating system is SUSI Linux and the MYSQL version is 5.6.15.
It is conceivable that the performance of this machine has caused my TPS to be very low, so all the data below are meaningless, but the trend is different, which can show the performance trend of the entire insertion.
Due to business characteristics, the table we use is very large, with a total of 195 fields, and it is full (each field is filled in, including varchar), and the size is slightly less than 4KB. Generally speaking, , the size of a record is also 3KB.
Because based on our actual experience, we are very sure that performance can be greatly improved by submitting a large number of INSERT statements in one transaction. Therefore, all the tests below are based on the practice of submitting every 5,000 records inserted.
Finally, it should be noted that all the following tests are conducted using the MYSQL C API and use the INNODB storage engine.
3. Comparison methods
Ideal type test (1) - method comparison
Purpose: to find out the most suitable under ideal circumstances Insertion mechanism
Key methods:
1. Each incoming/thread inserts in primary key order
2. Compare different insertion methods
3. Compare the impact of different numbers of inputs/threads on insertion
* "Ordinary method" refers to the situation where only one VALUE is inserted in an INSERT sentence.
* "Preprocessed SQL" refers to the use of preprocessed MYSQL C API.
* "Multiple table value SQL (10 records)" is a situation where 10 records are inserted using an INSERT statement. Why 10? Later verification tells us that this has the highest performance.
Conclusion, obviously, from the trend of the three methods, the multi-table value SQL (10) method is the most efficient.
Ideal test (2) - Comparison of the number of SQL entries with multiple table values
Obviously, as the amount of data increases, In this case, it is most efficient to insert 10 records for each INSERT statement.
Ideal test (3) - Comparison of number of connections
Conclusion: When connecting and operating with 2 times the number of CPU cores, the performance is the highest
General test - test based on our business volume
Purpose: Is the best insertion mechanism suitable for ordinary transaction situations?
Key methods:
1. Simulate production data (each record is about 3KB)
2. Insert the primary key of each thread out of order
Obviously, if the primary key is inserted out of order, the performance will plummet. This is actually consistent with the phenomenon shown in the internal implementation principle of INNODB. But it is still certain that the case of multi-table value SQL (10 entries) is optimal.
Stress Test
Purpose: Is the best insertion mechanism suitable for extreme trading situations?
Key methods:
1. 将数据行的每一个字段填满(每条记录约为4KB)
2. 每个线程主键乱序插入
结果和我们之前的规律类似,性能出现了极端下降。并且这里验证了随着记录的增大(可能已经超过了一个page的大小,毕竟还有slot和page head信息占据空间),会有page split等现象,性能会下降。
四、结论
根据上面的测试,以及我们对INNODB的了解,我们可以得到如下的结论。
•采用顺序主键策略(例如自增主键,或者修改业务逻辑,让插入的记录尽可能顺序主键)
•采用多值表(10条)插入方式最为合适
•将进程/线程数控制在2倍CPU数目相对合适
五、附录
我发现网上很少有完整的针对MYSQL 预处理SQL语句的例子。这里给出一个简单的例子。
--建表语句 CREATE TABLE tbl_test ( pri_key varchar(30), nor_char char(30), max_num DECIMAL(8,0), long_num DECIMAL(12, 0), rec_upd_ts TIMESTAMP );
c代码
#include <string.h> #include <iostream> #include <mysql.h> #include <sys/time.h> #include <sstream> #include <vector> using namespace std; #define STRING_LEN 30 char pri_key [STRING_LEN]= "123456"; char nor_char [STRING_LEN]= "abcabc"; char rec_upd_ts [STRING_LEN]= "NOW()"; bool SubTimeval(timeval &result, timeval &begin, timeval &end) { if ( begin.tv_sec>end.tv_sec ) return false; if ( (begin.tv_sec == end.tv_sec) && (begin.tv_usec > end.tv_usec) ) return false; result.tv_sec = ( end.tv_sec - begin.tv_sec ); result.tv_usec = ( end.tv_usec - begin.tv_usec ); if (result.tv_usec<0) { result.tv_sec--; result.tv_usec+=1000000;} return true; } int main(int argc, char ** argv) { INT32 ret = 0; char errmsg[200] = {0}; int sqlCode = 0; timeval tBegin, tEnd, tDiff; const char* precompile_statment2 = "INSERT INTO `tbl_test`( pri_key, nor_char, max_num, long_num, rec_upd_ts) VALUES(?, ?, ?, ?, ?)"; MYSQL conn; mysql_init(&conn); if (mysql_real_connect(&conn, "127.0.0.1", "dba", "abcdefg", "TESTDB", 3306, NULL, 0) == NULL) { fprintf(stderr, " mysql_real_connect, 2 failed\n"); exit(0); } MYSQL_STMT *stmt = mysql_stmt_init(&conn); if (!stmt) { fprintf(stderr, " mysql_stmt_init, 2 failed\n"); fprintf(stderr, " %s\n", mysql_stmt_error(stmt)); exit(0); } if (mysql_stmt_prepare(stmt, precompile_statment2, strlen(precompile_statment2))) { fprintf(stderr, " mysql_stmt_prepare, 2 failed\n"); fprintf(stderr, " %s\n", mysql_stmt_error(stmt)); exit(0); } int i = 0; int max_num = 3; const int FIELD_NUM = 5; while (i < max_num) { //MYSQL_BIND bind[196] = {0}; MYSQL_BIND bind[FIELD_NUM]; memset(bind, 0, FIELD_NUM * sizeof(MYSQL_BIND)); unsigned long str_length = strlen(pri_key); bind[0].buffer_type = MYSQL_TYPE_STRING; bind[0].buffer = (char *)pri_key; bind[0].buffer_length = STRING_LEN; bind[0].is_null = 0; bind[0].length = &str_length; unsigned long str_length_nor = strlen(nor_char); bind[1].buffer_type = MYSQL_TYPE_STRING; bind[1].buffer = (char *)nor_char; bind[1].buffer_length = STRING_LEN; bind[1].is_null = 0; bind[1].length = &str_length_nor; bind[2].buffer_type = MYSQL_TYPE_LONG; bind[2].buffer = (char*)&max_num; bind[2].is_null = 0; bind[2].length = 0; bind[3].buffer_type = MYSQL_TYPE_LONG; bind[3].buffer = (char*)&max_num; bind[3].is_null = 0; bind[3].length = 0; MYSQL_TIME ts; ts.year= 2002; ts.month= 02; ts.day= 03; ts.hour= 10; ts.minute= 45; ts.second= 20; unsigned long str_length_time = strlen(rec_upd_ts); bind[4].buffer_type = MYSQL_TYPE_TIMESTAMP; bind[4].buffer = (char *)&ts; bind[4].is_null = 0; bind[4].length = 0; if (mysql_stmt_bind_param(stmt, bind)) { fprintf(stderr, " mysql_stmt_bind_param, 2 failed\n"); fprintf(stderr, " %s\n", mysql_stmt_error(stmt)); exit(0); } cout << "before execute\n"; if (mysql_stmt_execute(stmt)) { fprintf(stderr, " mysql_stmt_execute, 2 failed\n"); fprintf(stderr, " %s\n", mysql_stmt_error(stmt)); exit(0); } cout << "after execute\n"; i++; } mysql_commit(&conn); mysql_stmt_close(stmt); return 0; }
The above is the detailed content of An introduction to how to optimize batch insertion of data in MYSQL. For more information, please follow other related articles on the PHP Chinese website!

InnoDB uses redologs and undologs to ensure data consistency and reliability. 1.redologs record data page modification to ensure crash recovery and transaction persistence. 2.undologs records the original data value and supports transaction rollback and MVCC.

Key metrics for EXPLAIN commands include type, key, rows, and Extra. 1) The type reflects the access type of the query. The higher the value, the higher the efficiency, such as const is better than ALL. 2) The key displays the index used, and NULL indicates no index. 3) rows estimates the number of scanned rows, affecting query performance. 4) Extra provides additional information, such as Usingfilesort prompts that it needs to be optimized.

Usingtemporary indicates that the need to create temporary tables in MySQL queries, which are commonly found in ORDERBY using DISTINCT, GROUPBY, or non-indexed columns. You can avoid the occurrence of indexes and rewrite queries and improve query performance. Specifically, when Usingtemporary appears in EXPLAIN output, it means that MySQL needs to create temporary tables to handle queries. This usually occurs when: 1) deduplication or grouping when using DISTINCT or GROUPBY; 2) sort when ORDERBY contains non-index columns; 3) use complex subquery or join operations. Optimization methods include: 1) ORDERBY and GROUPB

MySQL/InnoDB supports four transaction isolation levels: ReadUncommitted, ReadCommitted, RepeatableRead and Serializable. 1.ReadUncommitted allows reading of uncommitted data, which may cause dirty reading. 2. ReadCommitted avoids dirty reading, but non-repeatable reading may occur. 3.RepeatableRead is the default level, avoiding dirty reading and non-repeatable reading, but phantom reading may occur. 4. Serializable avoids all concurrency problems but reduces concurrency. Choosing the appropriate isolation level requires balancing data consistency and performance requirements.

MySQL is suitable for web applications and content management systems and is popular for its open source, high performance and ease of use. 1) Compared with PostgreSQL, MySQL performs better in simple queries and high concurrent read operations. 2) Compared with Oracle, MySQL is more popular among small and medium-sized enterprises because of its open source and low cost. 3) Compared with Microsoft SQL Server, MySQL is more suitable for cross-platform applications. 4) Unlike MongoDB, MySQL is more suitable for structured data and transaction processing.

MySQL index cardinality has a significant impact on query performance: 1. High cardinality index can more effectively narrow the data range and improve query efficiency; 2. Low cardinality index may lead to full table scanning and reduce query performance; 3. In joint index, high cardinality sequences should be placed in front to optimize query.

The MySQL learning path includes basic knowledge, core concepts, usage examples, and optimization techniques. 1) Understand basic concepts such as tables, rows, columns, and SQL queries. 2) Learn the definition, working principles and advantages of MySQL. 3) Master basic CRUD operations and advanced usage, such as indexes and stored procedures. 4) Familiar with common error debugging and performance optimization suggestions, such as rational use of indexes and optimization queries. Through these steps, you will have a full grasp of the use and optimization of MySQL.

MySQL's real-world applications include basic database design and complex query optimization. 1) Basic usage: used to store and manage user data, such as inserting, querying, updating and deleting user information. 2) Advanced usage: Handle complex business logic, such as order and inventory management of e-commerce platforms. 3) Performance optimization: Improve performance by rationally using indexes, partition tables and query caches.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 Chinese version
Chinese version, very easy to use

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

Dreamweaver Mac version
Visual web development tools

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.