search
HomeDatabaseMysql TutorialAn introduction to how to optimize batch insertion of data in MYSQL

I have also seen several other methods on the Internet, such as preprocessing SQL and batch submission. So how do these methods perform? This article will make a comparison of these methods

1. What problems did we encounter

In standard SQL, we usually write the following SQL insert statement.


INSERT INTO TBL_TEST (id) VALUES(1);

Obviously, this method is also feasible in MYSQL. But when we need to insert data in batches, such statements will cause performance problems. For example, if you need to insert 100,000 pieces of data, you will need 100,000 insert statements. Each statement needs to be submitted to the relational engine for parsing and optimization before it reaches the storage engine to do the actual insertion work.

It is precisely because of the performance bottleneck problem that the official MYSQL documentation also mentions the use of batch insertion, that is, inserting multiple values ​​​​in an INSERT statement. That is,


INSERT INTO TBL_TEST (id) VALUES (1), (2), (3)

This approach can indeed speed up batch insertion. The reason is not difficult to understand. Since there are fewer INSERT statements submitted to the server, the network load Less, the most important thing is that the time for parsing and optimization seems to increase, but in fact the number of data rows affected is actually much more. So the overall performance is improved. According to some opinions on the Internet, this method can be improved dozens of times.

However, I have also seen several other methods on the Internet, such as preprocessing SQL and batch submission. So how do these methods perform? This article will make a comparison of these methods.

2. Comparing environments and methods
My environment is relatively difficult, basically a backward virtual machine. There are only 2 cores and 6G of memory. The operating system is SUSI Linux and the MYSQL version is 5.6.15.

It is conceivable that the performance of this machine has caused my TPS to be very low, so all the data below are meaningless, but the trend is different, which can show the performance trend of the entire insertion.

Due to business characteristics, the table we use is very large, with a total of 195 fields, and it is full (each field is filled in, including varchar), and the size is slightly less than 4KB. Generally speaking, , the size of a record is also 3KB.

Because based on our actual experience, we are very sure that performance can be greatly improved by submitting a large number of INSERT statements in one transaction. Therefore, all the tests below are based on the practice of submitting every 5,000 records inserted.

Finally, it should be noted that all the following tests are conducted using the MYSQL C API and use the INNODB storage engine.

3. Comparison methods

Ideal type test (1) - method comparison

Purpose: to find out the most suitable under ideal circumstances Insertion mechanism

Key methods:

1. Each incoming/thread inserts in primary key order

2. Compare different insertion methods

3. Compare the impact of different numbers of inputs/threads on insertion

An introduction to how to optimize batch insertion of data in MYSQL

* "Ordinary method" refers to the situation where only one VALUE is inserted in an INSERT sentence.

* "Preprocessed SQL" refers to the use of preprocessed MYSQL C API.

* "Multiple table value SQL (10 records)" is a situation where 10 records are inserted using an INSERT statement. Why 10? Later verification tells us that this has the highest performance.

Conclusion, obviously, from the trend of the three methods, the multi-table value SQL (10) method is the most efficient.

Ideal test (2) - Comparison of the number of SQL entries with multiple table values

An introduction to how to optimize batch insertion of data in MYSQL

Obviously, as the amount of data increases, In this case, it is most efficient to insert 10 records for each INSERT statement.

Ideal test (3) - Comparison of number of connections

An introduction to how to optimize batch insertion of data in MYSQL

An introduction to how to optimize batch insertion of data in MYSQL

Conclusion: When connecting and operating with 2 times the number of CPU cores, the performance is the highest

General test - test based on our business volume

Purpose: Is the best insertion mechanism suitable for ordinary transaction situations?

Key methods:

1. Simulate production data (each record is about 3KB)

2. Insert the primary key of each thread out of order

An introduction to how to optimize batch insertion of data in MYSQL

Obviously, if the primary key is inserted out of order, the performance will plummet. This is actually consistent with the phenomenon shown in the internal implementation principle of INNODB. But it is still certain that the case of multi-table value SQL (10 entries) is optimal.

Stress Test

Purpose: Is the best insertion mechanism suitable for extreme trading situations?

Key methods:

1. 将数据行的每一个字段填满(每条记录约为4KB)

2. 每个线程主键乱序插入

An introduction to how to optimize batch insertion of data in MYSQL

结果和我们之前的规律类似,性能出现了极端下降。并且这里验证了随着记录的增大(可能已经超过了一个page的大小,毕竟还有slot和page head信息占据空间),会有page split等现象,性能会下降。

四、结论

根据上面的测试,以及我们对INNODB的了解,我们可以得到如下的结论。

•采用顺序主键策略(例如自增主键,或者修改业务逻辑,让插入的记录尽可能顺序主键)

•采用多值表(10条)插入方式最为合适

•将进程/线程数控制在2倍CPU数目相对合适

五、附录

我发现网上很少有完整的针对MYSQL 预处理SQL语句的例子。这里给出一个简单的例子。


--建表语句
CREATE TABLE tbl_test 
(
  pri_key varchar(30), 
  nor_char char(30), 
  max_num DECIMAL(8,0), 
  long_num DECIMAL(12, 0), 
  rec_upd_ts TIMESTAMP
);

c代码


#include <string.h>
#include <iostream>
#include <mysql.h>
#include <sys/time.h>
#include <sstream>
#include <vector>
 
using namespace std;
 
#define STRING_LEN 30
  
char    pri_key            [STRING_LEN]= "123456"; 
char    nor_char           [STRING_LEN]= "abcabc"; 
char    rec_upd_ts          [STRING_LEN]= "NOW()"; 
 
bool SubTimeval(timeval &result, timeval &begin, timeval &end)
{
  if ( begin.tv_sec>end.tv_sec ) return false;
 
  if ( (begin.tv_sec == end.tv_sec) && (begin.tv_usec > end.tv_usec) )  
    return  false;
 
  result.tv_sec = ( end.tv_sec - begin.tv_sec );  
  result.tv_usec = ( end.tv_usec - begin.tv_usec );  
 
  if (result.tv_usec<0) {
    result.tv_sec--;
    result.tv_usec+=1000000;} 
  return true;
}
 
int main(int argc, char ** argv)
{
  INT32 ret = 0;
  char errmsg[200] = {0};
  int sqlCode = 0;
 
  timeval tBegin, tEnd, tDiff;
   
  const char* precompile_statment2 = "INSERT INTO `tbl_test`( pri_key, nor_char, max_num, long_num, rec_upd_ts) VALUES(?, ?, ?, ?, ?)";
   
  MYSQL conn;
  mysql_init(&conn);
   
  if (mysql_real_connect(&conn, "127.0.0.1", "dba", "abcdefg", "TESTDB", 3306, NULL, 0) == NULL)
  {
    fprintf(stderr, " mysql_real_connect, 2 failed\n");
    exit(0);
  }
   
  MYSQL_STMT  *stmt = mysql_stmt_init(&conn);
  if (!stmt)
  {
   fprintf(stderr, " mysql_stmt_init, 2 failed\n");
   fprintf(stderr, " %s\n", mysql_stmt_error(stmt));
   exit(0);
  }
   
  if (mysql_stmt_prepare(stmt, precompile_statment2, strlen(precompile_statment2)))
  {
   fprintf(stderr, " mysql_stmt_prepare, 2 failed\n");
   fprintf(stderr, " %s\n", mysql_stmt_error(stmt));
   exit(0);
  }
   
  int i = 0; 
  int max_num = 3;
  const int FIELD_NUM = 5;
  while (i < max_num)
  {
    //MYSQL_BIND  bind[196] = {0};
    MYSQL_BIND  bind[FIELD_NUM];
    memset(bind, 0, FIELD_NUM * sizeof(MYSQL_BIND));
   
    unsigned long str_length = strlen(pri_key);
    bind[0].buffer_type  = MYSQL_TYPE_STRING;
    bind[0].buffer    = (char *)pri_key;
    bind[0].buffer_length = STRING_LEN;
    bind[0].is_null    = 0;
    bind[0].length    = &str_length;
     
    unsigned long str_length_nor = strlen(nor_char);
    bind[1].buffer_type  = MYSQL_TYPE_STRING;
    bind[1].buffer    = (char *)nor_char;
    bind[1].buffer_length = STRING_LEN;
    bind[1].is_null    = 0;
    bind[1].length    = &str_length_nor;
     
    bind[2].buffer_type  = MYSQL_TYPE_LONG;
    bind[2].buffer    = (char*)&max_num;
    bind[2].is_null    = 0;
    bind[2].length    = 0;
     
    bind[3].buffer_type  = MYSQL_TYPE_LONG;
    bind[3].buffer    = (char*)&max_num;
    bind[3].is_null    = 0;
    bind[3].length    = 0;
     
    MYSQL_TIME ts;
    ts.year= 2002;
    ts.month= 02;
    ts.day= 03;
    ts.hour= 10;
    ts.minute= 45;
    ts.second= 20;
     
    unsigned long str_length_time = strlen(rec_upd_ts);
    bind[4].buffer_type  = MYSQL_TYPE_TIMESTAMP;
    bind[4].buffer    = (char *)&ts;
    bind[4].is_null    = 0;
    bind[4].length    = 0;
     
    if (mysql_stmt_bind_param(stmt, bind))
    {
      fprintf(stderr, " mysql_stmt_bind_param, 2 failed\n");
      fprintf(stderr, " %s\n", mysql_stmt_error(stmt));
      exit(0);
    }
     
    cout << "before execute\n";
    if (mysql_stmt_execute(stmt))
    {
     fprintf(stderr, " mysql_stmt_execute, 2 failed\n");
     fprintf(stderr, " %s\n", mysql_stmt_error(stmt));
     exit(0);
    }
    cout << "after execute\n";
     
    i++;
  }
   
  mysql_commit(&conn);
   
  mysql_stmt_close(stmt);
 
  return 0;  
}

The above is the detailed content of An introduction to how to optimize batch insertion of data in MYSQL. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Explain the role of InnoDB redo logs and undo logs.Explain the role of InnoDB redo logs and undo logs.Apr 15, 2025 am 12:16 AM

InnoDB uses redologs and undologs to ensure data consistency and reliability. 1.redologs record data page modification to ensure crash recovery and transaction persistence. 2.undologs records the original data value and supports transaction rollback and MVCC.

What are the key metrics to look for in an EXPLAIN output (type, key, rows, Extra)?What are the key metrics to look for in an EXPLAIN output (type, key, rows, Extra)?Apr 15, 2025 am 12:15 AM

Key metrics for EXPLAIN commands include type, key, rows, and Extra. 1) The type reflects the access type of the query. The higher the value, the higher the efficiency, such as const is better than ALL. 2) The key displays the index used, and NULL indicates no index. 3) rows estimates the number of scanned rows, affecting query performance. 4) Extra provides additional information, such as Usingfilesort prompts that it needs to be optimized.

What is the Using temporary status in EXPLAIN and how to avoid it?What is the Using temporary status in EXPLAIN and how to avoid it?Apr 15, 2025 am 12:14 AM

Usingtemporary indicates that the need to create temporary tables in MySQL queries, which are commonly found in ORDERBY using DISTINCT, GROUPBY, or non-indexed columns. You can avoid the occurrence of indexes and rewrite queries and improve query performance. Specifically, when Usingtemporary appears in EXPLAIN output, it means that MySQL needs to create temporary tables to handle queries. This usually occurs when: 1) deduplication or grouping when using DISTINCT or GROUPBY; 2) sort when ORDERBY contains non-index columns; 3) use complex subquery or join operations. Optimization methods include: 1) ORDERBY and GROUPB

Describe the different SQL transaction isolation levels (Read Uncommitted, Read Committed, Repeatable Read, Serializable) and their implications in MySQL/InnoDB.Describe the different SQL transaction isolation levels (Read Uncommitted, Read Committed, Repeatable Read, Serializable) and their implications in MySQL/InnoDB.Apr 15, 2025 am 12:11 AM

MySQL/InnoDB supports four transaction isolation levels: ReadUncommitted, ReadCommitted, RepeatableRead and Serializable. 1.ReadUncommitted allows reading of uncommitted data, which may cause dirty reading. 2. ReadCommitted avoids dirty reading, but non-repeatable reading may occur. 3.RepeatableRead is the default level, avoiding dirty reading and non-repeatable reading, but phantom reading may occur. 4. Serializable avoids all concurrency problems but reduces concurrency. Choosing the appropriate isolation level requires balancing data consistency and performance requirements.

MySQL vs. Other Databases: Comparing the OptionsMySQL vs. Other Databases: Comparing the OptionsApr 15, 2025 am 12:08 AM

MySQL is suitable for web applications and content management systems and is popular for its open source, high performance and ease of use. 1) Compared with PostgreSQL, MySQL performs better in simple queries and high concurrent read operations. 2) Compared with Oracle, MySQL is more popular among small and medium-sized enterprises because of its open source and low cost. 3) Compared with Microsoft SQL Server, MySQL is more suitable for cross-platform applications. 4) Unlike MongoDB, MySQL is more suitable for structured data and transaction processing.

How does MySQL index cardinality affect query performance?How does MySQL index cardinality affect query performance?Apr 14, 2025 am 12:18 AM

MySQL index cardinality has a significant impact on query performance: 1. High cardinality index can more effectively narrow the data range and improve query efficiency; 2. Low cardinality index may lead to full table scanning and reduce query performance; 3. In joint index, high cardinality sequences should be placed in front to optimize query.

MySQL: Resources and Tutorials for New UsersMySQL: Resources and Tutorials for New UsersApr 14, 2025 am 12:16 AM

The MySQL learning path includes basic knowledge, core concepts, usage examples, and optimization techniques. 1) Understand basic concepts such as tables, rows, columns, and SQL queries. 2) Learn the definition, working principles and advantages of MySQL. 3) Master basic CRUD operations and advanced usage, such as indexes and stored procedures. 4) Familiar with common error debugging and performance optimization suggestions, such as rational use of indexes and optimization queries. Through these steps, you will have a full grasp of the use and optimization of MySQL.

Real-World MySQL: Examples and Use CasesReal-World MySQL: Examples and Use CasesApr 14, 2025 am 12:15 AM

MySQL's real-world applications include basic database design and complex query optimization. 1) Basic usage: used to store and manage user data, such as inserting, querying, updating and deleting user information. 2) Advanced usage: Handle complex business logic, such as order and inventory management of e-commerce platforms. 3) Performance optimization: Improve performance by rationally using indexes, partition tables and query caches.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.