The reasons why mysql inserts data slowly: 1. The insertion efficiency is reduced due to the main code, foreign code, and index; 2. Due to the use of a for loop to continuously execute this method to insert; 3. Failure to release in time search result.
Recommended: "mysql video tutorial" "java tutorial"
Recent projects require importing a large amount of data, and the insertion process also requires querying and inserting at the same time. The amount of data inserted is about 1 million. At first, I felt that 1 million pieces of data was not a large amount, so I plugged in and plugged in, had a meal, and when I came back, I found that after inserting more than 50w pieces of data, I could only insert 10 pieces per second. . I feel very strange, why does it become slower and slower the more I insert it? So I started to analyze the time loss of insertion, and came up with the following solution: (INNODB engine used by mysql)
1. Analyze whether it is composed of main code, foreign code, Reduced insertion efficiency caused by index
Main code: Since the main code is required for every table, it cannot be deleted. MySQL will automatically create an index for the main code. This index is a Btree index by default, so each time you insert data, you need to insert an additional Btree. This extra insertion time complexity is about log(n). This index cannot be deleted and therefore cannot be optimized. But every time it is inserted, due to the main code constraint, it is necessary to check whether the main code appears, which requires log(n). Can this overhead be reduced? The answer is yes. We canset the primary code to the auto-increment id AUTO_INCREMENT, so that the current auto-increment value will be automatically recorded in the database to ensure that no duplicate primary code will be inserted, thus avoiding the repeatability check of the primary code.
Foreign code: Since the foreign code exists in the insertion table of my project, the existence of the foreign code needs to be detected in another table each time it is inserted. This constraint is related to business logic and cannot be deleted casually. And this time cost should be a constant proportional to the size of the other table, and should not become slower with more inserts. So excluded.
Index: In order to reduce the time loss of Btree insertion, we can not create an index when creating the table, and insert all the data first. We will then add indexes to the table. This method indeed reduces time overhead.
After the above troubles, I tested it again and found that the speed was a little faster, but it started to slow down again after reaching 500,000. It seems that the crux of the problem is not here. So I continued to check the information and found a key problem:
2. Change single insertion to batch insertion (reference: click to open the link)
Since the executeUpdate(sql) method in Java only performs a SQL operation, it needs to call various resources in SQL. If you use a for loop to continuously execute this method to insert, it will undoubtedly be very expensive. Therefore, MySQL provides a solution: batch insert. That is to say, each sql is not submitted directly, but is first stored in the batch task set. When the size of the task set reaches the specified threshold, these sql are then sent to the mysql end. In the data scale of 1 million, I set the threshold to 10,000, that is, 10,000 SQL statements are submitted at one time. The final result is pretty good, the insertion speed is about 20 times faster than before. The batch insertion code is as follows:
public static void insertRelease() { Long begin = new Date().getTime(); String sql = "INSERT INTO tb_big_data (count, create_time, random) VALUES (?, SYSDATE(), ?)"; try { conn.setAutoCommit(false); PreparedStatement pst = conn.prepareStatement(sql); for (int i = 1; i <= 100; i++) { for (int k = 1; k <= 10000; k++) { pst.setLong(1, k * i); pst.setLong(2, k * i); pst.addBatch(); } pst.executeBatch(); conn.commit(); } pst.close(); conn.close(); } catch (SQLException e) { e.printStackTrace(); } Long end = new Date().getTime(); System.out.println("cast : " + (end - begin) / 1000 + " ms"); }
3. The VALUES of an UPDATE statement is followed by multiple (?,?,?,?)
I thought this method was similar to the one above at first, but after reading experiments done by others, I found that using this method to improve the batch insertion above can be 5 times faster. Later I discovered that the insert statements in the MySQL exported SQL file were also written like this. . That is UPDATE table_name (a1,a2) VALUES (xx,xx),(xx,xx),(xx,xx)... . That is to say, we need to splice a string ourselves in the background. Note that since the string is only inserted to the end, it can be inserted faster using StringBuffer. Here is the code:
public static void insert() { // 开时时间 Long begin = new Date().getTime(); // sql前缀 String prefix = "INSERT INTO tb_big_data (count, create_time, random) VALUES "; try { // 保存sql后缀 StringBuffer suffix = new StringBuffer(); // 设置事务为非自动提交 conn.setAutoCommit(false); // Statement st = conn.createStatement(); // 比起st,pst会更好些 PreparedStatement pst = conn.prepareStatement(""); // 外层循环,总提交事务次数 for (int i = 1; i <= 100; i++) { // 第次提交步长 for (int j = 1; j <= 10000; j++) { // 构建sql后缀 suffix.append("(" + j * i + ", SYSDATE(), " + i * j * Math.random() + "),"); } // 构建完整sql String sql = prefix + suffix.substring(0, suffix.length() - 1); // 添加执行sql pst.addBatch(sql); // 执行操作 pst.executeBatch(); // 提交事务 conn.commit(); // 清空上一次添加的数据 suffix = new StringBuffer(); } // 头等连接 pst.close(); conn.close(); } catch (SQLException e) { e.printStackTrace(); } // 结束时间 Long end = new Date().getTime(); // 耗时 System.out.println("cast : " + (end - begin) / 1000 + " ms"); }
做了以上的优化后,我发现了一个很蛋疼的问题。虽然一开始的插入速度的确快了几十倍,但是插入了50w条数据后,插入速度总是会一下突然变的非常慢。这种插入变慢是断崖式的突变,于是我冥思苦想,无意中打开了系统的资源管理器,一看发现:java占用的内存在不断飙升。 突然脑海中想到:是不是内存溢出了?
4.及时释放查询结果
在我的数据库查询语句中,使用到了pres=con.prepareStatement(sql)来保存一个sql执行状态,使用了resultSet=pres.executeQuery来保存查询结果集。而在边查边插的过程中,我的代码一直没有把查询的结果给释放,导致其不断的占用内存空间。当我的插入执行到50w条左右时,我的内存空间占满了,于是数据库的插入开始不以内存而以磁盘为介质了,因此插入的速度就开始变得十分的低下。因此,我在每次使用完pres和resultSet后,加入了释放其空间的语句:resultSet.close(); pres.close(); 。重新进行测试,果然,内存不飙升了,插入数据到50w后速度也不降低了。原来问题的本质在这里!
The above is the detailed content of Reasons why mysql inserts data slowly. For more information, please follow other related articles on the PHP Chinese website!