Home  >  Article  >  Database  >  How to optimize and migrate 20 million data in MySQL

How to optimize and migrate 20 million data in MySQL

怪我咯
怪我咯Original
2017-04-05 13:20:321617browse

Recently there is a data table with 20 million records that needs to be optimized and migrated. 2000W data is embarrassing for MySQL, because index## is created reasonably #The speed is still quite fast. No matter how much optimization is done, the speed will not be improved much. However, these data have a lot of redundant fields and error messages, which is very inconvenient for statistics and analysis, so I need to create one. Create a new table, take out the data from the old table one by one, optimize it, and put it back into the new table;

1. Clear redundant data and optimize the field structure In

2000W data, we already know the fields that can be used as

query conditions, so we create new fields separately for this part of the data, and reasonably change the field structure for regular data, such as ID cards, which are varchar. (18). After merging unimportant data, we will have a field with a text structure.

We need to calculate some related data, such as ID card type to obtain accurate gender and birth. Place, birthday, age.

2. Data migration

We take out an old data from the database, and then get the new data we want through calculation and processing. Insert new data into the new table. However, the following problems are encountered when obtaining new data.

    ##The amount of data is too large and cannot be obtained at one time (it is terrible to throw 2000W data into the memory);
  1. We can obtain it in batches through MySQL's limit syntax. For example, to obtain 50,000 each time, the SQL statement is as follows:

    select * from table_name limit 15000000,50000;

    This method can solve the problem of too large amount of data, but with the limit The first parameter is getting larger and larger, and the query speed will be horribly slow (the execution of the above SQL will take 35 seconds), so we started to optimize the SQL statement, and after optimization it became as follows:

    select * from table_name order by id desc limit 5000000,50000;
    .

    2000W data can be split by dichotomy. When 1000W data is executed, the data is reversed. After optimization, the SQL execution efficiency is significantly improved, from 35 seconds to 9 seconds;

    But it is still very slow. Time is life... Fortunately, we have an auto-increment ID (the first law of creating a data table, there must be an auto-increment field). The optimized SQl is as follows:

    1. select * from table_name where id>15000000 and id<15050000; 2. select * from table_name where id>15000000 limit 50000;

    For visual demonstration, I wrote two SQL with the same function. Compared with the first one, the limit of the second SQL will cause the index hit to be worse, and the efficiency will also be reduced. The execution time of the first SQL is 2 milliseconds, and the execution time of the second SQL is 5 milliseconds. (I took the average value). The query speed of each data dropped directly from 35 seconds to 2 milliseconds...

  2. The amount of data is too large and the data cannot be estimated, some special data Will cause the data import to fail;
  3. We have three options to store new data into the new table, as follows:

    1. Insert data one by one
    2. At first I would definitely think that this solution would not work, because every insertion will have a database IO operation. However, one advantage of this solution is that it can detect problematic data in time and continue execution after modification; using "binding

      variable

      " in Oracle can improve performance, and MySQL also provides "binding" variable' function. So without changing the logic, try to optimize the data storage speed. The code is as follows:

      public function actionTest(array $data)
      {
          $mysqli = new mysqli("192.168.1.106", "username", "password", "test");
          $sql = "insert into table_name(name,identity) values (?,?)";
      
          $stmt = $connection->prepare($sql);
          $name = "";
          $identity = "";
          //使用绑定变量
          $stmt->bind_param("si", $name, $identity);
          foreach($data as $val)
          {
              $name = $val[name];
              $identity = $val[card_id];
              //执行
              $stmt->execute();
          }
          $stmt->close();
      }
      The final effect is not very good. MySQL's "bind variable" does not bring obvious speed improvement, but it can effectively prevent SQL injection;

    3. Insert 50,000 pieces of data at a time;
    4. This is the solution I finally selected. First, it can detect problematic data in time, and second, the imported data is very stable. Just like supporting breakpoint resumption, you can see the effect at every step. When executing the script, you can also start writing analysis logic simultaneously;

    5. is assembled into a SQL file and finally imported uniformly;
    6. It is also good to assemble a large SQL file and finally import it through the tool that comes with MySQL. But if there is a problem with one of the SQL statements, you may need to rerun the script. Because it is very painful to modify a symbol in a 9G text file...

3. Summary

Through each This kind of optimization finally reduced the script execution time to less than 20 minutes. After optimization, the data quality has been highly guaranteed. Next time we will try to optimize and migrate 200 million data...


The above is the detailed content of How to optimize and migrate 20 million data in MySQL. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn