Heim  >  Artikel  >  php教程  >  MySQL大批量数据插入,PHP之for不断插入时出现缓慢的解决方案及

MySQL大批量数据插入,PHP之for不断插入时出现缓慢的解决方案及

WBOY
WBOYOriginal
2016-06-06 19:53:591719Durchsuche

很多时候为了测试数据库设计是否恰当,优化SQL语句,需要在表中插入大量的数据,怎么插入大量的数据就是个问题了。 最开始想到的办法就是写一个程序通过一个很大的循环来不停的插入,比如这样: 1 2 3 4 int i = LOOP_COUNT ; while ( i --= 0 ){ //insert d

很多时候为了测试数据库设计是否恰当,优化SQL语句,需要在表中插入大量的数据,怎么插入大量的数据就是个问题了。

最开始想到的办法就是写一个程序通过一个很大的循环来不停的插入,比如这样:

<span>1</span>
<span>2</span>
<span>3</span>
<span>4</span>
<code><span><span>int</span> <span>i</span> <span>=</span> <span>LOOP_COUNT</span><span>;</span>
</span><span><span>while</span><span>(</span><span>i</span><span>-->=</span><span>0</span><span>){</span>
</span><span>    <span>//insert data here.</span>
</span><span><span>}</span>
</span></code>

不过我在这么做的时候发现这样插入数据非常的慢,一秒钟插入的数据量还不到100条,于是想到不要一条一条的插入,而是通过

<span>1</span>
<code><span>INSERT</span> <span>INTO</span> <span>TABLE</span> <span>VALUES</span> <span>(),(),(),()...</span>
</code>

这样的方式来插入。于是修改程序为:

<span>1</span>
<span>2</span>
<span>3</span>
<span>4</span>
5
6
7
8
9
<code><span><span>int</span> <span>i</span> <span>=</span> <span>LOOP_COUNT</span><span>;</span>
</span><span><span>StringBuilder</span> <span>stringBuilder</span><span>;</span>
</span><span><span>while</span><span>(</span><span>i</span><span>-->=</span><span>0</span><span>){</span>
</span><span>    <span>if</span><span>(</span><span>LOOP_COUNT</span><span>!=</span><span>i</span> <span>&&</span> <span>i</span><span>%</span><span>5000</span><span>==</span><span>0</span><span>){</span>
</span><span>     <span>//通过insert values的方式插入这5000条数据并清空stringBuilder</span>
</span><span>    <span>}</span>
</span><span>    <span>stringBuilder</span><span>.</span><span>append</span><span>(</span><span>"(数据)"</span><span>);</span>
</span><span><span>}</span>
</span><span><span>//插入剩余的数据</span>
</span></code>

这样做的插入速度是上升了很多,不过如果想要插入大量的输入,比如上亿条,那么花费的时间还是非常长的。

查询MySQL的文档,发现了一个页面:LOAD DATA INFILE 光看这个名字,觉得有戏,于是仔细看了下。

官方对于这个命令的描述是:

<span>1</span>
<span>2</span>
<span>3</span>
<span>4</span>
5
6
7
8
9
10
11
12
13
14
15
16
<code><span><span>LOAD</span> <span>DATA</span> <span>[</span><span>LOW_PRIORITY</span> <span>|</span> <span>CONCURRENT</span><span>]</span> <span>[</span><span>LOCAL</span><span>]</span> <span>INFILE</span> <span>'</span><span>file_name</span><span>'</span>
</span><span>    <span>[</span><span>REPLACE</span> <span>|</span> <span>IGNORE</span><span>]</span>
</span><span>    <span>INTO</span> <span>TABLE</span> <span>tbl_name</span>
</span><span>    <span>[</span><span>CHARACTER</span> <span>SET</span> <span>charset_name</span><span>]</span>
</span><span>    <span>[{</span><span>FIELDS</span> <span>|</span> <span>COLUMNS</span><span>}</span>
</span><span>        <span>[</span><span>TERMINATED</span> <span>BY</span> <span>'</span><span>string</span><span>'</span><span>]</span>
</span><span>        <span>[[</span><span>OPTIONALLY</span><span>]</span> <span>ENCLOSED</span> <span>BY</span> <span>'</span><span>char</span><span>'</span><span>]</span>
</span><span>        <span>[</span><span>ESCAPED</span> <span>BY</span> <span>'</span><span>char</span><span>'</span><span>]</span>
</span><span>    <span>]</span>
</span><span>    <span>[</span><span>LINES</span>
</span><span>        <span>[</span><span>STARTING</span> <span>BY</span> <span>'</span><span>string</span><span>'</span><span>]</span>
</span><span>        <span>[</span><span>TERMINATED</span> <span>BY</span> <span>'</span><span>string</span><span>'</span><span>]</span>
</span><span>    <span>]</span>
</span><span>    <span>[</span><span>IGNORE</span> <span>number</span> <span>LINES</span><span>]</span>
</span><span>    <span>[(</span><span>col_name_or_user_var</span><span>,...)]</span>
</span><span>    <span>[</span><span>SET</span> <span>col_name</span> <span>=</span> <span>expr</span><span>,...]</span>
</span></code>

命令不复杂,具体的每个参数的意义和用法请看官方的解释 http://dev.mysql.com/doc/refman/5.5/en/load-data.html

那么现在做的就是生成数据了,我习惯用\t作为数据的分隔符、用\n作为一行的分隔符,所以生成数据的代码如下:

<span>1</span>
<span>2</span>
<span>3</span>
<span>4</span>
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
<code><span><span>long</span> <span>start</span> <span>=</span> <span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>/</span> <span>1000</span><span>;</span>
</span><span><span>try</span> <span>{</span>
</span><span>    <span>File</span> <span>file</span> <span>=</span> <span>new</span> <span>File</span><span>(</span><span>FILE</span><span>);</span>
</span><span>    <span>if</span> <span>(</span><span>file</span><span>.</span><span>exists</span><span>())</span> <span>{</span>
</span><span>        <span>file</span><span>.</span><span>delete</span><span>();</span>
</span><span>    <span>}</span>
</span><span>    <span>file</span><span>.</span><span>createNewFile</span><span>();</span>
</span><span>    <span>FileOutputStream</span> <span>outStream</span> <span>=</span> <span>new</span> <span>FileOutputStream</span><span>(</span><span>file</span><span>,</span> <span>true</span><span>);</span>
</span><span>    <span>StringBuilder</span> <span>builder</span> <span>=</span> <span>new</span> <span>StringBuilder</span><span>(</span><span>10240</span><span>);</span>
</span><span>    <span>DateFormat</span> <span>dateFormat</span> <span>=</span> <span>new</span> <span>SimpleDateFormat</span><span>(</span><span>DATE_FORMAT</span><span>);</span>
</span><span>    <span>Random</span> <span>rand</span> <span>=</span> <span>new</span> <span>Random</span><span>();</span>
</span><span>    <span>String</span> <span>tmpDate</span> <span>=</span> <span>dateFormat</span><span>.</span><span>format</span><span>(</span><span>new</span> <span>Date</span><span>());</span>
</span><span>    <span>Long</span> <span>tmpTimestamp</span> <span>=</span> <span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>/</span> <span>1000</span><span>;</span>
</span><span>    <span>int</span> <span>i</span> <span>=</span> <span>0</span><span>;</span>
</span><span>    <span>while</span> <span>(</span><span>i</span><span>++</span> <span> <span>LOOP</span><span>)</span> <span>{</span>
</span><span>        <span>if</span> <span>(</span><span>i</span> <span>></span> <span>0</span> <span>&&</span> <span>i</span> <span>%</span> <span>30000</span> <span>==</span> <span>0</span><span>)</span> <span>{</span>
</span><span>            <span>System</span><span>.</span><span>out</span><span>.</span><span>println</span><span>(</span><span>"write offset:"</span> <span>+</span> <span>i</span><span>);</span>
</span><span>            <span>outStream</span><span>.</span><span>write</span><span>(</span><span>builder</span><span>.</span><span>toString</span><span>().</span><span>getBytes</span><span>(</span><span>CHARCODE</span><span>));</span>
</span><span>            <span>builder</span> <span>=</span> <span>new</span> <span>StringBuilder</span><span>(</span><span>10240</span><span>);</span>
</span><span>        <span>}</span>
</span><span>        <span>if</span> <span>(</span><span>tmpTimestamp</span><span>.</span><span>compareTo</span><span>(</span><span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>/</span> <span>1000</span><span>)</span> <span>!=</span> <span>0</span><span>)</span> <span>{</span>
</span><span>            <span>tmpDate</span> <span>=</span> <span>dateFormat</span><span>.</span><span>format</span><span>(</span><span>new</span> <span>Date</span><span>());</span>
</span><span>            <span>tmpTimestamp</span> <span>=</span> <span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>/</span> <span>1000</span><span>;</span>
</span><span>        <span>}</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>tmpDate</span><span>);</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\t"</span><span>);</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>rand</span><span>.</span><span>nextInt</span><span>(</span><span>999</span><span>));</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\t"</span><span>);</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>Encrypt</span><span>.</span><span>md5</span><span>(</span><span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>+</span> <span>""</span> <span>+</span> <span>rand</span><span>.</span><span>nextInt</span><span>(</span><span>99999999</span><span>)));</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\t"</span><span>);</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>rand</span><span>.</span><span>nextInt</span><span>(</span><span>999</span><span>)</span> <span>%</span> <span>2</span> <span>==</span> <span>0</span> <span>?</span> <span>"AA."</span> <span>:</span> <span>"BB"</span><span>);</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\t"</span><span>);</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>rand</span><span>.</span><span>nextFloat</span><span>()</span> <span>*</span> <span>2000</span><span>);</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\t"</span><span>);</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>rand</span><span>.</span><span>nextInt</span><span>(</span><span>9</span><span>));</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\n"</span><span>);</span>
</span><span>    <span>}</span>
</span><span>    <span>System</span><span>.</span><span>out</span><span>.</span><span>println</span><span>(</span><span>"write data:"</span> <span>+</span> <span>i</span><span>);</span>
</span><span>    <span>outStream</span><span>.</span><span>write</span><span>(</span><span>builder</span><span>.</span><span>toString</span><span>().</span><span>getBytes</span><span>(</span><span>CHARCODE</span><span>));</span>
</span><span>    <span>outStream</span><span>.</span><span>close</span><span>();</span>
</span><span><span>}</span> <span>catch</span> <span>(</span><span>Exception</span> <span>e</span><span>)</span> <span>{</span>
</span><span>    <span>e</span><span>.</span><span>printStackTrace</span><span>();</span>
</span><span><span>}</span>
</span><span><span>System</span><span>.</span><span>out</span><span>.</span><span>println</span><span>(</span><span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>/</span> <span>1000</span> <span>-</span> <span>start</span><span>);</span>
</span></span></code>

这段代码会生成一个数据文件,每一行为一条记录,然后再使用上面提到的 LOAD DATA 来导入数据就可以了,我在公司的电脑下(2G内存+垃圾双核CPU,MySQL直接装在windows下,没任何优化,developer模式)每秒能达到近万条的插入速度,比其他方式都快很多。

另外如果想直接用GUI工具操作也可以,比如SQLYog中,右键要导入的表,选择Import – Import CSV Data Using Load Local.. 然后设置好编码、分隔符后就可以直接导入了。


Stellungnahme:
Der Inhalt dieses Artikels wird freiwillig von Internetnutzern beigesteuert und das Urheberrecht liegt beim ursprünglichen Autor. Diese Website übernimmt keine entsprechende rechtliche Verantwortung. Wenn Sie Inhalte finden, bei denen der Verdacht eines Plagiats oder einer Rechtsverletzung besteht, wenden Sie sich bitte an admin@php.cn