>php教程 >php手册 >MySQL大批量数据插入,PHP之for不断插入时出现缓慢的解决方案及

MySQL大批量数据插入,PHP之for不断插入时出现缓慢的解决方案及

WBOY
WBOY원래의
2016-06-06 19:53:591792검색

很多时候为了测试数据库设计是否恰当,优化SQL语句,需要在表中插入大量的数据,怎么插入大量的数据就是个问题了。 最开始想到的办法就是写一个程序通过一个很大的循环来不停的插入,比如这样: 1 2 3 4 int i = LOOP_COUNT ; while ( i --= 0 ){ //insert d

很多时候为了测试数据库设计是否恰当,优化SQL语句,需要在表中插入大量的数据,怎么插入大量的数据就是个问题了。

最开始想到的办法就是写一个程序通过一个很大的循环来不停的插入,比如这样:

<span>1</span>
<span>2</span>
<span>3</span>
<span>4</span>
<code><span><span>int</span> <span>i</span> <span>=</span> <span>LOOP_COUNT</span><span>;</span>
</span><span><span>while</span><span>(</span><span>i</span><span>-->=</span><span>0</span><span>){</span>
</span><span>    <span>//insert data here.</span>
</span><span><span>}</span>
</span></code>

不过我在这么做的时候发现这样插入数据非常的慢,一秒钟插入的数据量还不到100条,于是想到不要一条一条的插入,而是通过

<span>1</span>
<code><span>INSERT</span> <span>INTO</span> <span>TABLE</span> <span>VALUES</span> <span>(),(),(),()...</span>
</code>

这样的方式来插入。于是修改程序为:

<span>1</span>
<span>2</span>
<span>3</span>
<span>4</span>
5
6
7
8
9
<code><span><span>int</span> <span>i</span> <span>=</span> <span>LOOP_COUNT</span><span>;</span>
</span><span><span>StringBuilder</span> <span>stringBuilder</span><span>;</span>
</span><span><span>while</span><span>(</span><span>i</span><span>-->=</span><span>0</span><span>){</span>
</span><span>    <span>if</span><span>(</span><span>LOOP_COUNT</span><span>!=</span><span>i</span> <span>&&</span> <span>i</span><span>%</span><span>5000</span><span>==</span><span>0</span><span>){</span>
</span><span>     <span>//通过insert values的方式插入这5000条数据并清空stringBuilder</span>
</span><span>    <span>}</span>
</span><span>    <span>stringBuilder</span><span>.</span><span>append</span><span>(</span><span>"(数据)"</span><span>);</span>
</span><span><span>}</span>
</span><span><span>//插入剩余的数据</span>
</span></code>

这样做的插入速度是上升了很多,不过如果想要插入大量的输入,比如上亿条,那么花费的时间还是非常长的。

查询MySQL的文档,发现了一个页面:LOAD DATA INFILE 光看这个名字,觉得有戏,于是仔细看了下。

官方对于这个命令的描述是:

<span>1</span>
<span>2</span>
<span>3</span>
<span>4</span>
5
6
7
8
9
10
11
12
13
14
15
16
<code><span><span>LOAD</span> <span>DATA</span> <span>[</span><span>LOW_PRIORITY</span> <span>|</span> <span>CONCURRENT</span><span>]</span> <span>[</span><span>LOCAL</span><span>]</span> <span>INFILE</span> <span>'</span><span>file_name</span><span>'</span>
</span><span>    <span>[</span><span>REPLACE</span> <span>|</span> <span>IGNORE</span><span>]</span>
</span><span>    <span>INTO</span> <span>TABLE</span> <span>tbl_name</span>
</span><span>    <span>[</span><span>CHARACTER</span> <span>SET</span> <span>charset_name</span><span>]</span>
</span><span>    <span>[{</span><span>FIELDS</span> <span>|</span> <span>COLUMNS</span><span>}</span>
</span><span>        <span>[</span><span>TERMINATED</span> <span>BY</span> <span>'</span><span>string</span><span>'</span><span>]</span>
</span><span>        <span>[[</span><span>OPTIONALLY</span><span>]</span> <span>ENCLOSED</span> <span>BY</span> <span>'</span><span>char</span><span>'</span><span>]</span>
</span><span>        <span>[</span><span>ESCAPED</span> <span>BY</span> <span>'</span><span>char</span><span>'</span><span>]</span>
</span><span>    <span>]</span>
</span><span>    <span>[</span><span>LINES</span>
</span><span>        <span>[</span><span>STARTING</span> <span>BY</span> <span>'</span><span>string</span><span>'</span><span>]</span>
</span><span>        <span>[</span><span>TERMINATED</span> <span>BY</span> <span>'</span><span>string</span><span>'</span><span>]</span>
</span><span>    <span>]</span>
</span><span>    <span>[</span><span>IGNORE</span> <span>number</span> <span>LINES</span><span>]</span>
</span><span>    <span>[(</span><span>col_name_or_user_var</span><span>,...)]</span>
</span><span>    <span>[</span><span>SET</span> <span>col_name</span> <span>=</span> <span>expr</span><span>,...]</span>
</span></code>

命令不复杂,具体的每个参数的意义和用法请看官方的解释 http://dev.mysql.com/doc/refman/5.5/en/load-data.html

那么现在做的就是生成数据了,我习惯用\t作为数据的分隔符、用\n作为一行的分隔符,所以生成数据的代码如下:

<span>1</span>
<span>2</span>
<span>3</span>
<span>4</span>
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
<code><span><span>long</span> <span>start</span> <span>=</span> <span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>/</span> <span>1000</span><span>;</span>
</span><span><span>try</span> <span>{</span>
</span><span>    <span>File</span> <span>file</span> <span>=</span> <span>new</span> <span>File</span><span>(</span><span>FILE</span><span>);</span>
</span><span>    <span>if</span> <span>(</span><span>file</span><span>.</span><span>exists</span><span>())</span> <span>{</span>
</span><span>        <span>file</span><span>.</span><span>delete</span><span>();</span>
</span><span>    <span>}</span>
</span><span>    <span>file</span><span>.</span><span>createNewFile</span><span>();</span>
</span><span>    <span>FileOutputStream</span> <span>outStream</span> <span>=</span> <span>new</span> <span>FileOutputStream</span><span>(</span><span>file</span><span>,</span> <span>true</span><span>);</span>
</span><span>    <span>StringBuilder</span> <span>builder</span> <span>=</span> <span>new</span> <span>StringBuilder</span><span>(</span><span>10240</span><span>);</span>
</span><span>    <span>DateFormat</span> <span>dateFormat</span> <span>=</span> <span>new</span> <span>SimpleDateFormat</span><span>(</span><span>DATE_FORMAT</span><span>);</span>
</span><span>    <span>Random</span> <span>rand</span> <span>=</span> <span>new</span> <span>Random</span><span>();</span>
</span><span>    <span>String</span> <span>tmpDate</span> <span>=</span> <span>dateFormat</span><span>.</span><span>format</span><span>(</span><span>new</span> <span>Date</span><span>());</span>
</span><span>    <span>Long</span> <span>tmpTimestamp</span> <span>=</span> <span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>/</span> <span>1000</span><span>;</span>
</span><span>    <span>int</span> <span>i</span> <span>=</span> <span>0</span><span>;</span>
</span><span>    <span>while</span> <span>(</span><span>i</span><span>++</span> <span> <span>LOOP</span><span>)</span> <span>{</span>
</span><span>        <span>if</span> <span>(</span><span>i</span> <span>></span> <span>0</span> <span>&&</span> <span>i</span> <span>%</span> <span>30000</span> <span>==</span> <span>0</span><span>)</span> <span>{</span>
</span><span>            <span>System</span><span>.</span><span>out</span><span>.</span><span>println</span><span>(</span><span>"write offset:"</span> <span>+</span> <span>i</span><span>);</span>
</span><span>            <span>outStream</span><span>.</span><span>write</span><span>(</span><span>builder</span><span>.</span><span>toString</span><span>().</span><span>getBytes</span><span>(</span><span>CHARCODE</span><span>));</span>
</span><span>            <span>builder</span> <span>=</span> <span>new</span> <span>StringBuilder</span><span>(</span><span>10240</span><span>);</span>
</span><span>        <span>}</span>
</span><span>        <span>if</span> <span>(</span><span>tmpTimestamp</span><span>.</span><span>compareTo</span><span>(</span><span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>/</span> <span>1000</span><span>)</span> <span>!=</span> <span>0</span><span>)</span> <span>{</span>
</span><span>            <span>tmpDate</span> <span>=</span> <span>dateFormat</span><span>.</span><span>format</span><span>(</span><span>new</span> <span>Date</span><span>());</span>
</span><span>            <span>tmpTimestamp</span> <span>=</span> <span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>/</span> <span>1000</span><span>;</span>
</span><span>        <span>}</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>tmpDate</span><span>);</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\t"</span><span>);</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>rand</span><span>.</span><span>nextInt</span><span>(</span><span>999</span><span>));</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\t"</span><span>);</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>Encrypt</span><span>.</span><span>md5</span><span>(</span><span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>+</span> <span>""</span> <span>+</span> <span>rand</span><span>.</span><span>nextInt</span><span>(</span><span>99999999</span><span>)));</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\t"</span><span>);</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>rand</span><span>.</span><span>nextInt</span><span>(</span><span>999</span><span>)</span> <span>%</span> <span>2</span> <span>==</span> <span>0</span> <span>?</span> <span>"AA."</span> <span>:</span> <span>"BB"</span><span>);</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\t"</span><span>);</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>rand</span><span>.</span><span>nextFloat</span><span>()</span> <span>*</span> <span>2000</span><span>);</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\t"</span><span>);</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>rand</span><span>.</span><span>nextInt</span><span>(</span><span>9</span><span>));</span>
</span><span>        <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\n"</span><span>);</span>
</span><span>    <span>}</span>
</span><span>    <span>System</span><span>.</span><span>out</span><span>.</span><span>println</span><span>(</span><span>"write data:"</span> <span>+</span> <span>i</span><span>);</span>
</span><span>    <span>outStream</span><span>.</span><span>write</span><span>(</span><span>builder</span><span>.</span><span>toString</span><span>().</span><span>getBytes</span><span>(</span><span>CHARCODE</span><span>));</span>
</span><span>    <span>outStream</span><span>.</span><span>close</span><span>();</span>
</span><span><span>}</span> <span>catch</span> <span>(</span><span>Exception</span> <span>e</span><span>)</span> <span>{</span>
</span><span>    <span>e</span><span>.</span><span>printStackTrace</span><span>();</span>
</span><span><span>}</span>
</span><span><span>System</span><span>.</span><span>out</span><span>.</span><span>println</span><span>(</span><span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>/</span> <span>1000</span> <span>-</span> <span>start</span><span>);</span>
</span></span></code>

这段代码会生成一个数据文件,每一行为一条记录,然后再使用上面提到的 LOAD DATA 来导入数据就可以了,我在公司的电脑下(2G内存+垃圾双核CPU,MySQL直接装在windows下,没任何优化,developer模式)每秒能达到近万条的插入速度,比其他方式都快很多。

另外如果想直接用GUI工具操作也可以,比如SQLYog中,右键要导入的表,选择Import – Import CSV Data Using Load Local.. 然后设置好编码、分隔符后就可以直接导入了。


성명:
본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.