很多时候为了测试数据库设计是否恰当,优化SQL语句,需要在表中插入大量的数据,怎么插入大量的数据就是个问题了。 最开始想到的办法就是写一个程序通过一个很大的循环来不停的插入,比如这样: 1 2 3 4 int i = LOOP_COUNT ; while ( i --= 0 ){ //insert d
很多时候为了测试数据库设计是否恰当,优化SQL语句,需要在表中插入大量的数据,怎么插入大量的数据就是个问题了。
最开始想到的办法就是写一个程序通过一个很大的循环来不停的插入,比如这样:
<span>1</span> <span>2</span> <span>3</span> <span>4</span> |
<code><span><span>int</span> <span>i</span> <span>=</span> <span>LOOP_COUNT</span><span>;</span> </span><span><span>while</span><span>(</span><span>i</span><span>-->=</span><span>0</span><span>){</span> </span><span> <span>//insert data here.</span> </span><span><span>}</span> </span></code> |
不过我在这么做的时候发现这样插入数据非常的慢,一秒钟插入的数据量还不到100条,于是想到不要一条一条的插入,而是通过
<span>1</span> |
<code><span>INSERT</span> <span>INTO</span> <span>TABLE</span> <span>VALUES</span> <span>(),(),(),()...</span> </code> |
这样的方式来插入。于是修改程序为:
<span>1</span> <span>2</span> <span>3</span> <span>4</span> 5 6 7 8 9 |
<code><span><span>int</span> <span>i</span> <span>=</span> <span>LOOP_COUNT</span><span>;</span> </span><span><span>StringBuilder</span> <span>stringBuilder</span><span>;</span> </span><span><span>while</span><span>(</span><span>i</span><span>-->=</span><span>0</span><span>){</span> </span><span> <span>if</span><span>(</span><span>LOOP_COUNT</span><span>!=</span><span>i</span> <span>&&</span> <span>i</span><span>%</span><span>5000</span><span>==</span><span>0</span><span>){</span> </span><span> <span>//通过insert values的方式插入这5000条数据并清空stringBuilder</span> </span><span> <span>}</span> </span><span> <span>stringBuilder</span><span>.</span><span>append</span><span>(</span><span>"(数据)"</span><span>);</span> </span><span><span>}</span> </span><span><span>//插入剩余的数据</span> </span></code> |
这样做的插入速度是上升了很多,不过如果想要插入大量的输入,比如上亿条,那么花费的时间还是非常长的。
查询MySQL的文档,发现了一个页面:LOAD DATA INFILE 光看这个名字,觉得有戏,于是仔细看了下。
官方对于这个命令的描述是:
<span>1</span> <span>2</span> <span>3</span> <span>4</span> 5 6 7 8 9 10 11 12 13 14 15 16 |
<code><span><span>LOAD</span> <span>DATA</span> <span>[</span><span>LOW_PRIORITY</span> <span>|</span> <span>CONCURRENT</span><span>]</span> <span>[</span><span>LOCAL</span><span>]</span> <span>INFILE</span> <span>'</span><span>file_name</span><span>'</span> </span><span> <span>[</span><span>REPLACE</span> <span>|</span> <span>IGNORE</span><span>]</span> </span><span> <span>INTO</span> <span>TABLE</span> <span>tbl_name</span> </span><span> <span>[</span><span>CHARACTER</span> <span>SET</span> <span>charset_name</span><span>]</span> </span><span> <span>[{</span><span>FIELDS</span> <span>|</span> <span>COLUMNS</span><span>}</span> </span><span> <span>[</span><span>TERMINATED</span> <span>BY</span> <span>'</span><span>string</span><span>'</span><span>]</span> </span><span> <span>[[</span><span>OPTIONALLY</span><span>]</span> <span>ENCLOSED</span> <span>BY</span> <span>'</span><span>char</span><span>'</span><span>]</span> </span><span> <span>[</span><span>ESCAPED</span> <span>BY</span> <span>'</span><span>char</span><span>'</span><span>]</span> </span><span> <span>]</span> </span><span> <span>[</span><span>LINES</span> </span><span> <span>[</span><span>STARTING</span> <span>BY</span> <span>'</span><span>string</span><span>'</span><span>]</span> </span><span> <span>[</span><span>TERMINATED</span> <span>BY</span> <span>'</span><span>string</span><span>'</span><span>]</span> </span><span> <span>]</span> </span><span> <span>[</span><span>IGNORE</span> <span>number</span> <span>LINES</span><span>]</span> </span><span> <span>[(</span><span>col_name_or_user_var</span><span>,...)]</span> </span><span> <span>[</span><span>SET</span> <span>col_name</span> <span>=</span> <span>expr</span><span>,...]</span> </span></code> |
命令不复杂,具体的每个参数的意义和用法请看官方的解释 http://dev.mysql.com/doc/refman/5.5/en/load-data.html
那么现在做的就是生成数据了,我习惯用t作为数据的分隔符、用n作为一行的分隔符,所以生成数据的代码如下:
<span>1</span> <span>2</span> <span>3</span> <span>4</span> 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
<code><span><span>long</span> <span>start</span> <span>=</span> <span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>/</span> <span>1000</span><span>;</span> </span><span><span>try</span> <span>{</span> </span><span> <span>File</span> <span>file</span> <span>=</span> <span>new</span> <span>File</span><span>(</span><span>FILE</span><span>);</span> </span><span> <span>if</span> <span>(</span><span>file</span><span>.</span><span>exists</span><span>())</span> <span>{</span> </span><span> <span>file</span><span>.</span><span>delete</span><span>();</span> </span><span> <span>}</span> </span><span> <span>file</span><span>.</span><span>createNewFile</span><span>();</span> </span><span> <span>FileOutputStream</span> <span>outStream</span> <span>=</span> <span>new</span> <span>FileOutputStream</span><span>(</span><span>file</span><span>,</span> <span>true</span><span>);</span> </span><span> <span>StringBuilder</span> <span>builder</span> <span>=</span> <span>new</span> <span>StringBuilder</span><span>(</span><span>10240</span><span>);</span> </span><span> <span>DateFormat</span> <span>dateFormat</span> <span>=</span> <span>new</span> <span>SimpleDateFormat</span><span>(</span><span>DATE_FORMAT</span><span>);</span> </span><span> <span>Random</span> <span>rand</span> <span>=</span> <span>new</span> <span>Random</span><span>();</span> </span><span> <span>String</span> <span>tmpDate</span> <span>=</span> <span>dateFormat</span><span>.</span><span>format</span><span>(</span><span>new</span> <span>Date</span><span>());</span> </span><span> <span>Long</span> <span>tmpTimestamp</span> <span>=</span> <span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>/</span> <span>1000</span><span>;</span> </span><span> <span>int</span> <span>i</span> <span>=</span> <span>0</span><span>;</span> </span><span> <span>while</span> <span>(</span><span>i</span><span>++</span> <span><</span> <span>LOOP</span><span>)</span> <span>{</span> </span><span> <span>if</span> <span>(</span><span>i</span> <span>></span> <span>0</span> <span>&&</span> <span>i</span> <span>%</span> <span>30000</span> <span>==</span> <span>0</span><span>)</span> <span>{</span> </span><span> <span>System</span><span>.</span><span>out</span><span>.</span><span>println</span><span>(</span><span>"write offset:"</span> <span>+</span> <span>i</span><span>);</span> </span><span> <span>outStream</span><span>.</span><span>write</span><span>(</span><span>builder</span><span>.</span><span>toString</span><span>().</span><span>getBytes</span><span>(</span><span>CHARCODE</span><span>));</span> </span><span> <span>builder</span> <span>=</span> <span>new</span> <span>StringBuilder</span><span>(</span><span>10240</span><span>);</span> </span><span> <span>}</span> </span><span> <span>if</span> <span>(</span><span>tmpTimestamp</span><span>.</span><span>compareTo</span><span>(</span><span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>/</span> <span>1000</span><span>)</span> <span>!=</span> <span>0</span><span>)</span> <span>{</span> </span><span> <span>tmpDate</span> <span>=</span> <span>dateFormat</span><span>.</span><span>format</span><span>(</span><span>new</span> <span>Date</span><span>());</span> </span><span> <span>tmpTimestamp</span> <span>=</span> <span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>/</span> <span>1000</span><span>;</span> </span><span> <span>}</span> </span><span> <span>builder</span><span>.</span><span>append</span><span>(</span><span>tmpDate</span><span>);</span> </span><span> <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\t"</span><span>);</span> </span><span> <span>builder</span><span>.</span><span>append</span><span>(</span><span>rand</span><span>.</span><span>nextInt</span><span>(</span><span>999</span><span>));</span> </span><span> <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\t"</span><span>);</span> </span><span> <span>builder</span><span>.</span><span>append</span><span>(</span><span>Encrypt</span><span>.</span><span>md5</span><span>(</span><span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>+</span> <span>""</span> <span>+</span> <span>rand</span><span>.</span><span>nextInt</span><span>(</span><span>99999999</span><span>)));</span> </span><span> <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\t"</span><span>);</span> </span><span> <span>builder</span><span>.</span><span>append</span><span>(</span><span>rand</span><span>.</span><span>nextInt</span><span>(</span><span>999</span><span>)</span> <span>%</span> <span>2</span> <span>==</span> <span>0</span> <span>?</span> <span>"AA."</span> <span>:</span> <span>"BB"</span><span>);</span> </span><span> <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\t"</span><span>);</span> </span><span> <span>builder</span><span>.</span><span>append</span><span>(</span><span>rand</span><span>.</span><span>nextFloat</span><span>()</span> <span>*</span> <span>2000</span><span>);</span> </span><span> <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\t"</span><span>);</span> </span><span> <span>builder</span><span>.</span><span>append</span><span>(</span><span>rand</span><span>.</span><span>nextInt</span><span>(</span><span>9</span><span>));</span> </span><span> <span>builder</span><span>.</span><span>append</span><span>(</span><span>"\n"</span><span>);</span> </span><span> <span>}</span> </span><span> <span>System</span><span>.</span><span>out</span><span>.</span><span>println</span><span>(</span><span>"write data:"</span> <span>+</span> <span>i</span><span>);</span> </span><span> <span>outStream</span><span>.</span><span>write</span><span>(</span><span>builder</span><span>.</span><span>toString</span><span>().</span><span>getBytes</span><span>(</span><span>CHARCODE</span><span>));</span> </span><span> <span>outStream</span><span>.</span><span>close</span><span>();</span> </span><span><span>}</span> <span>catch</span> <span>(</span><span>Exception</span> <span>e</span><span>)</span> <span>{</span> </span><span> <span>e</span><span>.</span><span>printStackTrace</span><span>();</span> </span><span><span>}</span> </span><span><span>System</span><span>.</span><span>out</span><span>.</span><span>println</span><span>(</span><span>System</span><span>.</span><span>currentTimeMillis</span><span>()</span> <span>/</span> <span>1000</span> <span>-</span> <span>start</span><span>);</span> </span></code> |
这段代码会生成一个数据文件,每一行为一条记录,然后再使用上面提到的 LOAD DATA 来导入数据就可以了,我在公司的电脑下(2G内存+垃圾双核CPU,MySQL直接装在windows下,没任何优化,developer模式)每秒能达到近万条的插入速度,比其他方式都快很多。
另外如果想直接用GUI工具操作也可以,比如SQLYog中,右键要导入的表,选择Import – Import CSV Data Using Load Local.. 然后设置好编码、分隔符后就可以直接导入了。