Home  >  Article  >  Backend Development  >  When making a webpage that imports CSV data, what solution should be used or what issues should be considered if there are dozens or millions of pieces of data?

When making a webpage that imports CSV data, what solution should be used or what issues should be considered if there are dozens or millions of pieces of data?

WBOY
WBOYOriginal
2016-12-05 13:44:201441browse

想用PHP做一个如题的网页,新手不知道怎么入手,之前做一个简单的上传,但数据只有几千条,现在要处理上百万条数据导入,应该如何解决。会Python 但是Python太折腾打算用PHP。请教一下各位大神~

回复内容:

想用PHP做一个如题的网页,新手不知道怎么入手,之前做一个简单的上传,但数据只有几千条,现在要处理上百万条数据导入,应该如何解决。会Python 但是Python太折腾打算用PHP。请教一下各位大神~

效率问题的话。拼接成串类似 insert into...value(),value()这样>事务>>循环插入。注意一点就是一次插入很多数据 要修改mysql或其他数据库允许一次插入的最大数据量的配置。我刚刚试了一下 一次插入10万条 用第一种方法大概4秒(5字段这样的表)

数据量比较大, 可以分批次导入, 用ajax反复请求上传接口, 每次请求传递不同参数.
如:

www.xxx.com/upload.php?offset=0&length=1000

www.xxx.com/upload.php?offset=1000&length=1000

这样不至于让PHP脚本运行超时.

对于插入的语句优化,以及表结构有较大要求,不然会慢的可怕,合并数据+事务+有序数据的方式会比较高效

先把CSV表格上传到服务器.
然后PHP把CSV表格导入MySQL数据库:

<code><?php
//开启事务,批量插入
$db = new mysqli('127.0.0.1','user','pass','dbname',3306);
$db->query('SET AUTOCOMMIT=0');
$db->query('START TRANSACTION');

//导入CSV表格:CSV转数组
$fp = fopen('file.csv', 'r');
while ( ($row = fgetcsv($fp)) !== FALSE ) {
    //从文件指针中读入一行并解析CSV
    $stmt = $db->prepare('INSERT INTO posts (id, post_title, post_content) VALUES (?,?,?)');
    $stmt->bind_param('iss', $row[0], $row[1], $row[2]); //这里假设每行内容分别为ID,标题和内容
    $stmt->execute();
    //如果插入失败,改为更新
    if($stmt->affected_rows == 0) {
        $stmt = $db->prepare('UPDATE posts SET post_title = ?, post_content = ? WHERE id = ?');
        $stmt->bind_param('ssi', $row[1], $row[2], $row[0]);
        $stmt->execute();
        if($stmt->affected_rows == 0) {
            echo 'Import '.$row[0].' failed!'."\n";
        }
    }
}
fclose($fp);

//提交事务
$db->query('COMMIT'); //失败的操作已经echo输出来了,不需要回滚ROLLBACK
$db->query('SET AUTOCOMMIT=1');</code>

异步吧,同步几百万受不了吧

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn