Home >Backend Development >PHP Tutorial >PHP generates and downloads EXCEL files with extremely large amounts of data in real time

PHP generates and downloads EXCEL files with extremely large amounts of data in real time

藏色散人
藏色散人forward
2019-08-26 14:20:444012browse

Recently received a request to export the corresponding user access logs to excel through the selected time period. Due to the large number of users, it is often the case that more than 500,000 data are exported.

The commonly used PHPexcel package needs to get all the data before it can generate excel. This will obviously cause memory overflow when faced with generating an excel file with a large amount of data, so consider using PHP to write at the same time. The output stream allows the browser to complete the request in the form of download.

We write the PHP output stream in the following way

$fp = fopen('php://output', 'a');
fputs($fp, 'strings');
....
....
fclose($fp)

php://output is a writable output stream, allowing the program to write output to the output stream like a file. , PHP will send the content in the output stream to the web server and return it to the browser that initiated the request

In addition, since the excel data is gradually read from the database and then written to the output stream, the execution of PHP needs to be Set the time longer (default 30 seconds) set_time_limit(0) does not limit PHP execution time.

Note:

The following code only illustrates the ideas and steps for generating EXCEL with large amounts of data. After removing the project business code, the program has syntax errors and cannot be run directly. , please fill in the corresponding business code according to your own needs!

/**
     * 文章访问日志
     * 下载的日志文件通常很大, 所以先设置csv相关的Header头, 然后打开
     * PHP output流, 渐进式的往output流中写入数据, 写到一定量后将系统缓冲冲刷到响应中
     * 避免缓冲溢出
     */
    public function articleAccessLog($timeStart, $timeEnd)
    {
        set_time_limit(0);
        $columns = [
            '文章ID', '文章标题', ......
        ];
        $csvFileName = '用户日志' . $timeStart .'_'. $timeEnd . '.xlsx';
        //设置好告诉浏览器要下载excel文件的headers
        header('Content-Description: File Transfer');
        header('Content-Type: application/vnd.ms-excel');
        header('Content-Disposition: attachment; filename="'. $fileName .'"');
        header('Expires: 0');
        header('Cache-Control: must-revalidate');
        header('Pragma: public');
        $fp = fopen('php://output', 'a');//打开output流
        mb_convert_variables('GBK', 'UTF-8', $columns);
        fputcsv($fp, $columns);//将数据格式化为CSV格式并写入到output流中
        $accessNum = '1000000'//从数据库获取总量,假设是一百万
        $perSize = 1000;//每次查询的条数
        $pages   = ceil($accessNum / $perSize);
        $lastId  = 0;
        for($i = 1; $i <= $pages; $i++) {
            $accessLog = $logService->getArticleAccessLog($timeStart, $timeEnd, $lastId, $perSize);
            foreach($accessLog as $access) {
                $rowData = [
                    ......//每一行的数据
                ];
                mb_convert_variables(&#39;GBK&#39;, &#39;UTF-8&#39;, $rowData);
                fputcsv($fp, $rowData);
                $lastId = $access->id;
            }
            unset($accessLog);//释放变量的内存
            //刷新输出缓冲到浏览器
            ob_flush();
            flush();//必须同时使用 ob_flush() 和flush() 函数来刷新输出缓冲。
        }
        fclose($fp);
        exit();
    }

Okay, it’s actually very simple. It is to write the output stream step by step and send it to the browser to let the browser download the entire file step by step. Since it is written step by step, the overall size of the file cannot be obtained, so there is no The method is to tell the browser how big the file is before downloading by setting header("Content-Length: $size");. However, it does not affect the overall effect. The core problem here is to solve the real-time generation and download of large files.

Update: Let me talk about my idea of ​​database query here, because the data gradually written to EXCEL actually comes from the paging query of Mysql. Everyone knows that the syntax is LIMIT offset, num, but as The larger the offset, the more rows MySQL needs to skip in each paging query, which will seriously affect the efficiency of MySQL queries (including NoSQL such as MongoDB, it is not recommended to skip multiple rows to get the result set), so I use LastId to do paging queries.

Similar to the following statement:

SELECT columns FROM `table_name` 
WHERE `created_at` >= &#39;time range start&#39; 
AND `created_at` <= &#39;time range end&#39; 
AND  `id` < LastId 
ORDER BY `id` DESC 
LIMIT num

The above is the detailed content of PHP generates and downloads EXCEL files with extremely large amounts of data in real time. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:segmentfault.com. If there is any infringement, please contact admin@php.cn delete