Home >Backend Development >PHP Tutorial >How to Efficiently Process Large CSV Files with 30 Million Characters?

How to Efficiently Process Large CSV Files with 30 Million Characters?

DDD
DDDOriginal
2024-11-10 20:35:03605browse

How to Efficiently Process Large CSV Files with 30 Million Characters?

Manipulating Large CSV Files Efficiently: Handling Strings of 30 Million Characters

You encounter an 'out of memory' error when manipulating a large CSV file downloaded via Curl. The file contains approximately 30.5 million characters, and attempting to split it into an array of lines using "r" and "n" fails due to excessive memory consumption. To avoid allocation errors, consider alternative approaches:

Streaming Data without File Writing:

Utilize the CURLOPT_FILE option to stream data directly into a custom stream wrapper instead of writing to a file. By defining your own stream wrapper class, you can process chunks of data as they arrive without allocating excessive memory.

Example Stream Wrapper Class:

class MyStream {
    protected $buffer;

    function stream_open($path, $mode, $options, &$opened_path) {
        return true;
    }

    public function stream_write($data) {
        // Extract and process lines
        $lines = explode("\n", $data);
        $this->buffer = $lines[count($lines) - 1];
        unset($lines[count($lines) - 1]);

        // Perform operations on the lines
        var_dump($lines);
        echo '<hr />';

        return strlen($data);
    }
}

Register the stream wrapper:

stream_wrapper_register("test", "MyStream") or die("Failed to register protocol");

Configuration Curl with the stream wrapper:

$fp = fopen("test://MyTestVariableInMemory", "r+"); // Pseudo-file written to by curl

curl_setopt($ch, CURLOPT_FILE, $fp); // Directs output to the stream

This approach allows you to work on chunks of data incrementally, avoiding memory allocations and making it feasible to operate on large strings.

Other Considerations:

  • Test the implementation thoroughly to ensure it handles long lines and other edge cases.
  • Additional code may be required to perform database insertions.
  • This solution serves as a starting point; customization and optimization may be necessary.

The above is the detailed content of How to Efficiently Process Large CSV Files with 30 Million Characters?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn