Home  >  Article  >  Backend Development  >  Example of how PHP uses curl to download a file of a specified size

Example of how PHP uses curl to download a file of a specified size

黄舟
黄舟Original
2017-10-16 09:46:172051browse

Using the curl function based on libcurl in PHP, you can initiate an http request to the target URL and obtain the returned response content. The usual request method is similar to the following code:


public function callFunction($url, $postData, $method, header='')
{    
$maxRetryTimes = 3;    
$curl = curl_init();    
/******初始化请求参数start******/
    if(strtoupper($method) !== 'GET' && $postData){
        curl_setopt($curl, CURLOPT_POSTFIELDS, json_encode($postData));
    }elseif (strtoupper($method) === 'GET' && $postData){        
    $url .= '?'. http_build_query($postData);
    }    /******初始化请求参数end******/
    curl_setopt_array($curl, array(
        CURLOPT_URL => $url,
        CURLOPT_TIMEOUT => 10,
        CURLOPT_NOBODY => 0,
        CURLOPT_RETURNTRANSFER => 1
    ));    if(method == 'POST'){
        curl_setopt($curl, CURLOPT_POST, true);
    }    if(false == empty()){
        curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
    }    $response = false;    while(($response === false) && (--$maxRetryTimes > 0)){        
    $response = trim(curl_exec($curl));
    }    return $response;
}

The $response in the above code is the http request initiated by curl and obtained from $url If the size of the data to be downloaded is not specified by range in the $header, no matter how big the resource is, the complete content of the URI must be requested and returned. Usually curl is only used to request some interfaces or remotely call a function to obtain data, so the CURLOPT_TIMEOUT parameter is very important in this scenario.

The usage scenario of curl is not only to access the data interface, but also to detect whether any URL resource can provide the correct http service. When the URL filled in by the user is a resource file, such as a pdf or ppt, if the network condition is poor and curl is used to request a larger resource, a timeout will inevitably occur or more resources will be consumed. Internet resources. The previous strategy was to download completely (curl will download and store it in memory). After the request is completed, the content size will be checked. When the target value is exceeded, the monitoring task will be suspended. This restriction after the incident actually addressed the symptoms rather than the root cause. Finally, the customer put forward new requirements. The task could not be stopped and only the file of the specified size was downloaded and the md5 value was returned for the customer to verify the correctness.

After some attempts, this problem was solved. The recording process is as follows.

1. Try to use CURLOPT_MAXFILESIZE.

There are version requirements for the versions of php and libcurl, which are completely pre-processed. When it is found that the target is larger than the setting, an error exceeding the size limit is directly returned without downloading the target, which does not meet the requirements.

2. Use the callback function of the curl download process.

Refer to http://php.net/manual/en/function.curl-setopt-array.php, and finally use the CURLOPT_WRITEFUNCTION parameter to set on_curl_write. The function will be called back once in 1s.

$ch = curl_init();
$options = array(CURLOPT_URL        => 'http://www.php.net/',
CURLOPT_HEADER        => false,
CURLOPT_HEADERFUNCTION    => 'on_curl_header',
CURLOPT_WRITEFUNCTION    => 'on_curl_write'
);

The final fragment of my implementation:


function on_curl_write($ch, $data)
{    $pid = getmypid();    
$downloadSizeRecorder = DownloadSizeRecorder::getInstance($pid);    
$bytes = strlen($data);    
$downloadSizeRecorder->downloadData .= $data;    
$downloadSizeRecorder->downloadedFileSize += $bytes;
//    error_log(' on_curl_write '.$downloadSizeRecorder->downloadedFileSize." > {$downloadSizeRecorder->maxSize} \n", 3, '/tmp/hyb.log');
    //确保已经下载的内容略大于最大限制
    if (($downloadSizeRecorder->downloadedFileSize - $bytes) > $downloadSizeRecorder->maxSize) {        return false;
    }    return $bytes;  //这个不正确的返回,将会报错,中断下载 "errno":23,"errmsg":"Failed writing body (0 != 16384)"}

DownloadSizeRecorder is a singleton mode class that records the size when curl downloads. Implementation of returning md5 of downloaded content, etc.


class DownloadSizeRecorder
{    const ERROR_FAILED_WRITING = 23; //Failed writing body
    public $downloadedFileSize;    
    public $maxSize;    
    public $pid;    
    public $hasOverMaxSize;    
    public $fileFullName;    
    public $downloadData;    
    private static $selfInstanceList = array();    
    public static function getInstance($pid)
    {        if(!isset(self::$selfInstanceList[$pid])){
            self::$selfInstanceList[$pid] = new self($pid);
        }        return self::$selfInstanceList[$pid];
    }    private function __construct($pid)
    {        
    $this->pid = $pid;        
    $this->downloadedFileSize = 0;        
    $this->fileFullName = '';        
    $this->hasOverMaxSize = false;        
    $this->downloadData = '';
    }    
    /**
     * 保存文件     
     */
    public function saveMaxSizeData2File(){        
    if(empty($resp_data)){            
    $resp_data = $this->downloadData;
        }        
        $fileFullName = '/tmp/http_'.$this->pid.'_'.time()."_{$this->maxSize}.download";        
        if($resp_data && strlen($resp_data)>0)
        {            
        list($headerOnly, $bodyOnly) = explode("\r\n\r\n", $resp_data, 2);            
        $saveDataLenth = ($this->downloadedFileSize < $this->maxSize) ? $this->downloadedFileSize : $this->maxSize;            
        $needSaveData = substr($bodyOnly, 0, $saveDataLenth);            
        if(empty($needSaveData)){                
        return;
            }            
            file_put_contents($fileFullName, $needSaveData);            
            if(file_exists($fileFullName)){                
            $this->fileFullName = $fileFullName;
            }
        }
    }    
    /**
     * 返回文件的md5
     * @return string     
     */
    public function returnFileMd5(){        
    $md5 = &#39;&#39;;        
    if(file_exists($this->fileFullName)){            
    $md5 = md5_file($this->fileFullName);
        }        
        return $md5;
    }    
    /**
     * 返回已下载的size
     * @return int    
      */
    public function returnSize(){        
    return ($this->downloadedFileSize < $this->maxSize) ? $this->downloadedFileSize : $this->maxSize;
    }    
    /**
     * 删除下载的文件    
      */
    public function deleteFile(){        
    if(file_exists($this->fileFullName)){            
    unlink($this->fileFullName);
        }
    }
}

In the code example of curl request, the download size is limited.

……
curl_setopt($ch, CURLOPT_WRITEFUNCTION, &#39;on_curl_write&#39;);//设置回调函数
……
$pid = getmypid();
$downloadSizeRecorder = DownloadSizeRecorder::getInstance($pid);
$downloadSizeRecorder->maxSize = $size_limit;
……
//发起curl请求
$response = curl_exec($ch);
……
//保存文件,返回md5
$downloadSizeRecorder->saveMaxSizeData2File();  //保存
$downloadFileMd5 = $downloadSizeRecorder->returnFileMd5();
$downloadedfile_size = $downloadSizeRecorder->returnSize();
$downloadSizeRecorder->deleteFile();

At this point, I stepped on a pit. After adding on_curl_write, $response will return true, causing an exception when fetching the returned content later. Fortunately, the download size has been limited in real time, and downloadData is used to record the downloaded content, which can be used directly.

if($response === true){
    $response = $downloadSizeRecorder->downloadData;
}

The above is the detailed content of Example of how PHP uses curl to download a file of a specified size. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn