Home  >  Article  >  Backend Development  >  ## How to Efficiently Fetch Images from URLs with Dimension Thresholds?

## How to Efficiently Fetch Images from URLs with Dimension Thresholds?

Linda Hamilton
Linda HamiltonOriginal
2024-10-27 05:40:29296browse

## How to Efficiently Fetch Images from URLs with Dimension Thresholds?

Fetching Images from URLs with Width and Height Thresholds Efficiently

Problem:

Retrieving images from a given URL that meet specific width and height requirements, such as images with both dimensions greater than or equal to 200 pixels, is a common task in web development. However, this process can be time-consuming using traditional methods.

Current Approach:

The provided code iterates through all the elements on the given URL, retrieving the image dimensions using getimagesize(). While this approach works, its execution time can be noticeable.

Proposed Solution:

To speed up the process, consider these optimizations:

  • Parallel Image Download: Utilize the curl_multi_init function to download images concurrently. This technique reduces overall fetch time by initiating multiple requests simultaneously.
  • Local File Storage: Instead of running getimagesize() directly on the images from HTTP URLs, download them locally and store them in a temporary folder. This avoids the overhead of HTTP requests for dimension retrieval.

Code Implementation:

The following code presents a more efficient implementation that incorporates these optimizations:

<code class="php">require 'simple_html_dom.php';
$url = 'http://www.huffingtonpost.com';
$html = file_get_html($url);
$nodes = array();
$start = microtime();
$res = array();

if ($html->find('img')) {
    foreach ($html->find('img') as $element) {
        if (startsWith($element->src, "/")) {
            $element->src = $url . $element->src;
        }
        if (!startsWith($element->src, "http")) {
            $element->src = $url . "/" . $element->src;
        }
        $nodes[] = $element->src;
    }
}

echo "<pre class="brush:php;toolbar:false">";
print_r(imageDownload($nodes, 200, 200));
echo "<h1>", microtime() - $start, "</h1>";

function imageDownload($nodes, $maxHeight = 0, $maxWidth = 0) {
    $mh = curl_multi_init();
    $curl_array = array();
    foreach ($nodes as $i => $url) {
        $curl_array[$i] = curl_init($url);
        curl_setopt($curl_array[$i], CURLOPT_RETURNTRANSFER, true);
        curl_setopt($curl_array[$i], CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)');
        curl_setopt($curl_array[$i], CURLOPT_CONNECTTIMEOUT, 5);
        curl_setopt($curl_array[$i], CURLOPT_TIMEOUT, 15);
        curl_multi_add_handle($mh, $curl_array[$i]);
    }
    $running = NULL;
    do {
        usleep(10000);
        curl_multi_exec($mh, $running);
    } while ($running > 0);

    $res = array();
    foreach ($nodes as $i => $url) {
        $curlErrorCode = curl_errno($curl_array[$i]);

        if ($curlErrorCode === 0) {
            $info = curl_getinfo($curl_array[$i]);
            $ext = getExtention($info['content_type']);
            if ($info['content_type'] !== null) {
                $temp = "temp/img" . md5(mt_rand()) . $ext;
                touch($temp);
                $imageContent = curl_multi_getcontent($curl_array[$i]);
                file_put_contents($temp, $imageContent);
                if ($maxHeight == 0 || $maxWidth == 0) {
                    $res[] = $temp;
                } else {
                    $size = getimagesize($temp);
                    if ($size[1] >= $maxHeight && $size[0] >= $maxWidth) {
                        $res[] = $temp;
                    } else {
                        unlink($temp);
                    }
                }
            }
        }
        curl_multi_remove_handle($mh, $curl_array[$i]);
        curl_close($curl_array[$i]);

    }

    curl_multi_close($mh);
    return $res;
}

function getExtention($type) {
    $type = strtolower($type);
    switch ($type) {
        case "image/gif":
            return ".gif";
            break;
        case "image/png":
            return ".png";
            break;

        case "image/jpeg":
            return ".jpg";
            break;

        default:
            return ".img";
            break;
    }
}

function startsWith($str, $prefix) {
    $temp = substr($str, 0, strlen($prefix));
    $temp = strtolower($temp);
    $prefix = strtolower($prefix);
    return ($temp == $prefix);
}</code>

Benefits:

  • The optimized code reduces fetch time significantly compared to the original approach.
  • It utilizes parallel downloading using curl_multi_init, improving efficiency.
  • The local file storage strategy saves time by avoiding repeated HTTP requests for dimension retrieval.

The above is the detailed content of ## How to Efficiently Fetch Images from URLs with Dimension Thresholds?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn