Home  >  Article  >  Backend Development  >  How to Efficiently Retrieve Images from a URL with Dimensions Greater Than or Equal to 200px?

How to Efficiently Retrieve Images from a URL with Dimensions Greater Than or Equal to 200px?

Barbara Streisand
Barbara StreisandOriginal
2024-10-28 02:28:02995browse

How to Efficiently Retrieve Images from a URL with Dimensions Greater Than or Equal to 200px?

How to Obtain Images from a URL with Dimensions Greater Than or Equal to 200px Efficiently

In the pursuit of creating a feature similar to Pinterest's "Add a Pin," the question arises of how to quickly retrieve all images from a specified URL that have a width and height greater than or equal to 200 pixels. While the provided code takes 48.64 seconds to complete the process, it can be significantly accelerated with the following optimizations:

  1. Parallel Processing: Use curl_multi_init to execute multiple curl requests simultaneously, reducing bandwidth limitations and speeding up image retrieval.
  2. Local Image Storage: Avoid calling getimagesize() over HTTP by downloading the images directly to a local temporary directory. This approach dramatically improves performance compared to remote verification.
<code class="php">// PHP script using curl_multi_init for parallel image retrieval
require 'simple_html_dom.php';
$url = 'http://www.huffingtonpost.com/';
$html = file_get_html ( $url );
$nodes = array ();
$start = microtime ();
$res = array ();

if ($html->find ( 'img' )) {
    foreach ( $html->find ( 'img' ) as $element ) {
        if (startsWith ( $element->src, "/" )) {
            $element->src = $url . $element->src;
        }
        if (! startsWith ( $element->src, "http" )) {
            $element->src = $url . "/" . $element->src;
        }
        $nodes [] = $element->src;
    }
}

echo "<pre class="brush:php;toolbar:false">";
print_r ( imageDownload ( $nodes, 200, 200 ) );
echo "<h1>", microtime () - $start, "</h1>";

function imageDownload($nodes, $maxHeight = 0, $maxWidth = 0) {

    $mh = curl_multi_init ();
    $curl_array = array ();
    foreach ( $nodes as $i => $url ) {
        $curl_array [$i] = curl_init ( $url );
        curl_setopt ( $curl_array [$i], CURLOPT_RETURNTRANSFER, true );
        curl_setopt ( $curl_array [$i], CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)' );
        curl_setopt ( $curl_array [$i], CURLOPT_CONNECTTIMEOUT, 5 );
        curl_setopt ( $curl_array [$i], CURLOPT_TIMEOUT, 15 );
        curl_multi_add_handle ( $mh, $curl_array [$i] );
    }
    $running = NULL;
    do {
        usleep ( 10000 );
        curl_multi_exec ( $mh, $running );
    } while ( $running > 0 );

    $res = array ();
    foreach ( $nodes as $i => $url ) {
        $curlErrorCode = curl_errno ( $curl_array [$i] );

        if ($curlErrorCode === 0) {
            $info = curl_getinfo ( $curl_array [$i] );
            $ext = getExtention ( $info ['content_type'] );
            if ($info ['content_type'] !== null) {
                $temp = "temp/img" . md5 ( mt_rand () ) . $ext;
                touch ( $temp );
                $imageContent = curl_multi_getcontent ( $curl_array [$i] );
                file_put_contents ( $temp, $imageContent );
                if ($maxHeight == 0 || $maxWidth == 0) {
                    $res [] = $temp;
                } else {
                    $size = getimagesize ( $temp );
                    if ($size [1] >= $maxHeight &amp;&amp; $size [0] >= $maxWidth) {
                        $res [] = $temp;
                    } else {
                        unlink ( $temp );
                    }
                }
            }
        }
        curl_multi_remove_handle ( $mh, $curl_array [$i] );
        curl_close ( $curl_array [$i] );

    }

    curl_multi_close ( $mh );
    return $res;
}

function getExtention($type) {
    $type = strtolower ( $type );
    switch ($type) {
        case "image/gif" :
            return ".gif";
            break;
        case "image/png" :
            return ".png";
            break;

        case "image/jpeg" :
            return ".jpg";
            break;

        default :
            return ".img";
            break;
    }
}

function startsWith($str, $prefix) {
    $temp = substr ( $str, 0, strlen ( $prefix ) );
    $temp = strtolower ( $temp );
    $prefix = strtolower ( $prefix );
    return ($temp == $prefix);
}</code>

The above is the detailed content of How to Efficiently Retrieve Images from a URL with Dimensions Greater Than or Equal to 200px?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn