Rumah  >  Soal Jawab  >  teks badan

Bagaimana untuk menyemak sama ada URL wujud melalui PHP?

<p>Bagaimana untuk menyemak sama ada URL wujud (bukan 404) dalam PHP? </p>
P粉288069045P粉288069045445 hari yang lalu446

membalas semua(2)saya akan balas

  • P粉333186285

    P粉3331862852023-08-24 16:27:55

    Apabila menentukan sama ada url dalam php wujud, anda perlu memberi perhatian kepada perkara berikut:

    • Ul itu sendiri adalah sah (rentetan, tidak kosong, sintaks yang baik), ini boleh menjadi sisi pelayan semakan pantas.
    • Menunggu respons mungkin mengambil sedikit masa dan menghalang pelaksanaan kod.
    • Tidak semua pengepala yang dikembalikan oleh get_headers() dibentuk dengan baik.
    • Gunakan curl (jika boleh).
    • Menghalang mendapat seluruh badan/kandungan dan hanya meminta tajuk.
    • Pertimbangkan URL ubah hala:
    • Adakah anda mahu kembali kepada kod pertama?
    • Atau ikut semua ubah hala dan kembalikan kod terakhir?
    • Anda mungkin mendapat 200, tetapi ia boleh diubah hala menggunakan tag meta atau JavaScript. Memikirkan apa yang berlaku seterusnya adalah sukar.

    Sila ingat bahawa tidak kira apa kaedah yang anda gunakan, menunggu jawapan akan mengambil masa.
    Semua kod boleh (dan mungkin akan) berhenti sehingga anda mengetahui keputusan atau masa permintaan tamat.

    Contohnya: Jika URL tidak sah atau tidak boleh diakses, kod di bawah mungkin mengambil masa yang lama untuk memaparkan halaman:

    <?php
    $urls = getUrls(); // some function getting say 10 or more external links
    
    foreach($urls as $k=>$url){
      // this could potentially take 0-30 seconds each
      // (more or less depending on connection, target site, timeout settings...)
      if( ! isValidUrl($url) ){
        unset($urls[$k]);
      }
    }
    
    echo "yay all done! now show my site";
    foreach($urls as $url){
      echo "<a href=\"{$url}\">{$url}</a><br/>";
    }

    Fungsi berikut mungkin membantu, anda mungkin perlu mengubah suainya mengikut keperluan anda:

    function isValidUrl($url){
            // first do some quick sanity checks:
            if(!$url || !is_string($url)){
                return false;
            }
            // quick check url is roughly a valid http request: ( http://blah/... ) 
            if( ! preg_match('/^http(s)?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)*(:[0-9]+)?(\/.*)?$/i', $url) ){
                return false;
            }
            // the next bit could be slow:
            if(getHttpResponseCode_using_curl($url) != 200){
    //      if(getHttpResponseCode_using_getheaders($url) != 200){  // use this one if you cant use curl
                return false;
            }
            // all good!
            return true;
        }
        
        function getHttpResponseCode_using_curl($url, $followredirects = true){
            // returns int responsecode, or false (if url does not exist or connection timeout occurs)
            // NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
            // if $followredirects == false: return the FIRST known httpcode (ignore redirects)
            // if $followredirects == true : return the LAST  known httpcode (when redirected)
            if(! $url || ! is_string($url)){
                return false;
            }
            $ch = @curl_init($url);
            if($ch === false){
                return false;
            }
            @curl_setopt($ch, CURLOPT_HEADER         ,true);    // we want headers
            @curl_setopt($ch, CURLOPT_NOBODY         ,true);    // dont need body
            @curl_setopt($ch, CURLOPT_RETURNTRANSFER ,true);    // catch output (do NOT print!)
            if($followredirects){
                @curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,true);
                @curl_setopt($ch, CURLOPT_MAXREDIRS      ,10);  // fairly random number, but could prevent unwanted endless redirects with followlocation=true
            }else{
                @curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,false);
            }
    //      @curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,5);   // fairly random number (seconds)... but could prevent waiting forever to get a result
    //      @curl_setopt($ch, CURLOPT_TIMEOUT        ,6);   // fairly random number (seconds)... but could prevent waiting forever to get a result
    //      @curl_setopt($ch, CURLOPT_USERAGENT      ,"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1");   // pretend we're a regular browser
            @curl_exec($ch);
            if(@curl_errno($ch)){   // should be 0
                @curl_close($ch);
                return false;
            }
            $code = @curl_getinfo($ch, CURLINFO_HTTP_CODE); // note: php.net documentation shows this returns a string, but really it returns an int
            @curl_close($ch);
            return $code;
        }
        
        function getHttpResponseCode_using_getheaders($url, $followredirects = true){
            // returns string responsecode, or false if no responsecode found in headers (or url does not exist)
            // NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
            // if $followredirects == false: return the FIRST known httpcode (ignore redirects)
            // if $followredirects == true : return the LAST  known httpcode (when redirected)
            if(! $url || ! is_string($url)){
                return false;
            }
            $headers = @get_headers($url);
            if($headers && is_array($headers)){
                if($followredirects){
                    // we want the last errorcode, reverse array so we start at the end:
                    $headers = array_reverse($headers);
                }
                foreach($headers as $hline){
                    // search for things like "HTTP/1.1 200 OK" , "HTTP/1.0 200 OK" , "HTTP/1.1 301 PERMANENTLY MOVED" , "HTTP/1.1 400 Not Found" , etc.
                    // note that the exact syntax/version/output differs, so there is some string magic involved here
                    if(preg_match('/^HTTP\/\S+\s+([1-9][0-9][0-9])\s+.*/', $hline, $matches) ){// "HTTP/*** ### ***"
                        $code = $matches[1];
                        return $code;
                    }
                }
                // no HTTP/xxx found in headers:
                return false;
            }
            // no headers :
            return false;
        }

    balas
    0
  • P粉465287592

    P粉4652875922023-08-24 00:30:23

    Di sini:

    $file = 'http://www.example.com/somefile.jpg';
    $file_headers = @get_headers($file);
    if(!$file_headers || $file_headers[0] == 'HTTP/1.1 404 Not Found') {
        $exists = false;
    }
    else {
        $exists = true;
    }

    Datang di sinidan di bawah catatan di atas, ada penyelesaian keriting:

    function url_exists($url) {
        return curl_init($url) !== false;
    }

    balas
    0
  • Batalbalas