Home  >  Article  >  Backend Development  >  How to Retrieve Page Content Using cURL Despite \"Page Moved\" Errors?

How to Retrieve Page Content Using cURL Despite \"Page Moved\" Errors?

Patricia Arquette
Patricia ArquetteOriginal
2024-10-22 20:52:03437browse

How to Retrieve Page Content Using cURL Despite

Retrieving Page Content Using cURL

In this context, you seek to scrape the content of a Google search results page using cURL. Despite attempting to set user agents and various options, successful retrieval of the page content has eluded you. Redirects or "page moved" errors continue to plague your efforts.

It is believed that the issue may stem from the encoding of special characters in the query string. To mitigate this, alterations to your PHP code are necessary.

Here's the approach:

<code class="php">function get_web_page($url)
{
    $user_agent = 'Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';

    $options = array(
        CURLOPT_CUSTOMREQUEST => "GET",
        CURLOPT_POST           => false,
        CURLOPT_USERAGENT      => $user_agent,
        CURLOPT_COOKIEFILE     => "cookie.txt",
        CURLOPT_COOKIEJAR      => "cookie.txt",
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_HEADER         => false,
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_ENCODING       => "",
        CURLOPT_AUTOREFERER    => true,
        CURLOPT_CONNECTTIMEOUT => 120,
        CURLOPT_TIMEOUT        => 120,
        CURLOPT_MAXREDIRS      => 10
    );

    $ch = curl_init($url);
    curl_setopt_array($ch, $options);
    $content = curl_exec($ch);
    $err = curl_errno($ch);
    $errmsg = curl_error($ch);
    $header = curl_getinfo($ch);
    curl_close($ch);

    $header['errno']   = $err;
    $header['errmsg']  = $errmsg;
    $header['content'] = $content;
    return $header;
}</code>

Usage:

<code class="php">$result = get_web_page($url);

if ($result['errno'] != 0) {
    // Handle errors: bad URL, timeout, redirect loop
}

if ($result['http_code'] != 200) {
    // Handle errors: no page, no permissions, no service
}

$page = $result['content'];</code>

With this code, you can now retrieve the exact page content as displayed in your browser. By accounting for the special characters in the query string, you can overcome the obstacles you faced previously.

The above is the detailed content of How to Retrieve Page Content Using cURL Despite \"Page Moved\" Errors?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn