如何使用 cURL 检索页面内容
在从 Google 搜索结果中提取内容时,您可能会遇到诸如重定向和使用 cURL 时出现“页面已移动”错误。通常,这些障碍是由于编码的查询字符串引起的。
要有效检索所需的内容,请考虑以下 PHP 实现:
<code class="php">/** * Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an * array containing the HTTP server response header fields and content. */ function get_web_page( $url ) { $user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0'; $options = array( CURLOPT_CUSTOMREQUEST => "GET", // set request type post or get CURLOPT_POST => false, // set to GET CURLOPT_USERAGENT => $user_agent, // set user agent CURLOPT_COOKIEFILE => "cookie.txt", // set cookie file CURLOPT_COOKIEJAR => "cookie.txt", // set cookie jar CURLOPT_RETURNTRANSFER => true, // return web page CURLOPT_HEADER => false, // don't return headers CURLOPT_FOLLOWLOCATION => true, // follow redirects CURLOPT_ENCODING => "", // handle all encodings CURLOPT_AUTOREFERER => true, // set referer on redirect CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect CURLOPT_TIMEOUT => 120, // timeout on response CURLOPT_MAXREDIRS => 10, // stop after 10 redirects ); $ch = curl_init( $url ); curl_setopt_array( $ch, $options ); $content = curl_exec( $ch ); $err = curl_errno( $ch ); $errmsg = curl_error( $ch ); $header = curl_getinfo( $ch ); curl_close( $ch ); $header['errno'] = $err; $header['errmsg'] = $errmsg; $header['content'] = $content; return $header; }</code>
示例:
<code class="php">// Read a web page and check for errors: $result = get_web_page( $url ); if ( $result['errno'] != 0 ) ... error: bad url, timeout, redirect loop ... if ( $result['http_code'] != 200 ) ... error: no page, no permissions, no service ... $page = $result['content'];</code>
以上是如何使用 cURL 处理 URL 重定向和页面移动错误?的详细内容。更多信息请关注PHP中文网其他相关文章!