In PHP, both the file_get_contents and curl() functions can be used to grab data from the other party’s website and save it to the local server, but in general, file_get_contents() is slightly less efficient. Commonly used failure situations and curl() efficiency It is quite advanced and supports multi-threading, but you need to enable the curl extension, which means that to use the curl function, you must enable the curl extension, and the file_get_contents function system is the default.
The following are the steps to enable curl extension:
1. Copy the three files php_curl.dll, libeay32.dll, and ssleay32.dll under the PHP folder to system32;
2. Remove the semicolon from extension=php_curl.dll in php.ini (c:WINDOWS directory);
3. Restart apache or IIS.
Let’s first look at simple examples of two functions
curl() function
The code is as follows |
Copy code |
代码如下 |
复制代码 |
$ch = curl_init("http://www.bKjia.c0m/");
curl_exec($ch);
curl_close($ch);
//$ch = curl_init("要采集的网址"); curl_init()函数的作用初始化一个curl会话
//curl_exec($ch);执行$ch
//curl_close($ch); 关闭$ch
|
$ch = curl_init("http://www.bKjia.c0m/");
curl_exec($ch);
curl_close($ch);
//$ch = curl_init("URL to be collected"); The function of curl_init() function initializes a curl session
//curl_exec($ch); execute $ch
代码如下 |
复制代码 |
echo file_get_contents("http://www.hzhuti.com");
?> |
//curl_close($ch); Close $ch
|
代码如下 |
复制代码 |
This is a test file with test text.
|
file_get_contents function
Example
The code is as follows |
Copy code |
echo file_get_contents("http://www.hzhuti.com");
?> |
Output:
The code is as follows |
Copy code |
This is a test file with test text.
|
Summary
代码如下 |
复制代码 |
$config['context'] = stream_context_create(array('http' => array('method' => "GET",'timeout' => 5)));
|
fopen / file_get_contents will re-do the DNS query for each request and does not cache the DNS information.
But CURL will automatically cache DNS information. Requests for web pages or images under the same domain name only require one DNS query. This greatly reduces the number of DNS queries.
代码如下 |
复制代码 |
file_get_contents(http://***): failed to open stream… |
So the performance of CURL is much better than fopen/file_get_contents.
file_get_contents and curl efficiency and stability issues
The code is as follows |
Copy code |
$config['context'] = stream_context_create(array('http' => array('method' => "GET",'timeout' => 5)));
|
'timeout' => 5//This timeout is unstable and often difficult to use. At this time, if you look at the server's connection pool, you will find a bunch of errors similar to the following, which will give you a headache:
As a last resort, I installed the curl library and wrote a function replacement:
The code is as follows
代码如下 |
复制代码 |
function curl_get_contents($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); //设置访问的url地址
//curl_setopt($ch,CURLOPT_HEADER,1); //是否显示头部信息
curl_setopt($ch, CURLOPT_TIMEOUT, 5); //设置超时
curl_setopt($ch, CURLOPT_USERAGENT, _USERAGENT_); //用户访问代理 User-Agent
curl_setopt($ch, CURLOPT_REFERER,_REFERER_); //设置 referer
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1); //跟踪301
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //返回结果
$r = curl_exec($ch);
curl_close($ch);
return $r;
}
|
|
Copy code
|
function curl_get_contents($url)
{
代码如下 |
复制代码 |
1.2.31319094
2.2.30374217
3.2.21512604
4.3.30553889
5.2.30124092
curl使用的时间:
1.0.68719101
2.0.64675593
3.0.64326
4.0.81983113
5.0.63956594
|
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); //Set the accessed url address
//curl_setopt($ch,CURLOPT_HEADER,1); //Whether to display header information
curl_setopt($ch, CURLOPT_TIMEOUT, 5); //Set timeout
代码如下 |
复制代码 |
< ?php
function vita_get_url_content($url) {
if(function_exists('file_get_contents')) {
$file_contents = file_get_contents($url);
} else {
$ch = curl_init();
$timeout = 5;
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$file_contents = curl_exec($ch);
curl_close($ch);
}
return $file_contents;
}
?>
|
curl_setopt($ch, CURLOPT_USERAGENT, _USERAGENT_); //User access agent User-Agent |
curl_setopt($ch, CURLOPT_REFERER,_REFERER_); //Set referer
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1); //Track 301
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //return result
$r = curl_exec($ch);
curl_close($ch);
Return $r;
}
So no more problems other than real network issues.
Here is a test of curl and file_get_contents done by others:
Number of seconds file_get_contents takes to crawl google.com:
The code is as follows
|
Copy code
|
1.2.31319094
2.2.30374217
3.2.21512604
5.2.30124092
Time used by curl:
1.0.68719101
2.0.64675593
3.0.64326
4.0.81983113
5.0.63956594
So how to use file_get_contents or curl() according to the server situation? Below we can use the function_exists function to determine whether PHP supports a function. We can easily write the following function
The code is as follows
|
Copy code
|
< ?php <🎜>
function vita_get_url_content($url) { <🎜>
if(function_exists('file_get_contents')) { <🎜>
$file_contents = file_get_contents($url); <🎜>
} else { <🎜>
$ch = curl_init(); <🎜>
$timeout = 5; <🎜>
curl_setopt ($ch, CURLOPT_URL, $url); <🎜>
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); <🎜>
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout); <🎜>
$file_contents = curl_exec($ch); <🎜>
curl_close($ch); <🎜>
} <🎜>
return $file_contents; <🎜>
} <🎜>
?>
http://www.bkjia.com/PHPjc/633083.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/633083.htmlTechArticleIn php, the file_get_contents and curl() functions can be used to grab the data of the other party's website and save it to the local server , but generally speaking, file_get_contents() is slightly less efficient and often fails...
|
|