Home  >  Article  >  Backend Development  >  Commonly used remote collection functions in php_PHP tutorial

Commonly used remote collection functions in php_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 17:07:43791browse

The most commonly used way to collect data in PHP is to use the curl function. Because the curl function is high-performance and multi-threaded, I will introduce a PHP collection program for reference if necessary.

Function

The code is as follows Copy code
 代码如下 复制代码

/**
 * 获取远程url的内容
 * @param string $url
 * @return string
 */
function get_url_content($url) {
  if(function_exists(curl_init)) {
    $ch = curl_init();
    $timeout = 5;
    curl_setopt ($ch, CURLOPT_URL, $url);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    curl_setopt ($ch, CURLOPT_TIMEOUT, $timeout);
     
    $file_contents = curl_exec($ch);
    curl_close($ch);
  } else {
    $file_contents = file_get_contents($url);
  }
 
  return $file_contents;
}

/**

* Get the content of the remote url

* @param string $url
 代码如下 复制代码

$url = 'http://www.bKjia.c0m';
$a = get_url_content($url);
echo $a;

* @return string

​*/

function get_url_content($url) {

if(function_exists(curl_init)) {
$ch = curl_init();
$timeout = 5;
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt ($ch, CURLOPT_TIMEOUT, $timeout);

                     

$file_contents = curl_exec($ch); curl_close($ch); } else {

$file_contents = file_get_contents($url);

}

Return $file_contents;
}

Call method

The code is as follows Copy code
$url = 'http://www.bKjia.c0m';
 代码如下 复制代码

$oldcontent = file_get_contents(“http://www.abcam.cn/index.html?pageconfig=catalog_byproducttype&intProductTypeID=1&strStartChar=A&intResultsPage=2&tr=59”);
echo $oldcontent;
?>

$a = get_url_content($url); echo $a;
The above is just a simple example. If we want to apply it, we can refer to the collection program I wrote myself. 1, Get target web page data; 2. Intercept relevant content; 3. Write to database/generate HMTL file; Just follow the steps below to try it! Get landing page data 1. Determine the web page address and even format to be obtained. The URL we use here is:/index.html?pageconfig=catalog_byproducttype&intProductTypeID=1&strStartChar=A&intResultsPage=1&tr=59 This page is paginated. According to the rules, we found that we only need to change the page parameter to turn pages! That is: Our web page format is:/index.html?pageconfig=catalog_byproducttype&intProductTypeID=1&strStartChar=A&intResultsPage= NUMBER &tr=59 The red part is the corresponding value of the current page number! Just change the value! 2. Get page content: Naturally, you have to use PHP functions! Here, both functions are available! They are: file_get_contents() reads the entire file into a string. Same as file(), except that file_get_contents() reads the file into a string. The file_get_contents() function is the preferred method for reading the contents of a file into a string. If supported by the operating system, memory mapping technology is also used to enhance performance. Syntax: file_get_contents( path , include_path , context , start , max_length ) curl() For details, please refer to the official website documentation: http://cn.php.net/curl The fopen() function opens a file or URL. If the opening fails, this function returns FALSE. Syntax: fopen(filename, mode, include_path, context) Of course, we use the first one! In fact, all of them are similar, and interested children can learn about the others!
The code is as follows Copy code
$oldcontent = file_get_contents(“http://www.abcam.cn/index.html?pageconfig=catalog_byproducttype&intProductTypeID=1&strStartChar=A&intResultsPage=2&tr=59”);<🎜> echo $oldcontent;<🎜> ?>

Run the PHP program, and the above code can display the entire web page! Since the original web page uses the Jedi path, the display effect now is exactly the same as the original one!
The next step is to intercept the content! There are many ways to intercept the content. The one introduced today is relatively simple:

The code is as follows
 代码如下 复制代码
$oldcontent = file_get_contents(“http://www.abcam.cn/index.html?pageconfig=catalog_byproducttype&intProductTypeID=1&strStartChar=A&intResultsPage=2&tr=59″);
$oldcontent;
$pfirst = ‘’;
$plast = ‘Goat polyclonal’;
$b= strpos($oldcontent,$pfirst);
$c= strpos($oldcontent,$plast);
echo substr($oldcontent,$b,$c-1);
?>

Code

Copy code


$oldcontent = file_get_contents(“http://www.abcam.cn/index.html?pageconfig=catalog_byproducttype&intProductTypeID=1&strStartChar=A&intResultsPage=2&tr=59″);
代码如下 复制代码
$oldcontent = file_get_contents(“index.html?pageconfig=catalog_byproducttype&intProductTypeID=1&strStartChar=A&intResultsPage=2&tr=59″);
$oldcontent;
$pfirst = ‘’;
$plast = ‘Goat polyclonal’;
$b= strpos($oldcontent,$pfirst);
$c= strpos($oldcontent,$plast);
$a = substr($oldcontent,$b,$c-1);
$file = date(‘YmdHis’).”.html”;
$fp = fopen($file,”w+”);
if(!is_writable($file)){
die(“File “.$file.” can not be written”);
}
else {
file_put_contents($file, $a);
echo “success”;
}
fclose($fp);
?>

Code

$oldcontent;

$pfirst = '

';

$plast = ‘Goat polyclonal’; $b= strpos($oldcontent,$pfirst); $c= strpos($oldcontent,$plast); echo substr($oldcontent,$b,$c-1); ?>

The output is the required result! Writing to the database and writing to files are relatively simple! Here is where writing to files is done!

Code

The code is as follows Copy code
$oldcontent = file_get_contents(“index.html?pageconfig=catalog_byproducttype&intProductTypeID=1&strStartChar=A&intResultsPage=2&tr=59″); $oldcontent; $pfirst = ''; $plast = ‘Goat polyclonal’; $b= strpos($oldcontent,$pfirst); $c= strpos($oldcontent,$plast); $a = substr($oldcontent,$b,$c-1);
$file = date(‘YmdHis’).”.html”;
$fp = fopen($file,”w+”); if(!is_writable($file)){ die(“File “.$file.” can not be written”); } else { file_put_contents($file, $a); echo “success”; } fclose($fp); ?> OK, continue to work, today’s interception ends here, next time I will talk about regular expression extraction content http://www.bkjia.com/PHPjc/629899.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/629899.htmlTechArticleThe most common way to collect data in PHP is to use the curl function, because the curl function is high-performance and multi-threaded Function, let me introduce a php collection program. If you need it, you can...

Code

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn