Home  >  Article  >  Backend Development  >  How to implement recursive crawling of web page classes in PHP

How to implement recursive crawling of web page classes in PHP

墨辰丷
墨辰丷Original
2018-06-11 16:54:421480browse

This article mainly introduces the recursive crawling of web pages in PHP. It analyzes the techniques of PHP recursive operation and web page crawling with examples. It is of great practical value. Friends who need it can refer to the following examples.

This article explains the examples php implements the method of recursively crawling web page classes. The details are as follows:

<?php
class crawler{
 private $_depth=5;
 private $_urls=array();
 function extract_links($url)
 {
  if(!$this->_started){
   $this->_started=1;
   $curr_depth=0;
  }else{
   $curr_depth++;
  }
  if($curr_depth<$this->_depth)
  {
   $data=file_get_contents($url);
   if(preg_match_all(&#39;/((?:http|https)://(?:www.)*(?:[a-zA-Z0-9_-]{1,15}.+[a-zA-Z0-9_]{1,}){1,}(?:[a-zA-Z0-9_/.-?&:%,!;]*))/&#39;,$data,$urls12))
   {
    foreach($urls12[0] as $k=>$v){
     $check=get_headers($v,1);
     if(strstr($v,$url) && $check[0]==&#39;HTTP/1.1 200 OK&#39; && !array_search($v,$this->_urls) && $curr_depth<$this->_depth){
      $this->_urls[]=$v;
      $this->extract_links($v);
     }
    }
   }
  }
  return $this->_urls;
 }
}
?>

Summary: The above is the entire content of this article, I hope it will be helpful to everyone's study.

Related recommendations:

PHP implements Chinese character verification code

php process control and mathematical operations

php implements loading and saving fonts

The above is the detailed content of How to implement recursive crawling of web page classes in PHP. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn