Home  >  Article  >  Backend Development  >  Detailed introduction to how PHP+JavaScript crawls web content

Detailed introduction to how PHP+JavaScript crawls web content

零到壹度
零到壹度Original
2018-04-11 17:27:386033browse

The content of this article is to introduce in detail how PHP JavaScript crawls web content. It has a certain reference value. Friends in need can refer to it

PHP js crawls web page content—Let’s take a look at the effect first

Detailed introduction to how PHP+JavaScript crawls web contentDetailed introduction to how PHP+JavaScript crawls web content

How to do it?

We have always thought that only Python can crawl web content. That is because Python itself has many class libraries that are very convenient for crawling web pages, but our method of using PHP js is also very convenient and we can get it. The web content you want, and it doesn’t have to be complicated.

First we need PHP to simulate a request to obtain the HTML of the entire website

  // 允许所有域访问
  header("Access-Control-Allow-Origin: *");
  / 接收一个参数,参数名叫parm
  $parm=$_GET['mod'];
  if (empty($parm)) {  
    $url = 'http://m.80s.tw/';//Detailed introduction to how PHP+JavaScript crawls web content
    $html = file_get_contents($url);
}else{  
  $url = 'http://m.80s.tw/'.$parm;    
  $html = file_get_contents($url);
} 
    preg_match("/<body[^>]*?>(.*\s*?)<\/body>/is",$html,$match1);//正则匹配body里面的内容
    echo $match1[0];//输出网页
Note: If you encounter a file_get_contents error, please try to find extension=php_openssl.dll in php.ini and turn it on OK

Then the front end comes to get the data for processing

First write an asynchronous request

$.ajax({ 
        type:&#39;get&#39;,
        url: &#39;.././admin/test.php&#39;,
        success: function(data) {
        console.log(data)//可以看到获取的HTML,很简单吧,很兴奋吧
        }
    });

After getting the HTML, we can do whatever we want

How to use these HTML? Is this a problem? No
        //首先创建一个容器
        var p = document.createElement(&#39;p&#39;);        
        // 把整个html的字符串存到这个p节点里
        p.innerHTML = data;        
        //然后就可以对p一顿检查了
        //比如获取类list_mov_title下所有的a标签
        var list = p.querySelectorAll(&#39;.list_mov_title a&#39;);        
        //赶紧打印出来看一下
        console.log(list)        
        //想要的东西都在吧
        //然后就把想要的东西往自己的页面里面塞吧

This is the end of a tutorial on crawling web content. If you feel enlightened, please forward it. If you don’t understand, please leave a message

The above is the detailed content of Detailed introduction to how PHP+JavaScript crawls web content. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn