Home  >  Article  >  Backend Development  >  Take you ten minutes to understand the process of implementing a crawler in PHP

Take you ten minutes to understand the process of implementing a crawler in PHP

烟雨青岚
烟雨青岚forward
2020-07-16 13:49:483705browse

Take you ten minutes to understand the process of implementing a crawler in PHP

Text information

We try to get the table information. Here, we use the class schedule of a certain school instead. :

Take you ten minutes to understand the process of implementing a crawler in PHP

## Next we will add the code:

a.php

 <?php  header( "Content-type:text/html;Charset=utf-8" ); 
$ch = curl_init();        $url ="表的链接";
        curl_setopt ( $ch , CURLOPT_USERAGENT ,"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.113 Safari/537.36" );
        curl_setopt($ch,CURLOPT_URL,$url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);        $content=curl_exec($ch);
        preg_match_all("/<td rowspan=\"\d\">(.*?)<\/td>\n<td rowspan=\"\d\">(.*?)<\/td><td rowspan=\"\d\" align=\"\w+\">(.*?)<\/td><td rowspan=\"\d\" align=\"\w+\">(.*?)<\/td><td>(.*?)<\/td>\n<td>(.*?)<\/td><td>(.*?)<\/td>/",$content,$matchs,PREG_SET_ORDER);//匹配该表所用的正则
        var_dump($matchs);

Then let’s run it:

Take you ten minutes to understand the process of implementing a crawler in PHP

The class schedule was successfully obtained;

Picture acquisition

Absolute link

We take the homepage of Baidu Gallery as an example


Take you ten minutes to understand the process of implementing a crawler in PHPb.php

  <?php  header( "Content-type:text/html;Charset=utf-8" );  


    $ch = curl_init();    $url="http://image.baidu.com/";
    curl_setopt ( $ch , CURLOPT_USERAGENT ,"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.113 Safari/537.36" );
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);    $content=curl_exec($ch);    $string=file_get_contents($url); 
    preg_match_all("/<img ([^ alt="Take you ten minutes to understand the process of implementing a crawler in PHP" >]*)\s*src=(&#39;|\")([^&#39;\"]+)(&#39;|\")/", 
                    $string,$matches);    $new_arr=array_unique($matches[3]);     foreach($new_arr as $key){ 
        echo "<img  src=$key alt="Take you ten minutes to understand the process of implementing a crawler in PHP" >";
     }

Then, we get the following page:
Take you ten minutes to understand the process of implementing a crawler in PHP

Relative links

Most of the links to pictures in Baidu Gallery are absolute links, so when we encounter web page pictures that are relative links time, how should we deal with it? In fact, it is very simple. We only need to change the loop part to


Take you ten minutes to understand the process of implementing a crawler in PHP

Then we can also output the image in the browser;

Thank you for reading, I hope Everyone benefits.

Recommended tutorial: "

php tutorial"

The above is the detailed content of Take you ten minutes to understand the process of implementing a crawler in PHP. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:csdn.net. If there is any infringement, please contact admin@php.cn delete