Home >Backend Development >PHP Tutorial >Take you ten minutes to understand the process of implementing a crawler in PHP

Take you ten minutes to understand the process of implementing a crawler in PHP

烟雨青岚forward: 2020-07-16 13:49:483837browse

Text information

We try to get the table information. Here, we use the class schedule of a certain school instead. ：

Take you ten minutes to understand the process of implementing a crawler in PHP

## Next we will add the code:

a.php

 <?php  header( "Content-type:text/html;Charset=utf-8" ); 
$ch = curl_init();        $url ="表的链接";
        curl_setopt ( $ch , CURLOPT_USERAGENT ,"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.113 Safari/537.36" );
        curl_setopt($ch,CURLOPT_URL,$url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);        $content=curl_exec($ch);
        preg_match_all("/<td rowspan=\"\d\">(.*?)<\/td>\n<td rowspan=\"\d\">(.*?)<\/td><td rowspan=\"\d\" align=\"\w+\">(.*?)<\/td><td rowspan=\"\d\" align=\"\w+\">(.*?)<\/td><td>(.*?)<\/td>\n<td>(.*?)<\/td><td>(.*?)<\/td>/",$content,$matchs,PREG_SET_ORDER);//匹配该表所用的正则
        var_dump($matchs);

Then let’s run it:

Take you ten minutes to understand the process of implementing a crawler in PHP

The class schedule was successfully obtained;

Picture acquisition

Absolute link

We take the homepage of Baidu Gallery as an example

Take you ten minutes to understand the process of implementing a crawler in PHP b.php

  <?php  header( "Content-type:text/html;Charset=utf-8" );  


    $ch = curl_init();    $url="http://image.baidu.com/";
    curl_setopt ( $ch , CURLOPT_USERAGENT ,"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.113 Safari/537.36" );
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);    $content=curl_exec($ch);    $string=file_get_contents($url); 
    preg_match_all("/<img  src="/static/imghwm/default1.png"  data-src="https://img.php.cn/upload/image/301/164/948/1594878376637006.png"  class="lazy" ([^ alt="Take you ten minutes to understand the process of implementing a crawler in PHP" >]*)\s*src=(&#39;|\")([^&#39;\"]+)(&#39;|\")/", 
                    $string,$matches);    $new_arr=array_unique($matches[3]);     foreach($new_arr as $key){ 
        echo "<img  src="/static/imghwm/default1.png"  data-src="https://img.php.cn/upload/image/301/164/948/1594878376637006.png"  class="lazy"  src=$key alt="Take you ten minutes to understand the process of implementing a crawler in PHP" >";
     }

Then, we get the following page:
Take you ten minutes to understand the process of implementing a crawler in PHP

Relative links

Most of the links to pictures in Baidu Gallery are absolute links, so when we encounter web page pictures that are relative links time, how should we deal with it? In fact, it is very simple. We only need to change the loop part to

Take you ten minutes to understand the process of implementing a crawler in PHP

Then we can also output the image in the browser;

Thank you for reading, I hope Everyone benefits.

Take you ten minutes to understand the process of implementing a crawler in PHP

Related articles