Home > Article > Backend Development > How to grab BT Paradise movie data
I had a rest at night and wanted to watch two good movies.
I searched for a long time but couldn’t find what I wanted to watch.
I suddenly thought that someone had crawled Zhihu’s user data before. I had a whim,
It’s okay to crawl down the movie information of BT Paradise,I can check the database directly next time.
I can only say that I am so bored , haha, I can still code ^_^
1. Grab the website html source code
<span style="font-size:24px;">$url = "www.bttiantang.cc"; $html = shell_exec("curl $url");</span>
<span style="font-size:24px;">preg_match("/<span class=\"pageinfo\">.*?<\/span>/", $html, $pageCount); preg_match_all("/\d{1,10000}/",$pageCount[0],$pageCount);</span>
3. Capture movie information (regular matching information)
<span style="font-size:24px;">preg_match("/\d{4}\/\d{2}\/\d{2}/" , $pageInfo[0][$i], $updateTime); preg_match("/<font color=\"#FF6600\">(.*?)<i>/" , $pageInfo[0][$i], $movieName); preg_match("/<strong>(\d{1})<\/strong>/" , $pageInfo[0][$i], $movieScore_int); preg_match("/<em class=\"fm\">(\d{1})<\/em>/" , $pageInfo[0][$i], $movieScore_decimal); preg_match("/href=\"(.*?)\"/" , $pageInfo[0][$i], $movieUrl); preg_match("/<p class=\"des\">(.*?)<\/p>/" , $pageInfo[0][$i], $actor); </span>
Generally speaking, the speed of php crawling is quite fast. It takes less than 4 minutes to collect more than 20,000 pieces of information.
start:01:22:54
end:01:26:11
Attached database screenshot:
Attached source code:
<?php $url = "www.bttiantang.cc"; $html = shell_exec("curl $url"); preg_match("/<span class=\"pageinfo\">.*?<\/span>/", $html, $pageCount); preg_match_all("/\d{1,10000}/",$pageCount[0],$pageCount); $pageSize = intval($pageCount[0][0]); $movieCount = $pageCount[0][1]; $conn = mysql_connect('***','***',''); mysql_select_db('***',$conn); mysql_query('set names utf8',$conn); for($j=1;$j<=$pageSize;$j++){ $movieHtml = shell_exec("curl $url?PageNo=$j"); preg_match_all("/<div class=\"item cl\">.*?<\/div>/s", $movieHtml, $pageInfo); for($i=0;$i<count($pageInfo[0]);$i++){ preg_match("/\d{4}\/\d{2}\/\d{2}/" , $pageInfo[0][$i], $updateTime); /******clear ad*****/ if(empty($updateTime))continue; /*******************/ $updateTime = str_replace('/','-',$updateTime[0]); preg_match("/<font color=\"#FF6600\">(.*?)<i>/" , $pageInfo[0][$i], $movieName); /*****same conditions*****/ if(empty($movieName)) preg_match("/<b>(.*?)<i>/" , $pageInfo[0][$i], $movieName); if(empty($movieName)) preg_match("/<b>(.*?)<\/b>/" , $pageInfo[0][$i], $movieName); /************************/ $movieName = $movieName[1]; preg_match("/<strong>(\d{1})<\/strong>/" , $pageInfo[0][$i], $movieScore_int); $movieScore_int = $movieScore_int[1]; preg_match("/<em class=\"fm\">(\d{1})<\/em>/" , $pageInfo[0][$i], $movieScore_decimal); $movieScore_decimal = $movieScore_decimal[1]; $movieScore = floatval($movieScore_int.'.'.$movieScore_decimal); preg_match("/href=\"(.*?)\"/" , $pageInfo[0][$i], $movieUrl); $movieUrl = $movieUrl[1]; preg_match("/<p class=\"des\">(.*?)<\/p>/" , $pageInfo[0][$i], $actor); $movieActor = str_replace("<em>",'',str_replace("</em>",'',$actor[1])); mysql_unbuffered_query("insert into movie (name,actor,url,update_ts,score) values ('$movieName','$movieActor','$movieUrl',<span style="white-space:pre"> </span>'$updateTime','$movieScore')"); } } ?>
This movie information is grabbed from BT Paradise and does not involve confidential information. Therefore, I do not bear any legal responsibility!
If any relevant movie information involves your copyright or intellectual property rights or other interests, please inform us and it will be deleted as soon as possible after confirmation.
Copyright Statement: This article is an original article by the blogger and may not be reproduced without the blogger's permission.
The above introduces how to crawl BT Paradise movie data, including aspects of content. I hope it will be helpful to friends who are interested in PHP tutorials.