search
HomeBackend DevelopmentPHP TutorialHow to grab BT Paradise movie data

I had a rest at night and wanted to watch two good movies.

I searched for a long time but couldn’t find what I wanted to watchHow to grab BT Paradise movie data.

I suddenly thought that someone had crawled Zhihu’s user data before. I had a whimHow to grab BT Paradise movie data,

It’s okay to crawl down the movie information of BT Paradise,I can check the database directly next time. How to grab BT Paradise movie dataHow to grab BT Paradise movie data

I can only say that I am so bored How to grab BT Paradise movie data, haha, I can still code ^_^


1. Grab the website html source code

<span style="font-size:24px;">$url = "www.bttiantang.cc";
$html = shell_exec("curl $url");</span>

2. Get the total number of pages, Total number of movies (regular matching)

<span style="font-size:24px;">preg_match("/<span class='\"pageinfo\"'>.*?/", $html, $pageCount);
preg_match_all("/\d{1,10000}/",$pageCount[0],$pageCount);</span></span>

3. Capture movie information (regular matching information)

<span style="font-size:24px;">preg_match("/\d{4}\/\d{2}\/\d{2}/" , $pageInfo[0][$i], $updateTime);

preg_match("/<font color='\"#FF6600\"'>(.*?)<i>/" , $pageInfo[0][$i], $movieName);
        
preg_match("/<strong>(\d{1})/" , $pageInfo[0][$i], $movieScore_int);
     
preg_match("/<em class='\"fm\"'>(\d{1})/" , $pageInfo[0][$i], $movieScore_decimal);
        
preg_match("/href=\"(.*?)\"/" , $pageInfo[0][$i], $movieUrl);
      
preg_match("/<p class='\"des\"'>(.*?)/" , $pageInfo[0][$i], $actor);
       </p></em></strong></i></font></span>

4. Insert into the database and you’re done

Generally speaking, the speed of php crawling is quite fast. It takes less than 4 minutes to collect more than 20,000 pieces of information.

start:01:22:54

end:01:26:11



Attached database screenshot:



Attached source code:

<?php $url = "www.bttiantang.cc";
$html = shell_exec("curl $url");

preg_match("/<span class=\"pageinfo\">.*?/", $html, $pageCount);
preg_match_all("/\d{1,10000}/",$pageCount[0],$pageCount);

$pageSize = intval($pageCount[0][0]);
$movieCount = $pageCount[0][1];

$conn = mysql_connect('***','***','');
mysql_select_db('***',$conn);
mysql_query('set names utf8',$conn);

for($j=1;$j.*?/s", $movieHtml, $pageInfo);
    for($i=0;$i<count preg_match ad if str_replace color='\"#FF6600\"'>(.*?)<i>/" , $pageInfo[0][$i], $movieName);
        /*****same conditions*****/
        if(empty($movieName))
            preg_match("/<b>(.*?)<i>/" , $pageInfo[0][$i], $movieName);
        if(empty($movieName))
            preg_match("/<b>(.*?)/" , $pageInfo[0][$i], $movieName);
        /************************/
        $movieName = $movieName[1];

        preg_match("/<strong>(\d{1})/" , $pageInfo[0][$i], $movieScore_int);
        $movieScore_int = $movieScore_int[1];
        preg_match("/<em class='\"fm\"'>(\d{1})/" , $pageInfo[0][$i], $movieScore_decimal);
        $movieScore_decimal = $movieScore_decimal[1];
        $movieScore = floatval($movieScore_int.'.'.$movieScore_decimal);

        preg_match("/href=\"(.*?)\"/" , $pageInfo[0][$i], $movieUrl);
        $movieUrl = $movieUrl[1];

        preg_match("/<p class='\"des\"'>(.*?)/" , $pageInfo[0][$i], $actor);
        $movieActor = str_replace("<em>",'',str_replace("</em>",'',$actor[1]));

        mysql_unbuffered_query("insert into movie (name,actor,url,update_ts,score) values ('$movieName','$movieActor','$movieUrl',<span style="white-space:pre">	</span>'$updateTime','$movieScore')");
    }

}

?></p></em></strong></b></i></b></i></count>

This movie information is grabbed from BT Paradise and does not involve confidential information. Therefore, I do not bear any legal responsibility!

If any relevant movie information involves your copyright or intellectual property rights or other interests, please inform us and it will be deleted as soon as possible after confirmation.

Copyright Statement: This article is an original article by the blogger and may not be reproduced without the blogger's permission.

The above introduces how to crawl BT Paradise movie data, including aspects of content. I hope it will be helpful to friends who are interested in PHP tutorials.

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
php提交表单通过后,弹出的对话框怎样在当前页弹出,该如何解决php提交表单通过后,弹出的对话框怎样在当前页弹出,该如何解决Jun 13, 2016 am 10:23 AM

php提交表单通过后,弹出的对话框怎样在当前页弹出php提交表单通过后,弹出的对话框怎样在当前页弹出而不是在空白页弹出?想实现这样的效果:而不是空白页弹出:------解决方案--------------------如果你的验证用PHP在后端,那么就用Ajax;仅供参考:HTML code<form name="myform"

Microsoft正在推出Windows 11 23H2版本到带有Copilot的发布预览频道Microsoft正在推出Windows 11 23H2版本到带有Copilot的发布预览频道Sep 28, 2023 pm 07:17 PM

每个人都在期待今天的Windows1123H2发布。事实上,Microsoft刚刚启动了对发布预览版的更新,这是正式发布阶段之前最接近的频道。被称为Build22631的Microsoft表示,他们正在推出新的更名聊天应用程序,电话链接和一起玩小部件,这些小部件在过去几个月中已在其他内部渠道中进行了测试。“这个新的更新将具有与Windows11版本22H2相同的服务分支和代码库,并将与所有新宣布的功能累积,包括Windows中的Copilot(预览版),”Microsoft承诺。雷德蒙德官员进一

Match在java中的匹配方法Match在java中的匹配方法Apr 28, 2023 pm 10:31 PM

说明match用于匹配操作,其返回值为boolean类型。通过match,可以简单地验证list中是否存在某种要素。实例//验证list中string是否有以a开头的,匹配到第一个,即返回truebooleananyStartsWithA=stringCollection.stream().anyMatch((s)->s.startsWith("a"));System.out.println(anyStartsWithA);//true//验证list中string是否

Java中如何使用正则表达式匹配字符串?Java中如何使用正则表达式匹配字符串?Apr 19, 2023 pm 02:37 PM

概念1、各种Match操作可用于判断给定的Predicate是否符合Stream的要素。2、Match操作是终端操作,返回布尔值。实例booleananyStartsWithA=stringCollection.stream().anyMatch((s)->s.startsWith("a"));System.out.println(anyStartsWithA);//truebooleanallStartsWithA=stringCollection.stream().

java Match怎么使用java Match怎么使用Apr 18, 2023 pm 01:55 PM

概念1、各种Match操作可用于判断给定的Predicate是否符合Stream的要素。2、Match操作是终端操作,返回布尔值。实例booleananyStartsWithA=stringCollection.stream().anyMatch((s)->s.startsWith("a"));System.out.println(anyStartsWithA);//truebooleanallStartsWithA=stringCollection.stream().

图片消失怎么解决图片消失怎么解决Apr 07, 2024 pm 03:02 PM

图片消失如何解决先是图片文件上传$file=$_FILES['userfile'];  if(is_uploaded_file($file['tmp_name'])){$query=mysql_query("INSERT INTO gdb_banner(image_src ) VALUES ('images/{$file['name'

请教怎么修改url某一参数的参数值呢?是要拆开了再拼回去吗请教怎么修改url某一参数的参数值呢?是要拆开了再拼回去吗Jun 13, 2016 am 10:24 AM

请问如何修改url某一参数的参数值呢?是要拆开了再拼回去吗?那么请问如何修改url某一参数的参数值呢?是要拆开了再拼回去吗?http://127.0.0.1/myo/newuser.php?mod=search&type=fastone比如现在我要修改mod=new要怎么做呢?------解决方案--------------------发送了请求

不用数据库来实现用户的简单的下载,代码如下,但是却不能下载,请高手找下原因,文件路劲什么的没有关问题不用数据库来实现用户的简单的下载,代码如下,但是却不能下载,请高手找下原因,文件路劲什么的没有关问题Jun 13, 2016 am 10:15 AM

不用数据库来实现用户的简单的下载,代码如下,但是却不能下载,请高手找下原因,文件路劲什么的没问题。<?phpfunction down_file($file_name,$file_sub_dir){//为防止乱码使用函数iconv$file_name=iconv("utf-8","gb2312",$file_

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!