Some people are used to reading novels, and occasionally read a few chapters. They are all published by Baidu, but there are basically very annoying advertisements. Either add links to the overall div, and if they are accidentally touched, they will jump to some websites or even an endless loop. Some mobile apps also have a lot of ads, so I have nothing to do but write a small program to avoid the annoyance of ads
This article will use php curl to collect the page simple_html_dom parsing to achieve true removal of ads.
Look for a book on any novel website, but this site is particularly tricky on mobile phones because of the above problems:
Just take this This novel will do the surgery. (Disclaimer: This is definitely not promotion, infringement or deletion)
1. Understand the get method of curl
curl is a command line tool that uploads or downloads through the specified URL data and display the data. The c in curl means client, and URL is the URL.
Using cURL in PHP can implement Get and Post request methods
A simple crawl of novels only requires the get method.
The following sample code is an example of obtaining the html of the first chapter novel page through a get request. You only need to change the url parameters.
Initialization, setting options, certificate verification, execution, closing
<?php header("Content-Type:text/html;charset=utf-8"); $url="https://www.7kzw.com/85/85445/27248636.html"; $ch = curl_init($url); //初始化 //设置选项 curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);//获取的信息以字符串返回,而不是直接输出(必须) curl_setopt($ch,CURLOPT_TIMEOUT,10);//超时时间(必须) curl_setopt($ch, CURLOPT_HEADER,0);// 启用时会将头文件的信息作为数据流输出。 //参数为1表示输出信息头,为0表示不输出 curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false); //不验证证书 // 3.执行 $res = curl_exec($ch); // 4.关闭 curl_close($ch); print_r($res); ?>
The comments are particularly detailed. Follow the steps to send a curl get request. If it is a post request, then You need to add an additional setting to set the post option, pass parameters, and finally output the obtained information. The running results are as follows, there is no css rendering.
2. Parse the page
The output page has a lot of unnecessary content and needs to be extracted from all the content To get the content we need, such as the title and the content of each chapter, we need to parse the page.
There are many ways to parse the page. Simple_html_dom is used here. You need to download and reference the simple_html_dom.php class, instance object, and call the internal method. For specific methods, you can check the official website or other documents on the Chinese website.
First analyze the source code of this novel page and look at the elements corresponding to the title and content of this chapter
The first is the title: under h1 under the class bookname
Then the content: Under the div with the id of content,
simple_html_dom can use the find method, similar to jquery. The selector finds the positioned element. For example:
find('.bookname h1'); //Find the h1 title element under class bookname
find('#content'); //Find The content of the chapter with the id of content
The code is added based on the above:
include "simple_html_dom.php"; $html = new simple_html_dom(); @$html->load($res); $h1 = $html->find('.bookname h1'); foreach ($h1 as $k=>$v) { $artic['title'] = $v->innertext; } // 查找小说的具体内容 $divs = $html->find('#content'); foreach ($divs as $k=>$v) { $content = $v->innertext; } // 正则替换去除多余部分 $pattern = "/(<p>.*?<\/p>)|(<div .*?>.*?<\/div>)/"; $artic['content'] = preg_replace($pattern,'',$content); echo $artic['title'].'<br>'; echo $artic['content'];
The content obtained by using the above parsing method is an array, use foreach To obtain the content of the array, regular replacement is used to remove the text advertisements in the text, and the title and novel content are placed in the array. The simplest way to write it is done. The running result is as follows:
# Of course, this way of writing looks uncomfortable, you can encapsulate the function class yourself. The following is a code example I wrote myself. Of course, there are definitely deficiencies, but it can be used as a reference for expansion.
<?php include "simple_html_dom.php"; include "mySpClass.php"; header("Content-Type:text/html;charset=utf-8"); $get_html = get_html($_GET['n']); $artic = getContent($get_html); echo $artic['title'].'<br>'; echo $artic['content']; /** * 获取www.7kzw.com 获取每一章的页面html * @param type $num 第几章,从第一开始(int) * @return 返回字符串 */ function get_html($num){ $start = 27248636; $real_num = $num+$start-1; $url = 'https://www.7kzw.com/85/85445/'.$real_num.'.html'; $header = [ 'User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0' ]; return mySpClass()->getCurl($url,$header); } /** * 获取www.7kzw.com小说标题数组 * @param type $get_html 得到的每一章的页面html * @return 返回$artic数组,['title'=>'','content'=>''] */ function getContent($get_html){ $html = new simple_html_dom(); @$html->load($get_html); $h1 = $html->find('.bookname h1'); foreach ($h1 as $k=>$v) { $artic['title'] = $v->innertext; } // 查找小说的具体内容 $divs = $html->find('#content'); foreach ($divs as $k=>$v) { $content = $v->innertext; } // 正则替换去除多余部分 $pattern = "/(<p>.*?<\/p>)|(<div .*?>.*?<\/div>)/"; $artic['content'] = preg_replace($pattern,'',$content); return $artic; } ?>
<?php class mySpClass{ //单例对象 private static $ins = null; /** * 单例化对象 */ public static function exec() { if (self::$ins) { return self::$ins; } return self::$ins = new self(); } /** * 禁止克隆对象 */ public function __clone() { throw new curlException('错误:不能克隆对象'); } // 向服务器发送最简单的get请求 public static function getCurl($url,$header){ // 1.初始化 $ch = curl_init($url); //请求的地址 // 2.设置选项 curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);//获取的信息以字符串返回,而不是直接输出(必须) curl_setopt($ch,CURLOPT_TIMEOUT,10);//超时时间(必须) curl_setopt($ch, CURLOPT_HEADER,0);// 启用时会将头文件的信息作为数据流输出。 //参数为1表示输出信息头,为0表示不输出 curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false); //不验证证书 curl_setopt($ch,CURLOPT_SSL_VERIFYHOST,false); //不验证证书 if(!empty($header)){ curl_setopt($ch,CURLOPT_HTTPHEADER,$header);//设置头信息 } // 3.执行 $res = curl_exec($ch); // 4.关闭 curl_close($ch); return $res; } } //curl方法不存在就设置一个curl方法 if (!function_exists('mySpClass')) { function mySpClass() { return mySpClass::exec(); } } ?>
The final running result of the above example code: enter the number in the chapter and pass the parameters through $_GET['n']
Summary:
Knowledge points: curl (tips: curl module collects any web page php class), regular, parsing tool simple_html_dom
Although the writing method has been initially improved , but it is best to deploy your own server to achieve the best results. Otherwise, you can only watch it on a computer, which is not very convenient. You may be more willing to tolerate advertisements.
The above are the details of using php curl to collect pages and using simple_html_dom to parse them. For more information, please pay attention to other related articles on the php Chinese website!
The above is the detailed content of Do programmers still read novels with advertisements?. For more information, please follow other related articles on the PHP Chinese website!

学习C语言的魅力:解锁程序员的潜力随着科技的不断发展,计算机编程已经成为了一个备受关注的领域。在众多编程语言中,C语言一直以来都备受程序员的喜爱。它的简单、高效以及广泛应用的特点,使得学习C语言成为了许多人进入编程领域的第一步。本文将讨论学习C语言的魅力,以及如何通过学习C语言来解锁程序员的潜力。首先,学习C语言的魅力在于其简洁性。相比其他编程语言而言,C语

上周我们做了一次关于《2023PHP创业》的公益直播,很多同学咨询具体有哪些接单平台,下面php中文网整理了22个还算靠谱的平台,以供参考!

本篇文章给大家介绍如何用前端代码实现一个烟花绽放的绚烂效果,其实主要就是用前端三剑客来实现,也就是HTML+CSS+JS,下面一起来看一下,作者会解说相应的代码,希望对需要的朋友有所帮助。

程序员的工作职责:1、负责软件项目的详细设计、编码和内部测试的组织实施;2、协助项目经理和相关人员同客户进行沟通,保持良好的客户关系;3、参与需求调研、项目可行性分析、技术可行性分析和需求分析;4、熟悉并熟练掌握交付软件部开发的软件项目的相关软件技术;5、负责向项目经理及时反馈软件开发中的情况;6、参与软件开发和维护过程中重大技术问题的解决;7、负责相关技术文档的拟订等等。

520将至,年度虐汪大戏他又双叒叕来啦!想看看最理性的代码和最浪漫的告白究竟能碰撞出怎样的火花?下面带你逐一领略最全最完整的告白代码,看看程序员们的浪漫是否能够掳获各位心目中女神的芳心呢?

VSCode历史版本的下载安装 VSCode安装 下载 安装 参考资料 VSCode安装 Windows版本:Windows10 VSCode版本:VScode1.65.0(64位User版本) 本文

终端仿真器允许您模仿标准计算机终端的功能。有了它,您可以执行数据传输并远程访问另一台计算机。当与Windows11等高级操作系统结合使用时,这些工具的创造性可能性是无穷无尽的。但是,有很多第三方终端仿真器可用。因此,很难选择合适的。但是,正如我们对必备的Windows11应用所做的那样,我们选择了您可以使用的最佳终端并提高您的工作效率。我们如何选择最好的Windows11终端模拟器?在选择此列表中的工具之前,我们的专家团队首先测试了它们与Windows11的兼容性。我们还检查了他们

由10枚IOI金牌在手的创业团队CognitionAI开发的全球首个AI程序员智能体Devin,一发布就让科技圈坐立不安。在演示中,Devin几乎已经可以独立完成许多需要普通程序员花费大量时间才能完成的任务,而且表现一点也不逊色于普通程序员。但是,产品能力的边界在哪里,实际体验和演示时候有差距,还的看上手实测之后的效果。这位斯坦福的小哥在Devin发布的第一时间就联系了团队,获得了第一手体验的资格。他让Devin帮它做了几个难度不一的项目,录制了一个视频,在推上写下了自己的使用感受。下一个任务是


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 English version
Recommended: Win version, supports code prompts!

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Zend Studio 13.0.1
Powerful PHP integrated development environment

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),
