Home  >  Article  >  PHP Framework  >  Using simple_html_dom to crawl and display the entire novel in laravel

Using simple_html_dom to crawl and display the entire novel in laravel

L先生
L先生Original
2020-05-07 14:14:382680browse

As mentioned in Programmers also read novels with advertisements, many novel websites basically have very annoying advertisements, or add links to the overall div, and they will jump to some websites if they are accidentally touched. Even in an infinite loop, some mobile apps also have a lot of ads. This article will apply it to the laravel framework. It is best to understand the previous article first and then deploy it yourself.

1. Introduce third-party classes into laravel

1. Create a new folder in the app directory under the project root directory and name it Lib (custom name )

2. If you introduce many third-party libraries, you can create several new directory categories under Lib. Since only one class is introduced, there is no new folder here. (Defined by yourself according to the number of imported classes)

Copy simple_html_dom.php to Lib

3. Find the composer.json file in the project root directory and write the path of the third-party class Enter the classmap under autoload so that it can be loaded automatically

"autoload": {
"classmap": [
"database/seeds",
"database/factories" ,
"app/Lib/simple_html_dom.php"
]
},

4. Switch to the project root directory in the cmd console and execute the command:

composer dumpautoload

5. Use this class in the controller

use simple_html_dom;

$html = new simple_html_dom(); use

2. Create routing

Route::get('/novel_list','index\Spnovel@index');

3. Create controller Spnovel.php

<?php
namespace App\Http\Controllers\index;
use simple_html_dom;
use Illuminate\Http\Request;
use App\Http\Controllers\Controller;
class Spnovel extends Controller
{
	public function index(){
		$url = "https://www.7kzw.com/85/85445/";
		$list_html = mySpClass::getCurl($url);
		$data[&#39;List&#39;] = self::getList($list_html);
		return view(&#39;index.spnovel.index&#39;,$data);
	}
	private static function getList($list_html){
		$html = new simple_html_dom();
		@$html->load($list_html);
		$list = $html->find(&#39;#list dd a&#39;);
		foreach ($list as $k=>$v) {
			$arr1=$arr2=[];
			$p1 = &#39;/<a .*?>(.*?)<\/a>/i&#39;;
			$p2 = &#39;/<a .*? href="(.*?)">.*?<\/a>/i&#39;;
			preg_match($p1,$v->outertext,$arr1);
			preg_match($p2,$v->outertext,$arr2);
			$content[$k][0]=$arr1[1];
			$content[$k][1]=$arr2[1];
		}
		array_splice($content,0,12); 
		return $content;
	}
}
class mySpClass{
	// 向服务器发送最简单的get请求
	public static function getCurl($url,$header=null){
		// 1.初始化
		$ch = curl_init($url);   //请求的地址
		// 2.设置选项
		curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);//获取的信息以字符串返回,而不是直接输出(必须) 
		curl_setopt($ch,CURLOPT_TIMEOUT,10);//超时时间(必须)
		curl_setopt($ch, CURLOPT_HEADER,0);// 	启用时会将头文件的信息作为数据流输出。 
		//参数为1表示输出信息头,为0表示不输出
		curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false); //不验证证书
		curl_setopt($ch,CURLOPT_SSL_VERIFYHOST,false); //不验证证书
		if(!empty($header)){
			curl_setopt($ch,CURLOPT_HTTPHEADER,$header);//设置头信息
		}else{
			$_head = [
			&#39;User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0&#39;
			]; 
			curl_setopt($ch,CURLOPT_HTTPHEADER,$_head);
		}
		// 3.执行
		$res = curl_exec($ch);
		// 4.关闭
		curl_close($ch);
		return $res;
	}
}

Explanation of the above code: First of all, you need to understand the laravel framework and the php class.

After accessing the above route, the index method in the Spnovel.php controller is run. $url is the chapter of a certain novel. The address of the list, use it as a parameter to run the getcurl method in the custom class mySpClass, and return the html document string of this page. Run the getList method in this class, the parameter is the html string that needs to be parsed. Privatize this method, use simple_html_dom parsing, and configure regular rules to extract the URL address and chapter name of each chapter. And return this array, through return view('index.spnovel.index',$data); will open index/spnovel/index.blade.php, please see index.blade.php

four , Create the view index.blade.php

<!DOCTYPE html>
<html>
<head>
	<title>爬取的小说列表</title>
	<style type="text/css">
	body{padding:0px;margin:0px;}
	#lists{width:100%;padding:30px 50px;box-sizing:border-box;}
	ul{margin:0;padding: 0;overflow:hidden;}
	ul li{list-style:none;display:inline-block;float:left;width:25%;color:#444;}
	ul li:hover{color:#777;cursor: pointer;}
	img {z-index:-1;width:100%;height:100%;position:fixed;}
	</style>
</head>
<body>
	<img  src="/static/img/index/novelbg.jpg" alt="Using simple_html_dom to crawl and display the entire novel in laravel" >
	<div id="lists">
		<ul>
			@foreach($List as $item)
			<li>
			<a href="/novel_con{{$item[1]}}">{{$item[0]}}</a>
			</li>
			@endforeach
		</ul>		
	</div>
</body>
</html>

Explanation of the above code: The css is simply written here, and the img is used as the background image. In the loop li in ul, {{$item[1]}} is the obtained address parameter, and {{$item[0]}} is the obtained chapter name. Take a look at the array and the final effect.

Using simple_html_dom to crawl and display the entire novel in laravel

5. Run

Using simple_html_dom to crawl and display the entire novel in laravel

The following is the content of each chapter

Look at the routing first:

Route::get(&#39;/novel_con/{a}/{b}/{c}&#39;,&#39;index\Spnovel@get_nContent&#39;);

This corresponds to the url parameters of each chapter. For example, the parameters of a certain chapter are: novel_con/85/85445/27248645.html

Writeget_nContent method:

public function get_nContent(Request $req){
		$url1 = $req->a.&#39;/&#39;.$req->b.&#39;/&#39;.$req->c;
		$url = "https://www.7kzw.com/".$url1;
		$res = mySpClass::getCurl($url);//获得
		// 开始解析
		$data[&#39;artic&#39;]= self::getContent($res);
		$next = (int)$req->c;
		$next = $next+1;
		$data[&#39;artic&#39;][&#39;next&#39;]="/novel_con/".$req->a.&#39;/&#39;.$req->b.&#39;/&#39;.$next.&#39;.html&#39;;
		return view(&#39;index.spnovel.ncontent&#39;,$data);
	}
private static function getContent($get_html){
		$html = new simple_html_dom();
		@$html->load($get_html);
		$h1 = $html->find(&#39;.bookname h1&#39;);
		foreach ($h1 as $k=>$v) {
			$artic[&#39;title&#39;] = $v->innertext;
		}
		// 查找小说的具体内容
		$divs = $html->find(&#39;#content&#39;);
		foreach ($divs as $k=>$v) {
			$content = $v->innertext;
		}
		// 正则替换去除多余部分
		$pattern = "/(<p>.*?<\/p>)|(<div .*?>.*?<\/div>)/";
		$artic[&#39;content&#39;] = preg_replace($pattern,&#39;&#39;,$content);
		return $artic;
	}

Explanation:$req->a,$req- >b, $req->c, are three parameters respectively, and then merge them into a complete address to request a certain chapter, and then obtain the html string of a certain chapter through mySpClass::getCurl. Then use getContent in this class to parse this page. First, look at the parsing method, parse the title and content of the chapter with the previous article, write it into the array, and remove the redundant text advertisement part. $next is the address of the next chapter stored, which is used to jump to the chapter details page.

View ncontent.blade.php

<!DOCTYPE html>
<html>
<head>
	<title>{{$artic[&#39;title&#39;]}}</title>
	<style type="text/css">
	h2{text-align:center;padding-top:30px;}
	div{margin:20px 50px;font-size:20px;}
	img {z-index:-1;width:100%;height:100%;position:fixed;}
	.next {position:fixed;right:10px;bottom:20px;background:coral;border-radius:3px;padding:4px;}
	.next:hover{color:#fff;}
	</style>
</head>
<body>
	<img  src="/static/img/index/novelbg.jpg" alt="Using simple_html_dom to crawl and display the entire novel in laravel" >
	<h2>{{$artic[&#39;title&#39;]}}</h2>
	<a href="{{$artic[&#39;next&#39;]}}" class="next">下一章</a>
	<div>
		{!!$artic[&#39;content&#39;]!!}
	</div>
</body>
</html>

Explanation: Because there is only the current article, there is no need to loop, { {$artic['title']}} is the title, and can also be written into the title. The way {!!$artic['content']!!} is written is that there is no need to escape the content of the article, otherwise there will be many other characters, such as
, etc. The address of the button for the next chapter can be passed directly. position:fixed fixes the positioning button, and you can go to the next chapter at any time.

Run:

Using simple_html_dom to crawl and display the entire novel in laravel

Summary: The most important part of this article is to introduce third-party classes that can be applied He, and also the basics of laravel, are more accustomed to using the controller view. If you use the model, please write your own verification.

This is enough for a novel. Of course, we can expand it and write out the novel list of the entire site. It will be even more perfect if we continue to pass the appropriate parameters.

The above is the detailed content of Using simple_html_dom to crawl and display the entire novel in laravel. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn