PHP经过长时间的发展,很多用户都很了解PHP了,我们现在可以利用PHP函数实现采集器程序。何为采集器,通常又叫小偷程序,主要是用来抓取别人网页内容的。关于采集器的制作,其实并不难,就是远程打开要采集的网页,然后用正则表达式将需要的内容匹配出来,只要稍微有点正则表达式的基础,都能做出自己的采集器来的。
这样还不够,还需要一个切取PHP函数:
<ol class="dp-xml"> <li class="alt"><span><span>function cut($string,$start,$end){ </span></span></li> <li class=""> <span>$</span><span class="attribute"><font color="#ff0000">message</font></span><span> = </span><span class="attribute-value"><font color="#0000ff">explode</font></span><span>($start,$string); </span> </li> <li class="alt"> <span>$</span><span class="attribute"><font color="#ff0000">message</font></span><span> = </span><span class="attribute-value"><font color="#0000ff">explode</font></span><span>($end,$message[1]); return $message[0];}其中$string为要被切取的内容,$start为开始的地方,$end为结束的地方。取出分类号: </span> </li> <li class=""><span> </span></li> <li class="alt"> <span>$</span><span class="attribute"><font color="#ff0000">start</font></span><span> = </span><span class="attribute-value"><font color="#0000ff">"Html/Book/"</font></span><span>; </span> </li> <li class=""> <span>$</span><span class="attribute"><font color="#ff0000">end</font></span><span> </span> </li> <li class="alt"> <span>= </span><span class="attribute-value"><font color="#0000ff">"List.shtm"</font></span><span>; </span> </li> <li class=""> <span>$</span><span class="attribute"><font color="#ff0000">typeid</font></span><span> = </span><span class="attribute-value"><font color="#0000ff">cut</font></span><span>($typeid[0][0],$start,$end); </span> </li> <li class="alt"> <span>$</span><span class="attribute"><font color="#ff0000">typeid</font></span><span> = </span><span class="attribute-value"><font color="#0000ff">explode</font></span><span>("/",$typeid);[/php] </span> </li> <li class=""><span> </span></li> <li class="alt"><span>这样,$typeid[0]就是我们要找的分类号了。方法如下: </span></li> <li class=""><span> </span></li> <li class="alt"> <span>$</span><span class="attribute"><font color="#ff0000">ustart</font></span><span> = </span><span class="attribute-value"><font color="#0000ff">""</font></span><span>"; </span> </li> <li class=""> <span>$</span><span class="attribute"><font color="#ff0000">uend</font></span><span> </span> </li> <li class="alt"> <span>= </span><span class="attribute-value"><font color="#0000ff">""</font></span><span>"; </span> </li> <li class=""><span>//t表示title的缩写 </span></li> <li class="alt"> <span>$</span><span class="attribute"><font color="#ff0000">tstart</font></span><span> = </span><span class="attribute-value"><font color="#0000ff">">"</font></span><span>; </span> </li> <li class=""> <span>$</span><span class="attribute"><font color="#ff0000">tend</font></span><span> </span> </li> <li class="alt"> <span>= </span><span class="attribute-value"><font color="#0000ff">"</font></span><span>; </span> </li> <li class=""><span>//取路径,例如:123.shtm,2342.shtm,233.shtm </span></li> <li class="alt"><span>preg_match_all("/"[0-9]{1,}.(shtm)"/is",$chapterurl,$url); </span></li> <li class=""><span>//取标题,例如:第一章 九世善人 </span></li> <li class="alt"> <span>preg_match_all("/</span><strong><font color="#006699"><span class="tag"><span class="tag-name">a</span></span></font></strong><span> </span><span class="attribute"><font color="#ff0000">href</font></span><span>="[0-9]{1,}.shtm"(.*?)</span><span class="tag"><strong><font color="#006699"></font></strong></span><span>/a</span><span class="tag"><strong><font color="#006699">></font></strong></span><span>/is",$file,$title); </span> </li> <li class=""> <span>$</span><span class="attribute-value"><font color="#0000ff">count</font></span><span class="attribute"><font color="#ff0000">countcount</font></span><span> = count($url[0]); </span> </li> <li class="alt"> <span>for($</span><span class="attribute"><font color="#ff0000">i</font></span><span>=</span><span class="attribute-value"><font color="#0000ff">0</font></span><span>;$i</span><span class="tag"><strong><font color="#006699"></font></strong></span><span>=$count;$i++) </span> </li> <li class=""><span>{ </span></li> <li class="alt"> <span>$</span><span class="attribute"><font color="#ff0000">u</font></span><span> = </span><span class="attribute-value"><font color="#0000ff">cut</font></span><span>($url[0][$i],$ustart,$uend); </span> </li> <li class=""> <span>$</span><span class="attribute"><font color="#ff0000">t</font></span><span> = </span><span class="attribute-value"><font color="#0000ff">cut</font></span><span>($title[0][$i],$tstart,$tend); </span> </li> <li class="alt"><span>$array[$u] = $t; </span></li> <li class=""><span>} </span></li> </ol>
$array数组就是所有的章节地址了,到这里,采集器就完成一半了,剩下的就是循环打开每个章节地址,读取,然后将内容匹配出来。这个比较简单,这里就不详细叙述了。好了,今天就先写到这吧,第一次写这么长的文章,语言组织方面难免有问题,还请大家多包涵!
陳述
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn

熱AI工具

Undresser.AI Undress
人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover
用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool
免費脫衣圖片

Clothoff.io
AI脫衣器

AI Hentai Generator
免費產生 AI 無盡。

熱門文章
R.E.P.O.能量晶體解釋及其做什麼(黃色晶體)
3 週前By尊渡假赌尊渡假赌尊渡假赌
R.E.P.O.最佳圖形設置
3 週前By尊渡假赌尊渡假赌尊渡假赌
刺客信條陰影:貝殼謎語解決方案
2 週前ByDDD
R.E.P.O.如果您聽不到任何人,如何修復音頻
3 週前By尊渡假赌尊渡假赌尊渡假赌
WWE 2K25:如何解鎖Myrise中的所有內容
3 週前By尊渡假赌尊渡假赌尊渡假赌

熱工具

Dreamweaver Mac版
視覺化網頁開發工具

EditPlus 中文破解版
體積小,語法高亮,不支援程式碼提示功能

WebStorm Mac版
好用的JavaScript開發工具

SAP NetWeaver Server Adapter for Eclipse
將Eclipse與SAP NetWeaver應用伺服器整合。

SublimeText3 Mac版
神級程式碼編輯軟體(SublimeText3)