最近有从百度贴吧上批量下载图片的需求,即从某一个贴吧下载所有图片。
本来打算用python写的,因为对python不熟悉,试了minidom,HtmlParser等,感觉上不了手,还是使用比较擅长的php语言吧。
以下是源代码:
1 <?php 2 //运行时间 3 @set_time_limit(60); 4 //贴吧名称 5 $tbname = "%CD%BC%C6%AC"; 6 //抓取类型 0-按照帖子顺序 1-按照贴图顺序 7 $type = 0; 8 //列表页url 9 $listurltpl = "http://tieba.baidu.com/f?kw=%s".($type?"&tp=1":"&pn=");10 //图册页url11 $galleryurltpl = "http://tieba.baidu.com/photo/bw/picture/guide?kw=%s&tid=%s&next=9999";12 //图片url13 $imageurltpl = "http://imgsrc.baidu.com/forum/pic/item/%s.jpg";14 //本地的目录15 $savepath = "h:/images/";16 //帖子子文件夹17 $filedirtpl = $savepath."%s/";18 //图片文件19 $filenametpl = $savepath."%s/%s.jpg";20 21 $listurl = sprintf($listurltpl,$tbname);22 //抓取起始点23 $pn = 0;24 while(1)25 {26 if (!$type) $listurl .= $pn;27 //得到列表页源代码28 $listhtml = file_get_contents($listurl);29 //匹配出帖子id30 if($type)31 preg_match_all('/<div class=\"aep_wrapper\" id=\"pic_item_(\d+)\" tid=\"\d+\">/',$listhtml,$m1);32 else33 preg_match_all('/<ul class=\"threadlist_media j_threadlist_media\" id=\"fm(\d+)\"/',$listhtml,$m1);34 //得到帖子id列表35 $tidlist = $m1[1];36 echo "Fetching ... <br /> \r\n";37 foreach($tidlist as $tid)38 {39 echo "--Gallery $tid <br /> \r\n";40 $galleryurl = sprintf($galleryurltpl,$tbname,$tid);41 //得到帖子图册的源代码42 $galleryhtml = file_get_contents($galleryurl);43 //匹配出图片id44 preg_match_all('/\{\"original\":\{\"id\":\"(\w+)\"/',$galleryhtml,$m2);45 //得到图片id列表46 $pidlist = $m2[1];47 foreach($pidlist as $pid)48 {49 echo "----Picture {$tid}/{$pid}.jpg ";50 $filedir = sprintf($filedirtpl,$tid);51 $filename = sprintf($filenametpl,$tid,$pid);52 //文件是否存在53 if(!is_file($filename))54 {55 $imageurl = sprintf($imageurltpl,$pid);56 //下载图片57 $imagebin = file_get_contents($imageurl);58 //目录是否存在59 if(!is_dir($filedir))60 mkdir($filedir);61 //保存图片62 file_put_contents($filename,$imagebin);63 $rnd = rand(2000,5000);64 echo "Downloaded! ";65 //延时休息66 sleep(1.0*$rnd/1000);67 echo "Sleep $rnd us <br />\r\n";68 }69 else70 echo "Existed! <br />\r\n";71 }72 }73 //翻到下一页74 if (!$type) $pn += 50;75 }
运行测试:
程序基本上可以满足要求,但是长时间抓取图片时,百度会弹出验证码,此时使用猫重新拨号即可更换IP继续抓取图片。
(仅供学习参考,请勿用来做非法的事情。)

Thedifferencebetweenunset()andsession_destroy()isthatunset()clearsspecificsessionvariableswhilekeepingthesessionactive,whereassession_destroy()terminatestheentiresession.1)Useunset()toremovespecificsessionvariableswithoutaffectingthesession'soveralls

Stickysessionsensureuserrequestsareroutedtothesameserverforsessiondataconsistency.1)SessionIdentificationassignsuserstoserversusingcookiesorURLmodifications.2)ConsistentRoutingdirectssubsequentrequeststothesameserver.3)LoadBalancingdistributesnewuser

PHPoffersvarioussessionsavehandlers:1)Files:Default,simplebutmaybottleneckonhigh-trafficsites.2)Memcached:High-performance,idealforspeed-criticalapplications.3)Redis:SimilartoMemcached,withaddedpersistence.4)Databases:Offerscontrol,usefulforintegrati

Session in PHP is a mechanism for saving user data on the server side to maintain state between multiple requests. Specifically, 1) the session is started by the session_start() function, and data is stored and read through the $_SESSION super global array; 2) the session data is stored in the server's temporary files by default, but can be optimized through database or memory storage; 3) the session can be used to realize user login status tracking and shopping cart management functions; 4) Pay attention to the secure transmission and performance optimization of the session to ensure the security and efficiency of the application.

PHPsessionsstartwithsession_start(),whichgeneratesauniqueIDandcreatesaserverfile;theypersistacrossrequestsandcanbemanuallyendedwithsession_destroy().1)Sessionsbeginwhensession_start()iscalled,creatingauniqueIDandserverfile.2)Theycontinueasdataisloade

Absolute session timeout starts at the time of session creation, while an idle session timeout starts at the time of user's no operation. Absolute session timeout is suitable for scenarios where strict control of the session life cycle is required, such as financial applications; idle session timeout is suitable for applications that want users to keep their session active for a long time, such as social media.

The server session failure can be solved through the following steps: 1. Check the server configuration to ensure that the session is set correctly. 2. Verify client cookies, confirm that the browser supports it and send it correctly. 3. Check session storage services, such as Redis, to ensure that they are running normally. 4. Review the application code to ensure the correct session logic. Through these steps, conversation problems can be effectively diagnosed and repaired and user experience can be improved.

session_start()iscrucialinPHPformanagingusersessions.1)Itinitiatesanewsessionifnoneexists,2)resumesanexistingsession,and3)setsasessioncookieforcontinuityacrossrequests,enablingapplicationslikeuserauthenticationandpersonalizedcontent.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Dreamweaver Mac version
Visual web development tools

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.
