<?php/×只为了证明PHP是最好的语言。目前设计的该程序是顺序执行,生产和消费者没有分开,使用来一个死循环,不断从redis的list里取出最新的QQ号码,然后用该QQ号码拼接出需要网站的地址,一次访问并存入mongodb,这里只是整个实现的流程代码,没有写QQ空间API接口。由于我分析的接口返回的是json,所以我直接存入mongodb。回来分析也非常方便。为了加大速度,我目前开了10进程,虽然没有到腾讯封IP的极限,但是以及到来我的笔记本的极限了。10个进程24小时可以采集180W,如果电脑配置高点,一天200多万不是问题。其中,懒得自己写curl类,用之前大神爬取知乎的时候写的类https://github.com/hhqcontinue/zhihuSpider××/include "phpspider/config.php";include "phpspider/cache.php";include "phpspider/rolling_curl.php";$cookie = trim(file_get_contents("cookie.txt"));$curl = new rolling_curl();$curl->set_cookie($cookie);$curl->set_gzip(true);$curl->callback = function($response, $info, $request, $error) { preg_match("#xxxxx#", $request['url'], $out); $qq = $out[1]; if (empty($response)) { file_put_contents("./data/error_timeout.log", date("Y-m-d H:i:s") . ' ' . $username.' --- '.json_encode($error)."\n", FILE_APPEND); // 注意这里不要用 exit,否则整个程序就断开了 return; } // 如果是个人信息 if (strpos($request['url'], 'a') ==true ) { //将信息添加至mongodb,先判断是否是错误信息 } //如果是来访QQ if (strpos($request['url'], 'b') == true) { //将信息添加至mongodb,先判断是否是错误信息 //从中取出QQ号码存入redis cache->get_instance()->lpush("qq");//一定要从左端插入,否则是出现死循环,例如A访问来B空间,B同样也访问来A空间,如果在设置网址的时候取先存入的QQ,程序就会不断在A和B之间访问,如果设置网址的时候取最新的则会将他们存入栈底部 } //如果是说说 if (strpos($request['url'], 'c') == true) { ///将信息添加至mongodb,先判断是否是错误信息 } };while(true){ $qq=cache->get_instance()->lpop("qq");//用从左端删除 if ($qq=="")$qq="12345678";//这里是登录的QQ号码 $url = "http://a".$qq;//个人信息 $curl->get($url); $url = "http://b".$qq;//来访QQ $curl->get($url); $url = "http://c".$qq;//最近说说 $curl->get($url); $data = $curl->execute(); // 睡眠100毫秒,太快了会被认为是ddos usleep(100000);}
因为,生产者和消费者没有分开,所以其中有一个问题,就是生产者产生的QQ号码多,消费者使用速度低,所以建议将采集QQ来访者和采集数据分开,采集数据可以多开写进程
用到的类库地址https://github.com/hhqcontinue/zhihuSpider

Laravel simplifies handling temporary session data using its intuitive flash methods. This is perfect for displaying brief messages, alerts, or notifications within your application. Data persists only for the subsequent request by default: $request-

This is the second and final part of the series on building a React application with a Laravel back-end. In the first part of the series, we created a RESTful API using Laravel for a basic product-listing application. In this tutorial, we will be dev

The PHP Client URL (cURL) extension is a powerful tool for developers, enabling seamless interaction with remote servers and REST APIs. By leveraging libcurl, a well-respected multi-protocol file transfer library, PHP cURL facilitates efficient execution of various network protocols, including HTTP, HTTPS, and FTP. This extension offers granular control over HTTP requests, supports multiple concurrent operations, and provides built-in security features.

Laravel provides concise HTTP response simulation syntax, simplifying HTTP interaction testing. This approach significantly reduces code redundancy while making your test simulation more intuitive. The basic implementation provides a variety of response type shortcuts: use Illuminate\Support\Facades\Http; Http::fake([ 'google.com' => 'Hello World', 'github.com' => ['foo' => 'bar'], 'forge.laravel.com' =>

Do you want to provide real-time, instant solutions to your customers' most pressing problems? Live chat lets you have real-time conversations with customers and resolve their problems instantly. It allows you to provide faster service to your custom

In this article, we're going to explore the notification system in the Laravel web framework. The notification system in Laravel allows you to send notifications to users over different channels. Today, we'll discuss how you can send notifications ov

Article discusses late static binding (LSB) in PHP, introduced in PHP 5.3, allowing runtime resolution of static method calls for more flexible inheritance.Main issue: LSB vs. traditional polymorphism; LSB's practical applications and potential perfo

PHP logging is essential for monitoring and debugging web applications, as well as capturing critical events, errors, and runtime behavior. It provides valuable insights into system performance, helps identify issues, and supports faster troubleshoot


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Dreamweaver Mac version
Visual web development tools

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.
