Home >php教程 >php手册 >php多线程抓取信息测试例子

php多线程抓取信息测试例子

WBOY
WBOYOriginal
2016-05-25 16:40:221392browse

只在php5.3以后的版本才真正的可以使用多线程序了,以前都是假的curl实现的多线程工作,下面我来给各位介绍几个多线程抓取信息测试例子,希望对各位会有帮助.

PHP 5.3 以上版本,使用pthreads PHP扩展,可以使PHP真正地支持多线程。多线程在处理重复性的循环任务,能够大大缩短程序执行时间。

PHP扩展下载:https://github.com/krakjoe/pthreads

PHP手册文档:http://php.net/manual/zh/book.pthreads.php

1、扩展的编译安装Linux,编辑参数 --enable-maintainer-zts 是必选项:

cd /Data/tgz/php-5.3.8 
./configure --prefix=/Data/apps/php --with-config-file-path=/Data/apps/php/etc --with-mysql=/Data/apps/mysql --with-mysqli=/Data/apps/mysql/bin/mysql_config --with-iconv-dir --with-freetype-dir=/Data/apps/libs --with-jpeg-dir=/Data/apps/libs --with-png-dir=/Data/apps/libs --with-zlib --with-libxml-dir=/usr --enable-xml --disable-rpath --enable-bcmath --enable-shmop --enable-sysvsem --enable-inline-optimization --with-curl --enable-mbregex --enable-fpm --enable-mbstring --with-mcrypt=/Data/apps/libs --with-gd --enable-gd-native-ttf --with-openssl --with-mhash --enable-pcntl --enable-sockets --with-xmlrpc --enable-zip --enable-soap --enable-opcache --with-pdo-mysql --enable-maintainer-zts 
make clean 
make 
make install         
 
unzip pthreads-master.zip 
cd pthreads-master 
/Data/apps/php/bin/phpize 
./configure --with-php-config=/Data/apps/php/bin/php-config 
make 
make install

添加扩展:

vi /Data/apps/php/etc/php.ini

extension = "pthreads.so"

一段PHP多线程、与For循环,抓取百度搜索页面的PHP代码示例,代码如下:

<?php
class test_thread_run extends Thread {
    public $url;
    public $data;
    public function __construct($url) {
        $this->url = $url;
    }
    public function run() {
        if (($url = $this->url)) {
            $this->data = model_http_curl_get($url);
        }
    }
}
function model_thread_result_get($urls_array) {
    foreach ($urls_array as $key => $value) {
        $thread_array[$key] = new test_thread_run($value["url"]);
        $thread_array[$key]->start();
    }
    foreach ($thread_array as $thread_array_key => $thread_array_value) {
        while ($thread_array[$thread_array_key]->isRunning()) {
            usleep(10);
        }
        if ($thread_array[$thread_array_key]->join()) {
            $variable_data[$thread_array_key] = $thread_array[$thread_array_key]->data;
        }
    }
    return $variable_data;
}
function model_http_curl_get($url, $userAgent = "") {
    $userAgent = $userAgent ? $userAgent : &#39;Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2)&#39;;
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_TIMEOUT, 5);
    curl_setopt($curl, CURLOPT_USERAGENT, $userAgent);
    $result = curl_exec($curl);
    curl_close($curl);
    return $result;
}
for ($i = 0; $i < 100; $i++) {
    $urls_array[] = array(
        "name" => "baidu",
        "url" => "http://www.baidu.com/s?wd=" . mt_rand(10000, 20000)
    );
}
$t = microtime(true);
$result = model_thread_result_get($urls_array);
$e = microtime(true);
echo "多线程:" . ($e - $t) . "n";
//开源代码phprm.com
$t = microtime(true);
foreach ($urls_array as $key => $value) {
    $result_new[$key] = model_http_curl_get($value["url"]);
}
$e = microtime(true);
echo "For循环:" . ($e - $t) . "n";
?>

例子,采集数据,代码如下:

<?php
$urls = array(
    &#39;http://www.111cn.net/&#39;,
    &#39;http://www.sohu.com/&#39;,
    &#39;http://www.163.com/&#39;
);
$save_to = &#39;/test.txt&#39;; // 把抓取的代码写入该文件
$st = fopen($save_to, "a");
$mh = curl_multi_init();
foreach ($urls as $i => $url) {
    $conn[$i] = curl_init($url);
    curl_setopt($conn[$i], CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)");
    curl_setopt($conn[$i], CURLOPT_HEADER, 0);
    curl_setopt($conn[$i], CURLOPT_CONNECTTIMEOUT, 60);
    curl_setopt($conn[$i], CURLOPT_RETURNTRANSFER, true); // 设置不将爬取代码写到浏览器,而是转化为字符串
    curl_multi_add_handle($mh, $conn[$i]);
}
do {
    curl_multi_exec($mh, $active);
} while ($active);
foreach ($urls as $i => $url) {
    $data = curl_multi_getcontent($conn[$i]); // 获得爬取的代码字符串
    fwrite($st, $data); // 将字符串写入文件。当然,也可以不写入文件,比如存入数据库
    
} // 获得数据变量,并写入文件
foreach ($urls as $i => $url) {
    curl_multi_remove_handle($mh, $conn[$i]);
    curl_close($conn[$i]);
}
curl_multi_close($mh);
fclose($st);
?>


教程链接:

随意转载~但请保留教程地址★

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn