Home  >  Article  >  php教程  >  初级的用php写的采集程序

初级的用php写的采集程序

WBOY
WBOYOriginal
2016-06-13 12:32:281239browse

可以先用这个采集然后在用帝国处理
####################################################################################
#作者:9elong
#网站:个人小站不值一提
#时间:2007-01-01
#声明:仅用于学习php之用。
#功能:采集单页面图片。
#说明:3个示范表单已经写好正则用来示范。没有任何功能说明,一切都在源代码里。附加论坛图片采集正则示范
####################################################################################
//把图片从信息页抓取下来的函数
function  getimg($url,$n,$key,$imgqian,$imgbiao,$titlekey)
{
                //$key图片地址正则
                //$titlekey图片标题正则
                //$imgqian图片地址前缀
                //$imgbiao图片地址特殊标识
                global  $n;
                global  $msg;
                global  $result;
                global  $imgadd;
                global  $title;
                $msg=file_get_contents($url);
                $key=str_replace("{图片地址}","(.+)",$key);
                $key="|".$key."|isU";
                preg_match_all($key,$msg,$result);
                $c=count($result[0]);
                for($i=0;$i                {
                                $img=$result[0][$i]."
";
                                if(ereg("^.*".$imgbiao.".*$",$img))
                                {
                                                $img=str_replace($imgbiao,$imgqian.$imgbiao,$img);
                                                preg_match("|http://(.+)jpg|isU",$img,$img);
                                                $imgadd[$n]=$img[0];
                                                //echo  "初级的用php写的采集程序
";
                                                $n++;
                                }
                                elseif(ereg("^.*jpg.*$",$img))
                                {
                                                preg_match("|http://(.+)jpg|isU",$img,$img);
                                                $imgadd[$n]=$img[0];
                                                if($img[0]!="")
                                                $n++;
                                }
                                unset($img);
                }                                
                                $titlekey=str_replace("{图片标题}","(.+)",$titlekey);
                                $titlekey="|".$titlekey."|isU";
                                preg_match($titlekey,$msg,$title);
                                //echo  $title[0];
                                return  $title;
                return  $msg;
                return  $result;
                return  $n;
                return  $imgadd;
}
####################################################################################
#不支持file_get_contents()函数可以使用下面的替换
#$i=0;
#$handle=@fopen($url,"rb");
#while  (!@feof($handle))
#{
#                $buffer[$i]=  @fgets($handle,  4096);
#                $i++;
#}
#fclose($handle);
#$msg=join("",$buffer);
####################################################################################
if($_GET['act']=="getimgadd"&&$_POST['url']!="")
{
                $url=$_POST['url'];
        getimg($url,"0",$_POST['key'],$_POST['imgqian'],$_POST['imgbiao'],$_POST['titlekey']);
####################################################################################
        //获取分页
                if($_POST['getpage']=="是")
        {
                                $_POST['page']=str_replace("{分页地址}","(.+)",$_POST['page']);
                $page="|".$_POST['page']."|isU";
                //echo  $page;
        preg_match_all($page,$msg,$presult);
                if($_POST['pc']==""||$_POST['pc']=="全部")
                        $pc=count($presult[0]);
                else
                                $pc=$_POST['pc'];
                if($_POST['pc']>count($presult[0]))
                                $pc=count($presult[0]);
                for($i=1;$i                {
                                $pageurl=$presult[0][$i];
                                //echo  $pageurl."
";
                                if(ereg("^.*[1-9].*$",$pageurl))
                                {
                                                $pageurl=str_replace("                                                $pagekey=str_replace("{关键地址}","(.+)",$_POST['pagekey']);
                                                $pagekey="|".$pagekey."|isU";
                                                preg_match($pagekey,$pageurl,$N3[$i]);
                                                //echo  ($N3[$i][0])."
";
                                                getimg($N3[$i][0],$n,$_POST['key'],$_POST['imgqian'],$_POST['imgbiao'],$_POST['titlekey']);
                                }
                }
        }
####################################################################################
        echo  "图片集〖".$title[1]."〗".$n."张图片被抓取
返回首页
";

                while(list($num,$var)=each($imgadd))
                {
                                if($_POST['showtype']=="图片")
                                {
                                                echo  "初级的用php写的采集程序
";
                                }
                                else
                                echo  $var."
";
                }
####################################################################################
                //exit();
}
?>
实例1(信息页有分页,使用了简单的分页正则):


输入图片地址

图片地址正则'>

图片地址前缀

图片地址标识

图片标题正则{图片标题}'>

分页地址正则.[0-9]'>

分页地址模式










实例2(信息页没有分页,所以分页正则为空):

输入图片地址

图片地址正则

图片地址前缀

图片地址标识

图片标题正则{图片标题}'>

分页地址正则

分页地址模式










实例3(信息页没有分页,所以分页正则为空,图片为绝对地址,所以图片地址前缀为空):

输入图片地址

图片地址正则初级的用php写的采集程序'>

图片地址前缀

图片地址标识

图片标题正则{图片标题}'>

分页地址正则

分页地址模式












华声论坛图片为附件http://bbs.hnol.net/dispbbs2.asp?boardID=50&ID=336436

图片地址正则:upload=jpg{图片地址}upload

图片地址标识:bbs

图片标题正则:帖子主题:{图片标题}


华声论坛图片为外链http://bbs.hnol.net/dispbbs2.asp?boardID=50&ID=336253

图片地址正则:img]{图片地址}/img

图片地址标识:jpg

图片标题正则:帖子主题:{图片标题}
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn