Home  >  Article  >  Backend Development  >  一个新闻采集功能,朋友们指教!解决思路

一个新闻采集功能,朋友们指教!解决思路

WBOY
WBOYOriginal
2016-06-13 13:30:15906browse

一个新闻采集功能,朋友们指教!
批量采集的时候有时候会有一些新闻不能采集到(只是少部分),不清楚什么原因,朋友们帮看下
我是采集腾讯的新闻

PHP code
<!--

Code highlighting produced by Actipro CodeHighlighter (freeware)
http://www.CodeHighlighter.com/

-->
$url = "http://news.qq.com/newsgn/zhxw/shizhengxinwen.htm";
$urlcontent = file_get_contents($url);

preg_match_all("/<a. class='\"pub\_time\"'>/isU", $urlcontent, $urlcontent);
//程序运行到此正常,能拿出列表页要采集新闻的连接
$urllength=count($urlcontent[1]);

$conpattern = "/<div id='\"C-Main-Article-QQ\"' class='\"mod-left\"'>.+<h1>(.+).+<span class='\"pubTime\"'>(.+).+<div id='\"Cnt-Main-Article-QQ\"' bosszone='\"content\"'>(.+)<div class='\"ft\"'>/isU";
for($i=0;$i";
    }
    
}

<br><br><font color="#e78608">------解决方案--------------------</font><br>你怎么也认真的判断一下file_get_contents的返回值吧。
<br><font color="#e78608">------解决方案--------------------</font><br>file_get_contents换curl吧,<br><br>$url = "http://news.qq.com/newsgn/zhxw/shizhengxinwen.htm";<br>$ch = curl_init();  <br>curl_setopt($ch, CURLOPT_URL, $url);<br>curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);   <br>$urlcontent = curl_exec($ch);  <br>curl_close($ch); <div class="clear">
                 
              
              
        
            </div>
</div>
</div></span>
</h1>
</div></a.>
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn