Rumah >php教程 >php手册 >新浪科技文章采集代码

新浪科技文章采集代码

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB
WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBasal
2016-06-10 15:12:001269semak imbas

新浪科技的文章一键采集ThinkPhp适用代码
/* 新浪科技文章采集 */
public function sina_tech() {
/* NEED CAULL PAGE NUM */
$page_num = intval($_POST['get_post_page_num']);
if (empty($page_num)) $page_num = 1;
/* FIRST COUNT */
$post_count_a = M('post')->count();
/* FOR CULL */
for ($page = 1; $page
$fullpage = CurlGetPage('http://roll.tech.sina.com.cn/s/channel.php?ch=05#col=30&spec=&type=&ch=05&k=&offset_page=0&offset_num=0&num=5&asc=&page='.$page);

preg_match_all('/

\s+(.*)\s+/Us', $fullpage, $match);
$fullpage = iconv("GB2312", "UTF-8", $match[1][0]);//echo $data1;die;

preg_match_all('/
  • (.*)/isU', $fullpage, $in_li_tags);
    foreach (array_unique($in_li_tags[1]) as $row) {
    /* TITLE */
    preg_match_all('/(.*)/', $row, $title);
    $title = $title[1][0];
    /* LINK */
    preg_match_all('/href="([^"]*)"/', $row, $link);
    $link = $link[1][0];
    /* DATE */
    preg_match_all('/(.*)/i', $row, $date);
    $date = date("Y-", time()) . $date[1][0] . ':00';
    // echo $title.' '.$link.' '.$date.'
    ';

    /* GOING THE POST PAGE */
    $fullpage_post = CurlGetPage($link);
    /* FIX TAGS */
    $fullpage_post = preg_replace('/
  • Kenyataan:
    Kandungan artikel ini disumbangkan secara sukarela oleh netizen, dan hak cipta adalah milik pengarang asal. Laman web ini tidak memikul tanggungjawab undang-undang yang sepadan. Jika anda menemui sebarang kandungan yang disyaki plagiarisme atau pelanggaran, sila hubungi admin@php.cn
    Artikel sebelumnya:给不想使用CSS框架的一些童靴的选择Artikel seterusnya:系统错误