Home  >  Article  >  Backend Development  >  Detailed explanation of simple data collection and storage program examples based on PHP

Detailed explanation of simple data collection and storage program examples based on PHP

PHP中文网
PHP中文网Original
2017-04-18 17:40:592057browse

A few days ago, a friend asked me to help create a program for collecting news information. I took some time to write a PHP version and recorded it in my notes.

Speaking of collection, it is nothing more than obtaining information remotely ->Extracting required content->Classified storage->Reading->Display

It can also be regarded as a simple "thief program" Enhanced version

The following is the corresponding core code (don’t use it to do bad things^_^)

The content to be collected is an announcement on a game website, as shown below:

You can first use file_get_contents and simple regular expressions to obtain basic page information

Organize the basic information and collect it into the database:

<?php
  include_once("conn.php");


   if($_GET[&#39;id&#39;]<=8&&$_GET[&#39;id&#39;]){
     $id=$_GET[&#39;id&#39;];
    $conn=file_get_contents("http://www.93moli.com/news_list_4_$id.html");//获取页面内容
  
  $pattern="/<li><a title=\"(.*)\" target=\"_blank\" href=\"(.*)\">/iUs";//正则

  preg_match_all($pattern, $conn, $arr);//匹配内容到arr数组

  //print_r($arr);die;
  
  foreach ($arr[1] as $key => $value) {//二维数组[2]对应id和[1]刚好一样,利用起key
    $url="http://www.93moli.com/".$arr[2][$key];
    $sql="insert into list(title,url) value (&#39;$value&#39;, &#39;$url&#39;)";
    mysql_query($sql);

    //echo "<a href=&#39;content.php?url=http://www.93moli.com/$url&#39;>$value</a>"."<br/>";  
  }
   $id++;
   echo "正在采集URL数据列表$id...请稍后...";
   echo "<script>window.location=&#39;list.php?id=$id&#39;</script>";

 }else{
   echo "采集数据结束。";
 }

?>

conn.php is the database connection file

list.php is this page

Since the data to be collected is displayed in pages, and the page address is increasing regularly, I use The js jump code is eliminated, and the number of collected pages is controlled by passing the id value, which also avoids the number of for loops being too large.

Easily enter data into the database. The next article will be about the process of collecting information from specific URLs.

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn