Home >Backend Development >PHP Tutorial >Program for collecting data into the database based on PHP (2), collecting data into the database with PHP_PHP tutorial

Program for collecting data into the database based on PHP (2), collecting data into the database with PHP_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 10:22:131068browse

PHP-based data collection and warehousing program (2), PHP data collection and warehousing

In the previous article, PHP-based data collection and warehousing program (2) mentioned the collection of news information Page list data, let’s talk about the specific content of collecting news

This is the screenshot of the final data sheet from the previous blog:

The next step is to read the URL that needs to be collected from the database and crawl the page

Create a new content table

However, one thing to note is that you can no longer use the incrementing method of collecting URLs, because there may be id discontinuities in the data table, such as id=9, id=11. When the id=10 is collected, Sometimes, the URL is blank, which may result in empty fields being collected.

One of the techniques used here is the query statement of the database. When we collect the first piece of data, we determine whether there is an ID number greater than this ID in the database. If so, read one and repeat the query information above. work.

The specific code is as follows:

<?<span>php
    
    </span><span>include_once</span>("conn.php"<span>);
    </span><span>$id</span>=(int)<span>$_GET</span>['id'<span>];
    </span><span>$sql</span>="select * from list where id=<span>$id</span>"<span>;
    </span><span>$result</span>=<span>mysql_query</span>(<span>$sql</span><span>);
    </span><span>$row</span>=<span>mysql_fetch_array</span>(<span>$result</span>);<span>//</span><span>取得对应的url地址</span>
    <span>$content</span>=<span>file_get_contents</span>(<span>$row</span>['url'<span>]);
    </span><span>$pattern</span>="/<dd class=\"dataWrap\">(.*)<\/dd>/iUs"<span>;
    </span><span>preg_match</span>(<span>$pattern</span>, <span>$content</span>,<span>$info</span>);<span>//</span><span>获取内容存放info</span>
    <span>echo</span> <span>$title</span>=<span>$row</span>[1]."<br/>"<span>;
    </span><span>echo</span> <span>$content</span>=<span>$info</span>[0]."<hr/>"<span>;

    </span><span>//</span><span>插入数据库</span>
    <span>$add</span>="insert into content(title,content) value('<span>$title</span>','<span>$content</span>')"<span>;
    </span><span>mysql_query</span>(<span>$add</span><span>);

    </span><span>$sql2</span>="select * from list where id><span>$id</span> order by id asc limit 1"<span>;
    </span><span>$result2</span>=<span>mysql_query</span>(<span>$sql2</span><span>);
    </span><span>$row2</span>=<span>mysql_fetch_array</span>(<span>$result2</span>);<span>//</span><span>取得对应的url地址</span>
    <span>if</span>(<span>$row2</span>['id'<span>]){
        </span><span>echo</span> "<script>window.location='content.php?id=<span>$row2</span>[0]'</script>"<span>;
    }

</span>?>

In this way, the news content we want has been collected and stored in the database. Next, we only need to organize some styles of the data.

How well do PHP programmers understand data collection?

Common technical essentials for PHP data collection:

1. Proficient in regular expression data extraction technology: key steps for extracting content
2. Proficient in character encoding conversion analysis technology: compatibility management and data validity control
3. Proficient in data storage and storage technology: storage and management of collected content, including databases, files and progress
4. Data mining and website crawling technology: analyze website structure, simplify crawling techniques, and improve efficiency
5. Anti-anti-collection processing technology: Anti-anti-collection technology designed for target objects with anti-collection
6. Multi-server concurrent collection management technology: working methods to improve efficiency
7. Data sorting and analysis Technology: Check for leaks and verify the correctness and effectiveness of data
8. Self-identity protection technology: Protection of one’s own information

PHP collection and warehousing problem

There is $nr = implode('#',$arr) method in php, that's it
But the above composition is "Content 1# Content 2" without the last #, if necessary
That’s $nr = implode('#',$arr).'#'

The stupid way is to use
foreach( $arr as $vl){
$nr .=$vl."#";
}
Reference: $

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/850753.htmlTechArticleBased on PHP data collection and warehousing program (2), PHP data collection and warehousing is in the previous article based on PHP data collection and warehousing. The library program (2) mentioned collecting list data of news information pages, let’s talk about it next...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn