Home  >  Article  >  Backend Development  >  Analysis of PHP Collection Program Principles_PHP Tutorial

Analysis of PHP Collection Program Principles_PHP Tutorial

WBOY
WBOYOriginal
2016-07-21 15:40:24835browse

After thinking hard for a few days, I finally figured out the reason behind it. Write it down here and ask experts to correct me.
The idea of ​​the collection program is very simple. It is nothing more than opening a page first, usually a list page, getting the addresses of all the links in it, and then opening the links one by one to look for what we are interested in. If found, put it into the database or other processing. Let's talk about it with a very simple example.

First determine a collection page, usually the list page. The target here is: http://www.jb51.net/article/11/index.htm. This is a list page, and our purpose is to collect all articles on this list page.

There is a list page. The first step is to open it and incorporate its content into our program. Generally, the two functions fopen or file_get_contents are used. We use fopen as an example here. How to open it? It's very simple: $source=fopen("http://www.jb51.net/article/11/index.htm",'r'); In fact, the content has been incorporated into our program. Note that the $source obtained is a resource, not text that can be processed, so use the function fread to read the content into a variable. This time it is truly editable text. Example:
$content=fread($source,99999); The following number indicates the number of bytes, just fill in a large one. You use file_put_contents to write $content to a text file. You can see that the content inside is actually the source code of the web page. After getting the source code of the web page, we have to analyze the article link address inside. Regular expressions are used here. [Recommended regular expression tutorial (http://www.jb51.net/article/7/all/545.1. htm)]. By looking at the source code, we can see that the link addresses of the articles inside all look like this

Encapsulate the database connection code in a function and call it when you need to read..
We can write regular expressions. $count=preg_match_all("/
(.+?)/",$content,$art_list);
The array $art_list[1][$s] contains the link address of an article. And $art_list[2][$s] contains the title of a certain article. At this point, it can be considered half the battle.
Then use a for loop to hit each link in turn, and then get the content in the same way as the title. The above are similar to the tutorials I found online, but when it comes to this for loop, the online tutorials are terrible. I haven't found an article that can explain this clearly. At the beginning, I used js to help the loop, or used Let me give you an example. At the beginning, I did this:
for($i=0;$i<20;4i++ { The middle of
is the content collection part. I omitted
and collected one page. I definitely need to collect another page
But it doesn’t work when I use fopen to open the link. The request fails or something, and it doesn’t work with js. Finally I know that I need to use this echo "";where aa.php is the file name of our program, and the number after the id can help us implement loops and collect multiple pages. This is how to truly loop The key to getting up
}
My mind is a bit uncomfortable and the writing is a bit messy, so just make do with it. It may not be a big deal in the eyes of experts, but it is really helpful for novices like me.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/321368.htmlTechArticleAfter thinking hard for a few days, I finally figured out the reason. Write it down here and ask experts to correct me. The idea of ​​the collection program is very simple. It is nothing more than opening a page first, usually a list page...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn