Home > Article > Backend Development > About the code collected by PHP program
Today I will share with you my collection code! Idea: The idea of the collection program is very simple and can be roughly divided into the following steps
1. Get the remote file source code (file_get_contents or use fopen).
2. Analyze the code to get what you want content (formal matching is used here, usually to get pagination).
3. Download and store the content obtained from the root.
The second step here may have to be repeated several times. For example, we need to analyze the paging address first, and then analyze the content of the inner page to get what we want.
Code:
I remember posting some of the code before. Today I will simply post it here
PHP code:
@$nl=file_get_contents($rs['url']);//抓取远程内容 preg_match_all("/var url = "gameswf/(.*?).swf";/is",$nl,$connect);//进行正规匹配取得自己要的内容 mysql_query("insert ......插入数据库部分");
The above code is all the collection requirements The code is used. Of course, you can also use fope to do it. I personally like to use file_get_contents.
Now I’m going to share my method of downloading pictures from flash to local. It’s too simple. Just two lines of code.
PHP code:
if(@copy($url,$newurl)){ echo 'ok'; }
I’ve also posted a picture download function on the forum before. Post it for everyone
PHP code:
/*本存图片函数*/ function getimg($url,$filename){ /*判断图片的url是否为空,如果为空停止函数*/ if($url==""){ return false; } /*取得图片的扩展名,存入变量$ext中*/ $ext=strrchr($url,"."); /*判断是否是合法的图片文件*/ if($ext!=".gif" && $ext!=".jpg"){ return false; } /*读取图片*/ $img=file_get_contents($url); /*打开指定的文件*/ $fp=@fopen($filename.$ext,"a"); /*写入图片到指点的文件*/ fwrite($fp,$img); /*关闭文件*/ fclose($fp); /*返回图片的新文件名*/ return $filename.$ext; }
Share your personal collection experience:
1. Don’t collect sites that are protected against hotlinking. In fact, you can fake the origin but collect from such sites. The cost is too high
2. For sites that collect as quickly as possible, it is best to collect locally
3. When collecting, there are many times when you can store part of the data in the database first, and wait for the next step of processing later.
4. You must handle errors when collecting. I usually skip it if the collection fails three times. In the past, I would often get stuck picking out a piece of content just because I couldn't pick it up.
5. You must make good judgment before entering the database, check the legality of the content, and filter unnecessary strings.
The above is the entire content of this article. I hope it will be helpful to everyone's study. For more related content, please pay attention to the PHP Chinese website!
Related recommendations:
About the definition and implementation method of PHP dictionary tree
The above is the detailed content of About the code collected by PHP program. For more information, please follow other related articles on the PHP Chinese website!