Home >Backend Development >PHP Tutorial >PHP thief program example code_PHP tutorial
The thief program actually uses a specific function in php to collect the content of other people's websites, and then saves the content we want to our own local database through regular analysis. Now I will introduce the implementation method of the php thief program. If necessary Friends can refer to it.
The file_get_contents function is the key in the data collection process below. Let’s take a look at the file_get_contents function syntax
string file_get_contents ( string $filename [, bool $use_include_path = false [, resource $context [, int $offset = -1 [, int $maxlen ]]]] )
Same as file(), except file_get_contents() reads the file into a string. Contents of length maxlen will be read starting at the position specified by the offset parameter. On failure, file_get_contents() will return FALSE.
The file_get_contents() function is the preferred method for reading the contents of a file into a string. If the operating system supports it, memory mapping technology will also be used to enhance performance.
Example
Copy code | |||||
?> |
代码如下 | 复制代码 |
function fetch_urlpage_contents($url){ //采集网页 $url="http://www.bkjia.com"; //要采集的地址 $ft["title"]["end"]=" $th["title"]["中山"]="广东"; //截取部分的替换 $ft["body"]["begin"]=" "; //截取的开始点$ft["body"]["end"]=""; //截取的结束点 $th["body"]["中山"]="广东"; //截取部分的替换 $rs=pick($url,$ft,$th); //开始采集 echo $rs["title"]; |
Copy code | |
<🎜>function fetch_urlpage_contents($url){<🎜>$c=file_get_contents($url);<🎜>return $c;<🎜>}<🎜>//Get matching content<🎜>function fetch_match_contents($begin,$end,$c)<🎜>{<🎜>$begin=change_match_string($begin);<🎜>$ end=change_match_string($end);<🎜>$p = "{$begin}(.*){$end}";<🎜>if(eregi($p,$c,$rs))<🎜>{ <🎜>return $rs[1];}<🎜>else { return "";}<🎜>}//Escape regular expression string<🎜>function change_match_string($str){<🎜>//Note , the following is just a simple escape<🎜>//$old=array("/","$");<🎜>//$new=array("/","$");<🎜>$str= str_replace($old,$new,$str);<🎜>return $str;<🎜>}<🎜><🎜>//Collect web pages<🎜>function pick($url,$ft,$th)<🎜 >{<🎜>$c=fetch_urlpage_contents($url);<🎜>foreach($ft as $key => $value){$rs[$key]=fetch_match_contents($value[" begin"],$value["end"],$c);if(is_array($th[$key])){ foreach($th[$key] as $old => $ new){$rs[$key]=str_replace($old,$new,$rs[$key]);}}}return $rs ;}$url="http://www.bkjia.com"; //The address to be collected$ft["title"]["begin"]="< ;title>"; //Start point of interception$ft["title"]["end"]=""; //End point of interception$th["title" ]["Zhongshan"]="Guangdong"; //Replacement of the intercepted part$ft["body"]["begin"]=""; //Start point of interception$ft["body"]["end"]=""; //End point of interception$th["body"]["Zhongshan"]="Guangdong"; / /Replacement of the intercepted part$rs=pick($url,$ft,$th); //Start collectionecho $rs["title"];echo $ rs["body"]; //Output?> |
The following code is modified from the previous page and is specifically used to extract all hyperlinks, emails or other specific content on web pages