Home >Backend Development >PHP Tutorial >PHP thief program example code_PHP tutorial

PHP thief program example code_PHP tutorial

WBOY
WBOYOriginal
2016-07-20 11:11:441112browse

The thief program actually uses a specific function in php to collect the content of other people's websites, and then saves the content we want to our own local database through regular analysis. Now I will introduce the implementation method of the php thief program. If necessary Friends can refer to it.

The file_get_contents function is the key in the data collection process below. Let’s take a look at the file_get_contents function syntax

string file_get_contents ( string $filename [, bool $use_include_path = false [, resource $context [, int $offset = -1 [, int $maxlen ]]]] )
Same as file(), except file_get_contents() reads the file into a string. Contents of length maxlen will be read starting at the position specified by the offset parameter. On failure, file_get_contents() will return FALSE.

The file_get_contents() function is the preferred method for reading the contents of a file into a string. If the operating system supports it, memory mapping technology will also be used to enhance performance.

Example

The code is as follows
Copy code
 代码如下 复制代码

$homepage = file_get_contents('http://www.hzhuti.com/');
echo $homepage;
?>

$homepage = file_get_contents('http://www.hzhuti.com/');

echo $homepage;

?>

 代码如下 复制代码

function fetch_urlpage_contents($url){
$c=file_get_contents($url);
return $c;
}
//获取匹配内容
function fetch_match_contents($begin,$end,$c)
{
$begin=change_match_string($begin);
$end=change_match_string($end);
$p = "{$begin}(.*){$end}";
if(eregi($p,$c,$rs))
{
return $rs[1];}
else { return "";}
}//转义正则表达式字符串
function change_match_string($str){
//注意,以下只是简单转义
//$old=array("/","$");
//$new=array("/","$");
$str=str_replace($old,$new,$str);
return $str;
}

//采集网页
function pick($url,$ft,$th)
{
$c=fetch_urlpage_contents($url);
foreach($ft as $key => $value)
{
$rs[$key]=fetch_match_contents($value["begin"],$value["end"],$c);
if(is_array($th[$key]))
{ foreach($th[$key] as $old => $new)
{
$rs[$key]=str_replace($old,$new,$rs[$key]);
}
}
}
return $rs;
}

$url="http://www.bkjia.com"; //要采集的地址
$ft["title"]["begin"]=""; //截取的开始点<br>$ft["title"]["end"]=""; //截取的结束点
$th["title"]["中山"]="广东"; //截取部分的替换

$ft["body"]["begin"]=""; //截取的开始点
$ft["body"]["end"]=""; //截取的结束点
$th["body"]["中山"]="广东"; //截取部分的替换

$rs=pick($url,$ft,$th); //开始采集

echo $rs["title"];
echo $rs["body"]; //输出
?>

In this way, $homepage is the content of our collection network saved. Okay, having said that, let’s get started. Example The code is as follows
Copy code
<🎜>function fetch_urlpage_contents($url){<🎜>$c=file_get_contents($url);<🎜>return $c;<🎜>}<🎜>//Get matching content<🎜>function fetch_match_contents($begin,$end,$c)<🎜>{<🎜>$begin=change_match_string($begin);<🎜>$ end=change_match_string($end);<🎜>$p = "{$begin}(.*){$end}";<🎜>if(eregi($p,$c,$rs))<🎜>{ <🎜>return $rs[1];}<🎜>else { return "";}<🎜>}//Escape regular expression string<🎜>function change_match_string($str){<🎜>//Note , the following is just a simple escape<🎜>//$old=array("/","$");<🎜>//$new=array("/","$");<🎜>$str= str_replace($old,$new,$str);<🎜>return $str;<🎜>}<🎜><🎜>//Collect web pages<🎜>function pick($url,$ft,$th)<🎜 >{<🎜>$c=fetch_urlpage_contents($url);<🎜>foreach($ft as $key => $value){$rs[$key]=fetch_match_contents($value[" begin"],$value["end"],$c);if(is_array($th[$key])){ foreach($th[$key] as $old => $ new){$rs[$key]=str_replace($old,$new,$rs[$key]);}}}return $rs ;}$url="http://www.bkjia.com"; //The address to be collected$ft["title"]["begin"]="< ;title>"; //Start point of interception$ft["title"]["end"]=""; //End point of interception$th["title" ]["Zhongshan"]="Guangdong"; //Replacement of the intercepted part$ft["body"]["begin"]=""; //Start point of interception$ft["body"]["end"]=""; //End point of interception$th["body"]["Zhongshan"]="Guangdong"; / /Replacement of the intercepted part$rs=pick($url,$ft,$th); //Start collectionecho $rs["title"];echo $ rs["body"]; //Output?>

The following code is modified from the previous page and is specifically used to extract all hyperlinks, emails or other specific content on web pages

?>

The code is as follows
 代码如下 复制代码

function fetch_urlpage_contents($url){
$c=file_get_contents($url);
return $c;
}
//获取匹配内容
function fetch_match_contents($begin,$end,$c)
{
$begin=change_match_string($begin);
$end=change_match_string($end);
$p = "#{$begin}(.*){$end}#iU";//i表示忽略大小写,U禁止贪婪匹配
if(preg_match_all($p,$c,$rs))
{
return $rs;}
else { return "";}
}//转义正则表达式字符串
function change_match_string($str){
//注意,以下只是简单转义
$old=array("/","$",'?');
$new=array("/","$",'?');
$str=str_replace($old,$new,$str);
return $str;
}

//采集网页
function pick($url,$ft,$th)
{
$c=fetch_urlpage_contents($url);
foreach($ft as $key => $value)
{
$rs[$key]=fetch_match_contents($value["begin"],$value["end"],$c);
if(is_array($th[$key]))
{ foreach($th[$key] as $old => $new)
{
$rs[$key]=str_replace($old,$new,$rs[$key]);
}
}
}
return $rs;
}

$url="http://www.bkjia.com"; //要采集的地址
$ft["a"]["begin"]='
$ft["a"]["end"]='>'; //截取的结束点

$rs=pick($url,$ft,$th); //开始采集

print_r($rs["a"]);

?>

Copy code

function fetch_urlpage_contents($url){
$c=file_get_contents($url);
return $c;
}
//Get matching content
function fetch_match_contents($begin,$end,$c )
{
$begin=change_match_string($begin);
$end=change_match_string($end);
$p = "#{$begin}(.*){ $end}#iU";//i means ignoring case, U prohibits greedy matching
if(preg_match_all($p,$c,$rs))
{
return $rs; }
else { return "";}
}//Escape regular expression string
function change_match_string($str){
//Note that the following is just simple escaping
$old=array("/","$",'?');
$new=array("/","$",'?');
$str =str_replace($old,$new,$str);
return $str;
}

//Collect web pages
function pick($url,$ft,$ th)
{
$c=fetch_urlpage_contents($url);
foreach($ft as $key => $value)
{
$rs[$ key]=fetch_match_contents($value["begin"],$value["end"],$c);
if(is_array($th[$key]))
{ foreach($th [$key] as $old => $new)
{
$rs[$key]=str_replace($old,$new,$rs[$key]);
}
}
}
return $rs;
}

$url="http://www.bkjia.com"; //The address to be collected
$ft["a"]["begin"]='
$ft["a"]["end" ]='>'; //End point of interception

$rs=pick($url,$ft,$th); //Start collection

 代码如下 复制代码

function GetSources($Url,$User_Agent='',$Referer_Url='') //抓取某个指定的页面
{
//$Url 需要抓取的页面地址
//$User_Agent 需要返回的user_agent信息 如“baiduspider”或“googlebot”
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $Url);
curl_setopt ($ch, CURLOPT_USERAGENT, $User_Agent);
curl_setopt ($ch, CURLOPT_REFERER, $Referer_Url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$MySources = curl_exec ($ch);
curl_close($ch);
return $MySources;
}
$Url = "http://www.bkjia.com"; //要获取内容的也没
$User_Agent = "baiduspider+(+http://www.baidu.com/search/spider.htm)";
$Referer_Url = 'http://www.jb51.net/';
echo GetSources($Url,$User_Agent,$Referer_Url);
?>

print_r($rs[" a"]);

file_get_contents is very easy to prevent collection, we can use curl To imitate users' access to the website, this is much more advanced than the above. file_get_contents() is slightly less efficient. In common failure situations, curl() is very efficient and supports multi-threading, but you need to enable the curl extension. The following are the steps to enable the curl extension: 1. Copy the three files php_curl.dll, libeay32.dll, ssleay32.dll in the PHP folder to system32; 2. Remove the semicolon in extension=php_curl.dll in php.ini (c: WINDOWS directory); 3. Restart apache or IIS. Simple page grabbing function, with fake Referer and User_Agent functions
The code is as follows Copy code
function GetSources($Url,$User_Agent='',$Referer_Url=' ') //Catch a specified page
{
//$Url The page address to be crawled
//$User_Agent The user_agent information to be returned such as "baiduspider" or "googlebot" ”
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $Url);
curl_setopt ($ch, CURLOPT_USERAGENT, $User_Agent);
curl_setopt ($ch , CURLOPT_REFERER, $Referer_Url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$MySources = curl_exec ($ch);
/>curl_close($ch);
return $MySources;
}
$Url = "http://www.bkjia.com"; //There is no need to get the content
$User_Agent = "baiduspider+(+http://www.baidu.com/search/spider.htm)";
$Referer_Url = 'http://www.jb51.net/';
echo GetSources($Url,$User_Agent,$Referer_Url);
?>
http://www.bkjia.com/PHPjc/444620.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/444620.htmlTechArticleThe thief program actually uses a specific function in php to collect the content of other people’s websites, and then analyzes it through regular rules The content we want is saved to our local database, below...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn