Home >Backend Development >PHP Tutorial >PHP captures Baidu snapshots, Baidu included, and Baidu hot word program codes_PHP tutorial

PHP captures Baidu snapshots, Baidu included, and Baidu hot word program codes_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 16:55:571228browse

If you take a closer look, you can find a problem. There is a function file_get_contents() in the following programs that capture Baidu collections, snapshots, or hot words. It is commonly used in PHP to collect web pages.

The code is as follows Copy code
 代码如下 复制代码


/*
抓取百度收录代码
*/
function baidu($s){
$baidu="http://www.baidu.com/s?wd=site%3A".$s;
$site=file_get_contents($baidu);
//$site=iconv("gb2312", "UTF-8", $site);
ereg("找到相关网页(.*)篇,", $site,$count);
$count=str_replace("找到相关网页","",$count);
$count=str_replace("篇,","",$count);
$count=str_replace("约","",$count);
$count=str_replace(",","",$count);
return $count[0];
}

echo baidu(www.hzhuti.com); //获取好主题在百度中的收录数量

?>


/*
Grab Baidu included code
*/
function baidu($s){
$baidu="http://www.baidu.com/s?wd=site%3A".$s;
$site=file_get_contents($baidu);
//$site=iconv("gb2312", "UTF-8", $site);
ereg("Relevant web pages (.*) found,", $site,$count);
$count=str_replace("Find relevant web pages","",$count);
$count=str_replace("article,","",$count);
$count=str_replace("about","",$count);
$count=str_replace(",","",$count);
Return $count[0];
}

echo baidu(www.hzhuti.com); //Get the number of good topics included in Baidu
代码如下 复制代码

/**
* * @user 小杰
* @return array 返回百度的热词数据(数组返回)
*/
function getBaiduHotKeyWord()
{
$templateRss = file_get_contents('http://top.baidu.com/rss_xml.php?p=top10');
If (preg_match('/

(.*)
/is', $templateRss, $_description)) {
$templateRss = $_description [0];
$templateRss = str_replace("&", "&", $templateRss);
}
$templateRss = "" . $templateRss;
$xml = simplexml_load_String($templateRss);
foreach ($xml->tbody->tr as $temp) {
if (!empty ($temp->td->a)) {
$keyArray [] = trim(($temp->td->a));
}
}
return $keyArray;
}
print_r(getBaiduHotKeyWord());
?> Get Baidu’s hot words
The code is as follows Copy code
/**
* * @user 小杰
* @return array Returns Baidu’s hot word data (array return)
*/
function getBaiduHotKeyWord()
{
$templateRss = file_get_contents('http://top.baidu.com/rss_xml.php?p=top10');
If (preg_match('/(.*)
/is', $templateRss, $_description)) {
$templateRss = $_description [0];
$templateRss = str_replace("&", "&", $templateRss);
}
$templateRss = "" . $templateRss;
$xml = simplexml_load_String($templateRss);
foreach ($xml->tbody->tr as $temp) {
if (!empty ($temp->td->a)) {
$keyArray [] = trim(($temp->td->a));
}
}
return $keyArray;
}
print_r(getBaiduHotKeyWord());


I found this on the Internet and modified it slightly. Write the following code into the php file
Baidu inclusion and Baidu snapshot time

$domain = “http://www.hzhuti.com/nokia/5230/ *Domain name to be queried*/
The code is as follows
 代码如下 复制代码

$domain = “http://www.hzhuti.com/nokia/5230/ *欲查询的域名*/
$site_url = ‘http://www.baidu.com/s?wd=site%3A’;
$all = $site_url.$domain; /*域名所有收录的网址*/
$today = $all.’&lm=1′; /*域名今日收录的网址*/
$utf_pattern = “/找到相关结果数(.*)个/”;
$kz_pattern = “/(.*)/”; /*用以匹配快照日期的字符串*/
    $times = “/d{4}-d{1,2}-d{1,2}/”; /*匹配快照日期的正则表达式,如:2011-8-4*/
    $s0 = @file_get_contents($all);    /*将site:www.ninthday.net的网页置入$s0字符串中*/
    $s1 = @file_get_contents($today);
    preg_match($utf_pattern,$s0,$all_num); /*匹配”找到相关结果数*个”*/
    preg_match($utf_pattern,$s1,$today_num);
    preg_match($kz_pattern,$s0,$temp);
    preg_match($times,$temp[0],$screenshot);
    if($all_num[1] == “”)
        $all_num[1] = 0;
    if($today_num[1] == “”)
        $today_num[1] = 0;
    if($screenshot[0] == “”)
        $screenshot[0] = “暂无快照”;
?>

   
    Test
   

 


   
     
   
   
     
   

 

日期百度收录百度今日收录百度快照日期

   

百度收录:” target=”_blank”>


   

百度今日收录:” target=”_blank”>


   

百度快照日期:”>



Copy code

$site_url = ‘http://www.baidu.com/s?wd=site%3A’;<🎜> $all = $site_url.$domain; /*All URLs included in the domain name*/<🎜> $today = $all.’&lm=1′; /*The URL of the domain name included today*/<🎜> $utf_pattern = "/Number of related results found (.*)/";<🎜> $kz_pattern = “/(.*)/”; /*String used to match snapshot date*/ $times = “/d{4}-d{1,2}-d{1,2}/”; /*Regular expression matching snapshot date, such as: 2011-8-4*/ $s0 = @file_get_contents($all); /*Place the web page of site:www.ninthday.net into the $s0 string*/ $s1 = @file_get_contents($today); Preg_match($utf_pattern,$s0,$all_num); /*Match "Number of relevant results found*"*/ Preg_match($utf_pattern,$s1,$today_num); Preg_match($kz_pattern,$s0,$temp); Preg_match($times,$temp[0],$screenshot); If($all_num[1] == “”)           $all_num[1] = 0; If($today_num[1] == “”)           $today_num[1] = 0; If($screenshot[0] == “”)          $screenshot[0] = “No snapshot yet”; ?> Test
DateBaidu includedBaidu included todayBaidu snapshot date

Baidu included:” target=”_blank”>

Baidu included today: ” target=”_blank”>

Baidu snapshot date: ”> The above method has not been strictly considered. If the server does not support the file_get_contents function, we will not be able to operate it, so we can also use curl operation, which is more convenient and can imitate users.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/631640.htmlTechArticleIf you take a closer look, you can find a problem. Let’s grab Baidu’s collection or snapshot or There is a function file_get_contents() in the hot word program, which is a php collection network...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn