Home >Backend Development >PHP Tutorial >PHP captures Baidu snapshots, Baidu included, and Baidu hot word program codes_PHP tutorial

PHP captures Baidu snapshots, Baidu included, and Baidu hot word program codes_PHP tutorial

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOriginal: 2016-07-13 16:55:571301browse

If you take a closer look, you can find a problem. There is a function file_get_contents() in the following programs that capture Baidu collections, snapshots, or hot words. It is commonly used in PHP to collect web pages.

The code is as follows

Copy code

代码如下

复制代码

/*
抓取百度收录代码
*/
function baidu($s){
$baidu="http://www.baidu.com/s?wd=site%3A".$s;
$site=file_get_contents($baidu);
//$site=iconv("gb2312", "UTF-8", $site);
ereg("找到相关网页(.*)篇，", $site,$count);
$count=str_replace("找到相关网页","",$count);
$count=str_replace("篇，","",$count);
$count=str_replace("约","",$count);
$count=str_replace(",","",$count);
return $count[0];
}

echo baidu(www.hzhuti.com); //获取好主题在百度中的收录数量

/*
Grab Baidu included code
*/
function baidu($s){
$baidu="http://www.baidu.com/s?wd=site%3A".$s;
$site=file_get_contents($baidu);
//$site=iconv("gb2312", "UTF-8", $site);
ereg("Relevant web pages (.*) found,", $site,$count);
$count=str_replace("Find relevant web pages","",$count);
$count=str_replace("article,","",$count);
$count=str_replace("about","",$count);
$count=str_replace(",","",$count);
Return $count[0];
}

echo baidu(www.hzhuti.com); //Get the number of good topics included in Baidu

代码如下

复制代码

/**
* * @user 小杰
* @return array 返回百度的热词数据(数组返回)
*/
function getBaiduHotKeyWord()
{
$templateRss = file_get_contents('http://top.baidu.com/rss_xml.php?p=top10');
If (preg_match('/

(.*)

/is', $templateRss, $_description)) {
$templateRss = $_description [0];
$templateRss = str_replace("&", "&", $templateRss);
}
$templateRss = "" . $templateRss;
$xml = simplexml_load_String($templateRss);
foreach ($xml->tbody->tr as $temp) {
if (!empty ($temp->td->a)) {
$keyArray [] = trim(($temp->td->a));
}
}
return $keyArray;
}
print_r(getBaiduHotKeyWord());

Get Baidu’s hot words

The code is as follows

Copy code

/**
* * @user 小杰
* @return array Returns Baidu’s hot word data (array return)
*/
function getBaiduHotKeyWord()
{
$templateRss = file_get_contents('http://top.baidu.com/rss_xml.php?p=top10');
If (preg_match('/(.*)

I found this on the Internet and modified it slightly. Write the following code into the php file
Baidu inclusion and Baidu snapshot time

$domain = “http://www.hzhuti.com/nokia/5230/ *Domain name to be queried*/

The code is as follows

代码如下

复制代码

    $domain = “http://www.hzhuti.com/nokia/5230/ *欲查询的域名*/
    $site_url = ‘http://www.baidu.com/s?wd=site%3A’;
    $all = $site_url.$domain; /*域名所有收录的网址*/
    $today = $all.’&lm=1′;    /*域名今日收录的网址*/
    $utf_pattern = “/找到相关结果数(.*)个/”;
    $kz_pattern = “/(.*)/”; /*用以匹配快照日期的字符串*/
    $times = “/d{4}-d{1,2}-d{1,2}/”; /*匹配快照日期的正则表达式，如:2011-8-4*/
    $s0 = @file_get_contents($all);    /*将site:www.ninthday.net的网页置入$s0字符串中*/
    $s1 = @file_get_contents($today);
    preg_match($utf_pattern,$s0,$all_num); /*匹配”找到相关结果数*个”*/
    preg_match($utf_pattern,$s1,$today_num);
    preg_match($kz_pattern,$s0,$temp);
    preg_match($times,$temp[0],$screenshot);
    if($all_num[1] == “”)
        $all_num[1] = 0;
    if($today_num[1] == “”)
        $today_num[1] = 0;
    if($screenshot[0] == “”)
        $screenshot[0] = “暂无快照”;
?>

    Test

日期	百度收录	百度今日收录	百度快照日期

百度收录：” target=”_blank”>

百度今日收录：” target=”_blank”>

百度快照日期：”>

Copy code

$site_url = ‘http://www.baidu.com/s?wd=site%3A’; $all = $site_url.$domain; /*All URLs included in the domain name*/ $today = $all.’&lm=1′; /*The URL of the domain name included today*/ $utf_pattern = "/Number of related results found (.*)/"; $kz_pattern = “/(.*)/”; /*String used to match snapshot date*/ $times = “/d{4}-d{1,2}-d{1,2}/”; /*Regular expression matching snapshot date, such as: 2011-8-4*/ $s0 = @file_get_contents($all); /*Place the web page of site:www.ninthday.net into the $s0 string*/ $s1 = @file_get_contents($today); Preg_match($utf_pattern,$s0,$all_num); /*Match "Number of relevant results found*"*/ Preg_match($utf_pattern,$s1,$today_num); Preg_match($kz_pattern,$s0,$temp); Preg_match($times,$temp[0],$screenshot); If($all_num[1] == “”) $all_num[1] = 0; If($today_num[1] == “”) $today_num[1] = 0; If($screenshot[0] == “”) $screenshot[0] = “No snapshot yet”; ?> Test

Date	Baidu included	Baidu included today	Baidu snapshot date
	td>

Baidu included:” target=”_blank”>> ;

Baidu included today: ” target=”_blank”> a>

Baidu snapshot date: ”> p> The above method has not been strictly considered. If the server does not support the file_get_contents function, we will not be able to operate it, so we can also use curl operation, which is more convenient and can imitate users.

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：Various methods of sorting arrays in php_PHP tutorialNext article：Various methods of sorting arrays in php_PHP tutorial

See more

PHP captures Baidu snapshots, Baidu included, and Baidu hot word program codes_PHP tutorial

Related articles