Home  >  Article  >  Backend Development  >  Use PHP to grab Google keyword ranking_PHP tutorial

Use PHP to grab Google keyword ranking_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 17:47:151327browse

 

说下思路,利用PHP的curl函数储存cookie,google搜索页面是无法用file_get_connents打开的,必须要完全模拟浏览器才行,百度就不同了,直接用file_get_conntens抓取页面,然后用正则处理下就行了,这里就不列举百度了。

 

header("Content-Type: text/html;charset=utf-8");

function ggsearch($url_s, $keyword, $page = 1) {

$enKeyword = urlencode($keyword);

$rsState = false;

$page_num = ($page -1) * 10;

if ($page <= 10) {

$interface = "eth0:" . rand(1, 4); //避免GG封IP

$cookie_file = dirname(__FILE__) . "/temp/google.txt"; //存储cookie值

$url = "http://www.google.com/search?q=$enKeyword&hl=en&prmd=imvns&ei=JPnJTvLFI8HlggeXwbRl&start=$page_num&sa=N";

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);

//curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);//获取浏览器类型

curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5");

curl_setopt($ch, CURLOPT_INTERFACE, "$interface"); //指定访问IP地址

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);

$contents = curl_exec($ch);

curl_close($ch);

$match = "!(.*)

s+!";

                preg_match_all("$match", "$contents", $line);

                while (list ($k, $v) = each($line[0])) {

                        preg_match_all("!]+>(.*?)!", $v, $title);

                        $num = count($title[1]);

                        for ($i = 0; $i < $num; $i++) {

                                if (strstr($title[0][$i], $url_s)) {

                                        $rsState = true;

                                        $j = $i +1;

                                        $sum = $j + (($page) * 10 - 10);

                                        //echo $contents;

                                        echo "关键字" . $keyword . "
" . "排名:" . '' . $sum . '' . "####" . "第" . ''.$page . ''. " 页" . "第" .''.$j . ''. "名" . $title[0][$i] . "
";

                                        echo "" . "点击搜索结果" . "" . "
";

                                        echo "


";

                                        break;

                                }

                        }

                }

                unset ($contents);

                if ($rsState === false) {

                        ggsearch($url_s, $keyword, ++ $page); //找不到搜索页面的继续往下搜索

                }

         } else {

                           echo 'Keyword' . $keyword . 'There is no ranking of this website within 10 pages' . '
';

echo "


";

}

}

if (!empty ($_POST['submit'])) {

          $time = explode(' ', microtime());

$start = $time[0] + $time[1];

          $more_key = trim($_POST['textarea']);

          $url_s = trim($_POST['url']);

If (!empty ($more_key) && !empty ($url_s)) {

/*Judge the pattern of input characters*/

                     if (strstr($more_key, "n")) {

                                  $exkey = explode("n", $more_key);

                }

If(strstr($more_key, "|")) {

                                $exkey = explode("|", $more_key);

                }

If(!strstr($more_key, "n")&&!strstr($more_key, "|")){

                    $exkey=array($more_key);

                }

/*Determine whether there is something like www or http://*/

If (count(explode('.', $url_s)) <= 2) {

$url = ltrim($url_s, 'http://www');

$url = 'www.' . $url_s;

                }

foreach ($exkey as $keyword) {

                                                 //$keyword;

                           ggsearch($url_s, $keyword);

                }

                     $endtime = explode(' ', microtime());

                    $end = $endtime[0] + $endtime[1];

echo '
';

echo 'Program running time: ';

                           echo $end - $start;

                         //die();

       }

}

?>

抓取排名

 

 

 

 

                        关键字:

 

 

                        url地址:

 

                       

www.2cto.com

 

 

 

摘自Shine的圣天堂-〃敏〃

www.bkjia.comtruehttp://www.bkjia.com/PHPjc/478516.htmlTechArticle说下思路,利用PHP的curl函数储存cookie,google搜索页面是无法用file_get_connents打开的,必须要完全模拟浏览器才行,百度就不同了,直接用fi...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:PHP deletes a file/folder in a specified directory - How to delete a specified file/folder in a directory using PHP? _PHP TutorialNext article:PHP deletes a file/folder in a specified directory - How to delete a specified file/folder in a directory using PHP? _PHP Tutorial

Related articles

See more