Python クエリ Baidu SEO 情報-Python チュートリアル-php.cn

ホームページ

バックエンド開発

Python チュートリアル

Python クエリ Baidu SEO 情報

高洛峰

Oct 18, 2016 am 10:30 AM

Baidu キーワードランキングをクエリするためのシンプルな Python 関数、特徴:

1. UA ランダム

2. シンプルで便利な操作、単に getRank (キーワード、ドメイン名) を直接実行します

3.コーディングに問題はないはずです。

4. 豊富な結果。ランキングだけでなく、検索結果のタイトル、URL、スナップショット時間も表示され、SEO ニーズを満たします

欠点:

シングルスレッド、遅い

#coding=utf-8
  
import requests
import BeautifulSoup
import re
import random
  
def decodeAnyWord(w):
    try:
        w.decode(&#39;utf-8&#39;)
    except:
        w = w.decode(&#39;gb2312&#39;)
    else:
        w = w.decode(&#39;utf-8&#39;)
    return w
  
def createURL(checkWord):   #create baidu URL with search words
    checkWord = checkWord.strip()
    checkWord = checkWord.replace(&#39; &#39;, &#39;+&#39;).replace(&#39;\n&#39;, &#39;&#39;)
    baiduURL = &#39;http://www.baidu.com/s?wd=%s&rn=100&#39; % checkWord
    return baiduURL
  
def getContent(baiduURL):   #get the content of the serp
    uaList = [&#39;Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322;+TencentTraveler)&#39;,
    &#39;Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+2.0.50727;+.NET+CLR+3.0.4506.2152;+.NET+CLR+3.5.30729)&#39;,
    &#39;Mozilla/5.0+(Windows+NT+5.1)+AppleWebKit/537.1+(KHTML,+like+Gecko)+Chrome/21.0.1180.89+Safari/537.1&#39;,
    &#39;Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)&#39;,
    &#39;Mozilla/5.0+(Windows+NT+6.1;+rv:11.0)+Gecko/20100101+Firefox/11.0&#39;,
    &#39;Mozilla/4.0+(compatible;+MSIE+8.0;+Windows+NT+5.1;+Trident/4.0;+SV1)&#39;,
    &#39;Mozilla/4.0+(compatible;+MSIE+8.0;+Windows+NT+5.1;+Trident/4.0;+GTB7.1;+.NET+CLR+2.0.50727)&#39;,
    &#39;Mozilla/4.0+(compatible;+MSIE+8.0;+Windows+NT+5.1;+Trident/4.0;+KB974489)&#39;]
    headers = {&#39;User-Agent&#39;: random.choice(uaList)}
    ipList = [&#39;202.43.188.13:8080&#39;,
    &#39;80.243.185.168:1177&#39;,
    &#39;218.108.85.59:81&#39;]
    proxies = {&#39;http&#39;: &#39;http://%s&#39; % random.choice(ipList)}
    r = requests.get(baiduURL, headers = headers, proxies = proxies)
    return r.content
  
def getLastURL(rawurl): #get final URL while there&#39;re redirects
    r = requests.get(rawurl)
    return r.url
  
def getAtext(atext):    #get the text with <a> and </a>
    pat = re.compile(r&#39;<a .*?>(.*?)</a>&#39;)
    match = pat.findall(atext)
    pureText = match[0].replace(&#39;<em>&#39;, &#39;&#39;).replace(&#39;</em>&#39;, &#39;&#39;)
    return pureText
  
def getCacheDate(t):    #get the date of cache
    pat = re.compile(r&#39;<span class="g">.*?(\d{4}-\d{1,2}-\d{1,2})  </span>&#39;)
    match = pat.findall(t)
    cacheDate = match[0]
    return cacheDate
  
def getRank(checkWord, domain): #main line
    checkWord = checkWord.replace(&#39;\n&#39;, &#39;&#39;)
    checkWord = decodeAnyWord(checkWord)
    baiduURL = createURL(checkWord)
    cont = getContent(baiduURL)
    soup = BeautifulSoup.BeautifulSoup(cont)
    results = soup.findAll(&#39;table&#39;, {&#39;class&#39;: &#39;result&#39;})    #find all results in this page
    for result in results:
        checkData = unicode(result.find(&#39;span&#39;, {&#39;class&#39;: &#39;g&#39;}))
        if re.compile(r&#39;^[^/]*%s.*?&#39; %domain).match(checkData): #改正则
            nowRank = result[&#39;id&#39;]  #get the rank if match the domain info
  
            resLink = result.find(&#39;h3&#39;).a
            resURL = resLink[&#39;href&#39;]
            domainURL = getLastURL(resURL)  #get the target URL
            resTitle = getAtext(unicode(resLink))   #get the title of the target page
  
            rescache = result.find(&#39;span&#39;, {&#39;class&#39;: &#39;g&#39;})
            cacheDate = getCacheDate(unicode(rescache)) #get the cache date of the target page
  
            res = u&#39;%s, 第%s名, %s, %s, %s&#39; % (checkWord, nowRank, resTitle, cacheDate, domainURL)
            return res.encode(&#39;gb2312&#39;)
            break
    else:
        return &#39;>100&#39;
  
domain = &#39;www.douban.com&#39; #set the domain which you want to search.
  
  
  
f = open(&#39;r.txt&#39;)
for w in f.readlines():
    print getRank(w, domain)
  
f.close()

声明

この記事の内容はネチズンが自主的に寄稿したものであり、著作権は原著者に帰属します。このサイトは、それに相当する法的責任を負いません。盗作または侵害の疑いのあるコンテンツを見つけた場合は、admin@php.cn までご連絡ください。

Python：コンパイラまたはインタープリター？May 13, 2025 am 12:10 AM

Pythonは解釈された言語ですが、コンパイルプロセスも含まれています。 1）Pythonコードは最初にBytecodeにコンパイルされます。 2）ByteCodeは、Python Virtual Machineによって解釈および実行されます。 3）このハイブリッドメカニズムにより、Pythonは柔軟で効率的になりますが、完全にコンパイルされた言語ほど高速ではありません。

ループvs whileループ用のpython：いつ使用するか？May 13, 2025 am 12:07 AM

useaforloopwhenteratingoverasequenceor foraspificnumberoftimes; useawhileloopwhentinuninguntinuntilaConditionismet.forloopsareidealforknownownownownownownoptinuptinuptinuptinuptinutionsituations whileoopsuitsituations withinterminedationations。

Pythonループ：最も一般的なエラーMay 13, 2025 am 12:07 AM

pythonloopscanleadtoErrorslikeinfiniteloops、ModifiningListsDuringiteration、Off-Oneerrors、Zero-dexingissues、およびNestededLoopinefficiencies.toavoidhese：1）use'i

ループの場合、およびPythonのループ：それぞれの利点は何ですか？May 13, 2025 am 12:01 AM

forloopsareadvastountousforknowterations and sequences、offeringsimplicityandeadability;

Python：編集と解釈に深く掘り下げますMay 12, 2025 am 12:14 AM

pythonusesahybridmodelofcompilation andtertation：1）thepythoninterpretercompilessourcodeodeplatform-indopent bytecode.2）thepythonvirtualmachine（pvm）thenexecuteTesthisbytecode、balancingeaseoputhswithporformance。

Pythonは解釈されたものですか、それとも編集された言語であり、なぜそれが重要なのですか？May 12, 2025 am 12:09 AM

pythonisbothintersedand compiled.1）it'scompiledtobytecode forportabalityacrossplatforms.2）bytecodeisthenは解釈され、開発を許可します。

ループ対pythonのループの場合：説明されたキーの違いMay 12, 2025 am 12:08 AM

loopsareideal whenyouwhenyouknumberofiterationsinadvance、foreleloopsarebetterforsituationsは、loopsaremoreedilaConditionismetを使用します

ループのために：実用的なガイドMay 12, 2025 am 12:07 AM

henthenumber ofiterationsisknown advanceの場合、dopendonacondition.1）forloopsareideal foriterating over for -for -for -saredaverseversives likelistorarrays.2）whileopsaresupasiable forsaresutable forscenarioswheretheloopcontinupcontinuspificcond

See all articles

ホットAIツール

Undresser.AI Undress

リアルなヌード写真を作成する AI 搭載アプリ

AI Clothes Remover

写真から衣服を削除するオンライン AI ツール。

Undress AI Tool

脱衣画像を無料で

Clothoff.io

AI衣類リムーバー

Video Face Swap

完全無料の AI 顔交換ツールを使用して、あらゆるビデオの顔を簡単に交換できます。

ホットツール

VSCode Windows 64 ビットのダウンロード

Microsoft によって発売された無料で強力な IDE エディター

WebStorm Mac版

便利なJavaScript開発ツール

mPDF

mPDF は、UTF-8 でエンコードされた HTML から PDF ファイルを生成できる PHP ライブラリです。オリジナルの作者である Ian Back は、Web サイトから「オンザフライ」で PDF ファイルを出力し、さまざまな言語を処理するために mPDF を作成しました。 HTML2FPDF などのオリジナルのスクリプトよりも遅く、Unicode フォントを使用すると生成されるファイルが大きくなりますが、CSS スタイルなどをサポートし、多くの機能強化が施されています。 RTL (アラビア語とヘブライ語) や CJK (中国語、日本語、韓国語) を含むほぼすべての言語をサポートします。ネストされたブロックレベル要素 (P、DIV など) をサポートします。