首頁  >  問答  >  主體

python - 使用requests爬取[大街網]職位信息,嘗試多次失敗,幫忙看看我的程式碼有什麼問題?該怎麼改呢?

之前爬取過幾個靜態的網站數據,都還比較順利,這次遇到ajax,看了幾個文檔,感覺不是很難,就直接上手了,但還是卡住了。 。 。

目標:
爬取大街網的職位資訊。

流程:
1,使用瀏覽器審查元素功能查看資料動態載入的位址資訊。

#2,根據顯示的資訊配置requests的請求參數。

data = {
    'keyword': 'python',
    'order': '0',
    'city': '',
    'recruitType': '',
    'salary': '',
    'experience': '',
    'page': '5',
    'positionFunction': '',
    '_CSRFToken': '',
    'ajax': '1'
}
headers = {
    'accept': 'application/json, text/javascript, */*; q=0.01',
    'accept-language': 'zh-CN,zh;q=0.8',
    'accept-encoding': 'gzip, deflate, sdch',
    'cookie': 'DJ_UVID=MTQ5MDMyMTExNTAzODM2MTc5; DJ_RF=empty; DJ_EU=http%3A%2F%2Fjob.dajie.com%2F; __login_tips=1; dj_cap=9c8c95bdef72e84a9bd7493a5ab91694; USER_ACTION="request^A-^A-^Ajobdetail:^A-"; SO_COOKIE_V2=0c7cGprjIH0q9RHc53CWLLXf151DQ5QvUP5ccPQj4g0B/izuXHm8sp41lJjJJh3nmjAkroj8JczFN/SCLPAUzbOHW7wYWmQ6Zu7s',
    'referer': 'https://so.dajie.com/job/search?keyword=%E9%A3%9E%E5%88%A9%E6%B5%A6&from=job&clicktype=blank',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest',
    'method':'get'
}

3,將請求頭資訊加入requests.get()。

response = requests.get('https://so.dajie.com/job/ajax/search/filter', params=data, headers=headers)

4,查看回傳的頁面資訊。

print response.url
print ''
print response.request.headers
print ''
print response.headers
print ''
print response.content[-1000:]
print ''
print response

5,傳回的結果怎麼不是期望的json資料呢。 。 。

response.url:
https://so.dajie.com/job/ajax/search/filter?salary=&city=&ajax=1&positionFunction=&_CSRFToken=&keyword=python&recruitType=&order=0&experience=&page=5

response.request.headers:
{'accept-language': 'zh-CN,zh;q=0.8', 'accept-encoding': 'gzip, deflate, sdch', 'X-Requested-With': 'XMLHttpRequest', 'accept': 'application/json, text/javascript, */*; q=0.01', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36', 'Connection': 'keep-alive', 'referer': 'https://so.dajie.com/job/search?keyword=%E9%A3%9E%E5%88%A9%E6%B5%A6&from=job&clicktype=blank', 'cookie': 'DJ_UVID=MTQ5MDMyMTExNTAzODM2MTc5; DJ_RF=empty; DJ_EU=http%3A%2F%2Fjob.dajie.com%2F; __login_tips=1; dj_cap=9c8c95bdef72e84a9bd7493a5ab91694; USER_ACTION="request^A-^A-^Ajobdetail:^A-"; SO_COOKIE_V2=0c7cGprjIH0q9RHc53CWLLXf151DQ5QvUP5ccPQj4g0B/izuXHm8sp41lJjJJh3nmjAkroj8JczFN/SCLPAUzbOHW7wYWmQ6Zu7s', 'method': 'get'}

response.headers:
{'Date': 'Wed, 19 Apr 2017 02:00:47 GMT', 'Content-Length': '5944', 'ETag': '"552f21de-1738"', 'Content-Type': 'text/html; charset=UTF-8', 'Connection': 'keep-alive'}

response.content[-1000:]:
,这个页面去火星了,试试搜索一下吧:</p>
    <form action="http://so.dajie.com/job/search" target="_top" class="search" method="get">
      <input type="text" placeholder="搜索感兴趣的职位" autocomplete="off" name="keyword"/><button type="submit">搜索</button>
      <input type="hidden" name="jobsearch" value="8"/>
    </form>
  </p>
   <p class="error-404">
     <p class="buttonwrap">
       <a  class="button guest" id="guest" title="" href="http://www.dajie.com/"><b>逛逛大街</b></a>
       <a  class="button report" id="report" title="" href="mailto:service@dajie.com"><b>报告管理员</b></a>
     </p>
   </p>
  </p>
  <script type="text/javascript">
    $(function(){
      $('input[placeholder]').each(function(){
        var $dom = $(this);
        var tip = $dom.attr('placeholder');
        $.placeholder($dom, {
          placeTextClass : 'placeholder',
          placeText : tip
        });
      });
    });
  </script>
 </body>
</html>

response:
<Response [299]>

問題:
1,‘https://so.dajie.com/job/ajax...’,這個頁面打開怎麼不是json數據頁面?我之前看的教學裡邊給的連結開啟就是資料額,例如:『https://rate.tmall.com/list_d...』。
2,第一次使用requests請求ajax數據,是不是請求時少寫什麼東西了?
2,我現在只是嘗試修改了各種請求參數,但是還是得不到json數據,思考方向錯了?

謝。

大家讲道理大家讲道理2733 天前2126

全部回覆(1)我來回復

  • 黄舟

    黄舟2017-05-18 11:03:06

    # coding: utf-8
    
    import requests
    
    url = 'https://so.dajie.com/job/search'
    page_url = 'https://so.dajie.com/job/ajax/search/filter?keyword=python&order=0&city=&recruitType=&salary=&experience=&page=1&positionFunction=&_CSRFToken=&ajax=1'
    
    session = requests.Session()
    session.headers['referer'] = url
    session.headers['user-agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'
    
    session.get(url)
    r = session.get(page_url)
    print r.text
    

    直接傳入cookies

    # coding: utf-8
    
    import requests
    
    data = {
        'keyword': 'python',
        'order': '0',
        'city': '',
        'recruitType': '',
        'salary': '',
        'experience': '',
        'page': '5',
        'positionFunction': '',
        '_CSRFToken': '',
        'ajax': '1'
    }
    
    headers = {
        'cookie': 'DJ_RF=empty; DJ_EU=http%3A%2F%2Fso.dajie.com%2Fjob%2Fsearch%3Fkeyword%3Dpython%26jobsearch%3D8; DJ_UVID=MTQ5MjU2OTgxOTU1ODg0Mzk1; __login_tips=1; dj_cap=1e41c3c0ca9602c45e6481cb53c19774; SO_COOKIE_V2=6a297gxq5vDDnl9D4q04fhTgrWB11xG9lMj7iLcnP1uM/Zuzzx1dkeHauV4blsO1KsRYQKEQDrDGdiAhRE9efdI8PnREZK1MhzR4',
        'referer': 'https://so.dajie.com/job/search',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'
    }
    r = requests.get('https://so.dajie.com/job/ajax/search/filter', data=data, headers=headers)
    print r.text

    回覆
    0
  • 取消回覆