
Home  >  Q&A  >  body text

python - Use requests to crawl [] job information, but failed many times. Can you help me find out what is wrong with my code? How should it be changed?

I have crawled several static website data before, and everything was relatively smooth. This time I encountered ajax, and after reading a few documents, I felt that it was not difficult, so I got started directly, but I still got stuck. . .

Crawl job information from

1. Use the browser inspection element function to view the address information of dynamically loaded data.

2, configure the request parameters of requests according to the displayed information.

data = {
    'keyword': 'python',
    'order': '0',
    'city': '',
    'recruitType': '',
    'salary': '',
    'experience': '',
    'page': '5',
    'positionFunction': '',
    '_CSRFToken': '',
    'ajax': '1'
headers = {
    'accept': 'application/json, text/javascript, */*; q=0.01',
    'accept-language': 'zh-CN,zh;q=0.8',
    'accept-encoding': 'gzip, deflate, sdch',
    'cookie': 'DJ_UVID=MTQ5MDMyMTExNTAzODM2MTc5; DJ_RF=empty;; __login_tips=1; dj_cap=9c8c95bdef72e84a9bd7493a5ab91694; USER_ACTION="request^A-^A-^Ajobdetail:^A-"; SO_COOKIE_V2=0c7cGprjIH0q9RHc53CWLLXf151DQ5QvUP5ccPQj4g0B/izuXHm8sp41lJjJJh3nmjAkroj8JczFN/SCLPAUzbOHW7wYWmQ6Zu7s',
    'referer': '',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest',

3, add request header information to requests.get().

response = requests.get('', params=data, headers=headers)

4, View the returned page information.

print response.url
print ''
print response.request.headers
print ''
print response.headers
print ''
print response.content[-1000:]
print ''
print response

5, Why is the returned result not the expected json data. . .


{'accept-language': 'zh-CN,zh;q=0.8', 'accept-encoding': 'gzip, deflate, sdch', 'X-Requested-With': 'XMLHttpRequest', 'accept': 'application/json, text/javascript, */*; q=0.01', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36', 'Connection': 'keep-alive', 'referer': '', 'cookie': 'DJ_UVID=MTQ5MDMyMTExNTAzODM2MTc5; DJ_RF=empty;; __login_tips=1; dj_cap=9c8c95bdef72e84a9bd7493a5ab91694; USER_ACTION="request^A-^A-^Ajobdetail:^A-"; SO_COOKIE_V2=0c7cGprjIH0q9RHc53CWLLXf151DQ5QvUP5ccPQj4g0B/izuXHm8sp41lJjJJh3nmjAkroj8JczFN/SCLPAUzbOHW7wYWmQ6Zu7s', 'method': 'get'}

{'Date': 'Wed, 19 Apr 2017 02:00:47 GMT', 'Content-Length': '5944', 'ETag': '"552f21de-1738"', 'Content-Type': 'text/html; charset=UTF-8', 'Connection': 'keep-alive'}

    <form action="" target="_top" class="search" method="get">
      <input type="text" placeholder="搜索感兴趣的职位" autocomplete="off" name="keyword"/><button type="submit">搜索</button>
      <input type="hidden" name="jobsearch" value="8"/>
   <p class="error-404">
     <p class="buttonwrap">
       <a  class="button guest" id="guest" title="" href=""><b>逛逛大街</b></a>
       <a  class="button report" id="report" title="" href=""><b>报告管理员</b></a>
  <script type="text/javascript">
        var $dom = $(this);
        var tip = $dom.attr('placeholder');
        $.placeholder($dom, {
          placeTextClass : 'placeholder',
          placeText : tip

<Response [299]>

1, ‘’, why is this page opened not a json data page? The link given in the tutorial I watched before opens the data amount, for example: ‘’.
2. This is the first time I use requests to request ajax data. Is there anything less to write when requesting?
2, I just tried to modify various request parameters, but I still can’t get the json data. Is my thinking in the wrong direction?


大家讲道理大家讲道理2845 days ago2201

reply all(1)I'll reply

  • 黄舟

    黄舟2017-05-18 11:03:06

    # coding: utf-8
    import requests
    url = ''
    page_url = ''
    session = requests.Session()
    session.headers['referer'] = url
    session.headers['user-agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'
    r = session.get(page_url)
    print r.text

    Input cookies directly

    # coding: utf-8
    import requests
    data = {
        'keyword': 'python',
        'order': '0',
        'city': '',
        'recruitType': '',
        'salary': '',
        'experience': '',
        'page': '5',
        'positionFunction': '',
        '_CSRFToken': '',
        'ajax': '1'
    headers = {
        'cookie': 'DJ_RF=empty;; DJ_UVID=MTQ5MjU2OTgxOTU1ODg0Mzk1; __login_tips=1; dj_cap=1e41c3c0ca9602c45e6481cb53c19774; SO_COOKIE_V2=6a297gxq5vDDnl9D4q04fhTgrWB11xG9lMj7iLcnP1uM/Zuzzx1dkeHauV4blsO1KsRYQKEQDrDGdiAhRE9efdI8PnREZK1MhzR4',
        'referer': '',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'
    r = requests.get('', data=data, headers=headers)
    print r.text

  • Cancelreply