Home >Backend Development >Python Tutorial >How to crawl ajax in python

How to crawl ajax in python

(*-*)浩
(*-*)浩Original
2019-07-01 10:22:035185browse

Use python package: requests.

How to crawl ajax in python

Specific method: (Recommended learning: Python video tutorial)

First, define your own headers. Note that the User-Agent field in the headers can be used to design a list according to your own needs for random replacement.

Web page features of ajax data: There are some ajax requests in the XHR network flow in NetWork, among which their request_url must be an ajax request interface, and the referer in the headers is the url before the jump. When constructing itself The referer field needs to be set in the headers.

Take the search for "java" on Lagou.com's homepage as an example:

How to crawl ajax in python The ajax data crawler and the ordinary web crawler have one more url, one is the url of the referer. Placed in headers. The other one is ajax_url, which is also the main access url.

The most important point is that ajax generally returns json data, so the parsing form of the captured data is different. Simply convert the result set into a json result set, and the access method is ordinary dictionary or list access.

The other one is the access parameter. If the request contains param, it will be constructed. If not, ignore it. There are parameters in this example, pay attention to the parameter dictionary construction method

A simple complete code is shown below

from urllib.request import quote,unquote
import random
import requests
 
keyword = quote('java').strip()
print(keyword, type(keyword))
city = quote('郑州').strip()
print(unquote(city))
 
refer_url = 'https://www.lagou.com/jobs/list_%s?city=%s&cl=false&fromSearch=true&labelWords=&suginput=' % (keyword, city)
ajax_url = 'https://www.lagou.com/jobs/positionAjax.json?city=%s&needAddtionalResult=false' %city
user_agents=[
    'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 Edge/16.16299',
]
 
data ={
    'first': 'true',
    'pn': '1',
    'kd': keyword,
}
headers={
    'Accept': 'application/json, text/javascript, */*; q=0.01',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'zh-CN,zh;q=0.9',
    'Connection': 'keep-alive',
    'Content-Length': '46',
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'Host': 'www.lagou.com',
    'Origin': 'https://www.lagou.com',
    'Referer': refer_url,
    'User-Agent': user_agents[random.randrange(0,3)],
    'X-Anit-Forge-Code': '0',
    'X-Anit-Forge-Token': 'None',
    'X-Requested-With': 'XMLHttpRequest',
}
resp = requests.post(ajax_url,data=data, headers=headers)
 
result = resp.json()
print(result)
# print(result)
#result 就是最终获得的json格式数据
item = result['content']['positionResult']['result'][0]
print(item)
#item就是单个招聘条目信息
print("程序结束")

For more Python related technical articles, please visitPython tutorial Column for learning!

The above is the detailed content of How to crawl ajax in python. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn