Rumah > Soal Jawab > teks badan
Saya telah merangkak beberapa data tapak web statik sebelum ini dan kali ini saya menemui ajax dan membaca beberapa dokumen, saya rasa ia tidak sukar, jadi saya mula terus, tetapi saya masih terperangkap. . .
Matlamat:
Maklumat kerja merangkak dari Dajie.com.
Proses:
1 Gunakan fungsi elemen pemeriksaan penyemak imbas untuk melihat maklumat alamat data yang dimuatkan secara dinamik.
2. Konfigurasikan parameter permintaan permintaan mengikut maklumat yang dipaparkan.
data = {
'keyword': 'python',
'order': '0',
'city': '',
'recruitType': '',
'salary': '',
'experience': '',
'page': '5',
'positionFunction': '',
'_CSRFToken': '',
'ajax': '1'
}
headers = {
'accept': 'application/json, text/javascript, */*; q=0.01',
'accept-language': 'zh-CN,zh;q=0.8',
'accept-encoding': 'gzip, deflate, sdch',
'cookie': 'DJ_UVID=MTQ5MDMyMTExNTAzODM2MTc5; DJ_RF=empty; DJ_EU=http%3A%2F%2Fjob.dajie.com%2F; __login_tips=1; dj_cap=9c8c95bdef72e84a9bd7493a5ab91694; USER_ACTION="request^A-^A-^Ajobdetail:^A-"; SO_COOKIE_V2=0c7cGprjIH0q9RHc53CWLLXf151DQ5QvUP5ccPQj4g0B/izuXHm8sp41lJjJJh3nmjAkroj8JczFN/SCLPAUzbOHW7wYWmQ6Zu7s',
'referer': 'https://so.dajie.com/job/search?keyword=%E9%A3%9E%E5%88%A9%E6%B5%A6&from=job&clicktype=blank',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
'method':'get'
}
3. Tambahkan maklumat pengepala permintaan pada requests.get().
response = requests.get('https://so.dajie.com/job/ajax/search/filter', params=data, headers=headers)
4. Lihat maklumat halaman yang dikembalikan.
print response.url
print ''
print response.request.headers
print ''
print response.headers
print ''
print response.content[-1000:]
print ''
print response
5 Mengapa hasil yang dikembalikan bukan data json yang dijangkakan? . .
response.url:
https://so.dajie.com/job/ajax/search/filter?salary=&city=&ajax=1&positionFunction=&_CSRFToken=&keyword=python&recruitType=&order=0&experience=&page=5
response.request.headers:
{'accept-language': 'zh-CN,zh;q=0.8', 'accept-encoding': 'gzip, deflate, sdch', 'X-Requested-With': 'XMLHttpRequest', 'accept': 'application/json, text/javascript, */*; q=0.01', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36', 'Connection': 'keep-alive', 'referer': 'https://so.dajie.com/job/search?keyword=%E9%A3%9E%E5%88%A9%E6%B5%A6&from=job&clicktype=blank', 'cookie': 'DJ_UVID=MTQ5MDMyMTExNTAzODM2MTc5; DJ_RF=empty; DJ_EU=http%3A%2F%2Fjob.dajie.com%2F; __login_tips=1; dj_cap=9c8c95bdef72e84a9bd7493a5ab91694; USER_ACTION="request^A-^A-^Ajobdetail:^A-"; SO_COOKIE_V2=0c7cGprjIH0q9RHc53CWLLXf151DQ5QvUP5ccPQj4g0B/izuXHm8sp41lJjJJh3nmjAkroj8JczFN/SCLPAUzbOHW7wYWmQ6Zu7s', 'method': 'get'}
response.headers:
{'Date': 'Wed, 19 Apr 2017 02:00:47 GMT', 'Content-Length': '5944', 'ETag': '"552f21de-1738"', 'Content-Type': 'text/html; charset=UTF-8', 'Connection': 'keep-alive'}
response.content[-1000:]:
,这个页面去火星了,试试搜索一下吧:</p>
<form action="http://so.dajie.com/job/search" target="_top" class="search" method="get">
<input type="text" placeholder="搜索感兴趣的职位" autocomplete="off" name="keyword"/><button type="submit">搜索</button>
<input type="hidden" name="jobsearch" value="8"/>
</form>
</p>
<p class="error-404">
<p class="buttonwrap">
<a class="button guest" id="guest" title="" href="http://www.dajie.com/"><b>逛逛大街</b></a>
<a class="button report" id="report" title="" href="mailto:service@dajie.com"><b>报告管理员</b></a>
</p>
</p>
</p>
<script type="text/javascript">
$(function(){
$('input[placeholder]').each(function(){
var $dom = $(this);
var tip = $dom.attr('placeholder');
$.placeholder($dom, {
placeTextClass : 'placeholder',
placeText : tip
});
});
});
</script>
</body>
</html>
response:
<Response [299]>
Soalan:
1. ‘https://so.dajie.com/job/ajax...’, kenapa halaman ini tidak membuka halaman data json? Pautan yang diberikan dalam tutorial yang saya tonton sebelum ini membuka jumlah data, contohnya: ‘https://rate.tmall.com/list_d...’.
2. Ini adalah kali pertama saya menggunakan permintaan untuk meminta data ajax.
2 Saya baru sahaja cuba mengubah suai pelbagai parameter permintaan, tetapi saya masih tidak boleh mendapatkan data json Adakah pemikiran saya ke arah yang salah?
Terima kasih.
黄舟2017-05-18 11:03:06
# coding: utf-8
import requests
url = 'https://so.dajie.com/job/search'
page_url = 'https://so.dajie.com/job/ajax/search/filter?keyword=python&order=0&city=&recruitType=&salary=&experience=&page=1&positionFunction=&_CSRFToken=&ajax=1'
session = requests.Session()
session.headers['referer'] = url
session.headers['user-agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'
session.get(url)
r = session.get(page_url)
print r.text
Masukkan kuki secara terus
# coding: utf-8
import requests
data = {
'keyword': 'python',
'order': '0',
'city': '',
'recruitType': '',
'salary': '',
'experience': '',
'page': '5',
'positionFunction': '',
'_CSRFToken': '',
'ajax': '1'
}
headers = {
'cookie': 'DJ_RF=empty; DJ_EU=http%3A%2F%2Fso.dajie.com%2Fjob%2Fsearch%3Fkeyword%3Dpython%26jobsearch%3D8; DJ_UVID=MTQ5MjU2OTgxOTU1ODg0Mzk1; __login_tips=1; dj_cap=1e41c3c0ca9602c45e6481cb53c19774; SO_COOKIE_V2=6a297gxq5vDDnl9D4q04fhTgrWB11xG9lMj7iLcnP1uM/Zuzzx1dkeHauV4blsO1KsRYQKEQDrDGdiAhRE9efdI8PnREZK1MhzR4',
'referer': 'https://so.dajie.com/job/search',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'
}
r = requests.get('https://so.dajie.com/job/ajax/search/filter', data=data, headers=headers)
print r.text