Home > Article > Web Front-end > Problem with POST crawling page_html/css_WEB-ITnose
A classmate reported that there was a problem with spider crawling a certain site through the post method, and it always sent 302 to itself. The details are as follows:
url: http://www. meituan.com/multiact/default/deal/25814805.html
post data: "yui_3_16_0_1_1423700000_000:{"act":"deal/dynamiccomponent","args":25814805,"__referer":""}"It can be captured normally through python. The capture code is as follows:
import urllibimport urllib2values = { 'yui_3_16_0_1_1423700000_000':'{"act":"deal/dynamiccomponent","args":25814805,"__referer":""}',}header={ "X-Requested-With":"XMLHttpRequest",}url="http://www.meituan.com/multiact/default/deal/25814805.html"data = urllib.urlencode(values)print datareq = urllib2.Request(url, data,header)response = urllib2.urlopen(req)the_page = response.read()print the_page
However, it cannot be captured by constructing the http request package by yourself. The request package is as follows:
POST /multiact/default/deal/25814805.html HTTP/1.1^M
Host: www.meituan.com^M
Content-Length: 126^M
Connection: close^M
Content-Type: application/x-www-form-urlencoded^M
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2^M
Accept-Encoding: gzip^M
Accept: */*^M
This parameter is missing: Content-Type: application/x-www-form-urlencoded^M
Just add it, as follows:
POST /multiact/default/deal /25814805.html HTTP/1.1^M
Host: www.meituan.com^M
Connection: close^M
Content-Type: application/x- www-form-urlencoded^M User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2^M
Accept-Encoding: gzip^M
Accept: */*^M
X-Requested-With: XMLHttpRequest^M
Content-Type: application/x-www-form-urlencoded^M