网页爬虫 - 用python的requests库爬虫时,post参数后却没有返回预期的结果


import requests
import urllib
from bs4 import BeautifulSoup

url = ''
headers = {'User_Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}

def get_data(url):
        if html.status_code == 200:
            return soup
    except urllib.error.HTTPError as e:
        print(url, e, str(time.time()))



<!DOCTYPE html>
<!--STATUS OK--><html>
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
<meta content="text/html;charset=utf-8" http-equiv="content-type"/>
<meta content="always" name="referrer"/>
<body link="#0000cc">
<p class="wrapper_l" id="wrapper">
<p id="head">
<p class="head_wrapper">
<p class="s_form">
<p class="s_form_wrapper">
<a href="/" id="result_logo"><img alt="到百度首页" src="//" title="到百度首页"/></a>
<form action="/s" class="fm" id="form" name="f">
<input name="ie" type="hidden" value="utf-8"/>
<input name="f" type="hidden" value="8"/>
<input name="rsv_bp" type="hidden" value="1"/>
<input name="ch" type="hidden" value=""/>
<input name="tn" type="hidden" value="baiduerr"/>
<input name="bar" type="hidden" value=""/>
<span class="bg s_ipt_wr iptfocus">
<input autocomplete="off" autofocus="" class="s_ipt" id="kw" maxlength="255" name="wd" value=""/>
</span><span class="bg s_btn_wr">
<input class="bg s_btn" id="su" type="submit" value="百度一下"/>
<p class="s_tab" id="s_tab"><b>网页</b><a href=";rn=20&amp;tn=news&amp;word=" wdfield="word">新闻</a><a href=";fr=wwwt" wdfield="kw">贴吧</a><a href=";pn=0&amp;tn=ikaslist&amp;rn=10&amp;word=&amp;fr=wwwt" wdfield="word">知道</a><a href=";ie=utf-8&amp;key=" wdfield="key">音乐</a><a href=";ps=1&amp;ct=201326592&amp;lm=-1&amp;cl=2&amp;nc=1&amp;ie=utf-8&amp;word=" wdfield="word">图片</a><a href=";rn=20&amp;pn=0&amp;db=0&amp;s=25&amp;ie=utf-8&amp;word=" wdfield="word">视频</a><a href=";fr=ps01000" wdfield="word">地图</a><a href=";lm=0&amp;od=0&amp;ie=utf-8" wdfield="word">文库</a><a href="//">更多»</a></p>
<p id="wrapper_wrapper">
<p id="content_left">
<p class="nors">
<p class="norsSuggest">
<h3 class="norsTitle">很抱歉,您要访问的页面不存在!</h3>
<p class="norsTitle2">温馨提示:</p>
<li>如果您不能确认访问的网址,请浏览<a href="//">百度更多</a>页面查看更多网址。</li>
<li>如有任何意见或建议,请及时<a href="">反馈给我们</a>。</li>
<p id="foot">
<span id="help" style="float:left;padding-left:121px">
<a href="" target="_blank">帮助</a>
<a href="" target="_blank">举报</a>
<a href="" target="_blank">给百度提建议</a>
    var bds = {
        util: {}
    var c = document.getElementById('kw').parentNode;

    bds.util.getWinWidth = function(){
        return window.document.documentElement.clientWidth;

    bds.util.setFormWidth = function(){
        var width = bds.util.getWinWidth();
        if(width < 1217)    {bds.util.setClass(c, 'ip_short', 'add')}
        else                {bds.util.setClass(c, 'ip_short', 'remove')};

    bds.util.setClass = function(obj, class_name, set) {
        var ori_class = obj.className,
            has_class_p = -1,
            ori_class_arr = [],
            new_class = '';

        if(ori_class.length) ori_class_arr = ori_class.split(' ');

        for( i in ori_class_arr) {
            if(ori_class_arr[i] == class_name) has_class_p = i;

        if( set == 'remove' && has_class_p >= 0) {
            ori_class_arr.splice(has_class_p, 1);
            new_class = ori_class_arr.join(' ');
            obj.className = new_class;
        } else if( set == 'add' && has_class_p < 0) {
            new_class = ori_class_arr.join(' ');
            obj.className = new_class;

    if (typeof document.addEventListener != "undefined") {
        window.addEventListener('resize', bds.util.setFormWidth, false);
        document.getElementById('kw').addEventListener('focus', function(){bds.util.setClass(c,'iptfocus', 'add');}, false);
        document.getElementById('kw').addEventListener('blur', function(){bds.util.setClass(c,'iptfocus', 'remove');}, false);
    } else {
        window.attachEvent('onresize', bds.util.setFormWidth, false);
        document.getElementById('kw').attachEvent('onfocus', function(){bds.util.setClass(c,'iptfocus', 'add');}, false);
        document.getElementById('kw').attachEvent('onblur', function(){bds.util.setClass(c,'iptfocus', 'remove');}, false);


高洛峰高洛峰2812 days ago958

reply all(2)I'll reply

  • 高洛峰

    高洛峰2017-04-18 09:43:09

    POST is not possible, use GET to request this URL

    You can test it like this under bash:

    curl\?wd\=python > python.html

    Then open this python.html and take a look

  • 天蓬老师

    天蓬老师2017-04-18 09:43:09

    You are talking about Baidu’s search prompt list. They use get requests, not post

    >>> r=requests.get('
    >>> r.text
    '{q:"python",p:false,s:["python基础教程","python set","python j
    son","python mysql","python web开发","python requests","python for循环","python3
    ","python环境变量设置","python 多线程"]});'

    The URL you enter directly is
    Just requests.get directly

  • Cancelreply