search

Home  >  Q&A  >  body text

python - Use urllib to grab the download link on the web page. The target file is in the form of xls, but it is found that the captured xls is an empty table with only one error message in it. Please help.

I want to use urllib to grab the xls download link of the Shanghai Stock Exchange stock list, as shown in the small red box below:

I found that the captured xls only reported error message:

How can I capture the xls with content?

code show as below

from urllib import request
from datetime import datetime

# -*- coding:utf-8 -*-

url = 'http://query.sse.com.cn/security/stock/downloadStockListFile.do?' \
      'csrcCode=&stockCode=&areaName=&stockType=1'

myheaders = [('User - Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.2) AppleWebKit/525.13'
                              ' (KHTML, like Gecko) Version/3.1 Safari/525.13'),]

opener = request.build_opener()
opener.addheaders = myheaders
request.install_opener(opener)

local = "/Users/Mty/Downloads/data/" + str(datetime.now().date()) + " .xls"

request.urlretrieve(url, local)
阿神阿神2787 days ago744

reply all(2)I'll reply

  • 黄舟

    黄舟2017-05-18 10:48:56

    You can see the returned company information on the URL marked with a red line. The rest is to simulate the browser requesting this URL. The refer in the request header must not be omitted, otherwise 403 will be reported

    Remember to simulate the value of refer.

    http://blog.csdn.net/ssshen14...
    This is an existing solution

    reply
    0
  • 曾经蜡笔没有小新

    曾经蜡笔没有小新2017-05-18 10:48:56

    View cookies, referer

    reply
    0
  • Cancelreply