Home  >  Q&A  >  body text

python - 如何爬取带有日期选择的ajax网站?

需要爬取三峡水库的实时水情数据,可以在网页中选择日期显示水情信息,如果一天天选择再复制数据发现很是耗时,我现在需要将下图中三峡水利枢纽2014年-2016年每天的数据爬下来。

网址如下:
http://www.ctgpc.com.cn/sxjt/...

通过浏览器自带的检查工具,右键检查元素,查看 network,查看调用的 ajax API 地址:初步分析后发现是通过ajax调用了以下网址,并用POST传递了一个日期数据,例如今天2017-02-15给该网址:
http://www.ctgpc.com.cn/eport...

Header如下:

Response如下:

之前有搜索到类似的问题:https://segmentfault.com/q/10...
但是按照回答并没能解决我的疑惑,因此在这里求助各位前辈,麻烦大家了

伊谢尔伦伊谢尔伦2740 days ago1208

reply all(4)I'll reply

  • 伊谢尔伦

    伊谢尔伦2017-04-18 10:21:32

    You can use the requests library to simulate post submission. From the browser inspection tool, you can see that the passed parameter is time:2017-02-07. Define data={"time": date such as 2017-02-07}. Then you can write a loop that loops through the date and adds one day to it. Then r = requests.post("url", data=data, header=****). Take out the data and save it into the database. If each cycle is too slow, you can add gevent, a coroutine library, to speed up the process. If you want to capture 2 years of data, cycle 365*2 times and it will be OK

    reply
    0
  • 伊谢尔伦

    伊谢尔伦2017-04-18 10:21:32

    You’ve seen that request with data, so what’s your question?

    reply
    0
  • 迷茫

    迷茫2017-04-18 10:21:32

    Capture the packet and then simulate post or get
    Look at the content below
    Python crawler association word video and code
    https://zhuanlan.zhihu.com/p/...

    Learn Python crawler to capture proxy IP and verification from Brother Huang.
    https://zhuanlan.zhihu.com/p/...
    Learn Python crawler to capture proxy IP from Huang Ge
    https://zhuanlan.zhihu.com/p/...

    reply
    0
  • PHP中文网

    PHP中文网2017-04-18 10:21:32

    Already got the Json string, it’s easier to get the data

    reply
    0
  • Cancelreply