search

Home  >  Q&A  >  body text

python爬虫 - python如何爬取带Ajax的网页连接

用python爬取网页图片,想要得到jpg的连接,然而爬到的网页没有该内容,代码如下:
import urllib.request
import requests
import bs4

Url=str("http://tw.ikanman.com/comic/8928/87948.html#p=8")

html=requests.get(Url)
html.encoding='utf-8'
html=html.text
soup=bs4.BeautifulSoup(html,'lxml')
print (soup)

大家讲道理大家讲道理2899 days ago621

reply all(2)I'll reply

  • PHPz

    PHPz2017-04-18 09:56:47

    This webpage does not use ajax. It also encrypts the js code, which can be decrypted, but it is not very convenient. It is recommended to use selenium+browser to operate, and search for tutorials by yourself, which are also available on this site.

    reply
    0
  • ringa_lee

    ringa_lee2017-04-18 09:56:47

    Request header information:

    GET /ps4/g/%E5%8F%A4%E6%83%91%E4%BB%94[%E7%89%9B%E4%BD%AC]/Vol_002/iieye0013-16663.jpg HTTP/1.1
    Host: i.hamreus.com:8080
    Connection: keep-alive
    Pragma: no-cache
    Cache-Control: no-cache
    User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36
    Accept: image/webp,image/*,*/*;q=0.8
    Referer: http://tw.ikanman.com/comic/8928/87948.html
    Accept-Encoding: gzip, deflate, sdch
    Accept-Language: zh-CN,zh;q=0.8
    

    Return header information:

    HTTP/1.1 302 Moved Temporarily
    Server: nginx/1.10.0 (Ubuntu)
    Date: Fri, 11 Nov 2016 03:23:15 GMT
    Content-Type: text/html
    Content-Length: 170
    Connection: keep-alive
    Location: http://p.yogajx.com/ps4/g/%E5%8F%A4%E6%83%91%E4%BB%94[%E7%89%9B%E4%BD%AC]/Vol_002/iieye0013-16663.jpg
    

    In fact, it is a 302 jump, please check the relevant information yourself

    reply
    0
  • Cancelreply