Home  >  Q&A  >  body text

python - re小问题,新手轻喷

尝试抓取instagram图片分享地址从而下载图片

# -*- coding: utf-8 -*-
import urllib2
import re

response = urllib2.urlopen('https://www.instagram.com/p/BG5SpsYuSr-/')
html = response.read()  
#print html

catch = re.compile(r'//*[display_src="(.+?\.jpg)"]')
urls = re.findall(catch,html)
for i, url in enumerate(urls):
    print url
    

查看源代码发现图片地址在这两个地方

想请教一下各位怎样抓取图片的下载地址?

天蓬老师天蓬老师2711 days ago317

reply all(2)I'll reply

  • PHPz

    PHPz2017-04-17 17:58:47

    from pyquery import PyQuery as Q
    import urllib2
    
    response = urllib2.urlopen('https://www.instagram.com/p/BG5SpsYuSr-/')
    html = response.read()
    print Q(html).find('meta[property="og:image"]').attr('content')

    reply
    0
  • 黄舟

    黄舟2017-04-17 17:58:47

    As you can see from the second picture, the image address is in the js object. According to experience, the image is probably added by js. I can't enter the target website, so I don't know what it looks like.
    You can try to use this regular expression to extract the js object, convert it into json, and then get the data you want just like operating a dictionary

    <script type="text/javascript">[\w ]+=([\s\S]+?);</script>

    reply
    0
  • Cancelreply