search

Home  >  Q&A  >  body text

python - 爬虫中的图片该怎么处理?

如题,比如爬取新闻类,该新闻中含图片,图片该怎么处理,如果有多张图片呢

类似

     [文字]  
     [图片]  
     [文字]

或者

     [文字]  
     [图片]  
     [文字]
     [图片]
     [文字]

需要下载图片到本地?还是直接用该网站的图片源,如果要下载到本地,文字内容上又该怎么处理。


多谢各位的回答,其实我想问得一点是怎么将图片保持在原位置,比如scrapy中可以使用

p.xpath('p/text()').extract()

得到文字内容

p.xpath('p/img/@src').extract()

定位图片,那么怎么保证图片的位置和原来的位置一样呢

PHPzPHPz2889 days ago455

reply all(6)I'll reply

  • ringa_lee

    ringa_lee2017-04-17 17:55:05

    If there is no need to save or collect (for example, you are afraid that the website will be closed or the original image will become invalid), you can directly use the image source of the website. There are no problems in terms of space, management, or copyright. Of course, the difficulty of doing this is also relatively low.

    reply
    0
  • 黄舟

    黄舟2017-04-17 17:55:05

    If you can externally link, do so, but be careful to prevent hotlinking. The safest way is to download it locally

    reply
    0
  • ringa_lee

    ringa_lee2017-04-17 17:55:05

    You can use Bs4 to select the corresponding node, xpath can also be used, and you can extract anything you want

    reply
    0
  • 迷茫

    迷茫2017-04-17 17:55:05

    Download to local, then replace src in the web page with the local relative directory

    reply
    0
  • ringa_lee

    ringa_lee2017-04-17 17:55:05

    News? Portal sites basically have anti-leeching protection

    It is better to download the fake Referer to the local first, and then replace the image address in the original text with the local address

    reply
    0
  • 巴扎黑

    巴扎黑2017-04-17 17:55:05

    http://blog.csdn.net/qq_34844199/article/details/51468841 After reading this, everything is clear

    reply
    0
  • Cancelreply