search
HomeBackend DevelopmentPython TutorialDetailed example of Python implementation of simple web page image capture

This article mainly introduces the complete code example of simple web page image grabbing in Python. It has certain reference value and friends in need can refer to it.

The steps to use python to capture network images are:
1. Obtain the web page source code according to the given URL
2. Use regular expressions to filter out the image addresses in the source code
3. Download network images based on the filtered image address

The following is a relatively simple implementation of grabbing images from a certain Baidu Tieba webpage:


# -*- coding: utf-8 -*- 
# feimengjuan 
import re 
import urllib 
import urllib2 
#抓取网页图片  
#根据给定的网址来获取网页详细信息,得到的html就是网页的源代码 
def getHtml(url): 
  page = urllib.urlopen(url) 
  html = page.read() 
  return html 
 def getImg(html): 
  #利用正则表达式把源代码中的图片地址过滤出来 
  reg = r'src="(.+?\.jpg)" pic_ext' 
  imgre = re.compile(reg) 
  imglist = imgre.findall(html) #表示在整个网页中过滤出所有图片的地址,放在imglist中 
  x = 0 
  for imgurl in imglist: 
    urllib.urlretrieve(imgurl,'%s.jpg' %x) #打开imglist中保存的图片网址,并下载图片保存在本地 
    x = x + 1 
 html = getHtml("http://tieba.baidu.com/p/2460150866")#获取该网址网页详细信息,得到的html就是网页的源代码 
getImg(html)#从网页源代码中分析并下载保存图片

Further organized the code and created a "pictures" folder locally to save the pictures


# -*- coding: utf-8 -*- 
# feimengjuan 
import re 
import urllib 
import urllib2 
import os 
#抓取网页图片  
#根据给定的网址来获取网页详细信息,得到的html就是网页的源代码 
def getHtml(url): 
  page = urllib.urlopen(url) 
  html = page.read() 
  return html 
 
#创建保存图片的文件夹 
def mkdir(path): 
  path = path.strip() 
  # 判断路径是否存在 
  # 存在  True 
  # 不存在 Flase 
  isExists = os.path.exists(path) 
  if not isExists: 
    print u'新建了名字叫做',path,u'的文件夹' 
    # 创建目录操作函数 
    os.makedirs(path) 
    return True 
  else: 
    # 如果目录存在则不创建,并提示目录已经存在 
    print u'名为',path,u'的文件夹已经创建成功' 
    return False 
# 输入文件名,保存多张图片 
def saveImages(imglist,name): 
  number = 1 
  for imageURL in imglist: 
    splitPath = imageURL.split('.') 
    fTail = splitPath.pop() 
    if len(fTail) > 3: 
      fTail = 'jpg' 
    fileName = name + "/" + str(number) + "." + fTail 
    # 对于每张图片地址,进行保存 
    try: 
      u = urllib2.urlopen(imageURL) 
      data = u.read() 
      f = open(fileName,'wb+') 
      f.write(data) 
      print u'正在保存的一张图片为',fileName 
      f.close() 
    except urllib2.URLError as e: 
      print (e.reason) 
    number += 1  
#获取网页中所有图片的地址 
def getAllImg(html): 
  #利用正则表达式把源代码中的图片地址过滤出来 
  reg = r'src="(.+?\.jpg)" pic_ext' 
  imgre = re.compile(reg) 
  imglist = imgre.findall(html) #表示在整个网页中过滤出所有图片的地址,放在imglist中 
  return imglist   
#创建本地保存文件夹,并下载保存图片 
if __name__ == '__main__': 
  html = getHtml("http://tieba.baidu.com/p/2460150866")#获取该网址网页详细信息,得到的html就是网页的源代码 
  path = u'图片' 
  mkdir(path) #创建本地文件夹 
  imglist = getAllImg(html) #获取图片的地址列表 
  saveImages(imglist,path) # 保存图片

As a result, dozens of files were saved in the "pictures" folder Pictures, such as screenshots:

Related recommendations:

PHP simple web page image grabbing class

How to capture the images here to local

Example of PHP+Ajax remote image grabber download

The above is the detailed content of Detailed example of Python implementation of simple web page image capture. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
How do you append elements to a Python array?How do you append elements to a Python array?Apr 30, 2025 am 12:19 AM

InPython,youappendelementstoalistusingtheappend()method.1)Useappend()forsingleelements:my_list.append(4).2)Useextend()or =formultipleelements:my_list.extend(another_list)ormy_list =[4,5,6].3)Useinsert()forspecificpositions:my_list.insert(1,5).Beaware

How do you debug shebang-related issues?How do you debug shebang-related issues?Apr 30, 2025 am 12:17 AM

The methods to debug the shebang problem include: 1. Check the shebang line to make sure it is the first line of the script and there are no prefixed spaces; 2. Verify whether the interpreter path is correct; 3. Call the interpreter directly to run the script to isolate the shebang problem; 4. Use strace or trusts to track the system calls; 5. Check the impact of environment variables on shebang.

How do you remove elements from a Python array?How do you remove elements from a Python array?Apr 30, 2025 am 12:16 AM

Pythonlistscanbemanipulatedusingseveralmethodstoremoveelements:1)Theremove()methodremovesthefirstoccurrenceofaspecifiedvalue.2)Thepop()methodremovesandreturnsanelementatagivenindex.3)Thedelstatementcanremoveanitemorslicebyindex.4)Listcomprehensionscr

What data types can be stored in a Python list?What data types can be stored in a Python list?Apr 30, 2025 am 12:07 AM

Pythonlistscanstoreanydatatype,includingintegers,strings,floats,booleans,otherlists,anddictionaries.Thisversatilityallowsformixed-typelists,whichcanbemanagedeffectivelyusingtypechecks,typehints,andspecializedlibrarieslikenumpyforperformance.Documenti

What are some common operations that can be performed on Python lists?What are some common operations that can be performed on Python lists?Apr 30, 2025 am 12:01 AM

Pythonlistssupportnumerousoperations:1)Addingelementswithappend(),extend(),andinsert().2)Removingitemsusingremove(),pop(),andclear().3)Accessingandmodifyingwithindexingandslicing.4)Searchingandsortingwithindex(),sort(),andreverse().5)Advancedoperatio

How do you create multi-dimensional arrays using NumPy?How do you create multi-dimensional arrays using NumPy?Apr 29, 2025 am 12:27 AM

Create multi-dimensional arrays with NumPy can be achieved through the following steps: 1) Use the numpy.array() function to create an array, such as np.array([[1,2,3],[4,5,6]]) to create a 2D array; 2) Use np.zeros(), np.ones(), np.random.random() and other functions to create an array filled with specific values; 3) Understand the shape and size properties of the array to ensure that the length of the sub-array is consistent and avoid errors; 4) Use the np.reshape() function to change the shape of the array; 5) Pay attention to memory usage to ensure that the code is clear and efficient.

Explain the concept of 'broadcasting' in NumPy arrays.Explain the concept of 'broadcasting' in NumPy arrays.Apr 29, 2025 am 12:23 AM

BroadcastinginNumPyisamethodtoperformoperationsonarraysofdifferentshapesbyautomaticallyaligningthem.Itsimplifiescode,enhancesreadability,andboostsperformance.Here'showitworks:1)Smallerarraysarepaddedwithonestomatchdimensions.2)Compatibledimensionsare

Explain how to choose between lists, array.array, and NumPy arrays for data storage.Explain how to choose between lists, array.array, and NumPy arrays for data storage.Apr 29, 2025 am 12:20 AM

ForPythondatastorage,chooselistsforflexibilitywithmixeddatatypes,array.arrayformemory-efficienthomogeneousnumericaldata,andNumPyarraysforadvancednumericalcomputing.Listsareversatilebutlessefficientforlargenumericaldatasets;array.arrayoffersamiddlegro

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.