Detailed example of Python implementation of simple web page image capture-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Detailed example of Python implementation of simple web page image capture

小云云

Dec 18, 2017 am 10:52 AM

pythonaccomplishSimple

This article mainly introduces the complete code example of simple web page image grabbing in Python. It has certain reference value and friends in need can refer to it.

The steps to use python to capture network images are:
1. Obtain the web page source code according to the given URL
2. Use regular expressions to filter out the image addresses in the source code
3. Download network images based on the filtered image address

The following is a relatively simple implementation of grabbing images from a certain Baidu Tieba webpage:

# -*- coding: utf-8 -*- 
# feimengjuan 
import re 
import urllib 
import urllib2 
#抓取网页图片  
#根据给定的网址来获取网页详细信息，得到的html就是网页的源代码 
def getHtml(url): 
  page = urllib.urlopen(url) 
  html = page.read() 
  return html 
 def getImg(html): 
  #利用正则表达式把源代码中的图片地址过滤出来 
  reg = r&#39;src="(.+?\.jpg)" pic_ext&#39; 
  imgre = re.compile(reg) 
  imglist = imgre.findall(html) #表示在整个网页中过滤出所有图片的地址，放在imglist中 
  x = 0 
  for imgurl in imglist: 
    urllib.urlretrieve(imgurl,&#39;%s.jpg&#39; %x) #打开imglist中保存的图片网址，并下载图片保存在本地 
    x = x + 1 
 html = getHtml("http://tieba.baidu.com/p/2460150866")#获取该网址网页详细信息，得到的html就是网页的源代码 
getImg(html)#从网页源代码中分析并下载保存图片

Further organized the code and created a "pictures" folder locally to save the pictures

# -*- coding: utf-8 -*- 
# feimengjuan 
import re 
import urllib 
import urllib2 
import os 
#抓取网页图片  
#根据给定的网址来获取网页详细信息，得到的html就是网页的源代码 
def getHtml(url): 
  page = urllib.urlopen(url) 
  html = page.read() 
  return html 
 
#创建保存图片的文件夹 
def mkdir(path): 
  path = path.strip() 
  # 判断路径是否存在 
  # 存在  True 
  # 不存在 Flase 
  isExists = os.path.exists(path) 
  if not isExists: 
    print u&#39;新建了名字叫做&#39;,path,u&#39;的文件夹&#39; 
    # 创建目录操作函数 
    os.makedirs(path) 
    return True 
  else: 
    # 如果目录存在则不创建，并提示目录已经存在 
    print u&#39;名为&#39;,path,u&#39;的文件夹已经创建成功&#39; 
    return False 
# 输入文件名，保存多张图片 
def saveImages(imglist,name): 
  number = 1 
  for imageURL in imglist: 
    splitPath = imageURL.split(&#39;.&#39;) 
    fTail = splitPath.pop() 
    if len(fTail) > 3: 
      fTail = &#39;jpg&#39; 
    fileName = name + "/" + str(number) + "." + fTail 
    # 对于每张图片地址，进行保存 
    try: 
      u = urllib2.urlopen(imageURL) 
      data = u.read() 
      f = open(fileName,&#39;wb+&#39;) 
      f.write(data) 
      print u&#39;正在保存的一张图片为&#39;,fileName 
      f.close() 
    except urllib2.URLError as e: 
      print (e.reason) 
    number += 1  
#获取网页中所有图片的地址 
def getAllImg(html): 
  #利用正则表达式把源代码中的图片地址过滤出来 
  reg = r&#39;src="(.+?\.jpg)" pic_ext&#39; 
  imgre = re.compile(reg) 
  imglist = imgre.findall(html) #表示在整个网页中过滤出所有图片的地址，放在imglist中 
  return imglist   
#创建本地保存文件夹，并下载保存图片 
if __name__ == &#39;__main__&#39;: 
  html = getHtml("http://tieba.baidu.com/p/2460150866")#获取该网址网页详细信息，得到的html就是网页的源代码 
  path = u&#39;图片&#39; 
  mkdir(path) #创建本地文件夹 
  imglist = getAllImg(html) #获取图片的地址列表 
  saveImages(imglist,path) # 保存图片

As a result, dozens of files were saved in the "pictures" folder Pictures, such as screenshots:

Related recommendations:

PHP simple web page image grabbing class

How to capture the images here to local

Example of PHP+Ajax remote image grabber download

The above is the detailed content of Detailed example of Python implementation of simple web page image capture. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

How do you append elements to a Python array?Apr 30, 2025 am 12:19 AM

InPython,youappendelementstoalistusingtheappend()method.1)Useappend()forsingleelements:my_list.append(4).2)Useextend()or =formultipleelements:my_list.extend(another_list)ormy_list =[4,5,6].3)Useinsert()forspecificpositions:my_list.insert(1,5).Beaware

How do you debug shebang-related issues?Apr 30, 2025 am 12:17 AM

The methods to debug the shebang problem include: 1. Check the shebang line to make sure it is the first line of the script and there are no prefixed spaces; 2. Verify whether the interpreter path is correct; 3. Call the interpreter directly to run the script to isolate the shebang problem; 4. Use strace or trusts to track the system calls; 5. Check the impact of environment variables on shebang.

How do you remove elements from a Python array?Apr 30, 2025 am 12:16 AM

Pythonlistscanbemanipulatedusingseveralmethodstoremoveelements:1)Theremove()methodremovesthefirstoccurrenceofaspecifiedvalue.2)Thepop()methodremovesandreturnsanelementatagivenindex.3)Thedelstatementcanremoveanitemorslicebyindex.4)Listcomprehensionscr

What data types can be stored in a Python list?Apr 30, 2025 am 12:07 AM

Pythonlistscanstoreanydatatype,includingintegers,strings,floats,booleans,otherlists,anddictionaries.Thisversatilityallowsformixed-typelists,whichcanbemanagedeffectivelyusingtypechecks,typehints,andspecializedlibrarieslikenumpyforperformance.Documenti

What are some common operations that can be performed on Python lists?Apr 30, 2025 am 12:01 AM

Pythonlistssupportnumerousoperations:1)Addingelementswithappend(),extend(),andinsert().2)Removingitemsusingremove(),pop(),andclear().3)Accessingandmodifyingwithindexingandslicing.4)Searchingandsortingwithindex(),sort(),andreverse().5)Advancedoperatio

How do you create multi-dimensional arrays using NumPy?Apr 29, 2025 am 12:27 AM

Create multi-dimensional arrays with NumPy can be achieved through the following steps: 1) Use the numpy.array() function to create an array, such as np.array([[1,2,3],[4,5,6]]) to create a 2D array; 2) Use np.zeros(), np.ones(), np.random.random() and other functions to create an array filled with specific values; 3) Understand the shape and size properties of the array to ensure that the length of the sub-array is consistent and avoid errors; 4) Use the np.reshape() function to change the shape of the array; 5) Pay attention to memory usage to ensure that the code is clear and efficient.

Explain the concept of 'broadcasting' in NumPy arrays.Apr 29, 2025 am 12:23 AM

BroadcastinginNumPyisamethodtoperformoperationsonarraysofdifferentshapesbyautomaticallyaligningthem.Itsimplifiescode,enhancesreadability,andboostsperformance.Here'showitworks:1)Smallerarraysarepaddedwithonestomatchdimensions.2)Compatibledimensionsare

Explain how to choose between lists, array.array, and NumPy arrays for data storage.Apr 29, 2025 am 12:20 AM

ForPythondatastorage,chooselistsforflexibilitywithmixeddatatypes,array.arrayformemory-efficienthomogeneousnumericaldata,andNumPyarraysforadvancednumericalcomputing.Listsareversatilebutlessefficientforlargenumericaldatasets;array.arrayoffersamiddlegro

See all articles