The girl pictures feature on Jiedan.com has very high-quality beautiesPictures, today I will share the method of using python to download these girl pictures in batches
The knowledge and tools you need to know:
#1 Required. Understand the basic syntax of python. For this article, you only need to know how to operate list, for...in..., and how to define functions. . Learn the functions of web crawling, analyzing and saving files as you use them
##2 Need to install the third-party library Beautif. ulSoup4. Using pip to install is a very convenient method. The latest version of python comes with the pip tool. Press the windows+x shortcut key under Windows to open the command prompt (administrator) and enter
pip install beautifulsoup4Press Enter to run
Successfully installed or something like that appears The prompt message indicates that the installation is complete.
No knowledge of HTML is required, but a browser for viewing source code and elements is still required, such as chr##. #ome and firefox. (If you don’t have pip, please search
how to install pip.)Want to download two For all the images on more than a thousand web pages, you must first learn to download a web page:)Exercise
The download URL is: jandan.net/ooxx/page-2397#comments. Use chrome or firefox browser. After opening, right-click the mouse to view the source code of the web page. The web pages we see are presented to us by the browser after parsing the source code written in html,js, css, etc. The address is included in these source codes, so the first step is to download these html codes
#.
import urllib.request
url = 'http://jandan.net/ooxx/page-2397#comments'function
res = urllib.request.urlopen(url)urllib.request.urlopen() What does this function do? As its name suggests, it can be used to open a url. It can accept either a str (what we passed) or a Request
object
. The return value of this
is always an object that can work like a context manager, and has its own methods such as geturl(), info(), getcode(), etc. In fact, we don’t have to worry about that much. We just need to remember that this function can accept a URL and then return us an object containing all the information of this URL. We operate on this object.
Now read out the html code in the res object and assign it to the
html. Use the res.read() method.
(html)Try
print
##Intercepted part of the code.
At this time, you find that the result is different from the content that appears when you right-click the mouse - view the source code of the web page. It turns out that the return value of the read() method is n bytes... What the hell is this? Well, actually we can parse this return value and get the image address. But if you want to get the same html code as what you see in the browser, you can change the previous line of code to
html = res.read().decode('utf-8')
Then print(html)
## Part of the code has been intercepted.
OK! Same, this is because the decode('utf-8') of read() can encode the return value of read() in utf-8. But we still use html = res.read() because it also contains the information we need. So far we have only used 4 lines of python code to download and store the html code of the web page http://jandan.net/ooxx/page-2397#comments into the variable html. As follows:import urllib.request2. Parse the address Next, use beautifulsoup4 to parse html. How to determine where the html code corresponding to a certain picture is? Right click on the page - Inspect. At this time, the left half of the screen is the original web page, and the right half of the screen is html code and a bunch of functional#Download webpageurl = 'http://jandan.net/ooxx/page-2397# comments'res = urllib.request.urlopen(url)html = res.read()
buttons.
src="//wx2.sinaimg.cn/mw600/66b3de17gy1fdrf0wcuscj20p60zktad.jpg" part is the address of this picture, and src is the source. The style after src is its style, don't worry about it. You can try it out at this time, add http: before src, visit http://wx2.sinaimg.cn/mw600/66b3de17gy1fdrf0wcuscj20p60zktad.jpg and you should be able to see the original picture.
max-width are similar to key-value. This is related to the method used later to extract the address of the image.
Look at the codes corresponding to other pictures, you can see that their formats are the same, that is, they are all included in.
soup = BeautifulSoup(html,'html.parser')This line of code parses html into a soup object. We can easily operate on this object. For example, only extract the text content containing 'img':
result = soup.find_all('img')Use the find_all() method. print(result) You can see that the result is a list, and each element is a src-picture address key-value pair, but it contains
![Python crawler [1] Download girl pictures in batches](http://upload-images.jianshu.io/upload_images/137307-4a4aac5e5f09f481.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
Use the get method to extract the address in double quotes and add http: at the beginning.
links=[]
for content in result:links.app
endcontent.get('src') is to get the value corresponding to the key src in content, that is, the address in double quotes.
links.append() is a common method of adding elements to a list.
print(links) You can see that each element in this list is the original image address in double quotes. As shown below:
#Intercepted part of the code
Use a browser to open any address and you can see the corresponding picture! YO! This means we’re just down to the final step, download them!
The address extraction part is completed. The code is also quite concise, as follows:
#Parse web pages
from bs4 import BeautifulSoup
soup = BeautifulSoup(html,'html.parser')
result = soup. find_all('img')
links=[]
for content in result:
links.append('http:'+content.get('src') )
3. Download the picture
The last step is to visit the addresses in the links in sequence and download the picture!
At the beginning
import os
First create a photo folder to store the downloaded pictures. The following code will create the photo folder in this program. py file is located.
if not os.path.exists('photo'):
os.makedirs('photo')
We know that links are a list, so it is best to use loop to download, name and store them one by one.
i=0
for link in links:
i+=1
filename ='photo\\'+'photo'+str(i)+'.png'
with open(filename,'w ') as file:
urllib.request.urlretrieve(link,filename)
i is the loop variable, i+=1 is the statement to control the loop.
filename names the picture, but it actually creates a file with this name first and then writes the picture into it. As can be seen from the assignment statement of filename, 'photo\\' indicates that it is located in the photo folder, and the following 'photo'+str(i) is for order. After the complete download is complete, it will look like photo1, photo2, and photo3. It feels like '.png' is the suffix. Using the + sign to connect strings is also a common practice in python.
With these two lines of statements, get the image pointed to by the address in the link locally, and then store it in filename.
open(filename,'w'), open the folder filename, 'w' means the opening method is write. That is to say, open() accepts two parameters here, one is the file name (file path), and the other is the opening method.
The function of urllib.request.urlretrieve(link,filename) is to access the link link, and then retrieve a copy and put it into filename.
After writing part 3, click Run! You can find the photo folder in the path where the .py file is located, which is full of the pictures we downloaded~
The complete code is as follows:
import urllib.request
from bs4 import BeautifulSoup
import os
#Download webpage
url = 'http://jandan.net/ooxx/page-2397#comments'
res = urllib.request.urlopen(url)
html = res.read()
#Parsing web pages
soup = BeautifulSoup(html,'html.parser')
result = soup.find_all('img ')
links=[]
for content in result:
links.append('http:'+content.get('src'))
#Download and store pictures
if not os.path.exists('photo'):
os.makedirs('photo')
i=0
for link in links:
i+=1
filename ='photo\\'+'photo'+str(i)+'.png'
with open(filename,'w') as file:
urllib.request.urlretrieve(link,filename)
This small program is written in a process-oriented way. From top to bottom, there are no functions defined. This may be easier for newbies to understand.
Link to girl picture
http://jandan.net/ooxx/page-2397#comments Only the middle number will change between 1-2XXX.
url = 'http://jandan.net/ooxx/page-'+str(i)+'#comments'
Just change the value of i Downloaded in batches. However, some comments say that frequent visits to this website may result in your IP being blocked. I don’t understand this, so please try it yourself!
The above is the detailed content of Python crawler [1] Download girl pictures in batches. For more information, please follow other related articles on the PHP Chinese website!

TomergelistsinPython,youcanusethe operator,extendmethod,listcomprehension,oritertools.chain,eachwithspecificadvantages:1)The operatorissimplebutlessefficientforlargelists;2)extendismemory-efficientbutmodifiestheoriginallist;3)listcomprehensionoffersf

In Python 3, two lists can be connected through a variety of methods: 1) Use operator, which is suitable for small lists, but is inefficient for large lists; 2) Use extend method, which is suitable for large lists, with high memory efficiency, but will modify the original list; 3) Use * operator, which is suitable for merging multiple lists, without modifying the original list; 4) Use itertools.chain, which is suitable for large data sets, with high memory efficiency.

Using the join() method is the most efficient way to connect strings from lists in Python. 1) Use the join() method to be efficient and easy to read. 2) The cycle uses operators inefficiently for large lists. 3) The combination of list comprehension and join() is suitable for scenarios that require conversion. 4) The reduce() method is suitable for other types of reductions, but is inefficient for string concatenation. The complete sentence ends.

PythonexecutionistheprocessoftransformingPythoncodeintoexecutableinstructions.1)Theinterpreterreadsthecode,convertingitintobytecode,whichthePythonVirtualMachine(PVM)executes.2)TheGlobalInterpreterLock(GIL)managesthreadexecution,potentiallylimitingmul

Key features of Python include: 1. The syntax is concise and easy to understand, suitable for beginners; 2. Dynamic type system, improving development speed; 3. Rich standard library, supporting multiple tasks; 4. Strong community and ecosystem, providing extensive support; 5. Interpretation, suitable for scripting and rapid prototyping; 6. Multi-paradigm support, suitable for various programming styles.

Python is an interpreted language, but it also includes the compilation process. 1) Python code is first compiled into bytecode. 2) Bytecode is interpreted and executed by Python virtual machine. 3) This hybrid mechanism makes Python both flexible and efficient, but not as fast as a fully compiled language.

Useaforloopwheniteratingoverasequenceorforaspecificnumberoftimes;useawhileloopwhencontinuinguntilaconditionismet.Forloopsareidealforknownsequences,whilewhileloopssuitsituationswithundeterminediterations.

Pythonloopscanleadtoerrorslikeinfiniteloops,modifyinglistsduringiteration,off-by-oneerrors,zero-indexingissues,andnestedloopinefficiencies.Toavoidthese:1)Use'i


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

SublimeText3 Chinese version
Chinese version, very easy to use
