Home > Article > Backend Development > How to use Python to collect image data?
#We first determine the URL. We first use the developer tools to locate the data we want. Found that the content is in the source code of the web page.
url = 'https://www.hexuexiao.cn/tj/WuJiayi/' res = requests.get(url) # print(res.text) html_url = re.findall('<a href="https://www.hexuexiao.cn/a/(\d+).html" rel="external nofollow" >',res.text,re.S) urls = sorted(list(set(html_url)), key=html_url.index)
What we have herehtml_url
The value of the variable is a regular expression object used to match links to website images. Code block list(set(html_url))
Converts the list object to a collection object. Use the set()
method to convert the list object to a collection object. This is a method to convert the elements in the list Method to convert to a collection. html_url.index
is an integer object that represents the number of times each element in html_url
appears in the original HTML string to achieve sorting.
for url1 in urls: for page in range(0,10): url2 = f'https://www.hexuexiao.cn/a/{url1}-{page}.html' # print(url2) res1 = requests.get(url2) # print(res1.text) url3 = re.findall('<img src=(.*?)/ alt="How to use Python to collect image data?" ></a>',res1.text,re.S)[0] print(url3) url3=re.sub('',"",url3) print(url3)
The urls
list in our code is a dictionary, where the key is the URL and the value is the page number. In the loop, we use the range()
function to iterate the page numbers from 0 to 9. Next, we use the requests.get()
method to obtain the HTML code of each page, and use regular expressions to match all image links. Finally, we use the requests.get()
method to get the content of each image link and write it to a file.
content = requests.get(url3).content with open('图片\' + str(num) + '.jpg', mode='wb') as f: f.write(content)
The value of the content
variable in our code is the content obtained from the image link url3
. Then, use the with open()
statement to open a binary file and write content
to the file. In this process, the value of the num
variable is the image serial number in the current file. Obtaining pictures is the same as obtaining audio before, and is saved as a binary file.
In this way, our image data is saved. Here, I won’t show the effect. The principle is the same. If we find the address of the image, we can get it.
The above is the detailed content of How to use Python to collect image data?. For more information, please follow other related articles on the PHP Chinese website!