Home  >  Article  >  Backend Development  >  How to use Python to collect image data?

How to use Python to collect image data?

PHPz
PHPzforward
2023-05-09 09:34:161582browse

Send request

How to use Python to collect image data?

#We first determine the URL. We first use the developer tools to locate the data we want. Found that the content is in the source code of the web page.

url = 'https://www.hexuexiao.cn/tj/WuJiayi/'

res = requests.get(url)

# print(res.text)
html_url = re.findall(&#39;<a href="https://www.hexuexiao.cn/a/(\d+).html" rel="external nofollow"  >&#39;,res.text,re.S)
urls = sorted(list(set(html_url)), key=html_url.index)

What we have herehtml_url The value of the variable is a regular expression object used to match links to website images. Code block list(set(html_url)) Converts the list object to a collection object. Use the set() method to convert the list object to a collection object. This is a method to convert the elements in the list Method to convert to a collection. html_url.index is an integer object that represents the number of times each element in html_url appears in the original HTML string to achieve sorting.

Save data

for url1 in urls:
    for page in range(0,10):
        url2 = f&#39;https://www.hexuexiao.cn/a/{url1}-{page}.html&#39;
        # print(url2)
        res1 = requests.get(url2)
        # print(res1.text)
        url3 = re.findall(&#39;<img  src=(.*?)/ alt="How to use Python to collect image data?" ></a>&#39;,res1.text,re.S)[0]
        print(url3)
        url3=re.sub(&#39;&#39;,"",url3)
        print(url3)

The urls list in our code is a dictionary, where the key is the URL and the value is the page number. In the loop, we use the range() function to iterate the page numbers from 0 to 9. Next, we use the requests.get() method to obtain the HTML code of each page, and use regular expressions to match all image links. Finally, we use the requests.get() method to get the content of each image link and write it to a file.

Save the image

 content = requests.get(url3).content
        with open(&#39;图片\&#39; + str(num) + &#39;.jpg&#39;, mode=&#39;wb&#39;) as f:
            f.write(content)

The value of the content variable in our code is the content obtained from the image link url3. Then, use the with open() statement to open a binary file and write content to the file. In this process, the value of the num variable is the image serial number in the current file. Obtaining pictures is the same as obtaining audio before, and is saved as a binary file.

In this way, our image data is saved. Here, I won’t show the effect. The principle is the same. If we find the address of the image, we can get it.

The above is the detailed content of How to use Python to collect image data?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete