Home >Backend Development >Python Tutorial >Basic crawler exercises—python crawler download Douban girl pictures

Basic crawler exercises—python crawler download Douban girl pictures

高洛峰
高洛峰Original
2017-02-16 10:52:171896browse

Download the girl pictures on the designated website. Here we only capture the first 100 pages of pictures. You can set the number of pages according to your needs.
The cat value is the picture type. You can change the cat value to experience it yourself. If you have any questions, leave a message to I will answer when I see it
2 = Big breasted girl
3 = Beautiful leg control
4 = Good looks
5 = Hodgepodge
6 = Small buttocks

import requests
import re
import time
from bs4 import BeautifulSoup

cat ='2'
img = 'http://www.dbmeinv.com/dbgroup/show.htm?cid='+ cat
end = '/dbgroup/show.htm?cid='+ cat + '&pager_offset=100'
urls = [ ]
def getURLs(mainURL):
    time.sleep(1)
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36'}
    html = requests.get(mainURL).text
    soup = BeautifulSoup(html, 'html.parser')
    picURL = re.findall('<img class.*?src="(.+?\.jpg)"', html, re.S)
    for url in picURL:
        urls.append(url)
        print(url)
    asoup = soup.select('.next a')[0]['href']
    Next_page = 'http://www.dbmeinv.com' + asoup
    if asoup != end:
        getURLs(Next_page)
    else:
        print('链接已处理完毕!')
    return urls
url = getURLs(img)

i = 0
for each in url:
    pic = requests.get(each, timeout = 10)
    picName = 'pictures/' + str(i) + '.jpg'
    fp = open(picName, 'wb')
    fp.write(pic.content)
    fp.close()
    i += 1

print('图片下载完成')

More crawler basic exercises—python crawler downloads Douban girl pictures For related articles, please pay attention to the PHP Chinese website!                                                         

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:Python-DjangoNext article:Python-Django