Web crawler - How to crawl the pictures in the Blog Park blog using python?

Question

I wrote a small piece of code to crawl the pictures in the Blog Park blog. This code is effective for some links, but some links report errors as soon as they are crawled. What is the reason? {code...} As shown in the figure, the image can be crawled correctly. If the url is changed to {code...}, an error will be reported immediately. Please solve it, thank you!

我想大声告诉你 · Answer

The error message is already very obvious. If you look at the source code of the web page, the first image matched is in GIF format, and it is still a relative path, so you cannot download it, so it prompts IOerror, even if you have downloaded it. , because you specified the format as JPG, you cannot open it. So all you need to do is judge and filter

for imgurl in imglist:
    if "gif" not in imgurl:
        urllib.urlretrieve(imgurl, '%s.jpg' % x)
        x += 1

Look at what I added. Of course, this is just the simplest judgment, but it can ensure that your second program will not report an error, and it also gives you an idea!

Web crawler - How to crawl the pictures in the Blog Park blog using python?

reply all(1)I'll reply