Home  >  Article  >  Backend Development  >  What is the request library crawler? how to use? (explanation with examples)

What is the request library crawler? how to use? (explanation with examples)

青灯夜游
青灯夜游forward
2018-10-22 16:04:333465browse

What is the request library crawler? how to use? This article brings you an introduction to what the request library crawler is? how to use? Explained through examples. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

Use request.get() to return the response object to crawl out a single JD page information

import requests
url = "https://item.jd.com/21508090549.html"
try:
	r = requests.get(url)
	r.raise_for_status()          #检验http状态码是否为200
	r.encoding = r.apparent_encoding#识别页面正确编码
	print(r.text[:1000])
except:
	print("爬取失败")

If you use the above code to access the Amazon page, you will crawl the error message because of the Amazon robots protocol It is defined that non-mainstream browsers are not allowed to access the page, so the 'user-agent' in the request access information must be set

import requests
url = "https://www.amazon.cn/gp/product/B01M8L5Z3Y"
try:
	#kv = {'user-agent':'Mozilla/5.0'}#假装访问浏览器为Mozilla/5.0
	r = requests.get(url)
	r.raise_for_status()#检验http状态码是否为200
	r.encoding = r.apparent_encoding#识别页面正确编码
	print(r.text[:1000])
except:
	print("爬取失败")

Use code to imitate Baidu/360 search

Need to be in the url Add the parameter Baidu's 'wd=..'/360 is 'q=...'

import requests
url = "http://www.baidu.com/s"
keyword="python"
try:
	kv = {'wd':key}
	r = requests.get(url,params=kv)
	print(r.request.url)
	r.raise_for_status()#检验http状态码是否为200
	r.encoding = r.apparent_encoding#识别页面正确编码
	print(len(r.text))#由于信息量可能特别大,这里只输出长度
except:
	print("爬取失败")

Crawl and save the picture

import requests
import os
url = "https://timgsa.baidu.com/timg?image&quality=80&size=b9999_10000&sec=1540201265460&di=64720dcd3bbc24b7d855454028173deb&imgtype=0&src=http%3A%2F%2Fpic34.photophoto.cn%2F20150105%2F0005018358919011_b.jpg"
root = "D://pics//"
path = root + url.split('.')[-2]+'.'+url.split('.')[-1]#得到文件名,生成文件路径
if not os.path.exists(root):
	os.mkdir(root)#如果目录不存在,创建目录
if not os.path.exists(path):#如果文件不存在爬取文件并保存
	r = requests.get(url)
	with open(path,'wb') as f:#打开文件对象
		f.write(r.content)#写入爬取的图片
		f.close()
		print("文件保存成功")
else:
	print("文件已存在")

Summary: The above is the entire content of this article , I hope it can be helpful to everyone’s study. For more related tutorials, please visit C#Video Tutorial!

The above is the detailed content of What is the request library crawler? how to use? (explanation with examples). For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:csdn.net. If there is any infringement, please contact admin@php.cn delete