HTTP protocol:
HTTP (Hypertext Transfer Protocol): Hypertext Transfer Protocol. URL is the Internet path for accessing resources through HTTP protocol. One URL corresponds to one data resource.
HTTP protocol operation on resources:
The Requests library provides all basic request methods of HTTP . Official introduction:
The 6 main methods of the Requests library:
Exceptions in the Requests library:
There are two important objects in the Requests library: Request and Response. The Request object supports multiple request methods; the Response object contains all the information returned by the server, as well as the requested Request information.
Attributes of the Response object:
Among them, r.encoding refers to: if it does not exist in the header charset, the encoding is considered to be ISO-8859-1.
r.raise_for_status() can directly know whether r.status_code is equal to 200.
Comparison between HTTP protocol and Requests library:
Crawling web pages General code framework:
1 try:2 r = requests.get(url,timeout = 30)3 r.raise_for_status()4 # 如果状态不是200,引发HTTPError异常5 r.encoding = r.apparent_encoding6 return r.text7 except:8 return '产生异常'
For example, to obtain information on the PMCAFF homepage:
1 import requests 2 3 def getHtmlText(url): 4 try: 5 r = requests.get(url,timeout = 30) 6 r.raise_for_status() 7 r.encoding = r.apparent_encoding 8 return r.text 9 except:10 return '产生异常'11 12 if __name__ == '__main__':13 url = ''14 print(getHtmlText(url))
Crawl the web page General code framework: Operating environment: Mac, Python 3.6, PyCharm 2016.2
Reference: Chinese University MOOC course "Python Web Crawler and Information Extraction"
----- End -----
Author: Du Wangdan, WeChat public account: Du Wangdan, Internet product manager.
The above is the detailed content of Python crawler: HTTP protocol, Requests library. For more information, please follow other related articles on the PHP Chinese website!

curl和Pythonrequests都是发送HTTP请求的强大工具。虽然curl是一种命令行工具,可让您直接从终端发送请求,但Python的请求库提供了一种更具编程性的方式来从Python代码中发送请求。将curl转换为Pythonrequestscurl命令的基本语法如下所示:curl[OPTIONS]URL将curl命令转换为Python请求时,我们需要将选项和URL转换为Python代码。这是一个示例curlPOST命令:curl-XPOSThttps://example.com/api

1、安装requests库因为学习过程使用的是Python语言,需要提前安装Python,我安装的是Python3.8,可以通过命令python--version查看自己安装的Python版本,建议安装Python3.X以上的版本。安装好Python以后可以直接通过以下命令安装requests库。pipinstallrequestsPs:可以切换到国内的pip源,例如阿里、豆瓣,速度快为了演示功能,我这里使用nginx模拟了一个简单网站。下载好了以后,直接运行根目录下的nginx.exe程序就可

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于Seaborn的相关问题,包括了数据可视化处理的散点图、折线图、条形图等等内容,下面一起来看一下,希望对大家有帮助。

Requests继承了urllib2的所有特性。Requests支持HTTP连接保持和连接池,支持使用cookie保持会话,支持文件上传,支持自动确定响应内容的编码,支持国际化的URL和POST数据自动编码。安装方式利用pip安装$pipinstallrequestsGET请求基本GET请求(headers参数和parmas参数)1.最基本的GET请求可以直接用get方法'response=requests.get("http://www.baidu.com/"

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于进程池与进程锁的相关问题,包括进程池的创建模块,进程池函数等等内容,下面一起来看一下,希望对大家有帮助。

python模拟浏览器发送post请求importrequests格式request.postrequest.post(url,data,json,kwargs)#post请求格式request.get(url,params,kwargs)#对比get请求发送post请求传参分为表单(x-www-form-urlencoded)json(application/json)data参数支持字典格式和字符串格式,字典格式用json.dumps()方法把data转换为合法的json格式字符串次方法需要

VS Code的确是一款非常热门、有强大用户基础的一款开发工具。本文给大家介绍一下10款高效、好用的插件,能够让原本单薄的VS Code如虎添翼,开发效率顿时提升到一个新的阶段。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Dreamweaver Mac version
Visual web development tools

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.
