search
HomeBackend DevelopmentPython TutorialPython crawler: HTTP protocol, Requests library
Python crawler: HTTP protocol, Requests libraryJun 23, 2017 pm 04:25 PM
httppythonrequestsprotocolreptile

HTTP protocol:

HTTP (Hypertext Transfer Protocol): Hypertext Transfer Protocol. URL is the Internet path for accessing resources through HTTP protocol. One URL corresponds to one data resource.

HTTP protocol operation on resources:

The Requests library provides all basic request methods of HTTP . Official introduction:

The 6 main methods of the Requests library:

Exceptions in the Requests library:

There are two important objects in the Requests library: Request and Response. The Request object supports multiple request methods; the Response object contains all the information returned by the server, as well as the requested Request information.

Attributes of the Response object:

Among them, r.encoding refers to: if it does not exist in the header charset, the encoding is considered to be ISO-8859-1.

r.raise_for_status() can directly know whether r.status_code is equal to 200.

Comparison between HTTP protocol and Requests library:

Crawling web pages General code framework:

1 try:2     r = requests.get(url,timeout = 30)3     r.raise_for_status()4     # 如果状态不是200,引发HTTPError异常5     r.encoding = r.apparent_encoding6     return r.text7 except:8     return '产生异常'

For example, to obtain information on the PMCAFF homepage:

 1 import requests 2  3 def getHtmlText(url): 4     try: 5         r = requests.get(url,timeout = 30) 6         r.raise_for_status() 7         r.encoding = r.apparent_encoding 8         return r.text 9     except:10         return '产生异常'11 12 if __name__ == '__main__':13     url = ''14     print(getHtmlText(url))

Crawl the web page General code framework: Operating environment: Mac, Python 3.6, PyCharm 2016.2

Reference: Chinese University MOOC course "Python Web Crawler and Information Extraction"

----- End -----

Author: Du Wangdan, WeChat public account: Du Wangdan, Internet product manager.

The above is the detailed content of Python crawler: HTTP protocol, Requests library. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
python中CURL和python requests的相互转换如何实现python中CURL和python requests的相互转换如何实现May 03, 2023 pm 12:49 PM

curl和Pythonrequests都是发送HTTP请求的强大工具。虽然curl是一种命令行工具,可让您直接从终端发送请求,但Python的请求库提供了一种更具编程性的方式来从Python代码中发送请求。将curl转换为Pythonrequestscurl命令的基本语法如下所示:curl[OPTIONS]URL将curl命令转换为Python请求时,我们需要将选项和URL转换为Python代码。这是一个示例curlPOST命令:curl-XPOSThttps://example.com/api

Python爬虫Requests库怎么使用Python爬虫Requests库怎么使用May 16, 2023 am 11:46 AM

1、安装requests库因为学习过程使用的是Python语言,需要提前安装Python,我安装的是Python3.8,可以通过命令python--version查看自己安装的Python版本,建议安装Python3.X以上的版本。安装好Python以后可以直接通过以下命令安装requests库。pipinstallrequestsPs:可以切换到国内的pip源,例如阿里、豆瓣,速度快为了演示功能,我这里使用nginx模拟了一个简单网站。下载好了以后,直接运行根目录下的nginx.exe程序就可

详细讲解Python之Seaborn(数据可视化)详细讲解Python之Seaborn(数据可视化)Apr 21, 2022 pm 06:08 PM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于Seaborn的相关问题,包括了数据可视化处理的散点图、折线图、条形图等等内容,下面一起来看一下,希望对大家有帮助。

Python如何使用Requests请求网页Python如何使用Requests请求网页Apr 25, 2023 am 09:29 AM

Requests继承了urllib2的所有特性。Requests支持HTTP连接保持和连接池,支持使用cookie保持会话,支持文件上传,支持自动确定响应内容的编码,支持国际化的URL和POST数据自动编码。安装方式利用pip安装$pipinstallrequestsGET请求基本GET请求(headers参数和parmas参数)1.最基本的GET请求可以直接用get方法'response=requests.get("http://www.baidu.com/"

详细了解Python进程池与进程锁详细了解Python进程池与进程锁May 10, 2022 pm 06:11 PM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于进程池与进程锁的相关问题,包括进程池的创建模块,进程池函数等等内容,下面一起来看一下,希望对大家有帮助。

归纳总结Python标准库归纳总结Python标准库May 03, 2022 am 09:00 AM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于标准库总结的相关问题,下面一起来看一下,希望对大家有帮助。

python requests post如何使用python requests post如何使用Apr 29, 2023 pm 04:52 PM

python模拟浏览器发送post请求importrequests格式request.postrequest.post(url,data,json,kwargs)#post请求格式request.get(url,params,kwargs)#对比get请求发送post请求传参分为表单(x-www-form-urlencoded)json(application/json)data参数支持字典格式和字符串格式,字典格式用json.dumps()方法把data转换为合法的json格式字符串次方法需要

分享10款高效的VSCode插件,总有一款能够惊艳到你!!分享10款高效的VSCode插件,总有一款能够惊艳到你!!Mar 09, 2021 am 10:15 AM

VS Code的确是一款非常热门、有强大用户基础的一款开发工具。本文给大家介绍一下10款高效、好用的插件,能够让原本单薄的VS Code如虎添翼,开发效率顿时提升到一个新的阶段。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Repo: How To Revive Teammates
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.