Home  >  Article  >  Backend Development  >  Python crawler: HTTP protocol, Requests library

Python crawler: HTTP protocol, Requests library

巴扎黑
巴扎黑Original
2017-06-23 16:25:041442browse

HTTP protocol:

HTTP (Hypertext Transfer Protocol): Hypertext Transfer Protocol. URL is the Internet path for accessing resources through HTTP protocol. One URL corresponds to one data resource.

HTTP protocol operation on resources:

The Requests library provides all basic request methods of HTTP . Official introduction:

The 6 main methods of the Requests library:

Exceptions in the Requests library:

There are two important objects in the Requests library: Request and Response. The Request object supports multiple request methods; the Response object contains all the information returned by the server, as well as the requested Request information.

Attributes of the Response object:

Among them, r.encoding refers to: if it does not exist in the header charset, the encoding is considered to be ISO-8859-1.

r.raise_for_status() can directly know whether r.status_code is equal to 200.

Comparison between HTTP protocol and Requests library:

Crawling web pages General code framework:

1 try:2     r = requests.get(url,timeout = 30)3     r.raise_for_status()4     # 如果状态不是200,引发HTTPError异常5     r.encoding = r.apparent_encoding6     return r.text7 except:8     return '产生异常'

For example, to obtain information on the PMCAFF homepage:

 1 import requests 2  3 def getHtmlText(url): 4     try: 5         r = requests.get(url,timeout = 30) 6         r.raise_for_status() 7         r.encoding = r.apparent_encoding 8         return r.text 9     except:10         return '产生异常'11 12 if __name__ == '__main__':13     url = ''14     print(getHtmlText(url))

Crawl the web page General code framework: Operating environment: Mac, Python 3.6, PyCharm 2016.2

Reference: Chinese University MOOC course "Python Web Crawler and Information Extraction"

----- End -----

Author: Du Wangdan, WeChat public account: Du Wangdan, Internet product manager.

The above is the detailed content of Python crawler: HTTP protocol, Requests library. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn