Python gets the status code of the HTTP request (200, 404, etc.) without accessing the entire page source code, which would be a waste of resources:
输入:segmentfault.com 输出:200
输入:segmentfault.com/nonexistant 输出:404
ringa_lee2017-06-28 09:27:31
Reference article: List of practical Python scripts
http not only has the get
method (requesting the header
+body
), but also the head
method, which only requests the header
.
import httplib
def get_status_code(host, path="/"):
""" This function retreives the status code of a website by requesting
HEAD data from the host. This means that it only requests the headers.
If the host cannot be reached or something else goes wrong, it returns
None instead.
"""
try:
conn = httplib.HTTPConnection(host)
conn.request("HEAD", path)
return conn.getresponse().status
except StandardError:
return None
print get_status_code("segmentfault.com") # prints 200
print get_status_code("segmentfault.com", "/nonexistant") # prints 404
怪我咯2017-06-28 09:27:31
You use get
to request the entire head
+body
. You can try the head
method to access the header directly!
import requests
html = requests.head('http://segmentfault.com') # 用head方法去请求资源头部
print html.status_code # 状态码
html = requests.head('/nonexistant') # 用head方法去请求资源头部
print html.status_code # 状态码
# 输出:
200
404