Home  >  Article  >  Backend Development  >  How to set request headers for python crawler

How to set request headers for python crawler

爱喝马黛茶的安东尼
爱喝马黛茶的安东尼Original
2019-06-20 14:30:382699browse

When requesting web crawling, words such as "Sorry, Unable to Access" will appear in the output text information. This means that crawling is prohibited. This problem needs to be solved through the anti-crawling mechanism.

Headers are one of the ways to solve the problem of anti-crawling of requests. It is equivalent to entering the server itself of this web page and pretending that it is crawling data.

For anti-crawler web pages, you can set some header information to simulate a browser accessing the website.

How to set request headers for python crawler

headers

Google or Firefox browser, click on the web page: right click – Inspect; click More Tools – Development or tool; you can also directly F12. Then press Fn F5 to refresh the web page to display the elements

Some browsers click: right click->View elements, refresh

Related recommendations: "Python Video Tutorial

How to set request headers for python crawler

Note: There are many contents in headers, the main ones commonly used are user-agent and host. They are displayed in the form of key pairs. If user-agent is If the dictionary key pair form is used as the content of headers, the reverse crawling can be successful and no other key pairs are needed; otherwise, more key pair forms under headers need to be added.

Settings

import urllib2
import urllib
values={"username":"xxxx","password":"xxxxx"}
data=urllib.urlencode(values)
url= "https://ssl.gstatic.com/gb/images/v2_730ffe61.png"
user_agent="Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1"
referer='http://www.google.com/'
headers={"User-Agent":user_agent,'Referer':referer}
request=urllib2.Request(url,data,headers)
response=urllib2.urlopen(request)
print response.read()

The above is the detailed content of How to set request headers for python crawler. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn