Home >Backend Development >Python Tutorial >How to set request headers for python crawler
When requesting web crawling, words such as "Sorry, Unable to Access" will appear in the output text information. This means that crawling is prohibited. This problem needs to be solved through the anti-crawling mechanism.
Headers are one of the ways to solve the problem of anti-crawling of requests. It is equivalent to entering the server itself of this web page and pretending that it is crawling data.
For anti-crawler web pages, you can set some header information to simulate a browser accessing the website.
headers
Google or Firefox browser, click on the web page: right click – Inspect; click More Tools – Development or tool; you can also directly F12. Then press Fn F5 to refresh the web page to display the elements
Some browsers click: right click->View elements, refresh
Related recommendations: "Python Video Tutorial》
Note: There are many contents in headers, the main ones commonly used are user-agent and host. They are displayed in the form of key pairs. If user-agent is If the dictionary key pair form is used as the content of headers, the reverse crawling can be successful and no other key pairs are needed; otherwise, more key pair forms under headers need to be added.
Settings
import urllib2 import urllib values={"username":"xxxx","password":"xxxxx"} data=urllib.urlencode(values) url= "https://ssl.gstatic.com/gb/images/v2_730ffe61.png" user_agent="Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1" referer='http://www.google.com/' headers={"User-Agent":user_agent,'Referer':referer} request=urllib2.Request(url,data,headers) response=urllib2.urlopen(request) print response.read()
The above is the detailed content of How to set request headers for python crawler. For more information, please follow other related articles on the PHP Chinese website!