Home >Backend Development >Python Tutorial >Implementation of python crawler web page login
Free recommendation: python video tutorial
I believe that when you write a python crawler, you will crawl the website. I encounter some login problems, such as entering a verification code when logging in, or encountering image dragging and other verifications when logging in. How to solve such problems? Generally there are two options.
Use cookies to log in
We can log in using cookies, first get the browser cookie, and then use the requests library to directly log in to the cookie. The server will think you are a real logged-in user, so just It will return you a logged-in status. This method is very useful. Basically, most websites that require verification codes to log in can be solved through cookie login.
#! -*- encoding:utf-8 -*- import requests import random import requests.adapters # 要访问的目标页面 targetUrlList = [ "https://httpbin.org/ip", "https://httpbin.org/headers", "https://httpbin.org/user-agent", ] # 代理服务器 proxyHost = "t.16yun.cn" proxyPort = "31111" # 代理隧道验证信息 proxyUser = "username" proxyPass = "password" proxyMeta = "http://%(user)s:%(pass)s@%(host)s:%(port)s" % { "host": proxyHost, "port": proxyPort, "user": proxyUser, "pass": proxyPass, } # 设置 http和https访问都是用HTTP代理 proxies = { "http": proxyMeta, "https": proxyMeta, } # 访问三次网站,使用相同的Session(keep-alive),均能够保持相同的外网IP s = requests.session() # 设置cookie cookie_dict = {"JSESSION":"123456789"} cookies = requests.utils.cookiejar_from_dict(cookie_dict, cookiejar=None, overwrite=True) s.cookies = cookies for i in range(3): for url in targetUrlList: r = s.get(url, proxies=proxies) print r.text 若存在验证码,此时采用resp**e = requests_session.post(url=url_login, data=data)是不行的,做法应该如下: resp**e_captcha = requests_session.get(url=url_login, cookies=cookies)resp**e1 = requests.get(url_login) # 未登陆resp**e2 = requests_session.get(url_login) # 已登陆,因为之前拿到了Resp**e Cookie!resp**e3 = requests_session.get(url_results) # 已登陆,因为之前拿到了Resp**e Cookie!
Simulated login
I have to say an old saying here, the ancestors planted trees, and the descendants enjoy the shade. At that time, I wanted to read the article of Zhihu Yanxuan, but I was stuck on the login. Unexpectedly, after searching, I found a library for simulating login, which is very good. Yes, but in line with the principle of not sharing good things to prevent harmony, I won’t talk about it here.
The specific idea is to simulate login through requests, then return the verification code, and then pass in the verification code to successfully log in.
The above is the detailed content of Implementation of python crawler web page login. For more information, please follow other related articles on the PHP Chinese website!