Home > Article > Backend Development > Can crawler technology crawl https?
Can crawler technology crawl https?
First of all, let’s understand what https is
https is HTTP SSL In short, the previous plaintext is encrypted and transmitted based on the HTTP transmission method. The information encryption method and secret key are determined before transmission. Even if it is captured or forged during transmission, it can ensure that the information is not leaked.
The essence of the crawler is to pretend to be a browser, send a request to the server, and participate in the entire process, so even https links can be crawled, but the premise is that the forged client has the correct SSL certificate.
Find the source of the error
When the crawler is running and an SSL error is prompted, it is usually because the local certificate or related SSL library is not installed correctly, and the server uses its own CA certificate, which is not certified by an authoritative organization.
Solving certificate exception issues
For CA certificate issues we can refer to the following centralized solutions:
1. Do not verify the CA certificate, but ignore security Warning
coding=utf-8import requests# 不验证CA证书则需要忽略安全警告方式一:import urllib3urllib3.disable_warnings()方式二:from requests.packages.urllib3.exceptions import InsecureRequestWarningrequests.packages.urllib3.disable_warnings(InsecureRequestWarning)r=requests.get(url=“https://www.baidu.com/”,verify=False)print r.elapsed.total_seconds()
2. Specify the certificate location or the folder containing the certificate (this folder is made by the OpenSSL tool)
coding=utf-8import requestsr=requests.get(url=“https://www.baidu.com/”,verify='/path/to/certfile')
The above is the detailed content of Can crawler technology crawl https?. For more information, please follow other related articles on the PHP Chinese website!