Preface
In recent years, the Internet has undergone earth-shaking changes, especially the HTTP protocol that we have been accustomed to , is gradually being replaced by the HTTPS protocol. With the joint promotion of browsers, search engines, CA institutions, and large Internet companies, the Internet has ushered in the "HTTPS encryption era". HTTPS will completely replace HTTP as the transport in the next few years. The mainstream of the agreement.
After reading this article, I hope you can understand:
What are the problems with HTTP communication?
How HTTPS improves What are the problems with HTTP
What is the working principle of HTTPS
1. What is HTTPS
HTTPS establishes an SSL encryption layer on top of HTTP and encrypts the transmitted data. It is a part of the HTTP protocol. Safe version. It is now widely used for security-sensitive communications on the World Wide Web, such as transaction payments.
The main functions of HTTPS are:
(1) Encrypt data and establish an information security channel to ensure data security during transmission;
(2) Perform real identity authentication on the website server.
We often use HTTPS communication on the Web login page and shopping settlement interface. When using HTTPS communication, http:// is no longer used, but https:// is used instead. In addition, when the browser accesses a Web website with valid HTTPS communication, a locked mark will appear in the browser's address bar. The way HTTPS is displayed will vary depending on the browser.
2. Why HTTPS is needed
There may be security issues such as information theft or identity disguise in the HTTP protocol. Using the HTTPS communication mechanism can effectively prevent these problems. Next, let’s first understand the problems of the HTTP protocol:
Communication uses plain text (not encrypted), and the content may be eavesdropped
Since HTTP itself does not have the encryption function, it cannot encrypt the entire communication (the content of requests and responses communicated using the HTTP protocol). That is, HTTP messages are sent in clear text (referring to unencrypted messages).
The flaws of the HTTP plaintext protocol are an important cause of security problems such as data leakage, data tampering, traffic hijacking, and phishing attacks. The HTTP protocol cannot encrypt data, and all communication data "runs naked" in the network in plain text. Through network sniffing equipment and some technical means, the content of HTTP messages can be restored.
The integrity of the message cannot be proven, so it may be tampered with.
The so-called integrity refers to the accuracy of the information. Failure to demonstrate its completeness usually means that it cannot be judged whether the information is accurate. Since the HTTP protocol cannot prove the integrity of the communication messages, there is no way to know even if the content of the request or response has been tampered with during the period after the request or response is sent until it is received by the other party. In other words, there is no way to confirm that the request/response sent and the request/response received are the same.
Does not verify the identity of the communicating party, so it is possible to encounter masquerading
Requests and responses in the HTTP protocol do not confirm the communicating party. During HTTP protocol communication, since there are no processing steps to confirm the communicating party, anyone can initiate a request. In addition, as long as the server receives the request, it will return a response no matter who the other party is (but only if the IP address and port number of the sender are not restricted by the Web server)
HTTP protocol cannot be verified As for the identity of the communicating party, anyone can forge a fake server to deceive users, achieving "phishing fraud" that cannot be detected by users.
Looking back at the HTTPS protocol, it has the following advantages over the HTTP protocol (details will be introduced below):
Data privacy: the content is symmetrically encrypted, and each connection generates a unique encryption Key
Data Integrity: Content transmission undergoes integrity verification
Identity Authentication: A third party cannot forge the server (client) identity
3. How does HTTPS solve the above problems of HTTP?
HTTPS is not a new protocol at the application layer. Only the HTTP communication interface part is replaced by SSL and TLS protocols.
Usually, HTTP communicates directly with TCP. When using SSL, it evolves to communicate with SSL first, and then SSL communicates with TCP. In short, the so-called HTTPS is actually HTTP wrapped in the shell of the SSL protocol.
After adopting SSL, HTTP has the encryption, certificate and integrity protection functions of HTTPS. That is to say, HTTP plus encryption processing, authentication and integrity protection is HTTPS.
The main functions of the HTTPS protocol basically rely on the TLS/SSL protocol. The function implementation of TLS/SSL mainly relies on three types of basic algorithms: hash function, symmetric encryption and asymmetric encryption, which uses asymmetric encryption. Implement identity authentication and key negotiation. The symmetric encryption algorithm uses the negotiated key to encrypt data and verifies the integrity of the information based on the hash function.
1. Solve the problem that the content may be eavesdropped - encryption
Method 1. Symmetric encryption
This method uses the same key for encryption and decryption. Keys are used for encryption and decryption. The password cannot be decrypted without the key, and conversely, anyone with the key can decrypt it.
When encrypting using symmetric encryption, the key must also be sent to the other party. But how can it be transferred safely? When keys are forwarded over the Internet, if the communication is eavesdropped then the keys may fall into the hands of an attacker and the purpose of encryption will be lost. You also have to find a way to keep the received key securely.
Method 2. Asymmetric encryption
Public key encryption uses a pair of asymmetric keys. One is called the private key and the other is called the public key. As the name suggests, the private key cannot be known to anyone else, while the public key can be freely released and available to anyone.
Using public key encryption, the party sending the ciphertext uses the other party's public key for encryption. After the other party receives the encrypted information, it uses its own private key to decrypt it. In this way, there is no need to send the private key used for decryption, and there is no need to worry about the key being eavesdropped and stolen by an attacker.
The characteristic of asymmetric encryption is that information is transmitted one-to-many. The server only needs to maintain one private key to carry out encrypted communication with multiple clients.
This method has the following disadvantages:
The public key is public, so after intercepting the information encrypted by the private key, the hacker can use the public key to decrypt and obtain the content;
The public key does not contain the server's information. The use of asymmetric encryption algorithms cannot ensure the legitimacy of the server's identity. There is a risk of man-in-the-middle attack. The public key sent by the server to the client may be intercepted and tampered with by the middleman during the transmission process;
Using asymmetric encryption requires a certain amount of time in the data encryption and decryption process, which reduces data transmission efficiency;
Method 3. Symmetric encryption and asymmetric encryption (HTTPS uses this method)
The advantage of using a symmetric key is that the decryption efficiency is relatively fast. The advantage of using an asymmetric key is that the transmitted content cannot be cracked, because even if you intercept the data, it cannot be cracked without the corresponding private key. contents. For example, you grab a safe, but you can't open the safe without the safe's key. Then we will combine symmetric encryption and asymmetric encryption, make full use of their respective advantages, use asymmetric encryption in the key exchange stage, and use symmetric encryption in the subsequent communication and message exchange stages.
The specific method is: the party sending the ciphertext uses the other party's public key to encrypt the "symmetric key", and then the other party uses its own private key to decrypt and obtain the "symmetric key". This ensures Communication is carried out using symmetric encryption on the premise that the keys exchanged are secure. Therefore, HTTPS uses a hybrid encryption mechanism that uses both symmetric encryption and asymmetric encryption.
2. Solve the problem of possible tampering of messages - digital signature
During the network transmission process, many intermediate nodes need to be passed through. Although the data cannot be decrypted, it may be tampered with. How to verify it? What about data integrity? ----Verify digital signature.
Digital signatures have two functions:
It can confirm that the message is indeed signed and sent by the sender, because others cannot fake the sender's signature.
Digital signatures can determine the integrity of the message and prove whether the data has not been tampered with.
How to generate a digital signature:
Use a Hash function to generate a message digest of a piece of text, then encrypt it with the sender's private key to generate a digital signature, and The original text is sent to the recipient together. The next step is the process of the recipient verifying the digital signature.
Verification digital signature process:
#The receiver can only decrypt the encrypted digest information with the sender's public key, and then use the HASH function to The obtained original text generates a summary information, which is compared with the summary information obtained in the previous step. If they are the same, it means that the received information is complete and has not been modified during the transmission process. Otherwise, it means that the information has been modified, so the digital signature can verify the integrity of the information.
Assume that the message passing occurs between Kobe and James. James sends the message to Kobe together with the digital signature. After receiving the message, Kobe can verify that the received message was sent by James by verifying the digital signature. Of course, the premise of this process is that Kobe knows James's public key. The crux of the problem is that, like the message itself, the public key cannot be sent directly to Kobe over an unsecured network, or how to prove that the public key obtained belongs to James.
At this time, it is necessary to introduce the Certificate Authority (CA). There are not many CAs. The Kobe client has built-in certificates of all trusted CAs. The CA generates a certificate after digitally signing James's public key (and other information).
3. Solve the problem that the identity of the communicating party may be disguised - digital certificate
The digital certificate certification authority is in the position of a third-party organization that is trustworthy for both the client and the server.
Let’s introduce the business process of the digital certificate certification authority:
The server operator submits the public key, organizational information, and personal information (domain name) to the third-party agency CA ) and other information and apply for certification;
CA verifies the authenticity of the information provided by the applicant through various means such as online and offline, such as whether the organization exists, whether the enterprise is legal, whether it has ownership of the domain name, etc.;
If the information is approved, The CA will issue a certification document-certificate to the applicant. The certificate contains the following information: the applicant's public key, the applicant's organizational information and personal information, the information of the issuing authority CA, the validity time, the certificate serial number and other information in plain text, and also contains a signature. The signature generation algorithm: first, use a hash function to calculate the information digest of the public plaintext information, and then use the CA's private key to encrypt the information digest, and the ciphertext is the signature;
Client sends the message to the server When the Server makes a request, the Server returns the certificate file;
The client reads the relevant plaintext information in the certificate, uses the same hash function to calculate the information digest, and then uses the public key of the corresponding CA to decrypt the signature The data is compared with the information digest of the certificate. If they are consistent, the legitimacy of the certificate can be confirmed, that is, the server's public key is trustworthy.
The client will also verify the domain name information, validity time and other information related to the certificate; the client will have built-in trust CA certificate information (including public key). If the CA is not trusted, the corresponding CA will not be found. Certificate, the certificate will also be judged as illegal.
4. HTTPS workflow
1.Client initiates an HTTPS request (such as https://juejin.im/user). According to RFC2818, Client Know the 443 (default) port of the server you need to connect to.
2. Server returns the pre-configured public key certificate to the client.
3. Client verifies the public key certificate: for example, whether it is within the validity period, whether the purpose of the certificate matches the site requested by the Client, whether it is in the CRL revocation list, and whether its upper-level certificate is valid. This is A recursive process until the root certificate is verified (the Root certificate built into the operating system or the Root certificate built into the Client). If the verification passes, continue, otherwise a warning message will be displayed.
4.Client uses a pseudo-random number generator to generate a symmetric key used for encryption, then encrypts the symmetric key with the public key of the certificate and sends it to the Server.
5. Server uses its own private key to decrypt the message and obtain the symmetric key. At this point, both Client and Server hold the same symmetric key.
6. Server uses a symmetric key to encrypt "plain text content A" and sends it to the Client.
7.Client uses the symmetric key to decrypt the ciphertext of the response and obtains "plaintext content A".
8.Client initiates an HTTPS request again, uses the symmetric key to encrypt the requested "plaintext content B", and then the server uses the symmetric key to decrypt the ciphertext and obtains "plaintext content B".
5. The difference between HTTP and HTTPS
HTTP is a plain text transmission protocol, and HTTPS protocol is a network protocol built from the SSL HTTP protocol that can perform encrypted transmission and identity authentication. , more secure than HTTP protocol.
Regarding security, the simplest metaphor to describe the relationship between the two is that trucks transport goods. The trucks under HTTP are open-top and the goods are exposed. And https is a closed container truck, which naturally improves security a lot.
HTTPS is more secure than HTTP, more friendly to search engines, and is conducive to SEO. Google and Baidu prefer to index HTTPS web pages;
HTTPS requires an SSL certificate, while HTTP does not;
HTTPS standard port 443, HTTP standard port 80;
HTTPS is based on the transport layer, HTTP is based on the application layer;
HTTPS displays a green security lock in the browser, HTTP does not display;
6. Why don’t all websites use HTTPS
Since HTTPS is so safe and reliable, why don’t all Web websites use HTTPS?
First of all, many people still think that there is a threshold for HTTPS implementation. This threshold lies in the need for an SSL certificate issued by an authoritative CA. From certificate selection, purchase to deployment, the traditional model is more time-consuming and labor-intensive.
Secondly, HTTPS is generally believed to have higher performance consumption than HTTP because encrypted communication consumes more CPU and memory resources than plain text communication. If each communication is encrypted, it will consume a lot of resources, and when spread across a single computer, the number of requests that can be processed will inevitably be reduced. But this is not the case. Users can solve this problem by optimizing performance and deploying certificates in SLB or CDN. To give a practical example, during the "Double Eleven" period, Taobao and Tmall using HTTPS for the entire site still ensured smooth access, browsing, transactions and other operations on the website and mobile terminals. Through testing, it was found that the performance of many optimized pages is the same as that of HTTP or even slightly improved, so HTTPS is not actually slow after optimization.
In addition, wanting to save the cost of purchasing certificates is also one of the reasons. To enable HTTPS communication, certificates are essential. The certificate used must be purchased from a certification authority (CA).
The last thing is safety awareness. Compared with China, the security awareness and technology application of the foreign Internet industry are relatively mature, and the HTTPS deployment trend is jointly promoted by society, enterprises, and governments.
The above is the detailed content of Why is HTTPS more secure than HTTP?. For more information, please follow other related articles on the PHP Chinese website!