Home > Article > Backend Development > A thorough understanding of cookies, sessions, and tokens in one article
Development History
1. A long time ago, the Web was basically just browsing documents. Since it is browsing , as a server, there is no need to record who has browsed what documents in a certain period of time. Each request is a new HTTP protocol, which is a request plus a response. Especially I don’t need to remember who just sent an HTTP request. Every request Both requests are new to me. It’s a very exciting time.
2. However, with the rise of interactive web applications, such as online shopping websites, websites that require login, etc., we immediately face a problem, that is, to manage sessions, we must remember who logs in to the system. Who puts items in their shopping cart, that is to say, I have to distinguish each person, which is a big challenge, because HTTP requests are stateless, so the solution I came up with is to send everyone a session ID. (session id), to put it bluntly, it is a random string. Everyone receives it differently. Every time you send an HTTP request to me, send this string along with it so that I can distinguish it. Who is who?
3. Everyone is very happy, but the server is not happy. Everyone only needs to save their own session id, and the server needs to save everyone’s session id! If there are too many access servers, there will be thousands or even hundreds of thousands.
This is a huge overhead for the server and severely limits the server's expansion capabilities. For example, if I use two machines to form a cluster, and Xiao F logs in to the system through machine A, the session id will be saved. On machine A, what if Little F’s next request is forwarded to machine B? Machine B does not have the session id of little F.
Sometimes a little trick is used: session sticky, which means that Xiao F's request is always stuck on machine A, but this doesn't work. If machine A hangs up, it has to be transferred to machine B. .
Then we have to copy the session. Moving the session id between the two machines is almost exhausting.
Later, someone called Memcached came up with a trick: centrally store session IDs in one place, and all machines will access the data in this place. In this way, There is no need to copy, but it increases the possibility of a single point of failure. If the machine responsible for the session hangs up, everyone will have to log in again, and they will probably be scolded to death.
I also tried to put this single machine into a cluster to increase reliability, but no matter what, this small session is a heavy burden for me. .
4. So some people have been thinking, why should I save this abominable session? How good would it be to just let each client save it?
But if these session ids are not saved, how can I verify that the session id sent to me by the client is indeed generated by me? If we don't verify, we don't know whether they are legitimate logged-in users, and those with bad intentions can forge session IDs and do whatever they want.
Well, by the way, the key point is verification!
For example, Little F has logged in to the system, and I will send him a token, which contains Little F’s user id. The next time Little F requests access to me through HTTP again, I will send him a token. This token can be brought over through Http header.
But there is no essential difference between this and session ID. Anyone can forge it, so I have to think of some way to prevent others from forging it.
Then make a signature on the data. For example, I use the HMAC-SHA256 algorithm, add a key that only I know, make a signature on the data, and use this signature and the data as a token. Since others do not know the key, the token cannot be forged.
Related recommendations: "Python Video Tutorial"
I won’t save this token. When Xiao F sends me this token When I come over, I use the same HMAC-SHA256 algorithm and the same key to calculate the signature again on the data, and compare it with the signature in the token. If they are the same, I know that Xiao F has logged in and can I directly get the user ID of little F. If it is not the same, the data part must have been tampered with. I will tell the sender: Sorry, there is no authentication.
#The data in Token is saved in clear text (although I will use Base64 for encoding, but that is not encryption), it can still be seen by others, so I can't. Save sensitive information like passwords in it.
Of course, if a person's token is stolen by someone else, there is nothing I can do about it. I will also think that the thief is a legitimate user. This is actually the same as a person's session id being stolen by others.
In this way, I don’t save the session id. I just generate the token and then verify the token. I use my CPU computing time to obtain my session storage space!
The burden of session ID has been relieved. It can be said that I have nothing to do. My machine cluster can now easily expand horizontally. As the number of user visits increases, I can just add machines directly. This stateless feeling is so good!
Cookie
Cookie is a very specific thing. It refers to a kind of data that can be stored permanently in the browser. It is just a kind of data implemented by the browser. storage function.
The cookie is generated by the server and sent to the browser. The browser saves the cookie in kv form to a text file in a certain directory. The cookie will be sent to the server the next time the same website is requested. Since cookies are stored on the client, the browser has added some restrictions to ensure that cookies will not be used maliciously and will not occupy too much disk space, so the number of cookies for each domain is limited.
Session
session Literally, it is a session. This is similar to when you are talking to someone. How do you know that the person you are talking to is Zhang San and not Li Si? The other party must have certain characteristics (such as appearance) that indicate that he is Zhang San.
session is similar. The server needs to know who is currently sending the request to itself. In order to make this distinction, the server assigns a different "identity identifier" to each client. Then every time the client sends a request to the server, it brings this "identity identifier", and the server knows that the request comes from Who. As for how the client saves this "identity", there are many ways. For browser clients, everyone uses cookies by default.
The server uses session to temporarily save the user's information on the server. The session will be destroyed after the user leaves the website. This method of storing user information is more secure than cookies, but the session has a flaw: if the web server is load balanced, the session will be lost when the next operation request goes to another server.
Token
Token-based authentication can be seen everywhere in the Web field. In most Internet companies using Web API, tokens are the best way to handle authentication for multiple users.
The following features will allow you to use Token-based authentication in your program:
(1) Stateless and extensible
(2) Support mobile devices
(3) Cross-program call
(4) Security
Those big guys who use Token-based authentication, most of the APIs and Web applications you have seen All use tokens. For example Facebook, Twitter, Google, GitHub, etc.
The Origin of Token
Before introducing the principles and advantages of Token-based authentication, you might as well take a look at how previous authentication was done.
Server-based verification
We all know that the HTTP protocol is stateless. This statelessness means that the program needs to verify each request to identify the client's identity.
Before this, the program identified the request through the login information stored on the server. This method is generally accomplished by storing Session.
With the rise of the Web, applications, and mobile terminals, this verification method has gradually exposed problems. Especially when it comes to scalability.
Some issues exposed based on server authentication methods
(1) Session: Each time an authenticated user initiates a request, the server needs to create a record to store information. When more and more users send requests, the memory overhead will continue to increase.
(2) Scalability: Using Session to store login information in the memory of the server comes with scalability issues.
(3) CORS (Cross-Origin Resource Sharing): When we need to use data across multiple mobile devices, the sharing of cross-domain resources can be a headache. When using Ajax to crawl resources from another domain, requests may be blocked.
(4) CSRF (cross-site request forgery): When users visit bank websites, they are vulnerable to cross-site request forgery attacks and can be exploited to access other websites.
Among these issues, scalability is the most prominent. Therefore, we need to find a more effective method.
Token-based authentication principle
Token-based authentication is stateless, and we do not store user information in the server or Session.
This concept solves many problems when storing information on the server side:
NoSession means that your program can add or remove machines as needed without worrying about whether the user is logged in.
The process of Token-based authentication is as follows:
(1) The user sends a request through user name and password.
(2) Program verification.
(3) The program returns a signed token to the client.
(4) The client stores the token and uses it for each request.
(5) The server verifies the token and returns data.
Every request requires a token. The token should be sent in the HTTP header to ensure that HTTP requests are stateless. We also set the server property Access-Control-Allow-Origin:* to allow the server to accept requests from all domains.
It should be noted that when the ACAO header is marked (designating) *, certificates such as HTTP authentication, client SSL certificate and cookies must not be included.
Implementation ideas:
(1) User login verification. After successful verification, the Token is returned to the client.
(2) The client saves the data on the client after receiving it.
(3) The client carries the Token to the server every time it accesses the API.
(4) The server side uses filter filter verification. If the verification is successful, the request data will be returned, if the verification fails, an error code will be returned. After we authenticate the information in the program and obtain the token, we can do many things with this token. We can even create a permission-based token and pass it to third-party applications, which can obtain our data (only with the specific token we allow).
Advantages of Tokens
Stateless and scalable
Tokens stored on the client side It is stateless and can be extended. Based on this statelessness and no storage of session information, the load balancer can transfer user information from one service to other servers.
If we save the authenticated user's information in Session, each request requires the user to send authentication information to the authenticated server (called Session affinity). When the number of users is large, it may cause some congestion.
But don’t rush. After using tokens, these problems are easily solved, because tokens themselves hold the user's verification information.
Security
Sending a token in the request instead of a cookie can prevent CSRF (cross-site request forgery). Even if a cookie is used to store tokens on the client, the cookie is only a storage mechanism and not used for authentication. Not storing information in Session allows us to operate less sessions.
Token is time-limited, and users need to re-verify after a period of time. We do not necessarily need to wait until the token automatically expires. The token has a withdrawal operation. Through token revocacitation, a specific token or a group of tokens with the same authentication can be invalidated.
Extensibility
Tokens enable the creation of programs that share permissions with other programs. For example, you can connect a random social account with your own account (Fackbook or Twitter). When logging into Twitter through the service (we will Buffer this process), we can attach these Buffers to the Twitter data stream (we are allowing Buffer to post to our Twitter stream).
When using tokens, optional permissions can be provided to third-party applications. When users want another application to access their data, we can derive special permission tokens by building our own API.
Multi-platform cross-domain
Let’s talk about CORS (cross-domain resource sharing) in advance. When expanding applications and services, we need to intervene in various Various devices and applications.
Having our API just serve data, we can also make the design choice to serve assets from a CDN. This eliminates the issues that CORS brings up after we set a quick header configuration for our application.
As long as the user has a verified token, data and resources can be requested on any domain.
Access-Control-Allow-Origin: *
When creating a token based on standards, you can set some options. We will describe it in more detail in subsequent articles, but the standard usage will be reflected in JSON Web Tokens.
The latest programs and documentation are provided for JSON Web Tokens. It supports numerous languages. This means you can actually switch your authentication mechanism in the future.
The above is the detailed content of A thorough understanding of cookies, sessions, and tokens in one article. For more information, please follow other related articles on the PHP Chinese website!