The vast majority of development platforms include java, c language, python (and even more). In the process of code development, Chinese is almost used. When the computer processes these things, it cannot recognize what you write. It will encode the data and convert it into a language that the computer can recognize and execute. If the compiled code is not decompiled, what we will see is a bunch of incomprehensible gibberish.
First of all, we need to figure out why garbled characters appear?
How does the server encode the data when receiving and sending data?
1. The server sends data
For sending data, the server follows the priority order of response.setCharacterEncoding—contentType—pageEncoding. The data is encoded.
2. The server receives data
There are three situations for receiving data. One is data submitted directly by the browser using the URL, and the other two are data submitted using the GET and POST methods of the form.
Because various WEB servers handle these three methods differently, we take Tomcat5.0 as an example.
①For data submitted by POST in the form
As long as response.setCharacterEncoding or contentType or pageEncoding is set to "utf-8" on the JSP page, after accepting There will be no Chinese garbled problems in the JSP/SERVLET of the data.
②For the data submitted by the URL and the data submitted by GET method in the form
It is not enough to just set the request.setCharacterEncoding parameter in the JSP/SERVLET that receives the data
Because in Tomcat5.0, ISO-8859-1 is used by default to re-encode (decode) the data submitted by the URL and the data submitted by the GET method in the form. To solve this problem:
The useBodyEncodingForURI or URIEncoding attribute should be set in the Connector tag of Tomcat's configuration file server. ) for unified recoding (decoding).
The useBodyEncodingForURI parameter indicates whether to use the request.setCharacterEncoding parameter to re-encode the data submitted by the URL and the data submitted by the GET method in the form. By default, this parameter is false;
The difference between URIEncoding and useBodyEncodingForURI is that URIEncoding uniformly recodes (decodes) the data of all GET requests, while useBodyEncodingForURI recodes the data based on the request.setCharacterEncoding parameter of the page that responds to the request. Encoding (decoding), different pages can have different re-encoding (decoding) encodings. Therefore, for data submitted by URL and data submitted by GET method in the form, you can modify the URIEncoding parameter to browser encoding or modify useBodyEncodingForURI to true, and set the request.setCharacterEncoding parameter to browser encoding in the JSP page where the data is obtained.
Regarding the garbled code problem when submitting JSP, we first need to figure out why the garbled code appears?
##Look at the JSP header file: <%@ page contentType="text/html;charset=UTF-8" language="java"%>
In this header file, there is also an encoding-related attribute: pageEncoding
First, let’s talk about the JSP/Servlet The role of several encodings.
In JSP/Servlet, there are mainly the following places where encoding can be set, pageEncoding="UTF-8", contentType="text/html;charset=UTF-8 ", request.setCharacterEncoding("UTF-8") and response.setCharacterEncoding("UTF-8"), the first two of which can only be used in JSP, and the latter two can be used in JSP and Servlet.
request.setCharacterEncoding("UTF-8") is used to set the encoding requested by the client for re-encoding on the server side.
This method is used to specify the encoding used when recoding (or decoding) the data sent by the browser.
response.setCharacterEncoding("UTF-8") is used to specify the encoding used to re-encode the server response on the client side.
This encoding is used when the server re-encodes the data before sending it to the browser.
The following is a summary of how to prevent Chinese garbled characters when using Tomcat7.0 as the WEB server.
#1. For the same application, it is best to use unified encoding. UTF-8 is recommended, and of course GBK is also acceptable.
2. Correctly set pageEncoding="UTF-8" of JSP
3. Set contentType="text/html;charset= in all JSP/Servlets UTF-8" or response.setCharacterEncoding("UTF-8"), thereby indirectly setting the browser encoding.
4. For get or url requests that are not submitted in a form, you can modify the default configuration of Tomcat. It is recommended to set the useBodyEncodingForURI parameter to true, or you can set the URIEncoding parameter to UTF-8 (it may affect other applications, so not recommended). Or use the following method to process when receiving data:
request.getParameter("userID"), get the value of userID
request.getParameter("userID" ).trim() remove the spaces on both sides of this value
request.getParameter("userID").trim().getBytes("ISO-8859-1")) use ISO for this String -8859-1 is encoded into a byte number
new String(request.getParameter("userID").trim().getBytes("ISO-8859-1"),"utf- 8") Pass the number of bytes just now into the string constructor to create a string object according to "utf-8" encoding.
5. Use URLEncoder method
Before passing parameters, use:
//Use the specified encoding mechanism to convert the string into application/x-www-form-urlencoded format
String username_encoder = URLEncoder.encode(username,"UTF-8");
After receiving the parameters, use:
//Use the specified encoding mechanism to decode the application/x-www-form-urlencoded string
String username_decoder = URLDecoder.decode(request.getParameter(" username"),"UTF-8");
The above is the detailed content of How to deal with Chinese garbled characters in jsp. For more information, please follow other related articles on the PHP Chinese website!