javaTutorial

JAVA WEB notes--Chinese garbled characters

巴扎黑

Jun 26, 2017 am 11:11 AM

webChineseGarbled charactersnotes

JAVA WEB Garbled Code Problem Analysis

Cause of Garbled Code

In the process of Java Web development, we often encounter the problem of garbled code. The reason for garbled code can be summarized as character encoding. Does not match the decoding method.

Since the reason for garbled characters is that the character encoding and decoding methods do not match, then why do we have to encode the characters? Is it okay not to encode them? This is because the basic unit of storing data in a computer is 1 byte, that is, 8 bits, so the maximum number of characters it can express is 2⁸=256, and in our real society There are far more characters (Chinese characters, English, other characters, etc.) than this number, so in order to solve the conflict between characters and bytes, characters must be encoded before they can be stored in the computer.

Encoding and decoding

　Common encoding methods in computers include ASCII, ISO-8859-1, GB2312, UTF-16, and UTF-8. .

ASCII code is represented by the lower 7 bits of a byte, so the maximum number of characters that can be expressed is 2⁷=128. ISO-8859-1 is an extension of the ISO organization based on ASCII code. It is compatible with ASCII code and covers most Western European characters. ISO8859-1 uses one byte to represent, so it can express up to 256 characters. GB2312 uses double-byte encoding. The encoding range is A1-F7, where A1-A9 is the symbol area and B0-F7 is the Chinese character area, containing 6763 Chinese characters. GBK is to expand the GB2312 encoding and add more Chinese characters. There are 21,003 Chinese characters that can be expressed. UTF-16 uses a fixed-length encoding method. No matter what character is represented, it is represented by 2 bytes. This is also the storage format of characters in JAVA memory. Contrary to UTF-16, UTF-8 uses a variable-length encoding method, and different types of characters can be composed of 1-6 bytes.

Let’s take a look at the encoding of different encoding methods in the computer using the string "Hyuuga Hinata", as shown below.

Garbled code analysis and solution

　For the garbled code problem in JAVA WEB, We divide the garbled characters caused by the request and the garbled characters caused by the response. For different garbled characters, we need to analyze the causes of the garbled characters, that is, what is the character encoding method and what is the decoding method.

For the garbled characters caused by the request, we need to analyze the Http request and check its encoding method. Since the HTTP request is divided into Get request and Post request, we will discuss them separately next.

For Get request, it is the default request method of the browser, and the submission method when the form is set to "Get" when submitting. We check the specific content through Firefox browser as follows:

The address bar is:

The request content is:

Through the above request, we can see that the query string in the GET request is stored in the request line and sent to the WEB server , through the "Hyuga Hinata" encoding we can see that the encoding method used by the browser for this string is "UTF-8".

Looking at the server code, we can see garbled characters (as shown below). This is because the server decodes the data after receiving the string encoding by default using ISO-8859-1. , so the encoding and decoding methods are not unified.

The solution is as follows:

First get the string user before decoding encoding, and then specify the encoding method of the string, as shown below:

## The solution diagram is as follows:

　In the process of Java web development, we pass parameters in hyperlinks and often encounter Chinese situations. In this case, we need to encode Chinese, we can set it to UTF-8, and the decoding scheme is the same as above.

<a href="${pageContext.request.contextPath}/Test?user=<%=URLEncoder.encode("日向雏田", "UTF-8")%>">点击</a>

　For Post request, it is the submission method when the form submission is set to "Post". We use the Firefox browser to view its specific content as follows:

The address bar and its page are:

The post request content is:

We can know from the above picture that in the post request , put the request content directly in the request body and send it to the web server, and the encoding method is "utf-8".

In this response Servlet, the doPost method body is as follows:

public void doPost(HttpServletRequest request, HttpServletResponse response)
			throws ServletException, IOException {
		String user=request.getParameter("user");
		System.out.println(user);//输出为æ¥åéç°
	}

　The reason for the garbled characters here is still when the code getParameter ("user"), the web server uses the default decoding scheme "ISO-8859-1" for decoding, resulting in encoding and decoding schemes I disagree, the solution can be to use the get request garbled solution, but there is a simpler solution, directly specify the encoding/decoding scheme of the method body as "utf-8". The plan is as follows.

public void doPost(HttpServletRequest request, HttpServletResponse response)
			throws ServletException, IOException {
		response.setCharacterEncoding("utf-8");　　//设置请求体的编码/解码方案为UTF-8 但是请求行的编码解码方案不会受影响
		String user=request.getParameter("user");
		System.out.println(user);　　　　　　　　　　//输出为日向雏田
	}

The above analysis of the garbled code caused by the request is completed.

In the garbled code caused by the impact, the web server will write the response content into the response body and return it to the client without involving the status line. For example, if "HelloWorld" is output to the browser, the response is as shown in the figure below.

We have to involve four methods for the garbled code caused by the response, as follows:

response.setHeader("Content-Type", "text/html;cahrset=utf-8");//设置发送到客户端的响应的内容类型和响应内容的编码类型（响应体的编码类型）
response.setCharacterEncoding("utf-8");//设置响应体的编码类型
response.getWriter();　　　　　　　　　　　//获取响应的输出字符流　
response.getOutputStream();　　　　　　　 //获取响应的输出字节流

　For setting the response body Encoding type, such as response.setHeader("Content-Type", "text/html;cahrset=utf-8"); and response.setCharacterEncoding("utf-8"); The encoding methods set by these two methods are equivalent. If the encoding method of the response body is not set, the default is ISO-8859-1, and the subsequent encoding method of the response body characters will iterate the previous encoding method. Both of these methods are valid before the getWriter method, and the method of setting encoding in the getWriter method will be invalid.

But these two methods are a little different, that is, the browser will automatically use the setHeader("Content-Type", "text/html;cahrset=utf-8") method. The encoding method of the response body is decoded, and the setCharacterEncoding() method is not used by all browsers to decode. The two methods are tested below, and the results are as follows:

	public void doGet(HttpServletRequest request, HttpServletResponse response)
			throws ServletException, IOException {
		response.setHeader("Content-Type", "text/html;charset=utf-8");
		response.getWriter().write("日向雏田");
	}

public void doGet(HttpServletRequest request, HttpServletResponse response)
			throws ServletException, IOException {
		response.setCharacterEncoding("utf-8");
		response.getWriter().write("日向雏田");
	}

　　从上面可以看到第一个方法对于浏览器来说，支持的较好，提倡采用第一种方法设置响应体的字符编码方式。

　　对于获取响应字符输出流的方法，如果在此之前没有设置响应体的编码方式，那么默认为null，即ISO-8859-1方式进行编码。而且后面设置的编码方式会覆盖前面设置的编码方式。在getWriter（）方法之后设置的编码无效。

　　对于获取响应输出字节流，我们在输出字符串时，我们需要设置字符串的编码方式如果没有那么默认ISO-8859-1。

　　对于前面2个输出流，由于只有一个输出缓存，所以这两个方法互斥。

　　以上，为了保证响应无乱码，需要保证字符编码和解码方法的统一，方案如下：

	public void doGet(HttpServletRequest request, HttpServletResponse response)
			throws ServletException, IOException {
//	方案1
//		response.setHeader("Content-Type", "text/html;charset=utf-8");
//		response.getWriter().write("日向雏田");
//	方案2
//		response.getOutputStream().write("日向雏田".getBytes("UTF-8"));
//	方案1，2互斥
	}

　　此外在Java web开发过程中，我们还会遇到当进行文件下载时，中文文件名导致的问题，如下图所示：

	public void doGet(HttpServletRequest request, HttpServletResponse response)
			throws ServletException, IOException {
		String realPath=this.getServletContext().getRealPath("/src/日向雏田.jpg");
		String fileName=realPath.substring(realPath.lastIndexOf(&#39;\\&#39;)+1);
		response.setHeader("content-disposition", "attachment;filename="+fileName);
		InputStream is=new FileInputStream(new File(realPath));
		OutputStream os=response.getOutputStream();
		byte[] buff=new byte[1024];
		int len=0;
		while((len=is.read(buff))>0){
			os.write(buff, 0, len);
		}
		os.close();
		is.close();
	}

　　采用火狐浏览器进行测试，查看页面效果，及其响应结果如下：

　　经过查看响应头分析，下载文件名存放在响应头中，且对于中文文字没有采用UTF-8、UTF-16、GBK等等能识别中文的编码，那么对于中文文件名导致采用哪种编码方式呢？查看REF 7578得知，在此处采用ASCII编码，但是REF规定，如果不可避免的要使用非ASCII码的字符，程序员应该均匀的使用UTF-8，来最小化交互操作的问题。

　　所以，解决方案就是把文件名编码成UTF-8，传递给响应头，浏览器（部分）默认对该文件名进行UTF-8解码处理。

public void doGet(HttpServletRequest request, HttpServletResponse response)
			throws ServletException, IOException {
		String realPath=this.getServletContext().getRealPath("/src/日向雏田.jpg");
		String fileName=realPath.substring(realPath.lastIndexOf(&#39;\\&#39;)+1);
		String utf_8Name=URLEncoder.encode(fileName,"utf-8");//解决方案
		response.setHeader("content-disposition", "attachment;filename="+utf_8Name);
		InputStream is=new FileInputStream(new File(realPath));
		OutputStream os=response.getOutputStream();
		byte[] buff=new byte[1024];
		int len=0;
		while((len=is.read(buff))>0){
			os.write(buff, 0, len);
		}
		os.close();
		is.close();
	}

　　效果如下：其中火狐浏览器并没有对其解码

The above is the detailed content of JAVA WEB notes--Chinese garbled characters. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Web Speech API开发者指南：它是什么以及如何工作Apr 11, 2023 pm 07:22 PM

译者 | 李睿审校 | 孙淑娟Web Speech API是一种Web技术，允许用户将语音数据合并到应用程序中。它可以通过浏览器将语音转换为文本，反之亦然。Web Speech API于2012年由W3C社区引入。而在十年之后，这个API仍在开发中，这是因为浏览器兼容性有限。该API既支持短时输入片段，例如一个口头命令，也支持长时连续的输入。广泛的听写能力使它非常适合与Applause应用程序集成，而简短的输入很适合语言翻译。语音识别对可访问性产生了巨大的影响。残疾用户可以使用语音更轻松地浏览

如何使用Docker部署Java Web应用程序Apr 25, 2023 pm 08:28 PM

docker部署javaweb系统1.在root目录下创建一个路径test/appmkdirtest&&cdtest&&mkdirapp&&cdapp2.将apache-tomcat-7.0.29.tar.gz及jdk-7u25-linux-x64.tar.gz拷贝到app目录下3.解压两个tar.gz文件tar-zxvfapache-tomcat-7.0.29.tar.gztar-zxvfjdk-7u25-linux-x64.tar.gz4.对解

web端是什么意思Apr 17, 2019 pm 04:01 PM

web端指的是电脑端的网页版。在网页设计中我们称web为网页，它表现为三种形式，分别是超文本（hypertext）、超媒体（hypermedia）和超文本传输协议（HTTP）。

web前端和后端开发有什么区别Jan 29, 2023 am 10:27 AM

区别：1、前端指的是用户可见的界面，后端是指用户看不见的东西，考虑的是底层业务逻辑的实现，平台的稳定性与性能等。2、前端开发用到的技术包括html5、css3、js、jquery、Bootstrap、Node.js、Vue等；而后端开发用到的是java、php、Http协议等服务器技术。3、从应用范围来看，前端开发不仅被常人所知，且应用场景也要比后端广泛的太多太多。

深入探讨“高并发大流量”访问的解决思路和方案May 11, 2022 pm 02:18 PM

怎么解决高并发大流量问题？下面本篇文章就来给大家分享下高并发大流量web解决思路及方案，希望对大家有所帮助！

web前端打包工具有哪些Aug 23, 2022 pm 05:31 PM

web前端打包工具有：1、Webpack，是一个模块化管理工具和打包工具可以将不同模块的文件打包整合在一起，并且保证它们之间的引用正确，执行有序；2、Grunt，一个前端打包构建工具；3、Gulp，用代码方式来写打包脚本；4、Rollup，ES6模块化打包工具；5、Parcel，一款速度极快、零配置的web应用程序打包器；6、equireJS，是一个JS文件和模块加载器。

Python轻量级Web框架：Bottle库！Apr 13, 2023 pm 02:10 PM

和它本身的轻便一样，Bottle库的使用也十分简单。相信在看到本文前，读者对python也已经有了简单的了解。那么究竟何种神秘的操作，才能用百行代码完成一个服务器的功能？让我们拭目以待。1. Bottle库安装1）使用pip安装2）下载Bottle文件https://github.com/bottlepy/bottle/blob/master/bottle.py2.“HelloWorld！”所谓万事功成先HelloWorld，从这个简单的示例中，了解Bottle的基本机制。先上代码：首先我们从b

ASGI解释：Python Web开发的未来Apr 12, 2023 pm 10:37 PM

译者 | 李睿审校 | 孙淑娟Python Web应用程序长期以来一直遵循Web服务器网关接口(WSGI)标准，该标准描述了它们如何与Web服务器通信。WSGI最初于2003年推出，并于2010年更新，仅依赖于Python2.2版本中原生可用的、易于实现的功能。因此， WSGI迅速融入了所有主要的Python Web框架，并成为Python Web开发的基石。快进到2022年。Python2已经过时，Python现在具有处理网络调用等异步操作的原生语法。WSGI和其他默认假定同步行为的标准无法

See all articles