Getting UTF-8 Encoding in Java Webapps
Problem: Implementing UTF-8 encoding to support non-Latin characters for text and specific alphabets.
Environment:
- Development: Windows XP
- Production: Debian
- Database: MySQL 5.x
- Browsers: Firefox2, Opera 9.x, FF3, IE7, Google Chrome
Solution:
-
Configure Tomcat's server.xml:
- Enable UTF-8 encoding for GET parameters:
-
CharsetFilter:
- Force Java app to handle all requests and responses in UTF-8.
-
Add this filter to the web.xml:
<filter>
<filter-name>CharsetFilter</filter-name>
<filter-class>fi.foo.filters.CharsetFilter</filter-class>
<init-param>
<param-name>requestEncoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>CharsetFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
-
JSP Page Encoding:
- Specify encoding for JSP pages in web.xml or add the following meta tag to each page:
-
JDBC Connection:
- Use ?useEncoding=true&characterEncoding=UTF-8 in connection URL.
-
MySQL Database and Tables:
- Create database and tables with DEFAULT CHARACTER SET=utf8 COLLATE=utf8_swedish_ci.
-
MySQL Server Configuration:
- Set default-character-set=utf8 in my.ini (Windows) or my.cnf (Linux).
-
MySQL Procedures and Functions:
- Specify UTF-8 character set explicitly, e.g.:
CREATE FUNCTION ... RETURNS TEXT CHARACTER SET utf8
Handling GET Requests:
- By default, URLs are encoded in Latin1, causing problems with non-ASCII characters.
- To address this, define URL encoding in server.xml as UTF-8.
- Instruct browsers to read pages in UTF-8 using meta-tags and request headers.
UTF-8 vs. Latin1 in GET Requests:
- POST requests are encoded in UTF-8 by browsers.
- For GET requests, while the page is defined as UTF-8, some characters may still be encoded in Latin1. This results in mixed encoding, making it difficult for the webapp to handle request parameters correctly.
References:
- http://tagunov.tripod.com/i18n/i18n.html
- http://wiki.apache.org/tomcat/Tomcat/UTF-8
- http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/
- http://dev.mysql.com/doc/refman/5.0/en/charset-syntax.html
- http://cagan327.blogspot.com/2006/05/utf-8-encoding-fix-tomcat-jsp-etc.html
- http://cagan327.blogspot.com/2006/05/utf-8-encoding-fix-for-mysql-tomcat.html
- http://jeppesn.dk/utf-8.html
- http://www.nabble.com/request-parameters-mishandle-utf-8-encoding-td18720039.html
- http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html
- http://www.utf8-chartable.de/
The above is the detailed content of How to Properly Implement UTF-8 Encoding in a Java Web Application?. For more information, please follow other related articles on the PHP Chinese website!
Statement:The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn