Home  >  Article  >  Java  >  Solution to GBK to UTF-8 garbled code in Java

Solution to GBK to UTF-8 garbled code in Java

angryTom
angryTomOriginal
2020-02-10 10:53:478447browse

Solution to GBK to UTF-8 garbled code in Java

Solution to GBK to UTF-8 garbled code in java

If you use GBK encoding, the other party will get UTF-8 Encoding. When sending data, you need to convert the GBK encoded data into UTF-8 encoded data so that the other party will not be garbled.

The problem arises: When converting GBK to UTF-8, odd numbers of Chinese characters will be garbled, but even numbers of Chinese characters will not be garbled.

Cause analysis:

public static void analyze() throws UnsupportedEncodingException {
String gbk = "我来了";
String utf8 = new String(gbk.getBytes("UTF-8"));
for (byte b : gbk.getBytes("UTF-8")) {
System.out.print(b + " ");
}
System.out.println();
for (byte b : utf8.getBytes()) {
System.out.print(b + " ");
}
}
/*
-26 -120 -111 -26 -99 -91 -28 -70 -122
-26 -120 -111 -26 -99 -91 -28 -70 63 !
*/

Note that the last byte is different, the above line is the correct UTF-8 encoding. So why is the last byte in the following line 63 instead of -122? This is what causes gibberish.

GBK encoding is 2 bytes in Chinese, while UTF-8 encoding is 3 bytes in Chinese. When we call the getBytes("UTF-8") method, the bytes will be added through calculation , changing from 2 bytes in GBK to 3 bytes corresponding to UTF-8. Therefore, the three Chinese characters in the above example output 9 bytes.

(Related video tutorial sharing: java video tutorial)

Solve the problem

It is the last word to ensure that the bytes are correct . When calling getBytes("UTF-8") to convert to a byte array, create an ISO-8859-1 encoded string. ISO-8859-1 encoding means that one byte corresponds to one character, so the last byte will not be changed. mistake.

public static void correctEncode() throws UnsupportedEncodingException {
String gbk = "我来了";
String iso = new String(gbk.getBytes("UTF-8"),"ISO-8859-1");
for (byte b : iso.getBytes("ISO-8859-1")) {
System.out.print(b + " ");
}
System.out.println();
//模拟UTF-8编码的网站显示
System.out.println(new String(iso.getBytes("ISO-8859-1"),"UTF-8"));
}
/*
-26 -120 -111 -26 -99 -91 -28 -70 -122
我来了
*/

The above is the detailed content of Solution to GBK to UTF-8 garbled code in Java. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn