Solution to GBK to UTF-8 garbled code in java
If you use GBK encoding, the other party will get UTF-8 Encoding. When sending data, you need to convert the GBK encoded data into UTF-8 encoded data so that the other party will not be garbled.
The problem arises: When converting GBK to UTF-8, odd numbers of Chinese characters will be garbled, but even numbers of Chinese characters will not be garbled.
Cause analysis:
public static void analyze() throws UnsupportedEncodingException { String gbk = "我来了"; String utf8 = new String(gbk.getBytes("UTF-8")); for (byte b : gbk.getBytes("UTF-8")) { System.out.print(b + " "); } System.out.println(); for (byte b : utf8.getBytes()) { System.out.print(b + " "); } } /* -26 -120 -111 -26 -99 -91 -28 -70 -122 -26 -120 -111 -26 -99 -91 -28 -70 63 ! */
Note that the last byte is different, the above line is the correct UTF-8 encoding. So why is the last byte in the following line 63 instead of -122? This is what causes gibberish.
GBK encoding is 2 bytes in Chinese, while UTF-8 encoding is 3 bytes in Chinese. When we call the getBytes("UTF-8") method, the bytes will be added through calculation , changing from 2 bytes in GBK to 3 bytes corresponding to UTF-8. Therefore, the three Chinese characters in the above example output 9 bytes.
(Related video tutorial sharing: java video tutorial)
Solve the problem
It is the last word to ensure that the bytes are correct . When calling getBytes("UTF-8") to convert to a byte array, create an ISO-8859-1 encoded string. ISO-8859-1 encoding means that one byte corresponds to one character, so the last byte will not be changed. mistake.
public static void correctEncode() throws UnsupportedEncodingException { String gbk = "我来了"; String iso = new String(gbk.getBytes("UTF-8"),"ISO-8859-1"); for (byte b : iso.getBytes("ISO-8859-1")) { System.out.print(b + " "); } System.out.println(); //模拟UTF-8编码的网站显示 System.out.println(new String(iso.getBytes("ISO-8859-1"),"UTF-8")); } /* -26 -120 -111 -26 -99 -91 -28 -70 -122 我来了 */
The above is the detailed content of Solution to GBK to UTF-8 garbled code in Java. For more information, please follow other related articles on the PHP Chinese website!