Home  >  Q&A  >  body text

utf-8 - java utf8 转 gb2312 错误?

直接上代码,方便同学可以复制下来跑跑


try {
            String str = "上海上海";
            String gb2312 = new String(str.getBytes("utf-8"), "gb2312");
            String utf8 = new String(gb2312.getBytes("gb2312"), "utf-8");
            System.out.println(str.equals(utf8));
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }

结果打印false

jdk7和8下面都是这结果,ide编码是utf-8

跪请大神赐教啊!!!!

PHP中文网PHP中文网2713 days ago668

reply all(3)I'll reply

  • 阿神

    阿神2017-04-17 15:36:05

    All Strings in Java are Unicode encoded. What you get by using String.getBytes(...) is the encoded byte array. The effect of your code is to directly read the UTF8 encoded byte array into GB2312. , of course it is wrong.
    String itself is a unified encoding. If you need to output a string with a specific encoding, you can directly use String.getBytes(...) to get the string byte array of the corresponding encoding. There is no concept of conversion.

    If you are converting a string byte array in UTF8 format into GB2312 format, the code should be

    byte[] bytes = ...
    String str = new String(bytes, "UTF-8");
    bytes = str.getBytes("GB2312");

    reply
    0
  • 大家讲道理

    大家讲道理2017-04-17 15:36:05

    Both the strings gb2312 and utf8 are already garbled. New String(str.getBytes("utf-8"), "gb2312") means using utf-8 to encode, and then using gb2312 to decode, it will definitely be garbled

    reply
    0
  • 怪我咯

    怪我咯2017-04-17 15:36:05

    The questioner probably has a misunderstanding about encoding and decoding.

    getBytes(String charsetName) refers to encoding the string using the encoding format represented by chasetName to obtain a byte array.
    String(byte bytes[], String charsetName) The construction method refers to using the encoding format represented by chasetName to decode the direct array to obtain a string.

    In other words, to get a character array encoded in a certain format, just use getBytes(String charsetName). The byte array needs to be decoded using the same encoding format used to encode it. Otherwise the code will be garbled. If you use an already garbled string to convert the encoding at this time, you may not be able to get the previously correctly encoded byte array.

    Example:

    String str = "上海上海"; // 我这设置 file.encoding 为 UTF-8
            byte[] utf8Bytes = str.getBytes("utf-8");
            byte[] defaultBytes = str.getBytes();
            Assert.assertArrayEquals(utf8Bytes, defaultBytes);
    
            byte[] gbkBytes = str.getBytes("GBK");
    //        Assert.assertArrayEquals(utf8Bytes, gbkBytes);// 这儿不过!! array lengths differed, expected.length=12 actual.length=8。
    
            String errorStr = new String(gbkBytes, "utf-8");// 此时是乱码的
            Assert.assertNotEquals(str, errorStr); // 肯定不一样
            byte[] errorUtf8Bytes = errorStr.getBytes("utf-8"); // 乱码后重新编码
    //        Assert.assertArrayEquals(gbkBytes, errorUtf8Bytes); // 不过! 已经和之前的字节数组不一样了。array lengths differed, expected.length=8 actual.length=16
    //        Assert.assertArrayEquals(utf8Bytes, errorUtf8Bytes); // 不过! 更不会和 utf8Bytes 相同。array lengths differed, expected.length=12 actual.length=16

    Where: errorStr is "�Ϻ��Ϻ�"
    The other byte array is:

    reply
    0
  • Cancelreply