search

Home  >  Q&A  >  body text

Java中char只有两个字节,总共才6万多个数值,如何表示所有的汉子?并且我测试了也可以存放日语的一个字或者韩语的

char既然代表了一个字符,就应该能存放所有的字符才对呀,加上乱七八糟的各国语言字符,总不能char只能存放 中日韩的单个字符吧?

高洛峰高洛峰2887 days ago410

reply all(3)I'll reply

  • PHPz

    PHPz2017-04-17 17:48:01

    Java only uses Unicode encoding, so char can store Chinese characters. What is Unicode?
    Unicode (Chinese: Universal Code, International Code, Unicode, Unicode) is an industry standard in the field of computer science. It organizes and codes most of the writing systems in the world, allowing computers to present and process text in a simpler way.

    Unicode developed with the standard of the universal character set and was also published in the form of a book [1]. Unicode is still being continuously revised to this day, with each new version adding more new characters. The latest version is 8.0.0 [1] released on June 17, 2015, which has received more than 100,000 characters (the 100,000 characters were adopted in 2005). In addition to visual glyphs, encoding methods, and standard character encodings, the data covered by Unicode also includes character characteristics, such as upper and lower case letters.
    The above comes from Wikipedia unicode-Wikipedia

    It is not difficult to see from the above that the things in Unicode are not free and need to be included by the Unicode organization. However, now only some Chinese, Japanese and Korean characters are included, and they may not be complete. And Java uses Unicode, so as long as Unicode Organizations that include Java will support these characters.
    Not a very good answer.

    reply
    0
  • 阿神

    阿神2017-04-17 17:48:01

    When utf-8 isn’t enough, there’s utf-16

    http://baike.baidu.com/link?url=nkV9FQlo3zIu25zKLF3M1Pjp3Y6377hPnesTlnNqHb19cbkdV4P6JX9_FtCWPQ97j7BukgEZ0TBb66uqEn8rpK

    reply
    0
  • 黄舟

    黄舟2017-04-17 17:48:01

    Char is stored using 2 bytes, because 2 bytes for characters + punctuation are more than enough to represent characters, but if you add other non-English text, Chinese, etc., it may not be enough. What if 4 bytes are used to represent one character? , the range that can be expressed will be expanded, and 8 bytes is theoretically possible
    The Unicode character set standard came into being

    Characters in java use Unicode encoding, 16 bits

    reply
    0
  • Cancelreply