Java中关于char和String对于代码点和代码单元的提问

Question

Java中采用的是Unicode,并且使用UTF-16进行编码.首先,Unicode中有17个代码层次,除了第一个代码层次意外其余16个代码层次全部需要2个代码单元组成.那么问题就来了:1.String类的length()方法,在官方API中写明了是返...

高洛峰 · Answer

Unicode character encoding has two schemes: 16-bit encoding and 32-bit encoding. The corresponding character sets are called USC-2 and USC-4 respectively. The Java language uses the USC-2 character set, which is a 16-bit Unicode character encoding. The first 128 characters are exactly the same as the ASCII character set, followed by other languages, such as Latin, Greek, Chinese characters, etc.

char is 2 bytes in java. Java uses Unicode, 2 bytes (16 bits) to represent a character.

天蓬老师 · Answer

Not all Chinese character encodings occupy two code units. The Unicode encodings corresponding to the two characters "country" are u56fd u5bb6, and each character only occupies one unit. Some Chinese characters need to be encoded with two code units, such as the characters included in CJK Unified Chinese Character Extension A. For example: "

Java中关于char和String对于代码点和代码单元的提问

reply all(2)I'll reply