Home  >  Article  >  Java  >  How Does Java Internally Represent Strings: UTF-16 or Modified UTF-8?

How Does Java Internally Represent Strings: UTF-16 or Modified UTF-8?

DDD
DDDOriginal
2024-11-11 01:32:03954browse

How Does Java Internally Represent Strings: UTF-16 or Modified UTF-8?

Unraveling Java's String Representation: UTF-16 or Modified UTF-8?

In the realm of Java, the internal representation of strings has been a subject of debate. Two seemingly reliable sources present conflicting information:

One source suggests Java employs UTF-16 for internal text representation, while the other posits a modified version of UTF-8. Which of these claims holds true?

The Answer: UTF-16 for Internal Representation

Java adopts UTF-16 for its internal representation of text, including strings, string builders, and other related structures. This encoding system utilizes 16-bit Unicode code units to represent characters within the range U 0000 to U FFFF or the UTF-16 code units.

Modified UTF-8 for Serialization

While Java favors UTF-16 internally, it employs a non-standard variant of UTF-8 for the serialization of strings. Serialization involves transforming Java objects into a storable and transmittable format, and in this context, serialized strings are represented using modified UTF-8.

In-Memory Storage: Compressed Strings

At the JVM level, Java may employ compressed strings (activated by -XX: UseCompressedStrings), where strings that do not require UTF-16 encoding can be stored using 8-bit ISO-8859-1 encoding. This optimization reduces memory usage for specific types of strings.

Byte Usage for Char

A char variable in Java consistently occupies 2 bytes, regardless of padding considerations within an object.

Code Points and Character Representation

It's important to note that a code point, representing characters beyond the 65535 limit, may be expressed using either one or two characters (i.e., 2 or 4 bytes).

The above is the detailed content of How Does Java Internally Represent Strings: UTF-16 or Modified UTF-8?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn