Home  >  Article  >  Java  >  How does Java internally represent Strings: UTF-8 or UTF-16?

How does Java internally represent Strings: UTF-8 or UTF-16?

Patricia Arquette
Patricia ArquetteOriginal
2024-11-10 07:12:02742browse

How does Java internally represent Strings: UTF-8 or UTF-16?

What is Java's Internal Representation for String: Modified UTF-8 or UTF-16?

Java utilizes UTF-16 for its internal text representation, as stated by the Oracle documentation. This representation applies to various data structures and classes that store character sequences within the Java platform, such as String and StringBuilder. A 16-bit unsigned integer (char) in Java can represent a Unicode code point or code units of UTF-16.

However, Java also employs a non-standard modification of UTF-8 for string serialization. This means that serialized strings are stored in UTF-8 format by default.

For storage in memory, Java uses 2 bytes for a char data type. Note that code points may require one or two char instances, resulting in 2 or 4 bytes of storage space, respectively.

The above is the detailed content of How does Java internally represent Strings: UTF-8 or UTF-16?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn