Counting String Bytes in Java
In Java, strings are a collection of characters that may contain a variable number of bytes. The number of bytes a string occupies depends on the character set used to encode it.
Getting the Encoded Byte Count
To determine the number of bytes in a string, you can convert it to a byte array using the getBytes() method. This method takes an encoding format as an argument and returns a byte array populated with the encoded string. The array's length represents the number of bytes in the encoded string.
Example:
<code class="java">String string = "Hello World"; // Get UTF-8 encoded byte count byte[] utf8Bytes = string.getBytes("UTF-8"); System.out.println(utf8Bytes.length); // prints 11 // Get UTF-16 encoded byte count byte[] utf16Bytes = string.getBytes("UTF-16"); System.out.println(utf16Bytes.length); // prints 24 // Get UTF-32 encoded byte count byte[] utf32Bytes = string.getBytes("UTF-32"); System.out.println(utf32Bytes.length); // prints 44</code>
Encoding Variations
As you can see from the example, even an ASCII string like "Hello World" can have different byte counts depending on the encoding used.
Character Sets
It's crucial to select the appropriate character set when encoding a string. Different character sets use different methods to represent characters as bytes, leading to varying byte counts.
Default Character Set
If you don't specify a character set, Java uses the platform's default character set. However, it's advisable to avoid relying on defaults and explicitly specify the character set to ensure consistent results.
The above is the detailed content of How Many Bytes Does a Java String Occupy?. For more information, please follow other related articles on the PHP Chinese website!