Review the past and learn the new (1) In-depth understanding of strings in Java

Review the past and learn the new (1) In-depth understanding of strings in Java

beginner In Java, we already know that Java can be divided into two major data types, namely basic data types and reference data types. Among these two data types, there is a special data type, String. String is a reference data type, but it is different from other reference data types. It can be said that it is a strange flower among data types. So, in this article, we will have an in-depth understanding of String strings in Java.

1. Let’s start with the memory allocation of String strings

The previous article "Reviewing the Past and Learning the New--JVM Memory Allocation You Don’t Know" analyzed the JVM memory model in detail. In the constant pool section, we learned about three types of constant pools: string constant pool, Class file constant pool, and runtime constant pool. The memory allocation of strings has a great relationship with the string constant pool.

We know that instantiating a string can be achieved in two ways. The first and most commonly used method is through literal assignment, and the other is through the construction method of passing parameters. The code is as follows:

    String str1="abc";
    String str2=new String("abc");复制代码

What is the difference between these two methods in memory allocation? I believe the teacher explained it to us when we first learned Java:

1. Pass Creating a String through literal assignment will only generate a String object in the string constant pool. 2. Passing in the String parameter through the constructor will generate a String object in the heap memory and the string constant pool, and put the reference to the String in the heap memory into the stack.

Is this answer correct? It doesn't seem entirely correct at least for now, as it completely depends on the Java version used. The previous article "Reviewing the past and learning the new - JVM memory allocation you don't know" talked about the HotSpot virtual machine's implementation of the string constant pool on different JDKs. The excerpt is as follows:

Before JDK7, the string constant pool was in the method area (permanent generation). At this time, the constant pool stored string objects. In JDK7, the string constant pool is moved from the method area to the heap memory, and the string object is stored in the Java heap. The string constant pool only stores references to string objects.

How should we understand this sentence? Let’s take String str1=new String("abc") as an example to analyze:

1. Memory allocation in JDK6

Let’s first analyze the memory allocation of JDK6, as shown in the figure below :

Review the past and learn the new (1) In-depth understanding of strings in Java

When new String("abc") is called, an "abc" object will be generated in the Java heap and the constant pool. At the same time, point str1 to the "abc" object in the heap.

2.Memory allocation in JDK7

In JDK7 and later versions, since the string constant pool is moved to the heap memory, the memory allocation method is also different, as shown in the following figure :

Review the past and learn the new (1) In-depth understanding of strings in Java

When new String("abc") is called, two "abc" objects will be created in the heap memory, and str1 points to them. An "abc" object, and a reference to the "abc" object will be generated in the constant pool and point to another "abc" object.

As for why Java is designed like this, we have already explained it in the previous article: Because String is the most frequently used data type in Java, in order to save program memory and improve program performance, The designers of Java have opened up a string constant pool area, which is shared by all classes. Each virtual machine has only one string constant pool. Therefore, when using literal assignment, if the string already exists in the string constant pool, the object will not be re-created in the heap memory, but will be pointed directly to the object in the string constant pool.

2. String’s intern() method

After understanding the memory allocation of String, we need to get to know a very important method in String: String.intern() .

Many readers may not know much about this method, but it does not mean that it is not important. Let's first take a look at the source code of the intern() method:

     * Returns a canonical representation for the string object.
     * <p>
     * A pool of strings, initially empty, is maintained privately by the
     * class {@code String}.
     * <p>
     * When the intern method is invoked, if the pool already contains a
     * string equal to this {@code String} object as determined by
     * the {@link #equals(Object)} method, then the string from the pool is
     * returned. Otherwise, this {@code String} object is added to the
     * pool and a reference to this {@code String} object is returned.
     * <p>
     * It follows that for any two strings {@code s} and {@code t},
     * {@code s.intern() == t.intern()} is {@code true}
     * if and only if {@code s.equals(t)} is {@code true}.
     * <p>
     * All literal strings and string-valued constant expressions are
     * interned. String literals are defined in section 3.10.5 of the
     * <cite>The Java&trade; Language Specification</cite>.
     * @return  a string that has the same contents as this string, but is
     *          guaranteed to be from a pool of unique strings.
    public native String intern();复制代码

emmmmm.... It's actually a native method, but it doesn't matter. Even if we can't see the source code, we can get some information from its comments: When calling the intern method, if the string constant pool already contains a string equal to the String object, the reference to the string in the string constant pool is directly returned. Otherwise, the string contained by this string object is added to the constant pool and a reference to this object is returned.



public class Test {    public static void main(String[] args) {
        String str1 = "hello world";
        String str2 = new String("hello world");
        String str3=str2.intern();
        System.out.println("str1 == str2:"+(str1 == str2));
        System.out.println("str1 == str3:"+(str1 == str3));


Review the past and learn the new (1) In-depth understanding of strings in Java


String str1 = "hello world"; 这行代码会首先在Java堆中创建一个对象,并将该对象的引用放入字符串常量池中,str1指向常量池中的引用。

String str2 = new String("hello world");这行代码会通过new来实例化一个String对象,并将该对象的引用赋值给str2,然后检测字符串常量池中是否已经有了与“hello world”相等的对象,如果没有,则会在堆内存中再生成一个值为"hello world"的对象,并将其引用放入到字符串常量池中,否则,不会再去创建。这里,第一行代码其实已经在字符串常量池中保存了“hello world”字符串对象的引用,因此,第二行代码就不会再次向常量池中添加“hello world"的引用。

String str3=str2.intern(); 这行代码会首先去检测字符串常量池中是否已经包含了”hello world"的String对象,如果有则直接返回其引用。而在这里,str2.intern()其实刚好返回了第一行代码中生成的“hello world"对象。

因此【System.out.println("str1 == str3:"+(str1 == str3));】这行代码会输出true.


Review the past and learn the new (1) In-depth understanding of strings in Java



public class Test {
    public static void main(String[] args) {
        String str1=new String("he")+new String("llo");
        String str2=str1.intern();
        String str3="hello";
        System.out.println("str1 == str2:"+(str1 == str2));
        System.out.println("str2 == str3:"+(str2 == str3)); 




Review the past and learn the new (1) In-depth understanding of strings in Java


Review the past and learn the new (1) In-depth understanding of strings in Java

String str1=new String("he")+new String("llo"); 这行代码中new String("he")和new String("llo")会在堆上生成四个对象,因为与本例无关,所以图上没有画出,new String("he")+new String("llo")通过”+“号拼接后最终会生成一个"hello"对象并赋值给str1。

String str2=str1.intern(); 这行代码会首先检测字符串常量池,发现此时还没有存在与”hello"相等的字符串对象的引用,而在检测堆内存时发现堆中已经有了“hello"对象,遂将堆中的”hello"对象的应用放入字符串常量池中。

String str3="hello"; 这行代码发现字符串常量池中已经存在了“hello"对象的引用,因此将str3指向了字符串常量池中的引用。




Review the past and learn the new (1) In-depth understanding of strings in Java


Review the past and learn the new (1) In-depth understanding of strings in Java

String str1=new String("he")+new String("llo"); 这行代码会通过new String("he")和new String("llo")会分别在Java堆与字符串常量池中各生成两个String对象,由于与本例无关,所以并没有在图中画出。而new String("he")+new String("llo")通过“+”号拼接后最终会在Java堆上生成一个"hello"对象,并将其赋值给了str1。

String str2=str1.intern(); 这行代码检测到字符串常量池中还没有“hello"对象,因此将堆中的”hello“对象复制到了字符串常量池,并将其赋值给str2。

String str3="hello"; 这行代码检测到字符串常量池中已经有了”hello“对象,因此直接将str3指向了字符串常量池中的”hello“对象。 此时str1指向的是Java堆中的”hello“对象,而str2和str3均指向了字符串常量池中的对象。因此,有了上面的输出结果。




public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {        /** The value is used for character storage. */
        private final char value[];        /** Cache the hash code for the string */
         private int hash; // Default to 0
        //  ...}复制代码




Review the past and learn the new (1) In-depth understanding of strings in Java

可以看到在Java6之前String中维护了一个char 数组、一个偏移量 offset、一个字符数量 count以及一个哈希值 hash。 String对象是通过 offset 和 count 两个属性来定位 char[]  数组,获取字符串。这么做可以高效、快速地共享数组对象,同时节省内存空间,但这种方式很有可能会导致内存泄漏。

在Java7和Java8的版本中移除了 offset 和 count 两个变量了。这样的好处是String对象占用的内存稍微少了些,同时 String.substring 方法也不再共享 char[],从而解决了使用该方法可能导致的内存泄漏问题。

从Java9开始,String中的char数组被byte[]数组所替代。我们知道一个char类型占用两个字节,而byte占用一个字节。因此在存储单字节的String时,使用char数组会比byte数组少一个字节,但本质上并无任何差别。 另外,注意到在Java9的版本中多了一个coder,它是编码格式的标识,在计算字符串长度或者调用 indexOf() 函数时,需要根据这个字段,判断如何计算字符串长度。coder 属性默认有 0 和 1 两个值, 0 代表Latin-1(单字节编码),1 代表 UTF-16 编码。如果 String判断字符串只包含了 Latin-1,则 coder 属性值为 0 ,反之则为 1。





    public String substring(int beginIndex) {            if (beginIndex < 0) {                throw new StringIndexOutOfBoundsException(beginIndex);
            }            int subLen = value.length - beginIndex;            if (subLen < 0) {                throw new StringIndexOutOfBoundsException(subLen);
            }            return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);






    public class Test {        private static final int COUNT=50000;        public static void main(String[] args) {
            String str="";            for(int i=0;i<COUNT;i++) {


    public class Test {        private static final int COUNT=50000;        public static void main(String[] args) {
            String str="";            for(int i=0;i<COUNT;i++) {


    public class Test {        private static final int COUNT=50000;        public static void main(String[] args) {
            StringBuilder str=new StringBuilder();            for(int i=0;i<COUNT;i++) {


拼接方法 最小用时(ms) 最大用时(ms) 平均用时(ms)
"+"操作符 4868 5146 4924
String的concat方法 2227 2456 2296
StringBuilder的append方法 4 12 6.6




    public class Test {        private static final int COUNT=50000;        public static void main(String[] args) {            for(int i=0;i<COUNT;i++) {


Review the past and learn the new (1) In-depth understanding of strings in Java


    public class Test {        private static final int COUNT=50000;        public static void main(String[] args) {
            String str="";            for(int i=0;i<COUNT;i++) {
                str=new StringBuilder(str).append("abc").toString();



public String concat(String str) {        int otherLen = str.length();        if (otherLen == 0) {            return this;
        }        int len = value.length;        char buf[] = Arrays.copyOf(value, len + otherLen);
        str.getChars(buf, len);        return new String(buf, true);




本篇文章我们深入分析了String字符串的内存分配、intern()方法,以及String类的结构及特性。关于这块知识,网上的文章鱼龙混杂,甚至众说纷纭。笔者也是参考了大量的文章并结合自己的理解来做的分析。但是,避免不了的可能会出现理解偏差的问题,如果有,希望大家多多讨论给予指正。 同时,文章中多次提到StringBuilder,但限于文章篇幅,没能给出关于其详细分析。不过不用担心,我会在下一篇文章中再做探讨。 不管怎样,相信大家看完这篇文章后一定 对String有了更加深入的认识,尤其是了解String类的一些裁剪及拼接中可能造成的性能问题,在今后的开发中应该尽量避免。

