Home  >  Article  >  Java  >  In-depth understanding of String in Java

In-depth understanding of String in Java

黄舟
黄舟Original
2017-09-20 10:07:481714browse

This article mainly introduces relevant information for in-depth understanding of java String. I hope that through this article everyone can understand the usage of String. Friends in need can refer to

In-depth understanding of java String

1. Java Memory Model

According to the official statement: Java virtual machine has a heap. The heap is the runtime data area, and all class instances and The memory of the array is allocated from here.

JVM mainly manages two types of memory: heap and non-heap. Heap Memory is created when the Java virtual machine starts. Non-heap Memory is created in the JVM heap. outside memory.

Simply put, the non-heap includes the method area, the memory required for internal JVM processing or optimization (such as JITCompiler, Just-in-time Compiler, just-in-time compiled code cache), each class structure (such as Runtime constant pool, field and method data) and code for methods and constructors.

Java's heap is a runtime data area, from which class (objects allocate space. These objects are created through instructions such as new, newarray, anewarray and multianewarray. They do not require program code to explicitly Release.

The heap is responsible for garbage collection. The advantage of the heap is that it can dynamically allocate memory size, and the lifetime does not need to be told to the compiler in advance, because it allocates memory dynamically at runtime. , Java's garbage collector will automatically collect these no longer used data, but the disadvantage is that due to dynamic allocation of memory at runtime, the access speed is slow. The advantage is that the access speed is faster than the heap, second only to the register, and the stack data can be shared. However, the disadvantage is that the size and lifetime of the data stored in the stack must be determined, and there is a lack of flexibility. Some basic types of variable data (int, short, long, byte, float, double, boolean, char) and object handles (references)

The virtual machine must maintain for each loaded type. A constant pool. A constant pool is an ordered collection of constants used by this type, including direct constants (string, integer and floating point constants) and symbolic references to other types, fields and methods.
For String. Constants, their values ​​are in the constant pool. The constant pool in the JVM exists in the form of a table in memory. For the String type, there is a fixed-length CONSTANT_String_info table used to store literal string values. Note: This table only stores literal string values, not symbolic references. At this point, you should have a clear understanding of the storage location of the string values ​​in the constant pool. When the program is executed, the constant pool will be stored in Method. Area, instead of in the heap. There are many String objects stored in the constant pool; and they can be shared, so it improves efficiency

2. Case Analysis


public static void main(String[] args) { 
    /** 
     * 情景一:字符串池 
     * JAVA虚拟机(JVM)中存在着一个字符串池,其中保存着很多String对象; 
     * 并且可以被共享使用,因此它提高了效率。 
     * 由于String类是final的,它的值一经创建就不可改变。 
     * 字符串池由String类维护,我们可以调用intern()方法来访问字符串池。 
     */ 
    String s1 = "abc";   
    //↑ 在字符串池创建了一个对象 
    String s2 = "abc";   
    //↑ 字符串pool已经存在对象“abc”(共享),所以创建0个对象,累计创建一个对象 
    System.out.println("s1 == s2 : "+(s1==s2));  
    //↑ true 指向同一个对象, 
    System.out.println("s1.equals(s2) : " + (s1.equals(s2)));  
    //↑ true 值相等 
    //↑------------------------------------------------------over 
    /** 
     * 情景二:关于new String("") 
     * 
     */ 
    String s3 = new String("abc"); 
    //↑ 创建了两个对象,一个存放在字符串池中,一个存在与堆区中; 
    //↑ 还有一个对象引用s3存放在栈中 
    String s4 = new String("abc"); 
    //↑ 字符串池中已经存在“abc”对象,所以只在堆中创建了一个对象 
    System.out.println("s3 == s4 : "+(s3==s4)); 
    //↑false  s3和s4栈区的地址不同,指向堆区的不同地址; 
    System.out.println("s3.equals(s4) : "+(s3.equals(s4))); 
    //↑true s3和s4的值相同 
    System.out.println("s1 == s3 : "+(s1==s3)); 
    //↑false 存放的地区多不同,一个栈区,一个堆区 
    System.out.println("s1.equals(s3) : "+(s1.equals(s3))); 
    //↑true 值相同 
    //↑------------------------------------------------------over 
    /** 
     * 情景三: 
     * 由于常量的值在编译的时候就被确定(优化)了。 
     * 在这里,"ab"和"cd"都是常量,因此变量str3的值在编译时就可以确定。 
     * 这行代码编译后的效果等同于: String str3 = "abcd"; 
     */ 
    String str1 = "ab" + "cd"; //1个对象 
    String str11 = "abcd";  
    System.out.println("str1 = str11 : "+ (str1 == str11)); 
    //↑------------------------------------------------------over 
    /** 
     * 情景四: 
     * 局部变量str2,str3存储的是存储两个拘留字符串对象(intern字符串对象)的地址。 
     * 
     * 第三行代码原理(str2+str3): 
     * 运行期JVM首先会在堆中创建一个StringBuilder类, 
     * 同时用str2指向的拘留字符串对象完成初始化, 
     * 然后调用append方法完成对str3所指向的拘留字符串的合并, 
     * 接着调用StringBuilder的toString()方法在堆中创建一个String对象, 
     * 最后将刚生成的String对象的堆地址存放在局部变量str3中。 
     * 
     * 而str5存储的是字符串池中"abcd"所对应的拘留字符串对象的地址。 
     * str4与str5地址当然不一样了。 
     * 
     * 内存中实际上有五个字符串对象: 
     *    三个拘留字符串对象、一个String对象和一个StringBuilder对象。 
     */ 
    String str2 = "ab"; //1个对象 
    String str3 = "cd"; //1个对象                     
    String str4 = str2+str3;                    
    String str5 = "abcd";  
    System.out.println("str4 = str5 : " + (str4==str5)); // false 
    //↑------------------------------------------------------over 
    /** 
     * 情景五: 
     * JAVA编译器对string + 基本类型/常量 是当成常量表达式直接求值来优化的。 
     * 运行期的两个string相加,会产生新的对象的,存储在堆(heap)中 
     */ 
    String str6 = "b"; 
    String str7 = "a" + str6; 
    String str67 = "ab"; 
    System.out.println("str7 = str67 : "+ (str7 == str67)); 
    //↑str6为变量,在运行期才会被解析。 
    final String str8 = "b"; 
    String str9 = "a" + str8; 
    String str89 = "ab"; 
    System.out.println("str9 = str89 : "+ (str9 == str89)); 
    //↑str8为常量变量,编译期会被优化 
    //↑------------------------------------------------------over 
  }

Summary:

1. The String class is immutable after initializationThis There is a lot to say. As long as you know that a String instance will not change once it is generated, for example: String str="kv"+"ill"+" "+"ans"; that is, there are 4 string constants. First, "kv" and "ill" generate "kvill", which is stored in the memory. Then "kvill" is combined with " " to generate "kvill", which is stored in the memory. Finally, it is combined with "kvill ans" to generate "kvill ans"; and the address of this string is Assigned to str, it is because of the "immutability" of String that a lot of temporary variables are generated. This is why it is recommended to use StringBuffer, because StringBuffer is changeable.

The following are some common problems related to String:

String中的final用法和理解 
final StringBuffer a = new StringBuffer(“111”); 
final StringBuffer b = new StringBuffer(“222”); 
a=b;//此句编译不通过 final StringBuffer a = new StringBuffer(“111”); 
a.append(“222”);// 编译通过

It can be seen that final is only valid for the referenced "value" (i.e. memory address), which forces A reference can only point to the object it originally pointed to, and changing its pointer will cause a compile-time error. As for changes in the object it points to, final is not responsible.


2. The string constants in the code are collected during the compilation process and placed in the constant area of ​​the class file,

such as "123", "123" + "456", etc. Expressions containing variables will not be included, such as "123"+a.

3. When the JVM loads a class, it generates a constant pool based on the strings in the constant area. Each character sequence such as "123" will generate an instance and place it in the constant pool. This instance is not in the heap and will not be GC. From the constructor of the source code, the value attribute of this instance should be created with new and placed in the array 123. So according to my understanding, the address of the character array where value is stored at this time is In the pile, you are welcome to correct me if I am wrong.

4. Using String does not necessarily create an object

在执行到双引号包含字符串的语句时,如String a = “123”,JVM会先到常量池里查找,如果有的话返回常量池里的这个实例的引用,否则的话创建一个新实例并置入常量池里。如果是 String a = “123” + b (假设b是”456”),前半部分”123”还是走常量池的路线,但是这个+操作符其实是转换成[SringBuffer].Appad()来实现的,所以最终a得到是一个新的实例引用,而且a的value存放的是一个新申请的字符数组内存空间的地址(存放着”123456”),而此时”123456”在常量池中是未必存在的。

要注意: 我们在使用诸如String str = “abc”;的格式定义类时,总是想当然地认为,创建了String类的对象str。担心陷阱!对象可能并没有被创建!而可能只是指向一个先前已经创建的对象。只有通过new()方法才能保证每次都创建一个新的对象

5.使用new String,一定创建对象

在执行String a = new String(“123”)的时候,首先走常量池的路线取到一个实例的引用,然后在堆上创建一个新的String实例,走以下构造函数给value属性赋值,然后把实例引用赋值给a:


public String(String original) {
  int size = original.count;
  char[] originalValue = original.value;
  char[] v;
   if (originalValue.length > size) {
     // The array representing the String is bigger than the new
     // String itself. Perhaps this constructor is being called
     // in order to trim the baggage, so make a copy of the array.
      int off = original.offset;
      v = Arrays.copyOfRange(originalValue, off, off+size);
   } else {
     // The array representing the String is the same
     // size as the String, so no point in making a copy.
    v = originalValue;
   }
  this.offset = 0;
  this.count = size;
  this.value = v;
  }

从中我们可以看到,虽然是新创建了一个String的实例,但是value是等于常量池中的实例的value,即是说没有new一个新的字符数组来存放”123”。

如果是String a = new String(“123”+b)的情况,首先看回第4点,”123”+b得到一个实例后,再按上面的构造函数执行。

6.String.intern()

String对象的实例调用intern方法后,可以让JVM检查常量池,如果没有实例的value属性对应的字符串序列比如”123”(注意是检查字符串序列而不是检查实例本身),就将本实例放入常量池,如果有当前实例的value属性对应的字符串序列”123”在常量池中存在,则返回常量池中”123”对应的实例的引用而不是当前实例的引用,即使当前实例的value也是”123”。


public native String intern();

存在于.class文件中的常量池,在运行期被JVM装载,并且可以扩充。String的 intern()方法就是扩充常量池的 一个方法;当一个String实例str调用intern()方法时,Java 查找常量池中 是否有相同Unicode的字符串常量,如果有,则返回其的引用,如果没有,则在常 量池中增加一个Unicode等于str的字符串并返回它的引用;看示例就清楚了


/**
 * Java学习交流QQ群:589809992 我们一起学Java!
 */
public static void main(String[] args) {
    String s0 = "kvill"; 
    String s1 = new String("kvill"); 
    String s2 = new String("kvill"); 
    System.out.println( s0 == s1 ); //false
    System.out.println( "**********" ); 
    s1.intern(); //虽然执行了s1.intern(),但它的返回值没有赋给s1
    s2 = s2.intern(); //把常量池中"kvill"的引用赋给s2 
    System.out.println( s0 == s1); //flase
    System.out.println( s0 == s1.intern() ); //true//说明s1.intern()返回的是常量池中"kvill"的引用
    System.out.println( s0 == s2 ); //true
  }

最后我再破除一个错误的理解:有人说,“使用 String.intern() 方法则可以将一个 String 类的保存到一个全局 String 表中 ,如果具有相同值的 Unicode 字符串已经在这个表中,那么该方法返回表中已有字符串的地址,如果在表中没有相同值的字符串,则将自己的地址注册到表中”如果我把他说的这个全局的 String 表理解为常量池的话,他的最后一句话,”如果在表中没有相同值的字符串,则将自己的地址注册到表中”是错的:


public static void main(String[] args) {    
    String s1 = new String("kvill"); 
    String s2 = s1.intern(); 
    System.out.println( s1 == s1.intern() ); //false
    System.out.println( s1 + " " + s2 ); //kvill kvill
    System.out.println( s2 == s1.intern() ); //true
  }

在这个类中我们没有声名一个”kvill”常量,所以常量池中一开始是没有”kvill”的,当我们调用s1.intern()后就在常量池中新添加了一 个”kvill”常量,原来的不在常量池中的”kvill”仍然存在,也就不是“将自己的地址注册到常量池中”了。

   s1==s1.intern() 为false说明原来的”kvill”仍然存在;s2现在为常量池中”kvill”的地址,所以有s2==s1.intern()为true。

StringBuffer与StringBuilder的区别,它们的应用场景是什么?

jdk的实现中StringBuffer与StringBuilder都继承自AbstractStringBuilder,对于多线程的安全与非安全看到StringBuffer中方法前面的一堆synchronized就大概了解了。

这里随便讲讲AbstractStringBuilder的实现原理:我们知道使用StringBuffer等无非就是为了提高java中字符串连接的效率,因为直接使用+进行字符串连接的话,jvm会创建多个String对象,因此造成一定的开销。AbstractStringBuilder中采用一个char数组来保存需要append的字符串,char数组有一个初始大小,当append的字符串长度超过当前char数组容量时,则对char数组进行动态扩展,也即重新申请一段更大的内存空间,然后将当前char数组拷贝到新的位置,因为重新分配内存并拷贝的开销比较大,所以每次重新申请内存空间都是采用申请大于当前需要的内存空间的方式,这里是2倍,

StringBuffer 始于 JDK 1.0

StringBuilder 始于 JDK 1.5

从 JDK 1.5 开始,带有字符串变量的连接操作(+),JVM 内部采用的是
StringBuilder 来实现的,而之前这个操作是采用 StringBuffer 实现的。

我们通过一个简单的程序来看其执行的流程:


/**
 * Java学习交流QQ群:589809992 我们一起学Java!
 */
public class Buffer { 
   public static void main(String[] args) { 
      String s1 = "aaaaa"; 
      String s2 = "bbbbb"; 
      String r = null; 
      int i = 3694; 
      r = s1 + i + s2;  

      for(int j=0;i<10;j++){ 
        r+="23124"; 
      } 
   } 
}

使用命令javap -c Buffer查看其字节码实现:

将清单1和清单2对应起来看,清单2的字节码中ldc指令即从常量池中加载“aaaaa”字符串到栈顶,istore_1将“aaaaa”存到变量1中,后面的一样,sipush是将一个短整型常量值(-32768~32767)推送至栈顶,这里是常量“3694”。  

让我们直接看到13,13~17是new了一个StringBuffer对象并调用其初始化方法,20 ~ 21则是先通过aload_1将变量1压到栈顶,前面说过变量1放的就是字符串常量“aaaaa”,接着通过指令invokevirtual调用StringBuffer的append方法将“aaaaa”拼接起来,后续的24 ~ 30同理。最后在33调用StringBuffer的toString函数获得String结果并通过astore存到变量3中。  

看到这里可能有人会说,“既然JVM内部采用了StringBuffer来连接字符串了,那么我们自己就不用用StringBuffer,直接用”+“就行了吧!“。是么?当然不是了。俗话说”存在既有它的理由”,让我们继续看后面的循环对应的字节码。  

37~ 42都是进入for循环前的一些准备工作,37,38是将j置为1。44这里通过if_icmpge将j与10进行比较,如果j大于10则直接跳转到73,也即return语句退出函数;否则进入循环,也即47~66的字节码。这里我们只需看47到51就知道为什么我们要在代码中自己使用StringBuffer来处理字符串的连接了,因为每次执行“+”操作时jvm都要new一个StringBuffer对象来处理字符串的连接,这在涉及很多的字符串连接操作时开销会很大。

The above is the detailed content of In-depth understanding of String in Java. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn