Home >Java >javaTutorial >Detailed explanation of String class in Java

Detailed explanation of String class in Java

Y2J
Y2JOriginal
2017-05-04 09:48:281867browse

This article mainly introduces the detailed explanation of the Java String class. This article has been collected, organized and summarized from many sources, and finally written into an article. It is very good and worth collecting. Friends who need it can refer to it

Introduction question

Among all data types in the Java language, the String type is a special type, and it is also an interview A knowledge point that is often asked when using Java memory allocation. This article combines Java memory allocation with an in-depth analysis of many confusing issues about String. The following are some issues that will be covered in this article. If the reader is familiar with these issues, you can ignore this article.

1. Which memory does Java memory refer to specifically? Why should this memory area be divided? How is it divided? What is the role of each area after division? How to set the size of each area?

2. Why is the String type less efficient than StringBuffer or StringBuilder when performing connection operations? What are the connections and differences between StringBuffer and StringBuilder?

3. What do constants mean in Java? What is the difference between String s = "s" and String s = new String("s")?

This article has been collected, organized and summarized from many sources and finally written. If there are any errors, please let me know!

Java Memory Allocation

1. Introduction to JVM

Java The virtual machine (Java Virtual Machine, referred to as JVM) is an abstract computer that runs all Java programs. It is the running environment of the Java language. It is one of the most attractive features of Java. The Java virtual machine has its own complete hardware architecture, such as processor, stack, registers, etc., and also has a corresponding instruction system. The JVM shields the information related to the specific operating system platform, so that the Java program only needs to generate the target code (bytecode) that runs on the Java virtual machine, and it can run on a variety of platforms without modification.

The bounden duty of a runtime Java virtual machine instance is: responsible for running a java program. When a Java program is started, a virtual machine instance is born. When the program closes and exits , the virtual machine instance will also die. If three Java programs are run simultaneously on the same computer, three Java virtual machine instances will be obtained. Each Java program runs in its own Java virtual machine instance.

As shown in the figure below, the JVM architecture includes several major subsystems and memory areas:

Recycle unused objects in the heap memory (Heap), that is, these objects are no longer referenced.          

Classloader Sub-System: In addition to locating and importing binary class files, it must also be responsible for verifying the correctness of imported classes, allocating and initializing class variables memory, and help resolving symbol references.

            Execution Engine

(Execution Engine): Responsible for executing the instructions contained in the methods of the loaded class.

          Runtime data area

(Java Memory Allocation Area): Also called virtual machine memory or Java memory, when the virtual machine is running, it needs to divide a memory area from the entire computer memory to store many things. For example: bytecode, other information obtained from loaded class files, objects created by the program, parameters passed to methods, return values, local variables, etc.

2. Java memory partition

As we know from the previous section, the runtime data area is the java memory, and the data area must There are a lot of things stored. If this memory area is not divided and managed, it will appear more disorganized. Programs like orderly things and hate disorganized things. According to different stored data, Java memory is usually divided into 5 areas: Program Count Register, Native Stack, Method Area, Stack, and Heap.

     Program Counter (Program Count Register): Also called program register. The JVM supports multiple threads running at the same time. When each new thread is created, it will get its own PC register (program counter). If the thread is executing a Java method (non-native), then the value of the PC register will always point to the next instruction to be executed. If the method is native, the value of the program counter register will not be defined. The JVM's program counter register is wide enough to hold a return address or native pointer.                                                     The JVM allocates a stack for each newly created thread. In other words, for a Java program, its operation is completed through the operation of the stack. The stack saves the
state

of the thread in units of frames. The JVM only performs two operations on the stack: push and pop operations in frame units. We know that the method being executed by a thread is called the current method of this thread. We may not know that the frame used by the current method is called the current frame. When a thread activates a Java method, the JVM will push a new frame into the thread's Java stack, and this frame naturally becomes the current frame. During the execution of this method, this frame will be used to save parameters, local variables, intermediate calculations and other data. From the perspective of Java's allocation mechanism, the stack can be understood like this: the stack is the storage area created by the operating system for this thread when it creates a process or thread (thread in an operating system that supports multi-threading) , this area has first-in-last-out characteristics. Its related setting parameters:

• -Xss --Set the maximum value of the method stack               Native Stack

: Store the calling status of local methods.


Method Area

(Method Area): When the virtual machine loads a class file, it will extract the binary data contained in the class file. Parse the type information in the method, and then put the type information (including class information, constants,

static variables, etc.) into the method area. This memory area is shared by all threads, as shown in the figure below. There is a special memory area in the local method area called the Constant Pool. This memory will be closely related to the analysis of the String type.

        
Heap

(Heap): Java Heap (Java Heap) is the largest piece of memory managed by the Java virtual machine. The Java heap is a memory area shared by all threads. The only purpose of this area is to store object instances. Almost all object instances allocate memory here, but the reference to this object is allocated in the stack. Therefore, when executing String s = new String("s"), memory needs to be allocated from two places: memory is allocated for the String object in the heap, and

is a reference in the stack (the memory address of this heap object, that is, the pointer ) allocates memory, as shown in the figure below.

## The JAVA virtual machine has an instruction to allocate a new object in the heap, but there is no instruction to release the memory, just as you cannot explicitly release an object using the Java code area. The virtual machine itself is responsible for deciding how and when to release the memory occupied by objects that are no longer referenced by the running program. Usually, the virtual machine leaves this task to the garbage collector (Garbage Collection). Its related setting parameters:
• -Xms -- Set the initial size of heap memory

• -Xmx -- Set the maximum value of heap memory

• -XX:MaxTenuringThreshold -- Set the number of times an object survives in the new generation

• -XX:PretenureSizeThreshold -- Set large objects exceeding the specified size to be directly allocated in the old generation

The Java heap is the main area managed by the garbage collector, so it is also called the "GC Heap" (Garbage Collectioned Heap). Today's garbage collectors basically use generational collection algorithms, so the Java heap can be subdivided into: Young Generation and Old Generation, as shown in the figure below. The idea of ​​generational collection algorithm: The first way is to scan and recycle young objects (young generation) at a higher frequency. This is called minor collection, while the frequency of checking and recycling old objects (old generation) is lower. A lot, called a major collection. In this way, there is no need to check all objects in the memory every time GC is used, so as to make more system resources available to the application system. In other words, when the allocated object encounters insufficient memory, the new generation will be GCed first. (Young GC); when the new generation GC still cannot meet the memory space allocation requirements, GC (Full GC) will be performed on the entire heap space and method area.

Some readers may have questions here: Remember there is a permanent generation? Doesn’t it belong to the Java heap? Dear, you got it right! In fact, the legendary permanent generation is the method area mentioned above, which stores some type information (including class information, constants, static variables, etc.) loaded by the loader when the jvm is initialized. The life cycle of this information is relatively long, and GC does not PermGen Space will be cleaned during the running of the main program, so if there are many CLASS in your application, PermGen Space errors are likely to occur. Its related setting parameters:

• -XX:PermSize --Set the initial size of the Perm area

• -XX:MaxPermSize --Set the maximum value of the Perm area

 ​ New Generation (Young Generation) is divided into: Eden area and Survivor area. The Survivor area is divided into From Space and To Space. The Eden area is where the object is initially allocated; by default, the areas of From Space and To Space are equal in size. When the JVM performs Minor GC, it copies the surviving objects in Eden to the Survivor area, and also copies the surviving objects in the Survivor area to the Tenured area. In this GC mode, in order to improve GC efficiency, the JVM divides Survivor into From Space and To Space, so that object recycling and object promotion can be separated. There are two related parameters for setting the size of the new generation:

• -Xmn -- Set the memory size of the new generation.

• -XX:SurvivorRatio -- Set the size ratio between Eden and Survivor space

            Old Generation (Old Generation): When the OLD area space is not enough, the JVM will The OLD area undergoes major collection; after complete garbage collection, if the Survivor and OLD areas still cannot store some objects copied from Eden, causing the JVM to be unable to create a memory area for new objects in the Eden area, an "Out of memory error" will occur.

3. In-depth analysis of String type

Let’s start with Java data types! Java data types are generally divided into two categories (with various classification methods): basic types and reference types. Variables of basic types hold primitive values, and variables of reference types usually represent references to actual objects. Its value is usually the memory address of the object.

1. The essence of String

Open the source code of String, there is such a passage in the classComments "Strings are constant; their values ​​cannot be changed after they are created. String buffers support mutable strings. Because String objects are immutable they can be shared.". This sentence summarizes one of the most important features of String: String is a constant whose value is immutable (immutable) and is thread-safe (can be shared).

Next, the String class uses the final modifier, indicating the second characteristic of the String class: the String class cannot be inherited.

The following is the member variable definition of the String class, which clarifies that the String value is immutable (immutable) from the implementation of the class. ​

 private final char value[];
  private final int count;

Therefore, we look at the concat method of String class. The first step to implement this method must be to expand the capacity of the member variable value. The expansion method redefines a large-capacity character array buf. The second step is to copy the characters in the original value to buf, and then copy the string value that needs to be concated into buf. In this way, buf contains the string value after concat. Here is the key to the problem. If the value is not final, directly point the value to buf, and then return this, you are done. There is no need to return a new String object. but. . . pity. . . Since value is final, it cannot point to the newly defined large-capacity array buf. What should we do? "return new String(0, count + otherLen, buf);", this is the last statement of the String class concat implementation method, and returns a new String object. Now the truth is revealed!

       Summary: String is essentially a character array, with two characteristics: 1. This class cannot be inherited; 2. Immutable.

2. The definition method of String

Before discussing the definition method of String, first understand the concept of constant pool. The method is introduced earlier. It was already mentioned in the district. Let’s give a slightly formal definition.


The constant pool refers to some data that is determined during compilation and saved in the compiled .class file. It includes constants in classes, methods, interfaces, etc., as well as string constants. The constant pool is also dynamic, and new constants can be put into the pool during runtime. The intern() method of the String class is a typical application of this feature. Don’t you understand? The intern method will be introduced later. The virtual machine maintains a constant pool for each loaded type. The pool is an ordered collection of constants used by the type, including direct constants (string, integer, and float constants) and symbolic references to other types, fields, and methods ( What is the difference between it and object reference? Readers can figure it out themselves).


## String The definition method summarizes a total of three ways:

• Use the keyword New, such as: String S1 = New String ("MyString" " );

•             Direct definition, such as: String s1 = "myString"; I won’t go into details here.

The first way is to define the process through the keyword new: During program compilation, the compiler first checks the string constant pool to see if "myString" exists. If it does not exist, open a memory in the constant pool. The space stores "myString"; if it exists, there is no need to re-open the space to ensure that there is only one "myString" constant in the constant pool, saving memory space. Then open up a space in the memory heap to store the new String instance. Create a space in the stack and name it "s1". The stored value is the memory address of the String instance in the heap. This process is to point the reference s1 to the new instance. String instance.

Everyone, the most ambiguous part has arrived! What is the relationship between the new instance in the heap and "myString" in the constant pool? We will return to analyze this issue after we have analyzed the second way of definition.

The second way is to directly define the process: during program compilation, the compiler first checks the string constant pool to see if "myString" exists. If it does not exist, open a memory in the constant pool. The space stores "myString"; if it exists, there is no need to re-open the space. Then open up a space in the stack, name it "s1", and store the value as the memory address of "myString" in the constant pool. What is the difference between string constants in the constant pool and String objects in the heap? Why can a directly defined string also call various methods of the String object?

With many questions, I will discuss with you the relationship between String objects in the heap and String constants in the constant pool. Please remember that this is just a discussion, because I am also relatively vague about this topic. .

       第一种猜想:因为直接定义的字符串也可以调用String对象的各种方法,那么可以认为其实在常量池中创建的也是一个String实例(对象)。String s1 = new String("myString");先在编译期的时候在常量池创建了一个String实例,然后clone了一个String实例存储在堆中,引用s1指向堆中的这个实例。此时,池中的实例没有被引用。当接着执行String s1 = "myString";时,因为池中已经存在“myString”的实例对象,则s1直接指向池中的实例对象;否则,在池中先创建一个实例对象,s1再指向它。如下图所示: 

       这种猜想认为:常量池中的字符串常量实质上是一个String实例,与堆中的String实例是克隆关系。

       第二种猜想也是目前网上阐述的最多的,但是思路都不清晰,有些问题解释不通。下面引用《JAVA String对象和字符串常量的关系解析》一段内容。

       在解析阶段,虚拟机发现字符串常量"myString",它会在一个内部字符串常量列表中查找,如果没有找到,那么会在堆里面创建一个包含字符序列[myString]的String对象s1,然后把这个字符序列和对应的String对象作为名值对( [myString], s1 )保存到内部字符串常量列表中。如下图所示: 

            如果虚拟机后面又发现了一个相同的字符串常量myString,它会在这个内部字符串常量列表内找到相同的字符序列,然后返回对应的String对象的引用。维护这个内部列表的关键是任何特定的字符序列在这个列表上只出现一次。
           例如,String s2 = "myString",运行时s2会从内部字符串常量列表内得到s1的返回值,所以s2和s1都指向同一个String对象。

           这个猜想有一个比较明显的问题,红色字体标示的地方就是问题的所在。证明方式很简单,下面这段代码的执行结果,javaer都应该知道。          

String s1 = new String("myString");
  String s2 = "myString";
  System.out.println(s1 == s2); //按照上面的推测逻辑,那么打印的结果为true;而实际上真实的结果是false,因为s1指向的是堆中String对象,而s2指向的是常量池中的String常量。

           虽然这段内容不那么有说服力,但是文章提到了一个东西——字符串常量列表,它可能是解释这个问题的关键。

           文中提到的三个问题,本文仅仅给出了猜想,具体请自己考证!

• 堆中new出来的实例和常量池中的“myString”是什么关系呢?

• 常量池中的字符串常量与堆中的String对象有什么区别呢?

• 为什么直接定义的字符串同样可以调用String对象的各种方法呢?  

    3、String、StringBuffer、StringBuilder的联系与区别

        上面已经分析了String的本质了,下面简单说说StringBuffer和StringBuilder。

     StringBuffer和StringBuilder都继承了抽象类AbstractStringBuilder,这个抽象类和String一样也定义了char[] value和int count,但是与String类不同的是,它们没有final修饰符。因此得出结论:String、StringBuffer和StringBuilder在本质上都是字符数组,不同的是,在进行连接操作时,String每次返回一个新的String实例,而StringBuffer和StringBuilder的append方法直接返回this,所以这就是为什么在进行大量字符串连接运算时,不推荐使用String,而推荐StringBuffer和StringBuilder。那么,哪种情况使用StringBuffe?哪种情况使用StringBuilder呢?        

     关于StringBuffer和StringBuilder的区别,翻开它们的源码,下面贴出append()方法的实现。    

             

The first picture above is the implementation of the append() method in StringBuffer, and the second picture is the implementation of append() in StringBuilder. The difference should be clear at a glance. StringBuffer adds a synchronized modification before the method, which plays a synchronization role and can be used in a multi-threaded environment. The price paid for this is reduced execution efficiency. Therefore, if you can use StringBuffer for string connection operations in a multi-threaded environment, it is more efficient to use StringBuilder in a single-threaded environment.

The above is the detailed content of Detailed explanation of String class in Java. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn