Home >Java >javaTutorial >What is the Java garbage collection mechanism?
The author has recently encountered a lot of problems about the garbage collection mechanism in Java, so I specially wrote a blog to share with you what garbage collection in Java is. The so-called garbage collection means that even if the jvm thinks that your object does not need to exist and cleans you out, then a problem arises.
How to determine whether an object needs to be recycled?
How does a typical garbage collection algorithm recycle objects?
What are the typical garbage collectors?
Now let me look at the problems one by one
Here we first understand a problem : What if an object is determined to be "garbage"? Since the task of the garbage collector is to reclaim the space occupied by garbage objects for use by new objects, how does the garbage collector determine that an object is "garbage"? —That is, how to determine whether an object can be recycled. Some objects are out of JVM memory and need to be cleaned up. The objects that need to be recycled in the next round will be cleaned up.
In Java, objects are associated through references, which means that if you want to operate an object, you must do it through references. So obviously a simple way is to use reference counting to determine whether an object can be recycled. Without loss of generality, if an object does not have any references associated with it, it means that the object is basically unlikely to be used elsewhere, and then the object becomes a recyclable object. This method is called reference counting.
This method is simple and crude, and very efficient. High efficiency will inevitably expose some problems. If some objects have circular references, even if you assign the object to null, this algorithm still cannot be recycled. Look at the following code
public class GcTest {public Object object = null; public static void main(String[] args) { GcTest gcTest1 = new GcTest(); GcTest gcTest2 = new GcTest(); gcTest1.object = gcTest1; gcTest2.object = gcTest2; gcTest1 = null; gcTest2 = null; } }
Although gcTest1 and gcTest2 are null, the objects they point to will no longer be accessed, but because they refer to each other, resulting in None of their reference counts are 0, so the garbage collector will never reclaim them.
The above problem has been exposed. Let’s see how jvm solves this problem. In order to solve this problem, reachability analysis method is adopted in Java. The basic idea of this method is to search through a series of "GC Roots" objects as a starting point. If there is no reachable path between "GC Roots" and an object, the object is said to be unreachable. However, it should be noted that Objects that are determined to be unreachable do not necessarily become recyclable objects. An object that is determined to be unreachable must go through at least two marking processes in order to become a recyclable object. If there is still no possibility of escaping and becoming a recyclable object during these two marking processes, it basically becomes a recyclable object. . The "In-depth Understanding of JVM" explains it very carefully. The author will briefly introduce the concept of GC Roots. If you want to know more about it, you can read the book introduced by the author.
The following three types of objects are used as GC roots in jvm to determine whether an object can be recycled (usually we only need to know the virtual machine stack and static references)
1. Objects referenced in the virtual machine stack (JVM stack) (to be precise, stack frames in the virtual machine stack). We know that when each method is executed, the jvm will create a corresponding stack frame (the stack frame includes references to the operand stack, local variable table, and runtime constant pool). The stack frame contains all the information used inside the method. The reference of the object (and of course other basic type data). When the method is executed, the stack frame will be popped from the virtual machine stack. In this way, the reference of the temporarily created object will no longer exist, or there will be no Any gc roots point to these temporary objects, and these objects will be recycled during the next GC
2. Objects referenced by class static attributes in the method area. Static properties are properties of this type (class) and do not belong to any instance alone, so this property will naturally serve as gc roots. As long as this class exists, the object pointed to by this reference will also exist. class will also be recycled, which will be explained later
3. Objects referenced by the native method stack (Native Stack)
The following is an introduction to soft references (softReference) and weak reference (weakReference) objects are processed by garbage collection
String str = new String("hello");//A SoftReference<String> sr = new SoftReference<String>(new String("java"));//B WeakReference<String> wr = new WeakReference<String>(new String("world"));//C
The recycling situation of the above objects is as follows. B will determine the String object as a recyclable object when there is insufficient memory, and C will determine the String object as a recyclable object under any circumstances. In other words, soft references will be recycled when memory overflows (OOM), while weak references will be recycled in the next round of recycling no matter what.
Generally, jvm will recycle these objects
1. Explicitly assign a reference to null or point a reference that already points to an object to a new object.
2. The object pointed to by the local reference.
3. The weak reference mentioned above (weakReference).
After determining which garbage can be recycled, what the garbage collector has to do is to start garbage collection, but there is a question involved: how Efficient garbage collection. Since the Java virtual machine specification does not clearly stipulate how to implement a garbage collector, virtual machines from various manufacturers can implement garbage collectors in different ways. Take the most commonly used HotShot as an example, so we will only discuss it here. The core ideas of several common garbage collection algorithms.
This is the most basic garbage collection algorithm. The reason why it is the most basic is because it is the easiest to implement and the most thought-provoking. simple. The mark-sweep algorithm is divided into two phases: the mark phase and the clear phase. The task of the marking phase is to mark all objects that need to be recycled, and the clearing phase is to recover the space occupied by the marked objects. The diagram comes from the Internet and illustrates the memory distribution before and after processing of the mark-clear algorithm.
All the pictures below are simulated memory blocks. Red represents unused memory blocks, gray represents memory blocks for objects to be recycled, and yellow represents surviving objects
Before recycling
After recycling
It is easy to see that such an operation has disadvantages. Let’s put it this way After the marked objects are cleared, the memory blocks become scattered. If there is an object occupying a large amount of memory, a garbage collection must be performed at this time to make room for this large object.
In order to solve the shortcomings of the Mark-Sweep algorithm, the Copying algorithm was proposed. It divides the available memory into two equal-sized blocks according to capacity, and only uses one of them at a time. When this block of memory is used up, copy the surviving objects to another block, and then clean up the used memory space at once, so that the problem of memory fragmentation is less likely to occur.
Before recycling
After recycling
## The copy algorithm will empty the general memory in advance. During garbage collection, the surviving objects are moved to the other half of the memory. This memory movement consumes too much. Although the memory is not fragmented, the cost is too high. 3. Mark-Compact (Mark-Compact) Algorithm In order to solve the shortcomings of the Copying algorithm and make full use of the memory space, the Mark-Compact algorithm is proposed. The marking phase of this algorithm is the same as Mark-Sweep, but after completing the marking, it does not directly clean up the recyclable objects, but moves the surviving objects to one end, and then cleans up the memory outside the end boundary. The specific process is shown in the figure below: Before recycling After recycling 4. Generational Collection (generational collection) algorithm The generational collection algorithm is the algorithm currently used by most JVM garbage collectors. Its core idea is to divide the memory into several different areas according to the life cycle of the object. Under normal circumstances, the heap area is divided into the Tenured Generation and the Young Generation. The characteristic of the Old Generation is that only a small number of objects need to be recycled during each garbage collection, and not all objects need to be recycled. The characteristics of the Young Generation are There are a large number of objects that need to be recycled during each garbage collection, so the most suitable collection algorithm can be adopted according to the characteristics of different generations. You can call the System.gc() method to check the recycling situation.At present, most garbage collectors adopt the Copying algorithm for the new generation, because most objects must be recovered for each garbage collection in the new generation, which means that the number of copying operations is less, but in practice it is not based on 1 :1 ratio to divide the space of the new generation. Generally speaking, the new generation is divided into a larger Eden space and two smaller Survivor spaces. Each time the Eden space and one of the Survivor spaces are used, when proceeding When recycling, copy the surviving objects in Eden and Survivor to another Survivor space, and then clean up Eden and the Survivor space just used.
Since the characteristic of the old generation is that only a small number of objects are recycled each time, the Mark-Compact algorithm is generally used.
Note that there is another generation outside the heap area, which is the permanent generation (Permanet Generation), which is used to store classes, constants, method descriptions, etc. The recycling of the permanent generation mainly recycles two parts: abandoned constants and useless classes.
The following are some probabilistic things. The author doesn’t seem to understand them, so I just moved them here and shared them with you
The Serial/Serial Old collector is the most basic and oldest collector. It is a single-threaded collector, and when it performs garbage collection, all user threads must be suspended. The Serial collector is a collector for the new generation and uses the Copying algorithm. The Serial Old collector is a collector for the old generation and uses the Mark-Compact algorithm. Its advantage is that it is simple and efficient to implement, but its disadvantage is that it will cause pauses for users.
The ParNew collector is a multi-threaded version of the Serial collector, using multiple threads for garbage collection.
Parallel Scavenge collector is a new generation multi-threaded collector (parallel collector). It does not need to suspend other user threads during recycling. It uses Copying algorithm, this collector is different from the previous two collectors. It is mainly to achieve a controllable throughput.
Parallel Old is the old generation version of the Parallel Scavenge collector (parallel collector), using multi-threading and Mark-Compact algorithm.
The CMS (Current Mark Sweep) collector is a collector that aims to obtain the shortest recovery pause time. It is a concurrent collector that uses Mark -Sweep algorithm.
The G1 collector is the most cutting-edge development of today's collector technology. It is a collector for server-side applications and can make full use of multi-CPU and multi-core environments. . It is therefore a parallel and concurrent collector, and it models predictable pause times.
Generally speaking, the memory allocation of objects is allocated on the heap. Objects are mainly allocated in the Eden Space and From Space of the new generation. In a few cases, Directly allocated in the old generation. If there is insufficient space in the new generation's Eden Space and From Space, a GC will be initiated. If after the GC, Eden Space and From Space can accommodate the object, it will be placed in Eden Space and From Space. During the GC process, the surviving objects in Eden Space and From Space will be moved to To Space, and then Eden Space and From Space will be cleaned. If there is not enough To Space to store an object during the cleanup process, the object will be moved to the old generation. After GC is performed, Eden space and To Space are used. The surviving objects will be copied to From Space during the next GC, and the cycle repeats. When an object escapes a GC in the Survivor area, its object age will be increased by 1. By default, if the object's age reaches 15 years old, it will be moved to the old generation.
Generally speaking, large objects will be allocated directly to the old generation. The so-called large objects refer to objects that require a large amount of continuous storage space. The most common large object is a large array, such as:
byte[] data = new byte[4*1024*1024]
This usually allocates storage space directly in the old generation.
Of course, the allocation rules are not 100% fixed. It depends on which garbage collector combination and JVM related parameters are currently used.
The above is the detailed content of What is the Java garbage collection mechanism?. For more information, please follow other related articles on the PHP Chinese website!