GC Algorithm Garbage Collector
Overview
Garbage Collection Garbage Collection is often called "GC". It was born in MIT's Lisp language in 1960. After more than half a century, it is now very mature.
In jvm, the program counter, virtual machine stack, and local method stack are all born and destroyed with the thread. The stack frame is pushed and popped as the method enters and exits, realizing automatic memory cleaning. Therefore, our memory garbage collection is mainly concentrated in the Java heap and method area. During the running of the program, the allocation and use of this part of memory are dynamic.
Object survival judgment
Generally there are two ways to judge whether an object is alive. Methods:
Reference counting: Each object has a reference count attribute. When a reference is added, the count increases by 1, when the reference is released, the count decreases by 1. When the count reaches 0, it can be recycled. This method is simple and cannot solve the problem of objects circularly referencing each other.
Reachability Analysis: Search downward from GC Roots, and the path traveled by the search is called a reference chain. When an object does not have any reference chain connected to GC Roots, it proves that the object is unavailable. Unreachable object.
In Java language, GC Roots include:
Objects referenced in the virtual machine stack.
The object referenced by the class static property entity in the method area.
The object referenced by the constant in the method area.
Object referenced by JNI in the local method stack.
Garbage Collection Algorithm
Mark-Sweep Algorithm
"Mark-Sweep" algorithm, as its name suggests, the algorithm is divided into two stages: "mark" and "sweep": first mark out All objects that need to be recycled will be recycled uniformly after the marking is completed. The reason why it is said to be the most basic collection algorithm is that subsequent collection algorithms are based on this idea and improve its shortcomings.
Its main disadvantages are two: one is the efficiency problem, the marking and clearing processes are not very efficient; the other is the space problem. After marking and clearing, a large number of discontinuous memory fragments will be generated. Too many space fragments may cause , when the program needs to allocate larger objects in the future, it cannot find enough contiguous memory and has to trigger another garbage collection action in advance.
Copying Algorithm
"Copying" (Copying) collection algorithm, it divides the available memory into two equal-sized blocks according to capacity, and only uses one of them at a time. When this block of memory runs out, copy the surviving objects to another block, and then clean up the used memory space at once.
This allows one block of memory to be recycled every time. There is no need to consider complex situations such as memory fragmentation when allocating memory. Just move the top pointer of the heap and allocate memory in order. It is simple to implement and efficient to run. It's just that the cost of this algorithm is to reduce the memory to half of its original size, and continuous copying of long-lived objects will lead to reduced efficiency.
Mark-Compression Algorithm
The copy collection algorithm will perform more copy operations when the object survival rate is high, and the efficiency will become lower. More importantly, if you don't want to waste 50% of the space, you need to have additional space for allocation guarantee to cope with the extreme situation where all objects in the used memory are 100% alive, so this method generally cannot be directly used in the old generation. algorithm.
Based on the characteristics of the old generation, someone has proposed another "Mark-Compact" algorithm. The marking process is still the same as the "Mark-Clear" algorithm, but the subsequent steps are not to directly clean up the recyclable objects. Instead, all surviving objects are moved to one end, and then the memory outside the end boundary is directly cleaned up
Generational collection algorithm
The basic assumption of GC generation: the life cycle of most objects is very short. Short survival time.
The "Generational Collection" algorithm divides the Java heap into the new generation and the old generation, so that the most appropriate collection algorithm can be used according to the characteristics of each generation. In the new generation, every time a garbage collection is performed, a large number of objects are found to have died, and only a few survive. Then a copy algorithm is used, and the collection can be completed only by paying the cost of copying a small number of surviving objects. In the old generation, because the object survival rate is high and there is no extra space to guarantee its allocation, the "mark-clean" or "mark-clean" algorithm must be used for recycling.
Garbage Collector
If the collection algorithm is the methodology of memory recycling, the garbage collector is the specific implementation of memory recycling
Serial Collector
The serial collector is the oldest, most stable and efficient The collector may produce long pauses and only use one thread to collect. The new generation and the old generation use serial recycling; the new generation copy algorithm, the old generation mark-compression; Stop The World (service suspension) will be stopped during the garbage collection process
Parameter control: -XX:+UseSerialGC Serial collector
ParNew Collector
ParNew Collector is actually a multi-threaded version of Serial Collector. New generation parallel, old generation serial; new generation copy algorithm, old generation mark-compression
Parameter control: -XX:+UseParNewGC ParNew collector
-XX:ParallelGCThreads Limit the number of threads
Parallel Collector
Parallel Scavenge collector is similar to ParNew collector. Parallel collector pays more attention to the throughput of the system. The adaptive adjustment strategy can be turned on through parameters. The virtual machine will collect performance monitoring information based on the current operating conditions of the system and dynamically adjust these parameters to provide the most appropriate pause time or maximum throughput; the GC time can also be controlled through parameters to not exceed How many milliseconds or ratio; new generation copy algorithm, old generation mark-compression
Parameter control: -XX:+UseParallelGC Use Parallel collector + old generation serial
Parallel Old collector
Parallel Old is Parallel Scavenge collection An old generation version of the server, using multi-threading and a "mark-and-sort" algorithm. This collector only started to provide
parameter control in JDK 1.6: -XX:+UseParallelOldGC Use Parallel collector + old generation parallel
CMS collector
CMS (Concurrent Mark Sweep) collector is a Get the collector that targets the shortest collection pause time. At present, a large part of Java applications are concentrated on the servers of Internet websites or B/S systems. Such applications pay special attention to the response speed of the service and hope that the system pause time will be the shortest to provide users with a better experience.
It can be seen from the name (including "Mark Sweep") that the CMS collector is implemented based on the "mark-sweep" algorithm. Its operation process is more complicated than the previous collectors. The entire process Divided into 4 steps, including:
Initial mark (CMS initial mark)
Concurrent mark (CMS concurrent mark)
Remark (CMS remark)
Concurrent sweep (CMS concurrent sweep)
Among them, initial mark, re-mark Marking both steps still requires "Stop The World". The initial marking only marks the objects that GC Roots can directly associate with, and it is very fast. The concurrent marking phase is the process of GC Roots Tracing, and the re-marking phase is to correct the marking caused by the user program continuing to operate during the concurrent marking period. For the marking record of the part of the object that has changed, the pause time in this phase is generally slightly longer than the initial marking phase, but much shorter than the concurrent marking time.
Since the collector thread can work together with the user thread during the longest concurrent marking and concurrent clearing processes in the entire process, in general, the memory recycling process of the CMS collector is executed concurrently with the user thread. Old generation collector (ParNew is used in the new generation)
Advantages: concurrent collection, low pauses
Disadvantages: generating a lot of space fragments, the concurrent phase will reduce throughput
Parameter control: -XX:+UseConcMarkSweepGC Use CMS collector
-XX:+ UseCMSCompactAtFullCollection Full GC后,进行一次碎片整理;整理过程是独占的,会引起停顿时间变长
-XX:+CMSFullGCsBeforeCompaction 设置进行几次Full GC后,进行一次碎片整理
-XX: ParallelCMSThreads Set the number of CMS threads (generally approximately equal to the number of available CPUs)
G1 Collector
G1 is one of the most cutting-edge achievements of current technology development. The mission given to it by the HotSpot development team is to replace the CMS collector released in JDK1.5 in the future. Compared with the CMS collector, the G1 collector has the following characteristics:
1. Space integration. The G1 collector uses a mark sorting algorithm and will not generate memory space fragmentation. When allocating large objects, the next GC will not be triggered in advance because contiguous space cannot be found.
2. Predictable pauses, which is another major advantage of G1. Reducing pause time is a common focus of G1 and CMS. However, in addition to pursuing low pauses, G1 can also establish a predictable pause time model, allowing users to The author explicitly specifies that within a time segment of length N milliseconds, the time spent on garbage collection shall not exceed N milliseconds. This is almost a characteristic of the real-time Java (RTSJ) garbage collector.
The garbage collector mentioned above only collects the entire new generation or old generation, but G1 is no longer like this. When using the G1 collector, the memory layout of the Java heap is very different from other collectors. It divides the entire Java heap into multiple independent regions (Regions) of equal size. Although the concepts of the new generation and the old generation are still retained, But the new generation and the old generation are no longer physically separated. They are both a collection of partial (which may be discontinuous) Regions.
G1’s new generation collection is similar to ParNew. When the new generation occupancy reaches a certain proportion, collection begins. Similar to CMS, there will be a short pause when the G1 collector collects old generation objects.
Collection steps:
1. Marking phase, first initial mark (Initial-Mark), this phase is paused (Stop the World Event), and a normal Mintor GC will be triggered. Corresponding to GC log: GC pause (young) (initial-mark)
2, Root Region Scanning, the survivor area will be recycled during program running (survive to the old generation), this process must be completed before young GC.
3. Concurrent Marking, performs concurrent marking on the entire heap (executed concurrently with the application), this process may be interrupted by young GC. During the concurrent marking phase, if all objects in the area object are found to be garbage, the area will be recycled immediately (X in the picture). At the same time, during the concurrent marking process, the object activity of each region (the proportion of surviving objects in the region) is calculated.
4. Remark, there will be a short pause (STW). The re-marking phase is used to collect new garbage generated by the concurrent marking phase (the concurrent phase runs together with the application); G1 uses an initial snapshot algorithm that is faster than CMS: snapshot-at-the-beginning (SATB).
5. Copy/Clean up, multi-thread clearing inactive objects, there will be STW. G1 copies the surviving objects in the recycling area to the new area, clears the Remember Sets, and concurrently clears the recycling area and returns it to the free area linked list.
6. After the copy/clear process. Active objects in the recycling area have been concentrated into dark blue and dark green areas.
Commonly used collector combinations