Home  >  Article  >  Java  >  Things about Java GC (2)

Things about Java GC (2)

黄舟
黄舟Original
2017-02-22 10:09:181503browse

Collection algorithm

The main garbage collection algorithms are: mark-clear, copy and mark-organize.

1. Mark-clear algorithm

Mark the objects to be recycled.

Disadvantages of the algorithm: efficiency problems, the marking and clearing processes are very inefficient; space problems, a large number of memory fragments will be generated after collection, which is not conducive to the allocation of large objects.

2. Copy algorithm

The copy algorithm divides the available memory into two equal-sized blocks A and B. Only one of them is used at a time. When the memory of A is used After that, copy the surviving objects to B and clear the memory of A. This not only improves the efficiency of marking, because only the surviving objects need to be marked, but also avoids the problem of memory fragmentation at the cost of reducing the available memory to half of the original size. .

3. Marking-collation algorithm

In the old generation, the object survival rate is high, and the efficiency of the replication algorithm is very low. In the mark-compact algorithm, all living objects are marked and moved to one end, and then the memory outside the boundary is directly cleaned.

Object marking process

During the reachability analysis process, in order to accurately find out the objects associated with GC Roots, the entire execution engine must look like Being frozen at a certain point in time means that all running threads are suspended, and the reference relationship of the object cannot continue to change.

How to quickly enumerate GC Roots?

GC Roots are mainly in global references (constants or class static attributes) and execution contexts (references in local variable tables). Many applications only have hundreds of megabytes in the method area. If traversed Searching will be very inefficient.

In HotSpot, a set of data structures called OopMap are used for implementation. When the class loading is completed, HotSpot calculates what type of data is at what offset in the object and stores it in OopMap. The local code compiled through JIT will also record which locations in the stack and registers are references. When GC occurs, surviving objects can be quickly identified by scanning the OopMap data.

How to safe GC?

When a thread is running, it can only stop for GC when it reaches the safe point.

Based on the OopMap data structure, HotSpot can quickly complete the traversal of GC Roots. However, HotSpot will not generate the corresponding OopMap for each instruction, and will only record this information at the Safe Point.

So the choice of Safe Point is very important. If it is too few, it may cause the GC to wait too long. If it is too frequent, it may cause runtime performance problems. The execution time of most instructions is very short, and some instructions with long execution time are usually selected as Safe Points, such as method calls, loop jumps, and exception jumps.

For more information about Safe Point, you can read this article: JVM’s Stop The World, Safe Point, Dark Underground World

How to stop all threads when GC occurs Run to the nearest Safe Point and pause?

When GC occurs, the thread is not directly interrupted, but an interrupt flag is simply set. When each thread runs to the Safe Point, it actively polls the interrupt flag. If If the interrupt flag is true, it will interrupt itself.

One issue ignored here is that when GC occurs, running threads can run to the Safe Point and hang, while those threads in Sleep or Blocked state cannot respond to the JVM's interrupt request at this time. It is impossible to go to the Safe Point to suspend. For this situation, you can use the Safe Region to solve the problem.

Safe Region means that in a piece of code, the reference relationship of the object will not change. It is safe to start GC anywhere in this region.

1. When a thread runs into the code of the Safe Region, it first identifies that it has entered the Safe Region. If GC occurs during this period, the JVM will ignore the thread identified as the Safe Region state;

2. When the thread is about to leave the Safe Region, it will check whether the JVM has completed the GC. If it is completed, it will continue to run. Otherwise, the thread must wait until it receives a signal that it can safely leave the Safe Region;

Garbage Collector

The Java virtual machine specification does not stipulate how the garbage collector should be implemented. Users can combine the collectors used in each area according to the system characteristics.

Things about Java GC (2)

The above picture shows 7 collectors of different generations. If there is a connection between two, it means they can be used in combination.

1. Serial collector (serial GC)

Serial is a collector that uses a single thread and works in the new generation based on a copy algorithm. When performing garbage collection , all other worker threads must be suspended. For a single-CPU environment, Serial can perform garbage collection very efficiently because it has no thread interaction overhead. It is the default collector for the new generation in Client mode.

2. ParNew collector (parallel GC)

ParNew is actually a multi-threaded version of serial. In addition to using multiple threads for garbage collection, the rest of the behavior is the same as Same as Serial.

3. Parallel Scavenge collector (parallel recycling GC)

Parallel Scavenge is a collector that uses a multi-threaded copy algorithm and works in the new generation. It focuses on The point is to achieve a controllable throughput, often referred to as a "throughput-first" collector.

Throughput = user code running time / (user code running time + garbage collection time)

Parallel Scavenge provides two parameters for precise control of throughput:

1. -XX: MaxGCPauseMillis sets the maximum pause time for garbage collection

2. -XX: GCTimeRatio sets the throughput size

4. Serial Old collector (serial GC)

Serial Old is a collector that uses a single-threaded mark-sort algorithm and works in the old generation. It is the default collector for the old generation in Client mode.

5. Parallel Old collector (parallel GC)

Parallel Old is a collector that uses multi-threading based on mark-collation algorithm and works in the old generation. In situations where throughput is important and CPU resources are sensitive, the collector combination of Parallel Scavenge and Parallel Old can be given priority.

6. CMS Collector (Concurrent GC)

CMS (Concurrent Mark Sweep) is a collector that aims to obtain the shortest recycling pause time and works in The old generation is implemented based on the "mark-clear" algorithm. The whole process is divided into the following 4 steps:

1, Initial marking: This process only marks the objects that the following GC Roots can be directly associated with. But it will still Stop The World;

2, Concurrency mark: The process of GC Roots Tracing can work with user threads.

3, Remark: Used to correct the part of the record that changes due to the user program continuing to run during concurrent marking. This process will suspend all threads, but the pause time is far longer. The time is shorter than concurrent marking;

4, Concurrent cleanup: Can work with user threads.

Disadvantages of the CMS collector:

1. It is sensitive to CPU resources. In the concurrent phase, although it will not cause user threads to pause, it will occupy part of the thread resources and reduce the total throughput of the system. quantity.

2. Floating garbage cannot be processed. During the concurrent cleanup phase, the running of user threads will still generate new garbage objects. This part of garbage can only be collected in the next GC.

3. CMS is implemented based on the mark-and-clear algorithm, which means that a large number of memory fragments will be caused after the collection is completed, which may lead to a large amount of remaining space in the old generation, but it cannot find a large enough continuous space to allocate the current Object has to trigger a Full GC in advance.

In JDK1.5 implementation, when the old generation space usage reaches 68%, the CMS collector will be triggered. If the old generation in the application does not grow too fast, you can increase the triggering percentage through the -XX:CMSInitiatingOccupancyFraction parameter. , thereby reducing the number of memory recycling times and improving system performance.

In the JDK1.6 implementation, the threshold for triggering the CMS collector has been increased to 92%. If the memory reserved during CMS operation cannot meet the needs of user threads, a "Concurrent Mode Failure" failure will occur. This is The virtual machine starts the Serial Old collector to collect garbage from the old generation. Of course, the pause time of the application will be longer, so this threshold cannot be set too high. If it causes a "Concurrent Mode Failure" failure, it will reduce performance. , as for how to set this threshold, the usage of the old generation space must be monitored for a long time.

7. G1 collector

G1 (Garbage First) is a collector provided by JDK1.7 that works in the new generation and old generation, based on "mark- "Collation" algorithm is implemented to avoid memory fragmentation problems after the collection is completed.

G1 Advantages:

1. Parallelism and concurrency: Make full use of multiple CPUs to shorten the pause time of Stop The World;

2. Generational collection: no other collection is required With cooperation, you can manage the entire Java heap, and use different methods to process newly created objects, objects that have survived for a period of time and have experienced multiple GCs to obtain better collection effects;

3. Space integration: Unlike CMS’s “mark-clear” algorithm, G1 will not generate memory space fragmentation during operation, which is conducive to long-term operation of applications. When allocating large objects, it will not cause the inability to apply for a large enough object. The continuous memory triggers a Full GC in advance;

4. Pause prediction: A predictable pause time model can be established in G1, allowing users to clearly specify the amount of time spent in garbage collection within a time segment of M milliseconds. The time on should not exceed N milliseconds.

When using the G1 collector, the memory layout of the Java heap is very different from other collectors. The entire Java heap will be divided into multiple independent regions of equal size. The new generation and the old generation are no longer Physically isolated, they are all a collection of regions (not necessarily consecutive). G1 will track the garbage collection status of each Region (recycling space size and recycling time), maintain a priority list, and give priority to the Region with the greatest value based on the allowed collection time to avoid full-region garbage collection on the entire Java heap. , ensuring that the G1 collector can collect as much garbage as possible in a limited time.

But here comes the problem: using the G1 collector, an object is allocated in a Region and can have a reference relationship with any object on the Java heap. So how to determine whether an object is alive and whether it is necessary to scan the entire Java heap? In fact, this problem also existed in previous collectors. If you have to scan the old generation at the same time when recycling objects in the new generation, the efficiency of Minor GC will be greatly reduced.

For this situation, the virtual machine provides a solution: the object reference relationship between Regions in the G1 collector and the object reference relationship between the new generation and the old generation in other collectors are saved in Remembered Set data structure, used to avoid full heap scan. Each Region in G1 has a corresponding Remenbered Set. When the virtual machine discovers that the program is writing data of the Reference type, it will generate a Write Barrier to temporarily interrupt the writing operation and check whether the object referenced by the Reference is in the same Region. If not, the relevant reference information is recorded through CardTable into the Remenbered Set of the Region to which the referenced object belongs.

Those things about Java GC (1)

Those things about Java GC (2)

The above is the content of those things about Java GC (2), more related Please pay attention to the PHP Chinese website (www.php.cn) for content!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:Things about Java GC (1)Next article:Things about Java GC (1)