Home  >  Article  >  Backend Development  >  .net garbage collection (GC) principle

.net garbage collection (GC) principle

伊谢尔伦
伊谢尔伦Original
2016-11-26 10:22:042055browse

As part of the advanced content of .NET, the garbage collector (GC for short) is something you must understand. In line with the principle of "easy to understand", this article will explain the working principle of the garbage collector in the CLR.

Basic knowledge

Managed Heap

Let’s first look at MSDN’s explanation: When initializing a new process, the runtime will reserve a continuous address space area for the process. This reserved address space is called the managed heap.

"A managed heap is also a heap", why do you say this? I say this because I hope everyone will not be confused by "terminology". The premise of this knowledge point is "the difference between value types and reference types." It is assumed here that the reader already knows the important concept of "value types are stored on the stack, and reference types are stored on the heap. (References of reference types are stored on the stack)". So, according to this theory, the CLR requires that all resources, except value types, be allocated from the managed heap.

The managed heap maintains a pointer, here named NextObjPtr, which points to the allocation location of the next object in the heap.

.net garbage collection (GC) principle

CPU Register

This is basic computer knowledge. Review it here to help understand the following "root" concepts.

CPU registers are the CPU's own "temporary memory", which is faster than memory access. In order of distance from the CPU, the closest ones are registers, then cache (computer level one, level two, and level three cache), and finally memory.

Roots

Any static fields defined in the class, method parameters, local variables (only reference type variables), etc. are roots. In addition, the object pointers in the CPU register are also roots. Roots are the various entry points that the CLR can find outside of the heap.

.net garbage collection (GC) principle

Objects reachable and unreachable

If a root refers to an object in the heap, the object is "reachable", otherwise it is "unreachable".

The reason for garbage collection

From the perspective of computer composition, all programs must reside in memory and run. And memory is a limiting factor (size). In addition to this, the managed heap also has size limits. If there is no size limit on the managed heap, the execution speed of C# will be better than that of c (the structure of the managed heap allows it to allocate objects faster than the c runtime heap). Due to address space and storage limitations, the managed heap must maintain its normal operation through a garbage collection mechanism to ensure that objects are allocated without "memory overflow".

Basic principles of garbage collection

Recycling is divided into two stages: Marking –> Compression

The process of marking is actually the process of determining whether the object is reachable. When all roots have been checked, the heap will contain reachable (marked) and unreachable (unmarked) objects.

After marking is completed, enter the compression stage. During this phase, the garbage collector linearly traverses the heap to find contiguous blocks of memory for unreachable objects. And move reachable objects here to compact the heap. This process is somewhat similar to defragmenting disk space.

.net garbage collection (GC) principle

As shown in the picture above, the green box indicates reachable objects, and the yellow box indicates unreachable objects. After unreachable objects are cleared, moving reachable objects achieves memory compression (becoming more compact).

After compaction, the variables and CPU registers "pointers to these objects" are now invalid and the garbage collector must revisit all roots and modify them to point to the new memory locations of the objects. This can cause a significant performance penalty. This loss is also the main disadvantage of the managed heap.

Based on the above characteristics, the recycling algorithm caused by garbage collection is also a research topic. Because if you really wait until the managed heap is full before starting garbage collection, it will be really "slow".

Garbage Collection Algorithm – Generation Algorithm

Generation is a mechanism used by the CLR garbage collector. Its only purpose is to improve the performance of the application. Generational recycling is obviously faster than recycling the entire heap.

CLR managed heap supports 3 generations: Generation 0, Generation 1, and Generation 2. The space of generation 0 is about 256KB, generation 1 is about 2M, and generation 2 is about 10M. The newly constructed objects will be allocated to generation 0. As shown in the figure above, when the space of generation 0 is full, the garbage collector starts recycling, and unreachable objects (C and E in the figure above) will be recycled. , the surviving objects are classified as the 1st generation.

.net garbage collection (GC) principle

When the 0th generation space is full and the 1st generation also begins to have many unreachable objects and the space is almost full, then both generations of garbage will be recycled. For surviving objects (reachable objects), generation 0 is promoted to generation 1, and generation 1 is promoted to generation 2.

The actual CLR generation collection mechanism is more "intelligent". If the newly created object has a short life cycle, the 0th generation garbage will be recycled immediately by the garbage collector (without waiting for the space to be fully allocated). In addition, if generation 0 is recycled and it is found that there are still many objects "reachable" and .net garbage collection (GC) principle

has not released much memory, the budget of generation 0 will be increased to 512KB, and the recycling effect will be transformed into: the number of garbage collections will be reduced, but a large amount of memory will be reclaimed each time. If not much memory has been released, the garbage collector will perform a

full collection (3 generations), and if it is still not enough, a "memory overflow" exception will be thrown.

In other words, the garbage collector will dynamically adjust the allocated space budget of each generation based on the size of the recovered memory! Achieve automatic optimization!

Summary

There is a basic idea behind garbage collection: programming languages ​​(most of them) always seem to have access to unlimited memory. And developers can keep allocating, allocating, and allocating—like magic, inexhaustible.

The basic working principle of the .NET garbage collector is: clear unreachable objects through the most basic mark and clear principle; then compress and organize available memory like disk defragmentation; and finally optimize performance through generational algorithms.

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn