Home  >  Article  >  Backend Development  >  .Net Garbage Collection and Large Object Processing

.Net Garbage Collection and Large Object Processing

黄舟
黄舟Original
2017-02-17 11:17:371305browse

English original text: Maoni Stephens, compiled by: Zhao Yukai (@玉开Sir)

The CLR garbage collector divides objects according to the size of the space they occupy. There is a big difference in how large objects and small objects are handled. For example, memory defragmentation - moving large objects in memory is expensive. Let's study how the garbage collector handles large objects and what potential impact large objects have on program performance.

Large Object Heap and Garbage Collection

In .Net 1.0 and 2.0, if the size of an object exceeds 85000byte, it is considered A large object. This number is based on experience with performance optimization. When the memory size requested by an object reaches this threshold, it will be allocated on the large object heap. What does this mean? To understand this, we need to understand the .Net garbage collection mechanism.

As most people know, .Net GC collects objects based on "generations". There are three generations of objects in the program, generation 0, generation 1 and generation 2. Generation 0 is the youngest object, and generation 2 objects have the longest survival time. GC collects garbage by generation for performance reasons; usually objects will be recycled in generation 0. For example, in an ASP.NET program, objects associated with each request should be recycled at the end of the request. Objects that have not been recycled will become generation 1 objects; that is to say, generation 1 objects are a buffer between resident memory objects and objects that are about to die.

From a generational perspective, large objects belong to generation 2 objects, because large objects are only processed during generation 2 recycling. When a certain generation of garbage collection is executed, the garbage collection of the younger generation will be executed at the same time. For example: when the 1st generation garbage collection is performed, the objects of the 1st generation and the 0th generation will be recycled at the same time. When the 2nd generation garbage collection is performed, the collection of the 1st generation and the 0th generation will be performed. The

generation is where the garbage collector distinguishes memory areas. Logical view. From a physical storage perspective, objects are allocated on different managed heaps. A managed heap is a memory area allocated by the garbage collector from the operating system (by calling the Windows API VirtualAlloc). When the CLR loads memory, it initializes two managed heaps, a large object heap (LOH – large object heap) and a small object pair (SOH – small object heap).

The memory allocation request is to place the managed object on the corresponding managed heap. If the size of the object is less than 85000 bytes, it will be placed in SOH; otherwise, it will be placed in LOH.

For SOH, the object will enter the next generation after performing a garbage collection. That is to say, if the surviving object will enter the second generation when garbage collection is performed for the first time, if the object is still not garbage collected after the second garbage collection, it will become a second-generation object; The 2nd generation object is the oldest object and will not increase the generation.

When garbage collection is triggered, the garbage collector will defragment the small object heap and move the surviving objects together. As for the large object heap, due to the high cost of moving memory, the CLR team chose to just clear them and form a list of recycled objects to meet the next large object request to use memory. Adjacent garbage objects will be merged into A free block of memory.

It should always be noted that until .Net 4.0, the large object heap will not be defragmented, but it may be done in the future. So if you want to allocate large objects and don't want them to be moved, you can use the fixed statement.

The following is a schematic diagram of the recycling of the small object heap SOH



##Before the first garbage collection in the above picture There are four objects obj0-3; after the first garbage collection, obj1 and obj3 were collected, and obj2 and obj0 were moved together; before the second garbage collection, three objects obj4-6 were allocated; in the second After performing garbage collection for the first time, obj2 and obj5 were recycled, and obj4 and obj6 were moved next to obj0.

The following picture is a schematic diagram of large object heap LOH recycling



You can see that garbage collection is not performed Before, there were four objects obj0-3; after the first second-generation garbage collection, obj1 and obj2 were recycled. After the recycling, the spaces occupied by obj1 and obj2 were merged together. When obj4 applied for memory allocation, obj1 was The space released after recycling and obj2 is allocated to it; at the same time, a memory fragment is left. If the size of this fragment is less than 85000 bytes, then this fragment can never be used again during the life cycle of this program.

If there is not enough free memory on the large object heap to accommodate the large object space to be applied for, the CLR will first try to apply for memory from the operating system. If the application fails, it will trigger a second-generation recycling to try to release some memory. .

During 2nd generation garbage collection, unnecessary memory can be returned to the operating system through VirtualFree. Please refer to the figure below for the return process:



When should large objects be recycled?

Before discussing when to recycle large objects, let’s take a look at when ordinary garbage collection operations are performed. Garbage collection occurs under the following circumstances:

1. The requested space exceeds the memory size of generation 0 or the threshold of the large object heap. Most managed heap garbage collection occurs in this case

2 . When the GC.Collect method is called in the program code; if the GC.MaxGeneration parameter is passed in when the GC.Collect method is called, garbage collection of all generation objects will be performed, including garbage collection of the large object heap

3. When the operating system has insufficient memory, when the application receives a high memory notification from the operating system

4. If the garbage collection algorithm believes that second-generation recycling is effective, it will trigger second-generation garbage collection

5. Each generation of object heap has an attribute that occupies a space size threshold. When you allocate objects to a certain generation, you increase the total amount of memory close to the threshold of that generation, or allocate objects that cause this generation to When the heap size exceeds the heap threshold, a garbage collection will occur. Therefore, when you allocate small objects or large objects, it will consume the threshold of the generation 0 heap or the large object heap. When the garbage collector increases the object generation to generation 1 or 2, the threshold of generations 1 and 2 will be consumed. These thresholds change dynamically while the program is running.

Performance impact of large object heap

Let us first look at the cost of allocating large objects. When the CLR allocates memory for each new object, it must ensure that the memory is cleared and not used by other objects (I give out is cleared). This means that the cost of allocation is completely controlled by the cost of clearing (unless a garbage collection is triggered during allocation). If it takes 2 cycles to clear 1 byte, it means that it takes 170,000 cycles to clear a smallest large object. Normally people do not allocate very large objects. For example, allocating a 16M object on a 2GHz machine takes about 16ms to clear the memory. The price is too high.

Let’s take a look at the cost of recycling. As mentioned earlier, large objects are recycled together with 2-generation objects. If the space occupied by a large object or a second-generation object exceeds its threshold, the recycling of the second-generation object will be triggered. If generation 2 recycling is triggered because the large object heap exceeds the threshold, there are not many objects in the generation 2 object heap itself that can be recycled. This is not a big problem if there are not many objects on the 2nd generation heap. However, if the second-generation heap is large and has many objects, excessive second-generation recycling will cause performance problems. If you allocate large objects temporarily, it will take a lot of time to run garbage collection; that is, if you continue to use large objects and then release the large objects, it will have a great negative impact on performance.

Huge objects on the large object heap are usually arrays (it is rare that one object is very large). If the elements in the object are strong references, the cost will be very high; if there are no mutual references between elements, there is no need to traverse the entire array during garbage collection. For example: use an array to save the nodes of a binary tree. One way is to strongly reference the left and right nodes in the node:

class Node
{
Data d;
Node left;
Node right;
}
 
Node[] binaryTree = new Node[num_nodes];

If num_nodes is a large number, it means that each node has at least There are two reference elements that need to be viewed. An alternative is to save the array index numbers of the left and right node elements in the node


class Node
{
Data d;
uint left_index;
uint right_index;
}

In this case, the reference relationship between the elements is removed; you can use binaryTree [left_index] to get the referenced node. The garbage collector no longer needs to look at related reference elements when doing garbage collection.

Collecting performance data for large object heaps

There are several ways to collect performance data related to large object heaps. Before I explain these methods, let's talk about why you need to collect performance data related to large object heaps.

When you start to collect performance data in a certain aspect, it is possible that you have already found evidence of a performance bottleneck in this aspect; or you have not searched all aspects and found no problem.

The .Net CLR Memory performance counters are usually the first tool you should consider when looking for performance problems. Counters related to LOH include generation 2 collectioins (number of generation 2 heap collections) and large object heap size. Generation 2 collections shows the number of generation 2 garbage collection operations that have occurred since the process was started. The Large object heap size counter displays the current size of the large object heap, including free space; this counter is updated after each garbage collection operation, not every time memory is allocated.

You can refer to the figure below to observe .Net CLR Memory related performance data in the windows performance counter


You can also query the values ​​of these counters through programs; many people collect performance counters through programs to help find performance bottlenecks.

Of course, you can also use the debugger winddbg to observe the large object heap.

Final reminder: So far, the large object heap is not defragmented as part of garbage collection, but this is just an implementation detail of clr, and program code should not rely on this feature. If you want to ensure that the object will not be moved by the garbage collector, use the fixed statement.

Original address: http://www.php.cn/

## The above is the .Net garbage collection and Regarding the content of large object processing, please pay attention to the PHP Chinese website (www.php.cn) for more related content!



Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn