Home  >  Article  >  Backend Development  >  .Net garbage collection mechanism principle (1)

.Net garbage collection mechanism principle (1)

黄舟
黄舟Original
2017-02-17 11:21:291150browse

English original text: Jeffrey Richter

Compiled by: Zhao Yukai

Link: http://www.php.cn/

With the garbage collection mechanism in Microsoft.Net clr, programmers no longer need to pay attention to when to release memory. The matter of releasing memory is completely done by GC and is transparent to programmers. Nonetheless, as a .Net programmer it is necessary to understand how garbage collection works. In this article, we will take a look at how .Net allocates and manages managed memory, and then describe the algorithm mechanism of the garbage collector step by step.
Designing an appropriate memory management strategy for a program is difficult and tedious, and it prevents you from focusing on solving the problem the program itself is trying to solve. Is there a built-in method that can help developers solve memory management problems? Of course, it is GC in .Net, garbage collection.
Let us think about it, every program uses memory resources: such as screen display, network connection, database resources, etc. In fact, in an object-oriented environment, each type needs to occupy some memory resources to store its data. Objects need to use memory according to the following steps:
1. Allocate memory for the type Space
2. Initialize the memory and set the memory to the available state
3. Access the members of the object
4. Destroy the object , making the memory clear
5. Release the memory
This seemingly simple memory usage pattern has caused many program problems. Sometimes programmers may Forget to release objects that are no longer used, and sometimes try to access already released objects. These two kinds of bugs are usually hidden to a certain extent and are not easy to find. Unlike logical errors, they can be modified once discovered. They may leak memory and cause unexpected crashes after the program has been running for a while. In fact, there are many tools that can help developers detect memory problems, such as: Task Manager, System Monitor AcitvieX Control, and Rational's Purify.
The GC does not require developers to pay attention to when to release memory. However, the garbage collector cannot manage all resources in memory. The garbage collector does not know how to recycle some resources. For these resources, developers need to write their own code to recycle them. In .Net In frameworks, developers usually write the code to clean up such resources into the Close, Dispose or Finalize methods. We will look at the Finalize method later. This method is automatically called by the garbage collector.
However, there are many objects that do not need to implement the code to release resources by themselves, such as Rectangle. To clear it, you only need to clear its left, right, width, and height fields. The garbage collector can do this. Let's take a look at how memory is allocated to objects.
Object allocation:

.Net clr allocates all reference objects to the managed heap. This is very similar to the c-runtime heap, but you don't need to pay attention to when to release the object, the object will be released automatically when not used. In this way, a question arises, how does the garbage collector know that an object is no longer used and should be recycled? We'll explain this later.
There are several garbage collection algorithms, each of which has performance optimization for a specific environment. In this article, we focus on the clr garbage collection algorithm. Let's start with a basic concept.
When a process is initialized, a continuous blank memory space will be reserved during runtime. This memory space is the managed heap. The managed heap will record a pointer, we call it NextObjPtr, which points to the allocation address of the next object. Initially, this pointer points to the starting location of the managed heap.
The application uses the new operator to create a new object. This operator must first confirm that the remaining space of the managed heap can accommodate the object. If it can accommodate it, point the NextObjPtr pointer to the object. , then call the object's constructor, and the new operator returns the address of the object.


##Figure 1 Managed Heap

At this time, NextObjPtr points to the location where the next object is allocated on the managed heap. Figure 1 shows that there are three objects A, B and C in a managed heap. The next object will be placed at the location pointed to by NextObjPtr (next to the C object)
Now let's take a look at how the c-runtime heap allocates memory. In the c-runtime heap, allocating memory requires traversing the data structure of a linked list until a large enough memory block is found. This memory block may be split. After splitting, the pointer in the linked list must point to the remaining memory space. Make sure The linked list is intact. For the managed heap, allocating an object only changes the pointer of the NextObjPtr pointer, which is very fast. In fact, allocating an object on the managed heap is very close to allocating memory on the thread stack.
So far, the speed of allocating memory on the managed heap seems to be faster than that on the c-runtime heap, and the implementation is simpler. Of course, the managed heap gains this advantage because it makes an assumption: the address space is unlimited. Clearly this assumption is wrong. There must be a mechanism to ensure that this assumption is true. This mechanism is the garbage collector. Let's see how it works.
When the application calls the new operator to create an object, there may be no memory to store the object. The managed heap can detect whether the space pointed to by NextObjPtr exceeds the size of the heap. If it exceeds the size of the heap, it means that the managed heap is full and a garbage collection is required.
In reality, a garbage collection will be triggered after the generation 0 heap is full. "Generation" is an implementation mechanism for the garbage collector to improve performance. "Generation" means: newly created objects are the young generation, and objects that are not recycled before the recycling operation occurs are older objects. Splitting objects into generations allows the garbage collector to collect only objects of a certain generation instead of recycling all objects.

Garbage collection algorithm:

The garbage collector checks to see if there are objects that are no longer used by the application. If such objects exist, the space occupied by these objects can be reclaimed (if there is not enough memory available on the heap, the new operator will throw an OutofMemoryException). You may ask how the garbage collector determines whether an object is still in use? This question is not easy to answer.
Each application has a set of root objects. The roots are some storage locations. They may point to an address on the managed heap, or they may be null. For example, all global and static object pointers are root objects of the application. In addition, local variables/parameters on the thread stack are also root objects of the application, and objects in the CPU registers pointing to the managed heap are also root objects. The list of surviving root objects is maintained by the JIT (just-in-time) compiler and clr, and the garbage collector can access these root objects.
When the garbage collector starts running, it will assume that all objects on the managed heap are garbage. That is, it is assumed that there is no root object and no objects referenced by the root object. The garbage collector then starts traversing the root object and builds a graph of all objects that have a reference to the root object.
Figure 2 shows that the root objects of the application on the managed heap are A, C, D and F. These objects are part of the graph. Then object D refers to object H, so object H is also added to the graph. ;The garbage collector will cycle through all reachable objects.


figure 2 Objects on the managed heap

The garbage collector will traverse the root object and reference objects one by one. If the garbage collector finds that an object is already in the graph, it will change the path and continue traversing it. This has two purposes: one is to improve performance, and the other is to avoid infinite loops.
After all root objects have been checked, the garbage collector's graph will contain all reachable objects in the application. All objects on the managed heap that are not on this graph are garbage objects to be recycled. After constructing the reachable object graph, the garbage collector begins to linearly traverse the managed heap to find blocks of consecutive garbage objects (which can be considered free memory). The garbage collector then moves the non-garbage objects together (using the memcpy function in C), covering all memory fragments. Of course, disable all object pointers when moving objects (because they may be wrong). Therefore the garbage collector must modify the application's root objects so that they point to the object's new memory address. In addition, if an object contains a pointer to another object, the garbage collector is also responsible for modifying the reference. Figure 3 shows the managed heap after a collection.


#image 3 The managed heap after recycling is shown in Figure 3. After recycling, all garbage objects are identified, and all non-garbage objects are moved together. The pointers of all non-garbage objects are also modified to the moved memory addresses, and NextObjPtr points to the back of the last non-garbage object. At this time, the new operator can continue to successfully create objects.
As you can see, there is a significant performance penalty for garbage collection, which is an obvious disadvantage of using the managed heap. However, remember that the memory reclamation operation is not performed until the managed heap is slow. The performance of the managed heap is better than the performance of the c-runtime heap until it is full. The runtime garbage collector also does some performance optimizations, which we'll talk about in the next article.
The following code illustrates how objects are created and managed:

class Application {
public static int Main(String[] args) {
 
      // ArrayList object created in heap, myArray is now a root
      ArrayList myArray = new ArrayList();
 
      // Create 10000 objects in the heap
      for (int x = 0; x < 10000; x++) {
         myArray.Add(new Object());    // Object object created in heap
      }
 
      // Right now, myArray is a root (on the thread&#39;s stack). So, 
      // myArray is reachable and the 10000 objects it points to are also 
      // reachable.
      Console.WriteLine(a.Length);
 
      // After the last reference to myArray in the code, myArray is not 
      // a root.
      // Note that the method doesn&#39;t have to return, the JIT compiler 
      // knows
      // to make myArray not a root after the last reference to it in the 
      // code.
 
      // Since myArray is not a root, all 10001 objects are not reachable
      // and are considered garbage.  However, the objects are not 
      // collected until a GC is performed.
   }
}

Maybe you will ask, GC is so good, why is it not included in ANSI C++? The reason is that the garbage collector must be able to find the application's root object list and must find the object's pointer. In C++, object pointers can be converted to each other, and there is no way to know what object the pointer points to. In the CLR, the managed heap knows the actual type of the object. The metadata information can be used to determine what member objects the object refers to.

Garbage Collection and Finalization

The garbage collector provides an additional feature, which can automatically call the Finalize method of an object after it is marked as garbage (provided that the object Overriding the Finalize method of object).
The Finalize method is a virtual method of the object object. You can override this method if necessary, but this method can only be rewritten in a way similar to the C++ destructor. For example:

{
~Foo(){
        Console.WriteLine(“Foo Finalize”);
}
}

Programmers who have used C++ here should pay special attention to the fact that the Finalize method is written exactly the same as the destructor in C++. However, the Finalize method and the destructor in .Net are different. Yes, managed objects cannot be destructed and can only be recycled through garbage collection.
When you design a class, it is best to avoid overriding the Finalize method for the following reasons:
1. Objects that implement Finalize will be promoted to an older "generation", which will increase memory pressure and make the object and The associated objects of this object cannot be recycled the first time they become garbage.
2. The allocation of these objects will take longer
3. Letting the garbage collector execute the Finalize method will significantly reduce performance. Please remember that every object that implements the Finalize method needs to execute the Finalize method. If there is an array object with a length of 10000, each object needs to execute the Finalize method
4. Objects that override the Finalize method may reference Other objects that do not implement the Finalize method will also be delayed for recycling
5. You have no way to control when the Finalize method is executed. If you want to release resources such as database connections in the Finalize method, it may cause the database resources to be released long after the time.
6. When the program crashes, some objects are still referenced, and their Finalize methods will not be released. Opportunity executed. This situation will occur when the object is used in a background thread, or when the object exits the program, or when the AppDomain is unloaded. In addition, by default, the Finalize method will not be executed when the application is forced to end. Of course all operating system resources will be reclaimed; but objects on the managed heap will not be reclaimed. You can change this behavior by calling the GC's RequestFinalizeOnShutdown method.
7. The runtime cannot control the order in which Finalize methods of multiple objects are executed. Sometimes the destruction of objects may be sequential.
If the object you define must implement the Finalize method, then ensure that the Finalize method is executed as quickly as possible and avoid all operations that may cause blocking, including any thread synchronization operations. In addition, make sure that the Finalize method does not cause any exception. If there is an exception, the garbage collector will continue to execute the Finalize method of other objects and directly ignore the exception.
When the compiler generates code, it will automatically call the constructor of the base class on the constructor. Similarly, the C++ compiler will automatically add a call to the base class destructor for the destructor. However, the Finalize function in .Net is not like this, and the compiler will not do special processing for the Finalize method. If you want to call the Finalize method of the parent class in the Finalize method, you must add the calling code explicitly yourself.
Please note that the Finalize method in C# is written the same as the destructor in C++, but C# does not support destructors. Don’t let this writing deceive you.

Internal implementation of GC calling the Finalize method

On the surface, it is very simple for the garbage collector to use the Finalize method. You create an object and call its Finalize method when the object is recycled. But it's actually a little more complicated.
When an application creates a new object, the new operator allocates memory on the heap. If the object implements the Finalize method. The object pointer will be placed in the finalization queue. The finalization queue is an internal data structure controlled by the garbage collector. Each object in the queue needs to call their Finalize method when recycling.
The heap shown in the figure below contains several objects, some of which are objects and some of which are not. When objects C, E, F, I, and J are created, the system detects that these objects implement the Finalize method and puts their pointers in the finalization queue.


What the Finalize method does is usually to recycle the garbage collector. resources, such as file handles, database connections, etc.
When garbage collection occurs, objects B, E, G, H, I, and J are marked as garbage. The garbage collector scans the finalization queue to find pointers to these objects. When an object pointer is found, the pointer is moved to the Freachable queue. The Freachable queue is another internal data structure controlled by the garbage collector. The Finalize method of each object in the Freachable queue will be executed.
After garbage collection, the managed heap is shown in Figure 6. You can see that objects B, G, and H have been recycled because these objects do not have a Finalize method. However, objects E, I, and J have not yet been recycled because their Finalize methods have not yet been executed.


#Figure 5 Managed heap after garbage collection
#

When the program is running, there will be a dedicated thread responsible for calling the Finalize method of the object in the Freachable queue. When the Freachable queue is empty, this thread will sleep. When there are objects in the queue, the thread is awakened, removes the objects in the queue, and calls their Finalize method. Therefore, do not attempt to access the thread's local when executing the Finalize method. storage.
The interaction between the finalization queue and the Freachable queue is very clever. First let me tell you how freachable got its name. F is obviously finalization; every object in this queue is waiting to execute their Finalize method; reachable means that these objects are coming. In other words, objects in a Freachable queue are considered to be related objects, just like global variables or static variables. Therefore, if an object is in the freachable queue, then the object is not garbage.
To put it simply, when an object is unreachable, the garbage collector will consider the object to be garbage. Then, when the garbage collector moves objects from the finalization queue to the Freachable queue, these objects are no longer garbage and their memory will not be reclaimed. At this point, the garbage collector has completed marking garbage, and some objects that were marked as garbage have been reconsidered as non-garbage objects. The garbage collector reclaims compressed memory, clears the freachable queue, and executes the Finalize method of each object in the queue.


##Figure 6 Managed heap after performing garbage collection again

再次出发垃圾回收之后,实现Finalize方法的对象才被真正的回收。这些对象的Finalize方法已经执行过了,Freachable队列清空了。

垃圾回收让对象复活

在前面部分我们已经说了,当程序不使用某个对象时,这个对象会被回收。然而,如果对象实现了Finalize方法,只有当对象的Finalize方法执行之后才会认为这个对象是可回收对象并真正回收其内存。换句话说,这类对象会先被标识为垃圾,然后放到freachable队列中复活,然后执行Finalize之后才被回收。正是Finalize方法的调用,让这种对象有机会复活,我们可以在Finalize方法中让某个对象强引用这个对象;那么垃圾回收器就认为这个对象不再是垃圾了,对象就复活了。
如下复活演示代码:

public class Foo {
~Foo(){
Application.ObjHolder = this;
  }
}
 
class Application{
  static public Object ObjHolder = null;
}

在这种情况下,当对象的Finalize方法执行之后,对象被Application的静态字段ObjHolder强引用,成为根对象。这个对象就复活了,而这个对象引用的对象也就复活了,但是这些对象的Finalize方法可能已经执行过了,可能会有意想不到的错误发生。
事实上,当你设计自己的类型时,对象的终结和复活有可能完全不可控制。这不是一个好现象;处理这种情况的常用做法是在类中定义一个bool变量来表示对象是否执行过了Finalize方法,如果执行过Finalize方法,再执行其他方法时就抛出异常。
现在,如果有其他的代码片段又将Application.ObjHolder设置为null,这个对象变成不可达对象。最终垃圾回收器会把对象当成垃圾并回收对象内存。请注意这一次对象不会出现在finalization队列中,它的Finalize方法也不会再执行了。
复活只有有限的几种用处,你应该尽可能避免使用复活。尽管如此,当使用复活时,最好重新将对象添加到终结队列中,GC提供了静态方法ReRegisterForFinalize方法做这件事:

如下代码:

public class Foo{
~Foo(){
Application.ObjHolder = this;
GC.ReRegisterForFinalize(this);
}
}

当对象复活时,重新将对象添加到复活队列中。需要注意的时如果一个对象已经在终结队列中,然后又调用了GC.ReRegisterForFinalize(obj)方法会导致此对象的Finalize方法重复执行。
垃圾回收机制的目的是为开发人员简化内存管理。
下一篇我们谈一下弱引用的作用,垃圾回收中的“代”,多线程中的垃圾回收和与垃圾回收相关的性能计数器。

 以上就是.Net 垃圾回收机制原理(一)的内容,更多相关内容请关注PHP中文网(www.php.cn)! 


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn