Home >Backend Development >Python Tutorial >Detailed explanation of Python garbage collection mechanism
Reference Counting
Python's default garbage collection mechanism is "reference counting", and each object maintains an ob_ref field. Its advantage is that its mechanism is simple. When a new reference points to the object, the reference count is increased by 1. When the reference of an object is destroyed, it is decreased by 1. Once the reference count of the object is 0, the object is immediately recycled and the memory occupied will be released. Its disadvantage is that it requires extra space to maintain reference counts, but the main problem is that it cannot solve "cyclic references".
What is a circular reference? A and B refer to each other and there is no external reference to either A or B. Although their reference counts are both 1, they should obviously be recycled. Example:
a = { } # The reference of a is 1
b = { } # The reference of b is 1
a['b'] = b # The reference of b is increased by 1, and the reference of b is 2
b['a'] = a # The reference of a is increased by 1, and the reference of a is 2
del a # The reference of a is decremented by 1, the reference of a is 1
del b # The reference of b is decremented by 1, the reference of b is 1
In this example, the del statement reduces the reference count of a and b and deletes the reference count of a and b. Because the two objects each contain a reference to the other object, although the last two objects cannot be accessed by name, the reference count has not been reduced to zero. Therefore, this object will not be destroyed, it will always reside in memory, which causes a memory leak. In order to solve the circular reference problem, Python introduced two GC mechanisms: mark-sweep and generational collection.
Mark Sweep
Mark-Sweep (Mark-Sweep) is a garbage collection algorithm based on tracing recycling technology. Objects are connected through references (pointers) to form a directed graph. Objects The nodes constitute the directed graph, and the reference relationships constitute the edges of the directed graph. Starting from the root object, objects are traversed along directed edges. Reachable objects are marked as useful objects, and unreachable objects are objects to be cleared. The so-called root objects are some global reference objects and references in the function stack. The objects referenced by these references cannot be deleted.
As Python’s auxiliary garbage collection technology, the mark clearing algorithm mainly deals with some container objects, such as list, dict, tuple, instance, etc., because it is impossible to cause circular reference problems for strings and numerical objects. Python uses a doubly linked list to organize these container objects.
Generational recycling
Generational recycling is an operation method that trades space for time. Python divides the memory into different collections based on the survival time of the object. Each collection is called a generation. Python divides the memory into 3 "generations. ”, respectively the young generation (0th generation), the middle generation (1st generation), and the old generation (2nd generation). They correspond to three linked lists, and their garbage collection frequency increases with the object’s survival time. decrease. Newly created objects will be allocated in the young generation. When the total number of young generation linked lists reaches the upper limit, the Python garbage collection mechanism will be triggered to recycle those objects that can be recycled, and those objects that will not be recycled will be moved to Go to the middle age, and so on. The objects in the old age are the objects that have survived the longest, even within the life cycle of the entire system. At the same time, generational recycling is based on mark-and-sweep technology.
Generational recycling also serves as Python’s auxiliary garbage collection technology to process those container objects