Home > Article > Backend Development > How to master Python's garbage collection mechanism.
Thanks to the automatic garbage collection mechanism of Python
, there is no need to manually release objects when creating them in Python
. This is very developer friendly and frees developers from having to worry about low-level memory management. But if you don’t understand its garbage collection mechanism, the Python
code you write will often be very inefficient.
There are many garbage collection algorithms, the main ones are: Reference counting
, Mark-clearance
, Generational collection
, etc.
In python
, the garbage collection algorithm is mainly based on reference counting
, mark-clearance
and generational collection
Two mechanisms are supplemented.
The principle of reference counting is relatively simple:
Each object has an integer reference counting attribute. Used to record the number of times an object is referenced. For example, object A
, if an object references A
, then the reference count of A
is 1
. When the reference is deleted, the reference count of A
is -1
. When the reference count of A
is 0, it means that the object A
can no longer be used and will be recycled directly.
In Python
, you can get the value of the reference counter of the specified object through the getrefcount
function of the sys
module. Let’s look at it with a practical example. .
import sys class A(): def __init__(self): pass a = A() print(sys.getrefcount(a))
Run the above code, you can get the output result as 2
.
We saw above that after creating an A
object and assigning the object to the a
variable, the reference counter of the object The value is 2
. So when will the counter be 1
and when will the counter be -1
?
A() a=A() func(a) arr=[a,a]
The object is explicitly destroyed, such as del a
. The variable is reassigned to a new object, such as a=0
. The object leaves its scope, such as func
When the function completes execution, func
local variables in the function (global variables will not).
The container in which the object is located is destroyed, or the object is deleted from the container.
In order to better understand the increase and decrease of the counter, we run the actual code and see it clearly at a glance.
import sys class A(): def __init__(self): pass print("创建对象 0 + 1 =", sys.getrefcount(A())) a = A() print("创建对象并赋值 0 + 2 =", sys.getrefcount(a)) b = a c = a print("赋给2个变量 2 + 2 =", sys.getrefcount(a)) b = None print("变量重新赋值 4 - 1 =", sys.getrefcount(a)) del c print("del对象 3 - 1 =", sys.getrefcount(a)) d = [a, a, a] print("3次加入列表 2 + 3 =", sys.getrefcount(a)) def func(c): print('传入函数 1 + 2 = ', sys.getrefcount(c)) func(A())
The output results are as follows:
创建对象 0 + 1 = 1 创建对象并赋值 0 + 2 = 2 赋给2个变量 2 + 2 = 4 变量重新赋值 4 - 1 = 3 del对象 3 - 1 = 2 3次加入列表 2 + 3 = 5 传入函数 1 + 2 = 3
Efficient , The logic is simple, just add and subtract the counter according to the rules.
real-time. Once the object's counter reaches zero, it means that the object can never be used again, and there is no need to wait for a specific time to release the memory directly.
Need to allocate reference counting space for the object, which increases memory consumption.
When the object that needs to be released is relatively large, such as a dictionary object, all referenced objects need to be called in a loop and nested, which may take a long time.
Circular reference. This is the fatal flaw of reference counting. Reference counting has no solution, so other garbage collection algorithms must be used to supplement it.
As mentioned in the previous section, the reference counting algorithm cannot solve the problem of circular references. Objects with circular references will cause our counters to be forever Neither will be equal to 0
, causing the problem of being unable to be recycled.
Mark-Clear
The algorithm is mainly used for potential circular reference problems. The algorithm is divided into 2 steps:
Marking stage. Treat all objects as nodes of the graph, and construct the graph structure based on the reference relationships of the objects. All objects are traversed from the root node of the graph, and all visited objects are marked to indicate that the objects are "reachable".
Clear phase. Traverse all objects, and if an object is found not marked "reachable", it is recycled.
Explain with specific code examples:
class A(): def __init__(self): self.obj = None def func(): a = A() b = A() c = A() d = A() a.obj = b b.obj = a return [c, d] e = func()
In the above code, a and b refer to each other, and e refers to c and d. The entire reference relationship is shown in the figure below
#If the reference counter algorithm is used, the two objects a and b will not be recycled. Using the mark-and-clear method, starting from the root node (ie object e), the three objects c, d, and e will be marked as reachable
, while a and b cannot be marked. Therefore a and b will be recycled.
这是读者可能会有疑问,为什么确定根节点是e,而不会是a、b、c、d呢?这里就有讲究了,什么样的对象会被看成是根节点呢?一般而言,根节点的选取包括(但不限于)如下几种:
当前栈帧中的本地变量表中引用的对象,如各个线程被调用的方法堆栈中使用到的参数、 局部变量、 临时变量等。
全局静态变量
...
在执行垃圾回收过程中,程序会被暂停,即 stop-the-world
。这里很好理解:你妈妈在打扫房间的时候,肯定不允许你在房间内到处丢垃圾,要不然永远也无法打扫干净。
为了减少程序的暂停时间,采用 分代回收
( Generational Collection
)降低垃圾收集耗时。
分代回收基于这样的法则:
接大部分的对象生命周期短,大部分对象都是朝生夕灭。
经历越多次数的垃圾收集且活下来的对象,说明该对象越不可能是垃圾,应该越少去收集。
Python
中,对象一共有3种世代: G0
, G1
, G2
。
对象刚创建时为 G0
。
如果在一轮 GC
扫描中存活下来,则移至 G1
,处于 G1
的对象被扫描次数会减少。
如果再次在扫描中活下来,则进入 G2
,处于 G1
的对象被扫描次数将会更少。
当某世代中分配的对象数量与被释放的对象之差达到某个阈值的时,将触发对该代的扫描。当某世代触发扫描时,比该世代年轻的世代也会触发扫描。
那么这个阈值是多少呢?我们可以通过代码查看或者修改,示例代码如下
import gc threshold = gc.get_threshold() print("各世代的阈值:", threshold) # 设置各世代阈值 # gc.set_threshold(threshold0[, threshold1[, threshold2]]) gc.set_threshold(800, 20, 20)
输出结果如下:
各世代的阈值: (700, 10, 10)
The above is the detailed content of How to master Python's garbage collection mechanism.. For more information, please follow other related articles on the PHP Chinese website!