How to master Python's garbage collection mechanism.-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

How to master Python's garbage collection mechanism.

PHPz

May 08, 2023 pm 10:10 PM

python

Thanks to the automatic garbage collection mechanism of Python, there is no need to manually release objects when creating them in Python. This is very developer friendly and frees developers from having to worry about low-level memory management. But if you don’t understand its garbage collection mechanism, the Python code you write will often be very inefficient.

There are many garbage collection algorithms, the main ones are: Reference counting, Mark-clearance, Generational collection, etc.

In python, the garbage collection algorithm is mainly based on reference counting, mark-clearance and generational collection Two mechanisms are supplemented.

1 Reference counting

1.1 Principle of reference counting algorithm

The principle of reference counting is relatively simple:

Each object has an integer reference counting attribute. Used to record the number of times an object is referenced. For example, object A, if an object references A, then the reference count of A is 1. When the reference is deleted, the reference count of A is -1. When the reference count of A is 0, it means that the object A can no longer be used and will be recycled directly.

In Python, you can get the value of the reference counter of the specified object through the getrefcount function of the sys module. Let’s look at it with a practical example. .

import sys

class A():
    def __init__(self):
        pass
        
a = A()
print(sys.getrefcount(a))

Run the above code, you can get the output result as 2.

1.2 Counter increase and decrease conditions

We saw above that after creating an A object and assigning the object to the a variable, the reference counter of the object The value is 2. So when will the counter be 1 and when will the counter be -1?

1.2.1 Conditions for reference count 1

A()
a=A()
func(a)
arr=[a,a]

1.2.2 Conditions for reference count -1

The object is explicitly destroyed, such as del a . The variable is reassigned to a new object, such as a=0. The object leaves its scope, such as func When the function completes execution, func local variables in the function (global variables will not).

The container in which the object is located is destroyed, or the object is deleted from the container.

1.2.3 Code practice

In order to better understand the increase and decrease of the counter, we run the actual code and see it clearly at a glance.

import sys
 
class A():

    def __init__(self):
        pass
 
print("创建对象 0 + 1 =", sys.getrefcount(A()))

a = A()
print("创建对象并赋值 0 + 2 =", sys.getrefcount(a))

b = a
c = a
print("赋给2个变量 2 + 2 =", sys.getrefcount(a))

b = None
print("变量重新赋值 4 - 1 =", sys.getrefcount(a))

del c
print("del对象 3 - 1 =", sys.getrefcount(a))

d = [a, a, a]
print("3次加入列表 2 + 3 =", sys.getrefcount(a))


def func(c):
    print(&#39;传入函数 1 + 2 = &#39;, sys.getrefcount(c))
func(A())

The output results are as follows:

创建对象 0 + 1 = 1
创建对象并赋值 0 + 2 = 2
赋给2个变量 2 + 2 = 4
变量重新赋值 4 - 1 = 3
del对象 3 - 1 = 2
3次加入列表 2 + 3 = 5
传入函数 1 + 2 =  3

1.3 Advantages and Disadvantages of Reference Counting

1.3.1 Advantages of Reference Counting

Efficient , The logic is simple, just add and subtract the counter according to the rules.
real-time. Once the object's counter reaches zero, it means that the object can never be used again, and there is no need to wait for a specific time to release the memory directly.

1.3.2 Disadvantages of reference counting

Need to allocate reference counting space for the object, which increases memory consumption.

When the object that needs to be released is relatively large, such as a dictionary object, all referenced objects need to be called in a loop and nested, which may take a long time.

Circular reference. This is the fatal flaw of reference counting. Reference counting has no solution, so other garbage collection algorithms must be used to supplement it.

How to master Pythons garbage collection mechanism.

2 Mark-Clear

As mentioned in the previous section, the reference counting algorithm cannot solve the problem of circular references. Objects with circular references will cause our counters to be forever Neither will be equal to 0, causing the problem of being unable to be recycled.

Mark-Clear The algorithm is mainly used for potential circular reference problems. The algorithm is divided into 2 steps:

Marking stage. Treat all objects as nodes of the graph, and construct the graph structure based on the reference relationships of the objects. All objects are traversed from the root node of the graph, and all visited objects are marked to indicate that the objects are "reachable".
Clear phase. Traverse all objects, and if an object is found not marked "reachable", it is recycled.

Explain with specific code examples:

class A():
    def __init__(self):
        self.obj = None
 
def func():
    a = A()
    b = A()
    c = A()
    d = A()

    a.obj = b
    b.obj = a
    return [c, d]

e = func()

In the above code, a and b refer to each other, and e refers to c and d. The entire reference relationship is shown in the figure below

How to master Pythons garbage collection mechanism.

#If the reference counter algorithm is used, the two objects a and b will not be recycled. Using the mark-and-clear method, starting from the root node (ie object e), the three objects c, d, and e will be marked as reachable, while a and b cannot be marked. Therefore a and b will be recycled.

这是读者可能会有疑问，为什么确定根节点是e，而不会是a、b、c、d呢？这里就有讲究了，什么样的对象会被看成是根节点呢？一般而言，根节点的选取包括（但不限于）如下几种：

当前栈帧中的本地变量表中引用的对象，如各个线程被调用的方法堆栈中使用到的参数、局部变量、临时变量等。
全局静态变量
...

3 分代收集

3.1 分代收集原理

在执行垃圾回收过程中，程序会被暂停，即 stop-the-world 。这里很好理解：你妈妈在打扫房间的时候，肯定不允许你在房间内到处丢垃圾，要不然永远也无法打扫干净。

为了减少程序的暂停时间，采用 分代回收 ( Generational Collection )降低垃圾收集耗时。

分代回收基于这样的法则：

接大部分的对象生命周期短，大部分对象都是朝生夕灭。
经历越多次数的垃圾收集且活下来的对象，说明该对象越不可能是垃圾，应该越少去收集。

Python 中，对象一共有3种世代： G0 , G1 , G2 。

对象刚创建时为 G0 。
如果在一轮 GC 扫描中存活下来，则移至 G1 ，处于 G1 的对象被扫描次数会减少。
如果再次在扫描中活下来，则进入 G2 ，处于 G1 的对象被扫描次数将会更少。

3.2 触发GC时机

当某世代中分配的对象数量与被释放的对象之差达到某个阈值的时，将触发对该代的扫描。当某世代触发扫描时，比该世代年轻的世代也会触发扫描。

那么这个阈值是多少呢？我们可以通过代码查看或者修改，示例代码如下

import gc
threshold = gc.get_threshold()
print("各世代的阈值:", threshold)

# 设置各世代阈值
# gc.set_threshold(threshold0[, threshold1[, threshold2]])
gc.set_threshold(800, 20, 20)

输出结果如下：

各世代的阈值: (700, 10, 10)

The above is the detailed content of How to master Python's garbage collection mechanism.. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:亿速云. If there is any infringement, please contact admin@php.cn delete

Python vs. C : Learning Curves and Ease of UseApr 19, 2025 am 12:20 AM

Python is easier to learn and use, while C is more powerful but complex. 1. Python syntax is concise and suitable for beginners. Dynamic typing and automatic memory management make it easy to use, but may cause runtime errors. 2.C provides low-level control and advanced features, suitable for high-performance applications, but has a high learning threshold and requires manual memory and type safety management.

Python vs. C : Memory Management and ControlApr 19, 2025 am 12:17 AM

Python and C have significant differences in memory management and control. 1. Python uses automatic memory management, based on reference counting and garbage collection, simplifying the work of programmers. 2.C requires manual management of memory, providing more control but increasing complexity and error risk. Which language to choose should be based on project requirements and team technology stack.

Python for Scientific Computing: A Detailed LookApr 19, 2025 am 12:15 AM

Python's applications in scientific computing include data analysis, machine learning, numerical simulation and visualization. 1.Numpy provides efficient multi-dimensional arrays and mathematical functions. 2. SciPy extends Numpy functionality and provides optimization and linear algebra tools. 3. Pandas is used for data processing and analysis. 4.Matplotlib is used to generate various graphs and visual results.

Python and C : Finding the Right ToolApr 19, 2025 am 12:04 AM

Whether to choose Python or C depends on project requirements: 1) Python is suitable for rapid development, data science, and scripting because of its concise syntax and rich libraries; 2) C is suitable for scenarios that require high performance and underlying control, such as system programming and game development, because of its compilation and manual memory management.

Python for Data Science and Machine LearningApr 19, 2025 am 12:02 AM

Python is widely used in data science and machine learning, mainly relying on its simplicity and a powerful library ecosystem. 1) Pandas is used for data processing and analysis, 2) Numpy provides efficient numerical calculations, and 3) Scikit-learn is used for machine learning model construction and optimization, these libraries make Python an ideal tool for data science and machine learning.

Learning Python: Is 2 Hours of Daily Study Sufficient?Apr 18, 2025 am 12:22 AM

Is it enough to learn Python for two hours a day? It depends on your goals and learning methods. 1) Develop a clear learning plan, 2) Select appropriate learning resources and methods, 3) Practice and review and consolidate hands-on practice and review and consolidate, and you can gradually master the basic knowledge and advanced functions of Python during this period.

Python for Web Development: Key ApplicationsApr 18, 2025 am 12:20 AM

Key applications of Python in web development include the use of Django and Flask frameworks, API development, data analysis and visualization, machine learning and AI, and performance optimization. 1. Django and Flask framework: Django is suitable for rapid development of complex applications, and Flask is suitable for small or highly customized projects. 2. API development: Use Flask or DjangoRESTFramework to build RESTfulAPI. 3. Data analysis and visualization: Use Python to process data and display it through the web interface. 4. Machine Learning and AI: Python is used to build intelligent web applications. 5. Performance optimization: optimized through asynchronous programming, caching and code

Python vs. C : Exploring Performance and EfficiencyApr 18, 2025 am 12:20 AM

Python is better than C in development efficiency, but C is higher in execution performance. 1. Python's concise syntax and rich libraries improve development efficiency. 2.C's compilation-type characteristics and hardware control improve execution performance. When making a choice, you need to weigh the development speed and execution efficiency based on project needs.

See all articles