Home  >  Article  >  Backend Development  >  Do you understand how Python memory management works?

Do you understand how Python memory management works?

WBOY
WBOYforward
2023-04-12 16:25:091194browse

Python offers many conveniences to developers, one of the biggest being its virtually worry-free memory management. Developers no longer need to manually allocate, track, and free memory for objects and data structures in Python. The runtime does all this work for you, so you can focus on solving actual problems rather than wrangling machine-level details.

Do you understand how Python memory management works?

# Still, even for inexperienced Python users, it’s beneficial to understand how Python’s garbage collection and memory management work. Understanding these mechanisms will help you avoid performance issues that may arise with more complex projects. You can also use Python's built-in tools to monitor your program's memory management behavior.

How Python manages memory

Every Python object has a reference count, also called a reference count. refcount is a count of the total number of other objects that hold references to a given object. As you add or remove references to the object, the number goes up or down. When an object's reference count reaches zero, the object is deallocated and its memory is freed.

What is a reference? Allows access to any content of an object by name or through an accessor in another object.

Here's a simple example:

x = "Hello there"

When we issue this command to Python, two things happen under the hood:

  1. The string "Hello there" is created as a Python object and stored in memory.
  2. The name x is created in the local namespace and points to the object, which increases its reference count by 1 to 1.

If we say y = x, then the reference count will increase to 2 again.

Whenever xandy goes out of scope or is removed from their namespace, the string's reference count is reduced by 1 for each name. Once both x and y go out of range or are deleted, the string's reference count becomes 0 and is deleted.

Now, suppose we create a list containing a string as follows:

x = ["Hello there", 2, False]

The string remains in memory until the list itself is deleted or the element containing the string is removed from the list Delete in. Either of these operations will cause the only thing holding a reference to the string to disappear.

Now consider this example:

x = "Hello there" y = [x]

If we remove the first element y from , or delete the list y entirely, the string is still in memory. This is because the name x contains a reference to it.

Reference Cycles in Python

In most cases, reference counting works fine. But sometimes you encounter a situation where two objects each hold a reference to the other. This is called the reference period. In this case, the object's reference count never reaches zero and it is never deleted from memory.

This is a contrived example:

x = SomeClass()
y = SomeOtherClass()
x.item = y
y.item = x

Since x and y hold references to each other, they are never deleted from the system - even if nothing else references either of them anyone.

It's actually quite common for Python's own runtime to generate reference cycles for objects. An example is an exception with a traceback object that contains a reference to the exception itself.

In earlier versions of Python, this was a problem. Objects with reference cycles can accumulate over time, which is a big problem for long-running applications. But Python has since introduced cycle detection and garbage collection systems to manage reference cycles.

Python Garbage Collector (gc)

Python’s garbage collector detects objects with reference cycles. It does this by keeping track of objects that are "containers" (e.g. lists, dictionaries, custom class instances) and determining which of them are not accessible anywhere else.

Once these objects are picked out, the garbage collector deletes them by ensuring that their reference counts can safely drop to zero.

The vast majority of Python objects have no reference cycles, so the garbage collector does not need to run 24/7. Instead, the garbage collector uses some heuristics to run less frequently and run as efficiently as possible every time.

When the Python interpreter starts, it keeps track of the number of objects that have been allocated but not freed. The vast majority of Python objects are short-lived, so they appear and disappear quickly. But over time, more long-lived objects will emerge. Once more than a certain number of such objects accumulate, the garbage collector runs.

Every time the garbage collector runs, it collects all objects that survived the collection and places them in a group called a generation. These "first generation" objects are scanned less frequently during the reference cycle. Any first-generation objects that survive the garbage collector will eventually be migrated to second-generation, where they are scanned less frequently.

同样,垃圾收集器不会跟踪所有内容。例如,像用户创建的类这样的复杂对象总是被跟踪。但是不会跟踪仅包含简单对象(如整数和字符串)的字典,因为该特定字典中的任何对象都不会包含对其他对象的引用。不能保存对其他元素(如整数和字符串)的引用的简单对象永远不会被跟踪。

如何使用 gc 模块

通常,垃圾收集器不需要调整即可运行良好。Python 的开发团队选择了反映最常见现实世界场景的默认值。但是如果你确实需要调整垃圾收集的工作方式,你可以使用Python 的 gc 模块。该gc模块为垃圾收集器的行为提供编程接口,并提供对正在跟踪的对象的可见性。

gc当你确定不需要垃圾收集器时,你可以做的一件有用的事情是关闭它。例如,如果你有一个堆放大量对象的短运行脚本,则不需要垃圾收集器。脚本结束时,所有内容都将被清除。为此,你可以使用命令禁用垃圾收集器gc.disable()。稍后,你可以使用 重新启用它gc.enable()。

你还可以使用 手动运行收集周期gc.collect()。一个常见的应用是管理程序的性能密集型部分,该部分会生成许多临时对象。你可以在程序的该部分禁用垃圾收集,然后在最后手动运行收集并重新启用收集。

另一个有用的垃圾收集优化是gc.freeze(). 发出此命令时,垃圾收集器当前跟踪的所有内容都被“冻结”,或者被列为免于将来的收集扫描。这样,未来的扫描可以跳过这些对象。如果你有一个程序在启动之前导入库并设置大量内部状态,那么你可以gc.freeze()在所有工作完成后发出。这使垃圾收集器不必搜寻那些无论如何都不太可能被删除的东西。(如果你想对冻结的对象再次执行垃圾收集,请使用gc.unfreeze().)

使用 gc 调试垃圾收集

你还可以使用它gc来调试垃圾收集行为。如果你有过多的对象堆积在内存中并且没有被垃圾收集,你可以使用gc's 检查工具来找出可能持有对这些对象的引用的对象。

如果你想知道哪些对象持有对给定对象的引用,可以使用gc.get_referrers(obj)列出它们。你还可以使用gc.get_referents(obj)来查找给定对象引用的任何对象。

如果你不确定给定对象是否是垃圾收集的候选对象,gc.is_tracked(obj)请告诉你垃圾收集器是否跟踪该对象。如前所述,请记住垃圾收集器不会跟踪“原子”对象(例如整数)或仅包含原子对象的元素。

如果你想亲自查看正在收集哪些对象,可以使用 设置垃圾收集器的调试标志gc.set_debug(gc.DEBUG_LEAK|gc.DEBUG_STATS)。这会将有关垃圾收集的信息写入stderr。它将所有作为垃圾收集的对象保留在只读列表中。

避免 Python 内存管理中的陷阱

如前所述,如果你在某处仍有对它们的引用,则对象可能会堆积在内存中而不会被收集。这并不是 Python 垃圾收集本身的失败。垃圾收集器无法判断你是否不小心保留了对某物的引用。

让我们以一些防止对象永远不会被收集的指针作为结尾。

注意对象范围

如果你将对象 1 指定为对象 2 的属性(例如类),则对象 2 将需要超出范围,然后对象 1 才会:

obj1 = MyClass()
obj2.prop = obj1

更重要的是,如果这种情况发生在某种其他操作的副作用中,例如将对象 2 作为参数传递给对象 1 的构造函数,你可能不会意识到对象 1 持有一个引用:

obj1 = MyClass(obj2)

另一个例子:如果你将一个对象推入模块级列表并忘记该列表,则该对象将一直保留,直到从列表中删除,或者直到列表本身不再有任何引用。但是如果该列表是一个模块级对象,它可能会一直存在,直到程序终止。

简而言之,请注意你的对象可能被另一个看起来并不总是很明显的对象持有的方式。

使用 weakref避免引用循环

Python 的 weakref 模块允许你创建对其他对象的弱引用。弱引用不会增加对象的引用计数,因此只有弱引用的对象是垃圾回收的候选对象。

一个常见的用途weakref是对象缓存。你不希望仅仅因为它具有缓存条目而保留引用的对象,因此你将 aweakref用于缓存条目。

Manual Breaking of Reference Cycles

Finally, if you know that a given object contains a reference to another object, you can always manually break a reference to that object. For example, if you have instance_of_class.ref = other_object, you can set instance_of_class.ref = None when ready to delete instance_of_class.

By understanding how Python memory management works, we take a look at how its garbage collection system helps optimize memory in Python programs, and how you can control memory usage and garbage collection using modules provided by the standard library and elsewhere.

Original title:​​Python garbage collection and the gc module​

The above is the detailed content of Do you understand how Python memory management works?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete