Home  >  Article  >  Backend Development  >  Latest open source: efficient Python universal object pooling library

Latest open source: efficient Python universal object pooling library

王林
王林forward
2023-04-17 09:04:021259browse

Latest open source: efficient Python universal object pooling library

In programming, creating object modules is mainly achieved by generating objects. When the object is used, it will become a module that is no longer needed and be destroyed.

In the process of generating and destroying objects in the system, memory consumption will be greatly increased. At the same time, the destruction of objects will often leave residual information, which will be accompanied by the problem of memory leaks.

In the actual program development process, it is often necessary to generate and destroy a large number of duplicate objects, which makes the information generated by memory leaks too much and cannot be recycled by the system, thus occupying more memory of the system. , and when there are too many generated objects, it is impossible to determine which module is instantiated and implemented, which puts a burden on the system and is not conducive to management and subsequent operations. If things go on like this, it will eventually cause the program to slow down or even crash.

The object pool is a pool that stores a batch of created objects. It is a structure used to maintain objects. When the program needs to use an object, it can obtain the object directly from the pool instead of instantiating a new object.

In the process of programming, most people tend to focus only on the use of objects and the realization of effects. In fact, there is an initialization process between creation and use, but the system will The two steps of creation are combined together, which allows the designer to ignore the impact of the system's creation and destruction of objects on the system.

Generally speaking, the cost of creating and destroying an object is very small and can be ignored. However, if a program involves multiple creations of an object and the creation time is relatively long, then You will clearly feel that the system speed is limited by this part of the consumption.

Object pool can be regarded as the preferred method to reduce GC pressure, and it is also the simplest method.

The object pool mode is mainly suitable for the following application scenarios:

  • Resource-limited scenarios. For example, in an environment that does not require scalability (physical resources such as CPU and memory are limited), the CPU performance is not strong enough, and the memory is relatively tight. Garbage collection and memory jitter will have a relatively large impact. The memory management efficiency needs to be improved. The responsiveness is better than the throughput. Quantity is more important.
  • A limited number of objects in memory.
  • Objects that are expensive to create.
  • Pool a large number of objects with short lifetimes and low initialization costs to reduce memory allocation and reallocation costs and avoid memory fragmentation.
  • In a dynamic language like Python, GC relies on reference technology to ensure that objects will not be recycled prematurely. In some scenarios, there may be an idle period when no one uses it even though it is created, resulting in The object is recycled. It can be delegated to the object pool for safekeeping.

Pond Introduction

Pond is an efficient general object pool in Python, with the characteristics of good performance, small memory usage and high hit rate. The ability to automatically recycle based on frequency based on approximate statistics can automatically adjust the number of free objects in each object pool.

Because currently Python does not have a better object pooling library with complete test cases, complete code comments, and complete documentation. At the same time, the current mainstream object pooling library does not have a relatively intelligent automatic recycling mechanism.

Pond may be the first object pooling library in Python with complete test cases disclosed by the community, a coverage rate of more than 90%, complete code comments, and complete documentation.

Pond is inspired by Apache Commons Pool, Netty Recycler, HikariCP, and Caffeine, and combines the advantages of many.

Secondly, Pond counts the usage frequency of each object pool in a very small memory space by using approximate counting and automatically recycles it.

When the traffic is relatively random and average, the default policy and weight can reduce the memory usage by 48.85% and the borrowing hit rate is 100%.

Latest open source: efficient Python universal object pooling library

When the traffic is relatively consistent with the 2/8 law, the default policy and weight can reduce the memory usage by 45.7% and the borrow hit rate is 100%.

Latest open source: efficient Python universal object pooling library

Design Overview

Pond is mainly composed of three parts: FactoryDict, Counter, PooledObjectTree and a separate recycling thread.

FactoryDict

Using Pond requires implementing the object factory PooledObjectFactory. PooledObjectFactory provides object creation, initialization, destruction, verification and other operations, and is called by Pond.

So in order for the object pool to support storing completely different objects, Pond uses a dictionary to record the name of each factory class and the instantiated object of the factory class it implements.

Each PooledObjectFactory should have the four functions of creating objects, destroying objects, verifying whether the objects are still available, and resetting the objects.

What’s special is that Pond supports automatic reset of objects, because in some scenarios there may be situations where the object needs to be assigned a value first and passed, and then recycled after being passed. In order to avoid contamination, this is recommended. This function can be realized in various scenarios.

Counter

Counter stores an approximate counter.

PooledObjectTree

PooleedObjectTree is a dictionary. Each key corresponds to a first-in-first-out queue. These queues are thread-safe.

Each queue holds multiple PooleedObjects. PooledObject saves the creation time, last loan time, and the actual required object.

Thread safety

Pond's borrowing and recycling are both thread-safe. Python's queue module provides a first-in, first-out (FIFO) data structure suitable for multi-threaded programming. It can be used to safely pass messages or other data between producer and consumer threads.

The lock is handled by the caller, and all multiple threads can safely and easily work on the same Queue instance. The borrowing and recycling of Pond both operate on the queue, so it can basically be considered thread-safe.

Lending mechanism

When using Pond to lend an object, it will first check whether the type of object you want to lend already exists in PooledObjectTree. If it exists, it will Checks whether the object pool of this object is empty, and creates a new one if it is empty.

If there are excess objects in the object pool, queue will be used to pop up an object and verify whether the object is available. If it is unavailable, the corresponding Factory will be automatically called to clean and destroy the object. At the same time, its GC count in Python will be cleared, so that it can be recycled by GC faster, and the next one will be taken continuously until one is available.

If this object is available, it will be returned directly. Of course, whether an object is taken out from the object pool or a new object is created, Counter will be used to increment a count.

Recycling mechanism

When recycling an object, it will determine whether the target object pool exists. If it exists, it will check whether the object pool is full. If it is full, it will be automatically destroyed. The object to be returned.

Then it will check whether the object has been lent for too long. If it exceeds the configured maximum time, it will also be cleared.

Automatic recycling

Automatic recycling will be executed every once in a while, the default is 300 s. Automatically clean up objects in the object pool that are not used frequently.

Instructions

You can first install the Pond library and reference it in your project.

pip install pondpond
from pond import Pond, PooledObjectFactory, PooledObject

First you need to declare a factory class for the type of object you want to put in. For example, in the following example we want the pooled object to be Dog, so we first declare a PooledDogFactory class, and Implement PooledObjectFactory.

class Dog:
 name: str
 validate_result:bool = True
class PooledDogFactory(PooledObjectFactory):
 def creatInstantce(self) -> PooledObject:
 dog = Dog()
 dog.name = "puppy"
 return PooledObject(dog)
 def destroy(self, pooled_object: PooledObject):
 del pooled_object
 def reset(self, pooled_object: PooledObject) -> PooledObject:
 pooled_object.keeped_object.name = "puppy"
 return pooled_object
 def validate(self, pooled_object: PooledObject) -> bool:
 return pooled_object.keeped_object.validate_result

Then you need to create the Pond object:

pond = Pond(borrowed_timeout=2,
 time_between_eviction_runs=-1,
 thread_daemon=True,
 eviction_weight=0.8)

Pond can pass some parameters in, which represent:

borrowed_timeout: the unit is seconds, borrow the object The maximum period. Objects that exceed the period will be automatically destroyed when returned and will not be put into the object pool.

time_between_eviction_runs: The unit is seconds, the interval between automatic recycling.

thread_daemon: daemon thread, if True, the automatically recycled thread will be closed when the main thread is closed.

eviction_weight: The weight during automatic recycling. This weight will be multiplied by the maximum usage frequency. Objects in the object pool whose usage frequency is less than this value will enter the cleanup step.

Instancing the factory class:

factory = PooledDogFactory(pooled_maxsize=10, least_one=False)

Everything that inherits PooledObjectFactory will have its own constructor that can pass pooled_maxsize and least_one parameters.

pooled_maxsize: The maximum number of objects that can be placed in the object pool generated by this factory class.

least_one: If True, when entering automatic cleanup, the object pool of objects generated by this factory class will retain at least one object.

Register this factory object with Pond. By default, the class name of the factory will be used as the key of PooledObjectTree:

pond.register(factory)

Of course, you can also customize its name. The name will be As the key of PooledObjectTree:

pond.register(factory, name="PuppyFactory")

After successful registration, Pond will automatically start creating objects according to the pooled_maxsize set in the factory until the object pool is filled.

Borrowing and returning objects:

pooled_object: PooledObject = pond.borrow(factory)
dog: Dog = pooled_object.use()
pond.recycle(pooled_object, factory)

Of course you can borrow and return objects by name:

pooled_object: PooledObject = pond.borrow(name="PuppyFactory")
dog: Dog = pooled_object.use()
pond.recycle(pooled_object, name="PuppyFactory")

Completely clean up an object pool:

pond.clear(factory)

Clean an object pool by name:

pond.clear(name="PuppyFactory")

Under normal circumstances, you only need to use the above methods, and object generation and recycling are fully automatic.

The above is the detailed content of Latest open source: efficient Python universal object pooling library. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete