(image source: https://www.irasutoya.com/2019/03/blog-post_72.html)
Intro
One day I was hopping around a few popular ML libraries in Python, including MLflow. While glancing at its source code, one class attracted my interest, LazyLoader in __init__.py (well, this actually mirrors from the wandb project, but the original code has changed from what MLflow is using now, as you can see).
You probably heard about the concept of lazyloading from many contexts, such as web frontend image loading, caching strategy, and so on. I think the essence of all those lazyloading concepts is, that "I am too lazy to load RIGHT NOW" - yes, the hidden words "right now". Namely, the application will load and use that resource only when it is needed. So here in this MLflow library, the modules are loaded only when the resources in it — variables, functions, and classes — are accessed.
But HOW? This was my main interest. So I read the source code, which looked very simple at first glance. However, surprisingly, it took a bit of time to understand how it works, and I learned a lot from reading the code. This article is about analyzing this source code of MLflow so that we understand how such lazyloading works using various techniques of Python language.
Playing around with LazyLoader
For the purpose of our analysis, I created a simple package called lazyloading on my local machine, and placed modules as follows:
lazyloading/ ├─ __init__.py ├─ __main__.py ├─ lazy_load.py ├─ heavy_module.py
- __init__.py: This file makes the entire directory into a package.
- __main__.py: This file is the entry point when we want to run the entire package as follows: python -m lazyloading.
- lazy_load.py: LazyLoader is in this file.
- heavy_module.py: This represents a module with heavy packages to be loaded (such as PyTorch) for a simulation:
import time for i in range(5): time.sleep(1) print(5 - i, " seconds left before loading") print("I am heavier than Pytorch!") HEAVY_ATTRIBUTE = "heavy”
Next, we import this heavy_module inside __main__.py:
if __name__ == "__main__": from lazyloading import heavy_module
Let’s run this package and see the result:
python -m lazyloading 5 seconds left before loading 4 seconds left before loading 3 seconds left before loading 2 seconds left before loading 1 seconds left before loading I am heavier than pytorch!
Here we can clearly see that if we simply import heavy packages such as PyTorch, it could be an overhead for the entire application. That’s why we need lazyloading here. Let’s change __main__.py to look like this:
if __name__ == "__main__": from lazyloading.lazy_load import LazyLoader heavy_module = LazyLoader("lazyloading.heavy_module", globals(), "lazyloading.heavy_module") print("nothing happens yet") print(heavy_module.HEAVY_ATTRIBUTE)
And the result should be:
python -m lazyloading nothing happens yet 5 seconds left before loading 4 seconds left before loading 3 seconds left before loading 2 seconds left before loading 1 seconds left before loading heavy
Yes, any module imported by LazyLoader doesn’t need to execute any script or import other packages. It happens only when any attribute of the module is accessed. This is the power of lazyloading!
How LazyLoader works in MLflow? - source code analysis
The code itself is short and simple. I added type annotations and a few comments (lines enclosed in ) for explanations. All the other comments are the ones in the original source code.
"""Utility to lazy load modules.""" import importlib import sys import types from typing import Any, TypeVar T = TypeVar("T") # <this is added by me> class LazyLoader(types.ModuleType): """Class for module lazy loading. This class helps lazily load modules at package level, which avoids pulling in large dependencies like `tensorflow` or `torch`. This class is mirrored from wandb's LazyLoader: https://github.com/wandb/wandb/blob/79b2d4b73e3a9e4488e503c3131ff74d151df689/wandb/sdk/lib/lazyloader.py#L9 """ _local_name: str # <the name of the package that is used inside code> _parent_module_globals: dict[str, types.ModuleType] # <importing module namespace accessible by calling globals> _module: types.ModuleType | None # <actual module> def __init__( self, local_name: str, parent_module_globals: dict[str, types.ModuleType], name: Any # <to be used in types.moduletype the full package name as pkg.subpkg.subsubpkg> ): self._local_name = local_name self._parent_module_globals = parent_module_globals self._module = None super().__init__(str(name)) def _load(self) -> types.ModuleType: """Load the module and insert it into the parent's globals.""" if self._module: # If already loaded, return the loaded module. return self._module # Import the target module and insert it into the parent's namespace # <see https:> # <absolute import importing the module itself from a package rather than top-level only __import__> # <here self.__name__ is the variable in __init__> # <this is why that in __init__ must be the full module path> module = importlib.import_module(self.__name__) # this automatically updates sys.modules # <add the name of module to importing namespace> # <so that you can use this module name as a variable inside the importing even if it is called function defined in> self._parent_module_globals[self._local_name] = module # <add the module to list of loaded modules for caching> # <see https:> # <this makes possible to import cached module with the variable _local_name sys.modules update this object dict so that if someone keeps a reference lookups are efficient is only called on fail self.__dict__.update return def __getattr__ item: t> T: module = self._load() return getattr(module, item) def __dir__(self): module = self._load() return dir(module) def __repr__(self): if not self._module: return f"<module loaded yet>" return repr(self._module) </module></this></see></add></so></add></this></here></absolute></see></to></actual></importing></the></this>
Now, let’s investigate the code while lazyloading our heavy_module. Since we don’t need to simulate the heaviness of the module anymore, let’s get rid of the time.sleep(1) loop part.
1. Creating an instance of LazyLoader, proxying the original module
Let’s look at __init__() of LazyLoader.
class LazyLoader(types.ModuleType): # … # code omitted # … def __init__( self, local_name: str, parent_module_globals: dict[str, types.ModuleType], name: Any # <to be used in types.moduletype the full package name as pkg.subpkg.subsubpkg> ): self._local_name = local_name self._parent_module_globals = parent_module_globals self._module = None super().__init__(str(name)) </to>
We provide local_name, parent_module_globals, and name to the constructor __init__(). At the moment, we are not sure what all those means, but at least the last line indicates that we are actually generating a module - super().__init__(str(name)), since LazyLoader inherits types.ModuleType. By providing the variable name, our module created by LazyLoader is recognized as a module with name name(which is the same as heavy_module.__name__).
Printing out the module itself proves this:
# __main__.py # run python -m lazyloading if __name__ == "__main__": from lazyloading.lazy_load import LazyLoader heavy_module = LazyLoader("lazyloading.heavy_module", globals(), "lazyloading.heavy_module") print(heavy_module.__name__)
which gives on our terminal:
lazyloading.heavy_module
However, in the constructor we only assigned values to the instance variables and gave the name of the module to this proxy module. Now, what happens when we try to access an attribute of the module?
2. Accessing an attribute - __getattribute__, __getattr__, and getattr
This is one of the fun parts of this class. What happens when we access an attribute of a Python object in general? Say we access HEAVY_ATTRIBUTE of heavy_module by calling heavy_module.HEAVY_ATTRIBUTE. From the code here, or from your own experience in several Python projects, you might guess that __getattr__() is called, and that’s partially correct. Look at the official docs:
Called when the default attribute access fails with an AttributeError (either getattribute() raises an AttributeError because name is not an instance attribute or an attribute in the class tree for self; or get of a name property raises AttributeError).
(Please ignore __get__ because it is out of scope of this post, and our LazyLoader doesn’t implement __get__ either).
So __getattribute__() the key method here is __getattribute__. According to the docs, when we try to access an attribute, __getattribute__ will be called first, and if the attribute we’re looking for cannot be found by __getattribute__, AttributeError will be raised, which will in turn invoke our __getattr__ in the code. To verify this, let’s override __getattribute__ of the LazyLoader class, and change __getattr__() a little bit as follows:
def __getattribute__(self, name: str) -> Any: try: print(f"__getattribute__ is called when accessing attribute '{name}'") return super().__getattribute__(name) except Exception as error: print(f"an error has occurred when __getattribute__() is invoked as accessing '{name}': {error}") raise def __getattr__(self, item: T) -> T: print(f"__getattr__ is called when accessing attribute '{item}'") module = self._load() return getattr(module, item)
When we access HEAVY_ATTRIBUTE that exists in heavy_module, the result is:
if __name__ == "__main__": from lazyloading.lazy_load import LazyLoader heavy_module = LazyLoader("lazyloading.heavy_module", globals(), "lazyloading.heavy_module") print(heavy_module.HEAVY_ATTRIBUTE)
python -m lazyloading __getattribute__ is called when accessing attribute 'HEAVY_ATTRIBUTE' an error has occurred when __getattribute__() is invoked as accessing 'HEAVY_ATTRIBUTE': module 'lazyloading.heavy_module' has no attribute 'HEAVY_ATTRIBUTE' __getattr__ is called when accessing attribute 'HEAVY_ATTRIBUTE' __getattribute__ is called when accessing attribute '_load' __getattribute__ is called when accessing attribute '_module' __getattribute__ is called when accessing attribute '__name__' I am heavier than Pytorch! __getattribute__ is called when accessing attribute '_parent_module_globals' __getattribute__ is called when accessing attribute '_local_name' __getattribute__ is called when accessing attribute '__dict__' heavy
So __getattr__ is actually not called directly, but __getattribute__ is called first, and it raises AttributeError because our LazyLoader instance doesn’t have attribute HEAVY_ATTRIBUTE. Now __getattr__() is called as a failover. Then we meet getattr(), but this code line getattr(module, item) is equivalent to code module.item in Python. So eventually, we access the HEAVY_ATTRIBUTE in the actual module heavy_module, if module variable in __getattr__() is correctly imported and returned by self._load().
But before we move on to investigating _load() method, let’s call HEAVY_ATTRIBUTE once again in __main__.py and run the package:
if __name__ == "__main__": from lazyloading.lazy_load import LazyLoader heavy_module = LazyLoader("lazyloading.heavy_module", globals(), "lazyloading.heavy_module") print(heavy_module.HEAVY_ATTRIBUTE) print(heavy_module.HEAVY_ATTRIBUTE)
Now we see the additional logs on the terminal:
# … the same log as above __getattribute__ is called when accessing attribute 'HEAVY_ATTRIBUTE' heavy
It seems that __getattribute__ can access HEAVY_ATTRIBUTE now inside the proxy module(our LazyLoader instance). This is because(!!!spoiler alert!!!) _load caches the accessed attribute in __dict__ attribute of the LazyLoader instance. We’ll get back to this in the next section.
3. Loading and caching the actual module
This section covers the core part the post - loading the actual module in the function _load().
3-1. Module caching at the level of LazyLoader class
First, it checks whether our LazyLoader instance has already imported the module before (which reminds us of the Singleton pattern).
if self._module: # If already loaded, return the loaded module. return self._module
3-2. Importing the actual module with importlib.import_module
Otherwise, the method tries to import the module named __name__, which we saw in the __init__ constructor:
# <see https:> # <absolute import importing the module itself from a package rather than top-level only __import__> # <here self.__name__ is the variable in __init__> # <this is why that in __init__ must be the full module path> module = importlib.import_module(self.__name__) # this automatically updates sys.modules </this></here></absolute></see>
According to the docs of importlib.import_module, when we don’t provide the pkg argument and only the path string, the function tries to import the package in the absolute manner. Therefore, when we create a LazyLoader instance, the name argument should be the absolute term. You can run your own experiment to see it raises ModuleNotFoundError:
if __name__ == "__main__": from lazyloading.lazy_load import LazyLoader heavy_module = LazyLoader("heavy_module", globals(), "heavy_module") print(heavy_module.HEAVY_ATTRIBUTE)
# logs omitted ModuleNotFoundError: No module named 'heavy_module'
Notably, invoking importlib.import_module(self.__name__) caches the module with name self.__name__ in the global scope. If you run the following lines in __main__.py
if __name__ == "__main__": from lazyloading.lazy_load import LazyLoader heavy_module = LazyLoader("heavy_module", globals(), "lazyloading.heavy_module") # check whether the module is cached at the global scope import sys print("lazyloading.heavy_module" in sys.modules) # accessing any attribute to load the module heavy_module.HEAVY_ATTRIBUTE print("lazyloading.heavy_module" in sys.modules)
and run the package, then the logs should be:
python -m lazyloading False I am heavier than Pytorch! True
This way of caching using sys.modules is related to the next two lines that also cache the module in different ways.
3-3. Caching the module with given local_name
# <add the name of module to importing namespace> # <so that you can use this module name as a variable inside the importing even if it is called function defined in> self._parent_module_globals[self._local_name] = module # <add the module to list of loaded modules for caching> # <see https:> # <this makes possible to import cached module with the variable _local_name sys.modules> <p>Both lines cache the module in the dictionaries self._parent_module_globals and sys.modules respectively, but with the key self._local_name(not self.__name__). This is the variable we provided as local_name when creating this proxy module instance with __init__(). But what does this caching accomplish?</p> <p>First, we can use the module with the given _local_name in the "parent module"’s globals(from the parameter’s name and seeing how MLflow uses in its uppermost __init__.py, we can infer that here the word <em>globals</em> means (globals()). This means that importing the module inside a function doesn’t limit the module to be used outside the function’s scope:</p> <pre class="brush:php;toolbar:false"> if __name__ == "__main__": from lazyloading.lazy_load import LazyLoader def load_heavy_module() -> None: # import the module inside a function heavy_module = LazyLoader("heavy_module", globals(), "lazyloading.heavy_module") print(heavy_module.HEAVY_ATTRIBUTE) # loads the heavy_module inside the function's scope load_heavy_module() # the module is now in the scope of this module print(heavy_module)
Running the package gives:
python -m lazyloading I am heavier than Pytorch! heavy <module from> # the path of the heavy_module(a Python file) </module>
Of course, if you provide the second argument locals(), then you’ll get NameError(give it a try!).
Second, we can also import the module in any other place inside the whole package with the given local name. Let’s create another module heavy_module_loader.py inside the current package lazyloading :
lazyloading/ ├─ __init__.py ├─ __main__.py ├─ lazy_load.py ├─ heavy_module.py ├─ heavy_module_loader.py
Note that I used a custom name heavy_module_local for the local variable name of the proxy module.
# heavy_module_loader.py from lazyloading.lazy_load import LazyLoader heavy_module = LazyLoader("heavy_module_local", globals(), "lazyloading.heavy_module") heavy_module.HEAVY_ATTRIBUTE
Now let __main__.py be simpler:
from lazyloading import heavy_module_loader if __name__ == "__main__": import heavy_module_local print(heavy_module_local)
Your IDE will probably alert this line as having a syntax error, but actually running it will give us the expected result:
python -m lazyloading I am heavier than Pytorch! <module from> # the path of the heavy_module(a Python file) </module>
Although MLflow seems to use the same string value for both local_name and name when creating LazyLoader instances, we can use the local_name as an alias for the actual package name, thanks to this caching mechanism.
3-4. Caching the attributes of the actual module in __dict__
# Update this object's dict so that if someone keeps a reference to the `LazyLoader`, # lookups are efficient (`__getattr__` is only called on lookups that fail). self.__dict__.update(module.__dict__)
In Python, the attribute __dict__ gives the dictionary of attributes of the given object. Updating this proxy module’s attributes with the actual module’s ones makes the user easier to access the attributes of the real one. As we discussed in section 2(2. Accessing an attribute - __getattribute__, __getattr__, and getattr) and noted in the comments of the original source code, this allows __getattribute__ and __getattr__ to directly access the target attributes.
In my view, this part is somewhat unnecessary, as we already cache modules and use them whenever their attributes are accessed. However, this could be useful when we need to debug and inspect __dict__.
4. __dir__ and __repr__
Similar to __dict__, these two dunder functions might not be strictly necessary when using LazyLoader modules. However, they could be useful for debugging. __repr__ is particularly helpful as it indicates whether the module has been loaded.
<p>if not self.<em>module</em>:<br> return f"<module>name_} (Not loaded yet)'>"<br> return repr(self._module)</module></p>
Conclusion
Although the source code itself is quite short, we covered several advanced topics, including importing modules, module scopes, and accessing object attributes in Python. Also, the concept of lazyloading is very common in computer science, but we rarely get the chance to examine how it is implemented in detail. By investigating how LazyLoader works, we learned more than we expected. Our biggest takeaway is that short code doesn’t necessarily mean easy code to analyze!
以上是[Python] 如何延遲載入 Python 模組? - 從 MLflow 分析 LazyLoader的詳細內容。更多資訊請關注PHP中文網其他相關文章!

Linux終端中查看Python版本時遇到權限問題的解決方法當你在Linux終端中嘗試查看Python的版本時,輸入python...

本文解釋瞭如何使用美麗的湯庫來解析html。 它詳細介紹了常見方法,例如find(),find_all(),select()和get_text(),以用於數據提取,處理不同的HTML結構和錯誤以及替代方案(SEL)

Python 對象的序列化和反序列化是任何非平凡程序的關鍵方面。如果您將某些內容保存到 Python 文件中,如果您讀取配置文件,或者如果您響應 HTTP 請求,您都會進行對象序列化和反序列化。 從某種意義上說,序列化和反序列化是世界上最無聊的事情。誰會在乎所有這些格式和協議?您想持久化或流式傳輸一些 Python 對象,並在以後完整地取回它們。 這是一種在概念層面上看待世界的好方法。但是,在實際層面上,您選擇的序列化方案、格式或協議可能會決定程序運行的速度、安全性、維護狀態的自由度以及與其他系

本文比較了Tensorflow和Pytorch的深度學習。 它詳細介紹了所涉及的步驟:數據準備,模型構建,培訓,評估和部署。 框架之間的關鍵差異,特別是關於計算刻度的

Python的statistics模塊提供強大的數據統計分析功能,幫助我們快速理解數據整體特徵,例如生物統計學和商業分析等領域。無需逐個查看數據點,只需查看均值或方差等統計量,即可發現原始數據中可能被忽略的趨勢和特徵,並更輕鬆、有效地比較大型數據集。 本教程將介紹如何計算平均值和衡量數據集的離散程度。除非另有說明,本模塊中的所有函數都支持使用mean()函數計算平均值,而非簡單的求和平均。 也可使用浮點數。 import random import statistics from fracti

該教程建立在先前對美麗湯的介紹基礎上,重點是簡單的樹導航之外的DOM操縱。 我們將探索有效的搜索方法和技術,以修改HTML結構。 一種常見的DOM搜索方法是EX

本文指導Python開發人員構建命令行界面(CLIS)。 它使用Typer,Click和ArgParse等庫詳細介紹,強調輸入/輸出處理,並促進用戶友好的設計模式,以提高CLI可用性。

本文討論了諸如Numpy,Pandas,Matplotlib,Scikit-Learn,Tensorflow,Tensorflow,Django,Blask和請求等流行的Python庫,並詳細介紹了它們在科學計算,數據分析,可視化,機器學習,網絡開發和H中的用途


熱AI工具

Undresser.AI Undress
人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover
用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool
免費脫衣圖片

Clothoff.io
AI脫衣器

AI Hentai Generator
免費產生 AI 無盡。

熱門文章

熱工具

SecLists
SecLists是最終安全測試人員的伙伴。它是一個包含各種類型清單的集合,這些清單在安全評估過程中經常使用,而且都在一個地方。 SecLists透過方便地提供安全測試人員可能需要的所有列表,幫助提高安全測試的效率和生產力。清單類型包括使用者名稱、密碼、URL、模糊測試有效載荷、敏感資料模式、Web shell等等。測試人員只需將此儲存庫拉到新的測試機上,他就可以存取所需的每種類型的清單。

EditPlus 中文破解版
體積小,語法高亮,不支援程式碼提示功能

SAP NetWeaver Server Adapter for Eclipse
將Eclipse與SAP NetWeaver應用伺服器整合。

Atom編輯器mac版下載
最受歡迎的的開源編輯器

PhpStorm Mac 版本
最新(2018.2.1 )專業的PHP整合開發工具