搜索
首页后端开发Python教程[Python] 如何延迟加载 Python 模块? - 从 MLflow 分析 LazyLoader

[Python] How do we lazyload a Python module? - analyzing LazyLoader from MLflow

(image source: https://www.irasutoya.com/2019/03/blog-post_72.html)

Intro

One day I was hopping around a few popular ML libraries in Python, including MLflow. While glancing at its source code, one class attracted my interest, LazyLoader in __init__.py (well, this actually mirrors from the wandb project, but the original code has changed from what MLflow is using now, as you can see).

You probably heard about the concept of lazyloading from many contexts, such as web frontend image loading, caching strategy, and so on. I think the essence of all those lazyloading concepts is, that "I am too lazy to load RIGHT NOW" - yes, the hidden words "right now". Namely, the application will load and use that resource only when it is needed. So here in this MLflow library, the modules are loaded only when the resources in it — variables, functions, and classes — are accessed.

But HOW? This was my main interest. So I read the source code, which looked very simple at first glance. However, surprisingly, it took a bit of time to understand how it works, and I learned a lot from reading the code. This article is about analyzing this source code of MLflow so that we understand how such lazyloading works using various techniques of Python language.

Playing around with LazyLoader

For the purpose of our analysis, I created a simple package called lazyloading on my local machine, and placed modules as follows:


lazyloading/
├─ __init__.py
├─ __main__.py
├─ lazy_load.py
├─ heavy_module.py


  • __init__.py: This file makes the entire directory into a package.
  • __main__.py: This file is the entry point when we want to run the entire package as follows: python -m lazyloading.
  • lazy_load.py: LazyLoader is in this file.
  • heavy_module.py: This represents a module with heavy packages to be loaded (such as PyTorch) for a simulation:

import time

for i in range(5):
    time.sleep(1)
    print(5 - i, " seconds left before loading")

print("I am heavier than Pytorch!")

HEAVY_ATTRIBUTE = "heavy”


Next, we import this heavy_module inside __main__.py:


if __name__ == "__main__":
    from lazyloading import heavy_module 


Let’s run this package and see the result:


python -m lazyloading
5  seconds left before loading
4  seconds left before loading
3  seconds left before loading
2  seconds left before loading
1  seconds left before loading
I am heavier than pytorch!


Here we can clearly see that if we simply import heavy packages such as PyTorch, it could be an overhead for the entire application. That’s why we need lazyloading here. Let’s change __main__.py to look like this:


if __name__ == "__main__":
    from lazyloading.lazy_load import LazyLoader
    heavy_module = LazyLoader("lazyloading.heavy_module", globals(), "lazyloading.heavy_module")
    print("nothing happens yet")
    print(heavy_module.HEAVY_ATTRIBUTE)


And the result should be:


python -m lazyloading
nothing happens yet
5  seconds left before loading
4  seconds left before loading
3  seconds left before loading
2  seconds left before loading
1  seconds left before loading
heavy


Yes, any module imported by LazyLoader doesn’t need to execute any script or import other packages. It happens only when any attribute of the module is accessed. This is the power of lazyloading!

How LazyLoader works in MLflow? - source code analysis

The code itself is short and simple. I added type annotations and a few comments (lines enclosed in ) for explanations. All the other comments are the ones in the original source code.


"""Utility to lazy load modules."""
import importlib
import sys
import types

from typing import Any, TypeVar

T = TypeVar("T") # <this is added by me>

class LazyLoader(types.ModuleType):
    """Class for module lazy loading.

    This class helps lazily load modules at package level, which avoids pulling in large
    dependencies like `tensorflow` or `torch`. This class is mirrored from wandb's LazyLoader:
    https://github.com/wandb/wandb/blob/79b2d4b73e3a9e4488e503c3131ff74d151df689/wandb/sdk/lib/lazyloader.py#L9
    """

    _local_name: str # <the name of the package that is used inside code>
    _parent_module_globals: dict[str, types.ModuleType] # <importing module namespace accessible by calling globals>
    _module: types.ModuleType | None # <actual module>

    def __init__(
        self, 
        local_name: str, 
        parent_module_globals: dict[str, types.ModuleType], 
        name: Any # <to be used in types.moduletype the full package name as pkg.subpkg.subsubpkg>
    ):
        self._local_name = local_name
        self._parent_module_globals = parent_module_globals
        self._module = None

        super().__init__(str(name)) 

    def _load(self) -> types.ModuleType:
        """Load the module and insert it into the parent's globals."""
        if self._module:
            # If already loaded, return the loaded module.
            return self._module

        # Import the target module and insert it into the parent's namespace

        # <see https:>
        # <absolute import importing the module itself from a package rather than top-level only __import__>
        # <here self.__name__ is the variable in __init__>
        # <this is why that in __init__ must be the full module path>
        module = importlib.import_module(self.__name__) # this automatically updates sys.modules

        # <add the name of module to importing namespace>
        # <so that you can use this module name as a variable inside the importing even if it is called function defined in>
        self._parent_module_globals[self._local_name] = module

        # <add the module to list of loaded modules for caching>
        # <see https:>
        # <this makes possible to import cached module with the variable _local_name sys.modules update this object dict so that if someone keeps a reference lookups are efficient is only called on fail self.__dict__.update return def __getattr__ item: t> T:
        module = self._load()
        return getattr(module, item)

    def __dir__(self):
        module = self._load()
        return dir(module)

    def __repr__(self):
        if not self._module:
            return f"<module loaded yet>"
        return repr(self._module)


</module></this></see></add></so></add></this></here></absolute></see></to></actual></importing></the></this>

Now, let’s investigate the code while lazyloading our heavy_module. Since we don’t need to simulate the heaviness of the module anymore, let’s get rid of the time.sleep(1) loop part.

1. Creating an instance of LazyLoader, proxying the original module

Let’s look at __init__() of LazyLoader.


class LazyLoader(types.ModuleType):
    # …
    # code omitted
    # …

    def __init__(
        self, 
        local_name: str, 
        parent_module_globals: dict[str, types.ModuleType], 
        name: Any # <to be used in types.moduletype the full package name as pkg.subpkg.subsubpkg>
    ):
        self._local_name = local_name
        self._parent_module_globals = parent_module_globals
        self._module = None

        super().__init__(str(name)) 


</to>

We provide local_name, parent_module_globals, and name to the constructor __init__(). At the moment, we are not sure what all those means, but at least the last line indicates that we are actually generating a module - super().__init__(str(name)), since LazyLoader inherits types.ModuleType. By providing the variable name, our module created by LazyLoader is recognized as a module with name name(which is the same as heavy_module.__name__).

Printing out the module itself proves this:


# __main__.py
# run python -m lazyloading

if __name__ == "__main__":
    from lazyloading.lazy_load import LazyLoader
    heavy_module = LazyLoader("lazyloading.heavy_module", globals(), "lazyloading.heavy_module")

    print(heavy_module.__name__)


which gives on our terminal:


lazyloading.heavy_module


However, in the constructor we only assigned values to the instance variables and gave the name of the module to this proxy module. Now, what happens when we try to access an attribute of the module?

2. Accessing an attribute - __getattribute__, __getattr__, and getattr

This is one of the fun parts of this class. What happens when we access an attribute of a Python object in general? Say we access HEAVY_ATTRIBUTE of heavy_module by calling heavy_module.HEAVY_ATTRIBUTE. From the code here, or from your own experience in several Python projects, you might guess that __getattr__() is called, and that’s partially correct. Look at the official docs:

Called when the default attribute access fails with an AttributeError (either getattribute() raises an AttributeError because name is not an instance attribute or an attribute in the class tree for self; or get of a name property raises AttributeError).

(Please ignore __get__ because it is out of scope of this post, and our LazyLoader doesn’t implement __get__ either).

So __getattribute__() the key method here is __getattribute__. According to the docs, when we try to access an attribute, __getattribute__ will be called first, and if the attribute we’re looking for cannot be found by __getattribute__, AttributeError will be raised, which will in turn invoke our __getattr__ in the code. To verify this, let’s override __getattribute__ of the LazyLoader class, and change __getattr__() a little bit as follows:


def __getattribute__(self, name: str) -> Any:
    try:
        print(f"__getattribute__ is called when accessing attribute '{name}'")
        return super().__getattribute__(name)

    except Exception as error:
        print(f"an error has occurred when __getattribute__() is invoked as accessing '{name}': {error}")
        raise

def __getattr__(self, item: T) -> T:
    print(f"__getattr__ is called when accessing attribute '{item}'")
    module = self._load()
    return getattr(module, item)


When we access HEAVY_ATTRIBUTE that exists in heavy_module, the result is:


if __name__ == "__main__":
    from lazyloading.lazy_load import LazyLoader
    heavy_module = LazyLoader("lazyloading.heavy_module", globals(), "lazyloading.heavy_module")

    print(heavy_module.HEAVY_ATTRIBUTE)



python -m lazyloading
__getattribute__ is called when accessing attribute 'HEAVY_ATTRIBUTE'
an error has occurred when __getattribute__() is invoked as accessing 'HEAVY_ATTRIBUTE': module 'lazyloading.heavy_module' has no attribute 'HEAVY_ATTRIBUTE'
__getattr__ is called when accessing attribute 'HEAVY_ATTRIBUTE'
__getattribute__ is called when accessing attribute '_load'
__getattribute__ is called when accessing attribute '_module'
__getattribute__ is called when accessing attribute '__name__'
I am heavier than Pytorch!
__getattribute__ is called when accessing attribute '_parent_module_globals'
__getattribute__ is called when accessing attribute '_local_name'
__getattribute__ is called when accessing attribute '__dict__'
heavy


So __getattr__ is actually not called directly, but __getattribute__ is called first, and it raises AttributeError because our LazyLoader instance doesn’t have attribute HEAVY_ATTRIBUTE. Now __getattr__() is called as a failover. Then we meet getattr(), but this code line getattr(module, item) is equivalent to code module.item in Python. So eventually, we access the HEAVY_ATTRIBUTE in the actual module heavy_module, if module variable in __getattr__() is correctly imported and returned by self._load().

But before we move on to investigating _load() method, let’s call HEAVY_ATTRIBUTE once again in __main__.py and run the package:


if __name__ == "__main__":
    from lazyloading.lazy_load import LazyLoader
    heavy_module = LazyLoader("lazyloading.heavy_module", globals(), "lazyloading.heavy_module")

    print(heavy_module.HEAVY_ATTRIBUTE)
    print(heavy_module.HEAVY_ATTRIBUTE)


Now we see the additional logs on the terminal:


# … the same log as above
__getattribute__ is called when accessing attribute 'HEAVY_ATTRIBUTE'
heavy


It seems that __getattribute__ can access HEAVY_ATTRIBUTE now inside the proxy module(our LazyLoader instance). This is because(!!!spoiler alert!!!) _load caches the accessed attribute in __dict__ attribute of the LazyLoader instance. We’ll get back to this in the next section.

3. Loading and caching the actual module

This section covers the core part the post - loading the actual module in the function _load().

3-1. Module caching at the level of LazyLoader class

First, it checks whether our LazyLoader instance has already imported the module before (which reminds us of the Singleton pattern).


if self._module:
    # If already loaded, return the loaded module.
    return self._module


3-2. Importing the actual module with importlib.import_module

Otherwise, the method tries to import the module named __name__, which we saw in the __init__ constructor:


# <see https:>
# <absolute import importing the module itself from a package rather than top-level only __import__>
# <here self.__name__ is the variable in __init__>
# <this is why that in __init__ must be the full module path>
module = importlib.import_module(self.__name__) # this automatically updates sys.modules


</this></here></absolute></see>

According to the docs of importlib.import_module, when we don’t provide the pkg argument and only the path string, the function tries to import the package in the absolute manner. Therefore, when we create a LazyLoader instance, the name argument should be the absolute term. You can run your own experiment to see it raises ModuleNotFoundError:


if __name__ == "__main__":
    from lazyloading.lazy_load import LazyLoader
    heavy_module = LazyLoader("heavy_module", globals(), "heavy_module")

    print(heavy_module.HEAVY_ATTRIBUTE)



# logs omitted
ModuleNotFoundError: No module named 'heavy_module'


Notably, invoking importlib.import_module(self.__name__) caches the module with name self.__name__ in the global scope. If you run the following lines in __main__.py


if __name__ == "__main__":
    from lazyloading.lazy_load import LazyLoader
    heavy_module = LazyLoader("heavy_module", globals(), "lazyloading.heavy_module")

    # check whether the module is cached at the global scope
    import sys
    print("lazyloading.heavy_module" in sys.modules)

    # accessing any attribute to load the module
    heavy_module.HEAVY_ATTRIBUTE

    print("lazyloading.heavy_module" in sys.modules)


and run the package, then the logs should be:


python -m lazyloading
False
I am heavier than Pytorch!
True


This way of caching using sys.modules is related to the next two lines that also cache the module in different ways.

3-3. Caching the module with given local_name


# <add the name of module to importing namespace>
# <so that you can use this module name as a variable inside the importing even if it is called function defined in>
self._parent_module_globals[self._local_name] = module

# <add the module to list of loaded modules for caching>
# <see https:>
# <this makes possible to import cached module with the variable _local_name sys.modules>

<p>Both lines cache the module in the dictionaries self._parent_module_globals and sys.modules respectively, but with the key self._local_name(not self.__name__). This is the variable we provided as local_name when creating this proxy module instance with __init__(). But what does this caching accomplish?</p>

<p>First, we can use the module with the given _local_name in the "parent module"’s globals(from the parameter’s name and seeing how MLflow uses in its uppermost __init__.py, we can infer that here the word <em>globals</em> means (globals()). This means that importing the module inside a function doesn’t limit the module to be used outside the function’s scope:</p>

<pre class="brush:php;toolbar:false">

if __name__ == "__main__":
    from lazyloading.lazy_load import LazyLoader

    def load_heavy_module() -> None:
        # import the module inside a function
        heavy_module = LazyLoader("heavy_module", globals(), "lazyloading.heavy_module")
        print(heavy_module.HEAVY_ATTRIBUTE)

    # loads the heavy_module inside the function's scope
    load_heavy_module()

    # the module is now in the scope of this module
    print(heavy_module)


Running the package gives:


python -m lazyloading
I am heavier than Pytorch!
heavy
<module from> # the path of the heavy_module(a Python file)


</module>

Of course, if you provide the second argument locals(), then you’ll get NameError(give it a try!).

Second, we can also import the module in any other place inside the whole package with the given local name. Let’s create another module heavy_module_loader.py inside the current package lazyloading :


lazyloading/
├─ __init__.py
├─ __main__.py
├─ lazy_load.py
├─ heavy_module.py
├─ heavy_module_loader.py


Note that I used a custom name heavy_module_local for the local variable name of the proxy module.


# heavy_module_loader.py

from lazyloading.lazy_load import LazyLoader

heavy_module = LazyLoader("heavy_module_local", globals(), "lazyloading.heavy_module")
heavy_module.HEAVY_ATTRIBUTE


Now let __main__.py be simpler:


from lazyloading import heavy_module_loader

if __name__ == "__main__":
    import heavy_module_local
    print(heavy_module_local)


Your IDE will probably alert this line as having a syntax error, but actually running it will give us the expected result:


python -m lazyloading
I am heavier than Pytorch!
<module from> # the path of the heavy_module(a Python file)


</module>

Although MLflow seems to use the same string value for both local_name and name when creating LazyLoader instances, we can use the local_name as an alias for the actual package name, thanks to this caching mechanism.

3-4. Caching the attributes of the actual module in __dict__


# Update this object's dict so that if someone keeps a reference to the `LazyLoader`,
# lookups are efficient (`__getattr__` is only called on lookups that fail).
self.__dict__.update(module.__dict__)


In Python, the attribute __dict__ gives the dictionary of attributes of the given object. Updating this proxy module’s attributes with the actual module’s ones makes the user easier to access the attributes of the real one. As we discussed in section 2(2. Accessing an attribute - __getattribute__, __getattr__, and getattr) and noted in the comments of the original source code, this allows __getattribute__ and __getattr__ to directly access the target attributes.

In my view, this part is somewhat unnecessary, as we already cache modules and use them whenever their attributes are accessed. However, this could be useful when we need to debug and inspect __dict__.

4. __dir__ and __repr__

Similar to __dict__, these two dunder functions might not be strictly necessary when using LazyLoader modules. However, they could be useful for debugging. __repr__ is particularly helpful as it indicates whether the module has been loaded.


<p>if not self.<em>module</em>:<br>
    return f"<module>name_} (Not loaded yet)'>"<br>
return repr(self._module)</module></p>




Conclusion

Although the source code itself is quite short, we covered several advanced topics, including importing modules, module scopes, and accessing object attributes in Python. Also, the concept of lazyloading is very common in computer science, but we rarely get the chance to examine how it is implemented in detail. By investigating how LazyLoader works, we learned more than we expected. Our biggest takeaway is that short code doesn’t necessarily mean easy code to analyze!

以上是[Python] 如何延迟加载 Python 模块? - 从 MLflow 分析 LazyLoader的详细内容。更多信息请关注PHP中文网其他相关文章!

声明
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn
Python和时间:充分利用您的学习时间Python和时间:充分利用您的学习时间Apr 14, 2025 am 12:02 AM

要在有限的时间内最大化学习Python的效率,可以使用Python的datetime、time和schedule模块。1.datetime模块用于记录和规划学习时间。2.time模块帮助设置学习和休息时间。3.schedule模块自动化安排每周学习任务。

Python:游戏,Guis等Python:游戏,Guis等Apr 13, 2025 am 12:14 AM

Python在游戏和GUI开发中表现出色。1)游戏开发使用Pygame,提供绘图、音频等功能,适合创建2D游戏。2)GUI开发可选择Tkinter或PyQt,Tkinter简单易用,PyQt功能丰富,适合专业开发。

Python vs.C:申请和用例Python vs.C:申请和用例Apr 12, 2025 am 12:01 AM

Python适合数据科学、Web开发和自动化任务,而C 适用于系统编程、游戏开发和嵌入式系统。 Python以简洁和强大的生态系统着称,C 则以高性能和底层控制能力闻名。

2小时的Python计划:一种现实的方法2小时的Python计划:一种现实的方法Apr 11, 2025 am 12:04 AM

2小时内可以学会Python的基本编程概念和技能。1.学习变量和数据类型,2.掌握控制流(条件语句和循环),3.理解函数的定义和使用,4.通过简单示例和代码片段快速上手Python编程。

Python:探索其主要应用程序Python:探索其主要应用程序Apr 10, 2025 am 09:41 AM

Python在web开发、数据科学、机器学习、自动化和脚本编写等领域有广泛应用。1)在web开发中,Django和Flask框架简化了开发过程。2)数据科学和机器学习领域,NumPy、Pandas、Scikit-learn和TensorFlow库提供了强大支持。3)自动化和脚本编写方面,Python适用于自动化测试和系统管理等任务。

您可以在2小时内学到多少python?您可以在2小时内学到多少python?Apr 09, 2025 pm 04:33 PM

两小时内可以学到Python的基础知识。1.学习变量和数据类型,2.掌握控制结构如if语句和循环,3.了解函数的定义和使用。这些将帮助你开始编写简单的Python程序。

如何在10小时内通过项目和问题驱动的方式教计算机小白编程基础?如何在10小时内通过项目和问题驱动的方式教计算机小白编程基础?Apr 02, 2025 am 07:18 AM

如何在10小时内教计算机小白编程基础?如果你只有10个小时来教计算机小白一些编程知识,你会选择教些什么�...

如何在使用 Fiddler Everywhere 进行中间人读取时避免被浏览器检测到?如何在使用 Fiddler Everywhere 进行中间人读取时避免被浏览器检测到?Apr 02, 2025 am 07:15 AM

使用FiddlerEverywhere进行中间人读取时如何避免被检测到当你使用FiddlerEverywhere...

See all articles

热AI工具

Undresser.AI Undress

Undresser.AI Undress

人工智能驱动的应用程序,用于创建逼真的裸体照片

AI Clothes Remover

AI Clothes Remover

用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool

Undress AI Tool

免费脱衣服图片

Clothoff.io

Clothoff.io

AI脱衣机

AI Hentai Generator

AI Hentai Generator

免费生成ai无尽的。

热门文章

R.E.P.O.能量晶体解释及其做什么(黄色晶体)
4 周前By尊渡假赌尊渡假赌尊渡假赌
R.E.P.O.最佳图形设置
4 周前By尊渡假赌尊渡假赌尊渡假赌
R.E.P.O.如果您听不到任何人,如何修复音频
4 周前By尊渡假赌尊渡假赌尊渡假赌
WWE 2K25:如何解锁Myrise中的所有内容
1 个月前By尊渡假赌尊渡假赌尊渡假赌

热工具

安全考试浏览器

安全考试浏览器

Safe Exam Browser是一个安全的浏览器环境,用于安全地进行在线考试。该软件将任何计算机变成一个安全的工作站。它控制对任何实用工具的访问,并防止学生使用未经授权的资源。

EditPlus 中文破解版

EditPlus 中文破解版

体积小,语法高亮,不支持代码提示功能

DVWA

DVWA

Damn Vulnerable Web App (DVWA) 是一个PHP/MySQL的Web应用程序,非常容易受到攻击。它的主要目标是成为安全专业人员在合法环境中测试自己的技能和工具的辅助工具,帮助Web开发人员更好地理解保护Web应用程序的过程,并帮助教师/学生在课堂环境中教授/学习Web应用程序安全。DVWA的目标是通过简单直接的界面练习一些最常见的Web漏洞,难度各不相同。请注意,该软件中

Dreamweaver CS6

Dreamweaver CS6

视觉化网页开发工具

适用于 Eclipse 的 SAP NetWeaver 服务器适配器

适用于 Eclipse 的 SAP NetWeaver 服务器适配器

将Eclipse与SAP NetWeaver应用服务器集成。