Home >Backend Development >Python Tutorial >Python Caching mutable values

Python Caching mutable values

Barbara Streisand
Barbara StreisandOriginal
2025-01-26 16:13:10963browse

Python Caching mutable values

Caching dramatically accelerates processing, from CPU-level operations to database interfaces. Cache invalidation—determining when to remove cached data—is a complex challenge. This post addresses a simpler, yet insidious, caching issue.

This problem, lurking for 18 months, surfaced only when users deviated from the recommended usage pattern. The issue stemmed from a custom machine learning (ML) framework (built on scikit-learn) within my organization. This framework accesses multiple data sources frequently, necessitating a caching layer for performance and cost optimization (reducing BigQuery egress costs).

Initially, lru_cache was used, but a persistent cache was needed for static data frequently accessed during development. DiskCache, a Python library using SQLite, was chosen for its simplicity and compatibility with our 32-process environment and Pandas DataFrames (up to 500MB). An lru_cache layer was added on top for in-memory access.

The problem emerged as more users experimented with the framework. Randomly incorrect results were reported, difficult to reproduce consistently. The root cause: in-place modification of cached Pandas DataFrames.

Our coding standard dictated creating new DataFrames after any processing. However, some users, out of habit, used inplace=True, modifying the cached object directly. This not only altered their immediate results but also corrupted the cached data, affecting subsequent requests.

To illustrate, consider this simplified example using dictionaries:

<code class="language-python">from functools import lru_cache
import time
import typing as t
from copy import deepcopy

@lru_cache
def expensive_func(keys: str, vals: t.Any) -> dict:
    time.sleep(3)
    return dict(zip(keys, vals))


def main():
    e1 = expensive_func(('a', 'b', 'c'), (1, 2, 3))
    print(e1)

    e2 = expensive_func(('a', 'b', 'c'), (1, 2, 3))
    print(e2)

    e2['d'] = "amazing"

    print(e2)

    e3 = expensive_func(('a', 'b', 'c'), (1, 2, 3))
    print(e3)


if __name__ == "__main__":
    main()</code>

lru_cache provides a reference, not a copy. Modifying e2 alters the cached data.

Solution:

The solution involves returning a deep copy of the cached object:

<code class="language-python">from functools import lru_cache, wraps
from copy import deepcopy

def custom_cache(func):
    cached_func = lru_cache(func)

    @wraps(func)
    def _wrapper(*args, **kwargs):
        return deepcopy(cached_func(*args, **kwargs))

    return _wrapper</code>

This adds a small overhead (data duplication), but prevents data corruption.

Key Takeaways:

  • A deeper understanding of lru_cache's reference behavior.
  • Adhering to coding standards minimizes bugs.
  • Account for user deviations from best practices in the implementation. Robustness often trumps elegance.

The above is the detailed content of Python Caching mutable values. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn