Home >Backend Development >Python Tutorial >Python Caching mutable values
Caching dramatically accelerates processing, from CPU-level operations to database interfaces. Cache invalidation—determining when to remove cached data—is a complex challenge. This post addresses a simpler, yet insidious, caching issue.
This problem, lurking for 18 months, surfaced only when users deviated from the recommended usage pattern. The issue stemmed from a custom machine learning (ML) framework (built on scikit-learn) within my organization. This framework accesses multiple data sources frequently, necessitating a caching layer for performance and cost optimization (reducing BigQuery egress costs).
Initially, lru_cache
was used, but a persistent cache was needed for static data frequently accessed during development. DiskCache
, a Python library using SQLite, was chosen for its simplicity and compatibility with our 32-process environment and Pandas DataFrames (up to 500MB). An lru_cache
layer was added on top for in-memory access.
The problem emerged as more users experimented with the framework. Randomly incorrect results were reported, difficult to reproduce consistently. The root cause: in-place modification of cached Pandas DataFrames.
Our coding standard dictated creating new DataFrames after any processing. However, some users, out of habit, used inplace=True
, modifying the cached object directly. This not only altered their immediate results but also corrupted the cached data, affecting subsequent requests.
To illustrate, consider this simplified example using dictionaries:
<code class="language-python">from functools import lru_cache import time import typing as t from copy import deepcopy @lru_cache def expensive_func(keys: str, vals: t.Any) -> dict: time.sleep(3) return dict(zip(keys, vals)) def main(): e1 = expensive_func(('a', 'b', 'c'), (1, 2, 3)) print(e1) e2 = expensive_func(('a', 'b', 'c'), (1, 2, 3)) print(e2) e2['d'] = "amazing" print(e2) e3 = expensive_func(('a', 'b', 'c'), (1, 2, 3)) print(e3) if __name__ == "__main__": main()</code>
lru_cache
provides a reference, not a copy. Modifying e2
alters the cached data.
Solution:
The solution involves returning a deep copy of the cached object:
<code class="language-python">from functools import lru_cache, wraps from copy import deepcopy def custom_cache(func): cached_func = lru_cache(func) @wraps(func) def _wrapper(*args, **kwargs): return deepcopy(cached_func(*args, **kwargs)) return _wrapper</code>
This adds a small overhead (data duplication), but prevents data corruption.
Key Takeaways:
lru_cache
's reference behavior.The above is the detailed content of Python Caching mutable values. For more information, please follow other related articles on the PHP Chinese website!