Home > Article > Backend Development > Django caching mechanism
The contents of static websites are simple static web pages stored directly on the server, which can easily achieve an amazing number of visits. But dynamic websites are dynamic, which means that every time a user visits a page, the server has to perform database queries, start templates, execute business logic, and finally generate the web page you said you saw. All of this is generated dynamically and instantly. From a processor resource perspective, this is relatively expensive.
For most network applications, overload is not a big problem. Because most web applications are not washingtopost.com or Slashdot; they are usually small and simple, or medium-sized sites with very little traffic. But for sites with medium to large traffic, it is necessary to solve the overload problem as much as possible. This requires the use of cache.
#The purpose of caching is to avoid repeated calculations, especially for some calculations that are more time-consuming and resource-consuming. The code below demonstrates how to cache the results of a dynamic page.
given a URL, try finding that page in the cache if the page is in the cache: return the cached page else: generate the page save the generated page in the cache (for next time) return the generated page
To this end, Django provides a stable caching system that allows you to cache the results of dynamic pages, so that subsequent identical requests can directly use the data in the cache to avoid unnecessary repeated calculations. In addition, Django also provides caching of data at different granularities. For example, you can cache the entire page, a certain part, or even the entire website.
Django also works well with "upstream" caches, such as Squid (http://www.squid-cache.org) and browser-based caches, which Types of caching You don't directly control, but you can provide them with clues (via HTTP headers) about which parts of your site should be cached and how.
Continue Read on to learn how to use Django's caching system. You'll be glad you understood this material when your site becomes like Slashdot
Setting up caching
The caching system requires a small amount of setup work, i.e. you have to tell it where your cached data is - in the database, the file system or directly in memory. This is an important decision that affects the performance of your cache, yes. , some cache types are faster than others, in-memory cache is usually faster than file system or database cache, because the former does not have to access the file system or database over the connection
your cache Select the CACHE_BACKEND setting in your settings file. If you use caching but do not specify CACHE_BACKEND, Django will default to simple:///. All available values for CACHE_BACKEND are explained below
Memory Cache
By far the fastest and most efficient type of cache available to Django is the memory-based caching framework Memcached, which was originally developed for LiveJournal .com handles high loads and was subsequently open sourced by Danga
Interactive (http://www.danga.com), which is used by sites such as Slashdot and Wikipedia to reduce database access and dramatically increase site performance
Memcached is available for free at http://danga.com/memcached/, it runs as a background process and allocates a specified amount of RAM. It gives you lightning-fast results in the cache Add, get and delete arbitrary data as fast as possible, all data is stored directly in memory, so there is no overuse of databases and file systems
After installing Memcached After itself, you will need to install the MemcachedPython bindings, which are not bundled directly with Django, these bindings are in a separate Python module, 'memcache.py', available at http://www.djangoproject.com/thirdparty/ python-memcached gets
Set CACHE_BACKEND to memcached://ip:port/ to let Django use Memcached. The ip here is the IP address of the Memcached background process, and the port is The port where Memcached is running
In this example, Memcached is running on the local host (127.0.0.1), and the port is 11211:
CACHE_BACKEND = ' memcached://127.0.0.1:11211/'
One of the great features of Memcached is its ability to share the cache across multiple servers, which means you can run the Memcached process on multiple machines. This group of machines will be treated as a *single* cache, without the need to copy the cache value on each machine. In order for Django to take advantage of this feature, you need to include all server addresses in CACHE_BACKEND separated by semicolons
In this example, the cache is shared between Memcached instances running on the IP addresses of 172.19.26.240 and 172.19.26.242 and port 11211:
CACHE_BACKEND = 'memcached: //172.19.26.240:11211;172.19.26.242:11211/'
In this example, the cache is running on 172.19.26.240 (port 11211), 172.19.26.242 (port 11212), 172.19.26.244 (port 11213) shared between Memcached instances:
CACHE_BACKEND = 'memcached://172.19.26.240:11211;172.19.26.242:11212;172.19.26.244:11213/'
The last thing about Memcached is that memory-based caching has a major disadvantage, because cached data is only stored in memory, then the data will be lost if the server crashes. Obviously memory is not prepared for persistent data storage, and Django does not have a cache. The backend is used for persistent storage. They are caching solutions, not storage. But we point it out here because memory-based caching is particularly short-lived
.
Database Cache
In order to use a database table as a cache backend, you need to create a cache table in the database and point Django's cache system to the table
First, use the following statement to create a cache data table:
python manage.py createcachetable [cache_table_name]
[cache_table_name] here is what you want Create a database table name. The name can be anything you want, as long as it is legal in your database and has not been used. This command creates a separate table in your database that follows the form expected by Django's database caching system.
Once you have created the database table, set your CACHE_BACKEND setting to "db://tablename", where tablename is the name of the database table, in this example, the cache The table name is my_cache_table:
CACHE_BACKEND = 'db://my_cache_table'
The database cache backend uses the same database specified in your settings file. You cannot use a different one for your cache table. Database backend.
File system cache
Use "file://" cache type as CACHE_BACKEND and specify File system directory where cached data is stored.
For example, use the following settings to store cached data in /var/tmp/django_cache:
CACHE_BACKEND = 'file:///var/tmp/django_cache'
Note that there are three forward slashes at the beginning of the example, the first two are file://, and the third is the directory path. The first character, /var/tmp/django_cache, if you are using Windows system, put the drive letter after file://, like this: 'file://c:/foo/bar'.
Directory paths should be *absolute* paths, i.e. they should start at the root of your file system, it doesn't matter whether you put a slash at the end of the setting.
Confirm that the directory pointed to by this setting exists and that the directory is readable and writable by the user of the system your web server is running on. Continuing with the above example, if your server is running as user apache, confirm that /var/ tmp/django_cache exists and user apache can read and write the /var/tmp/django_cache directory
Each cache value will be stored as a separate file whose contents are Python pickle Cache data saved by the module in serialized ("pickled") form, the filename of each file is the cache key, freed for secure file system use
Local memory cache
If you want the speed benefits of memcaching but don't have the ability to run Memcached, consider using the local memory cache backend, which is multi-threaded and thread-safe, but Due to its simple locking and memory allocation strategy it is not as efficient as Memcached
Set CACHE_BACKEND to locmem :/// to use it, for example:
CACHE_BACKEND = 'locmem:///'
Simple cache (for development phase)
You can use a simple single-process memory cache by configuring 'simple:///', for example:
CACHE_BACKEND = 'simple:///'
This cache only is to save data within the process, so it should only be used in a development environment or a test environment.
Fake cache (for development use)
Finally, Django provides a fake cache setting: it only implements the cache interface without doing any actual Things
This is a useful feature. If your online site uses a lot of heavy caching, but you don’t want to use caching in the development environment, then you just need to modify Configuration file, just set CACHE_BACKEND to 'dummy:///', for example:
CACHE_BACKEND = 'dummy:///'
The result is that your development environment does not have Using cache, the online environment is still using cache.
##Setting the cull_frequency value to 0 means that when max_entries is reached max_entries, the cache will be cleared. This will greatly increase the speed of access at the cost of many cache misses. This value defaults to 3
#In this example, timeout is set to 60
CACHE_BACKEND = "locmem:///?timeout=60"
而在这个例子中,timeout设为30而max_entries为400:
CACHE_BACKEND = "locmem:///?timeout=30&max_entries=400"
其中,非法的参数与非法的参数值都将被忽略。
站点级 Cache
一旦你指定了”CACHE_BACKEND”,使用缓存的最简单的方法就是缓存你的整个网站。这意味着所有不包含GET或POST参数的页面在第一次被请求之后将被缓存指定好的一段时间。
要激活每个站点的cache,只要将``’django.middleware.cache.CacheMiddleware’``添加到MIDDLEWARE_CLASSES的设置里,就像下面这样:
MIDDLEWARE_CLASSES = ( 'django.middleware.cache.CacheMiddleware', 'django.middleware.common.CommonMiddleware', )
注意
关于MIDDLEWARE_CLASSES顺序的一些事情。请看本章节后面的MIDDLEWARE_CLASSES顺序部分。
然后,在你的Django settings文件里加入下面所需的设置:
CACHE_MIDDLEWARE_SECONDS:每个页面应该被缓存的秒数
§ “CACHE_MIDDLEWARE_KEY_PREFIX”:如果缓存被多个使用相同Django安装的网站所共享,那么把这个值设成当前网站名,或其他能代表这个Django实例的唯一字符串,以避免key发生冲突。如果你不在意的话可以设成空字符串。
缓存中间件缓存每个没有GET或者POST参数的页面,即如果用户请求页面并在查询字符串里传递GET参数或者POST参数,中间件将不会尝试得到缓存版本的页面,如果你打算使用整站缓存,设计你的程序时牢记这点,例如,不要使用拥有查询字符串的URLs,除非那些页面可以不缓存
缓存中间件( cache middleware)支持另外一种设置选项,CACHE_MIDDLEWARE_ANONYMOUS_ONLY。如果你把它设置为“True”,那么缓存中间件就只会对匿名请求进行缓存,匿名请求是指那些没有登录的用户发起的请求。如果想取消用户相关页面(user-specific
pages)的缓存,例如Djangos的管理界面,这是一种既简单又有效的方法。另外,如果你要使用CACHE_MIDDLEWARE_ANONYMOUS_ONLY选项,你必须先激活AuthenticationMiddleware才行,也就是在你的配置文件MIDDLEWARE_CLASSES的地方,AuthenticationMiddleware必须出现在CacheMiddleware前面。
最后,再提醒一下:CacheMiddleware在每个HttpResponse中都会自动设置一些头部信息(headers)
§ 当一个新(没缓存的)版本的页面被请求时设置Last-Modified头部为当前日期/时间
§ 设置Expires头部为当前日期/时间加上定义的CACHE_MIDDLEWARE_SECONDS
§ 设置Cache-Control头部来给页面一个最大的时间—再一次,根据CACHE_MIDDLEWARE_SECONDS设置
视图级缓存
更加颗粒级的缓存框架使用方法是对单个视图的输出进行缓存。这和整站级缓存有一样的效果(包括忽略对有 GET和 POST
参数的请求的缓存)。它应用于你所指定的视图,而不是整个站点。
完成这项工作的方式是使用修饰器,其作用是包裹视图函数,将其行为转换为使用缓存。视图缓存修饰器称为cache_page,位于django.views.decorators.cache模块中,例如:
from django.views.decorators.cache import cache_page def my_view(request, param): # ... my_view = cache_page(my_view, 60 * 15)
如果使用 Python 2.4或更高版本,
你也可以使用 decorator语法。这个例子和前面的那个是等同的:
from django.views.decorators.cache import cache_page @cache_page(60 * 15) def my_view(request, param): # ...
cache_page只接受一个参数:以秒计的缓存超时。在前例中, “my_view()”视图的结果将被缓存 15
分钟。(注意:为了提高可读性,该参数被书写为60 * 15。60 * 15将被计算为900,也就是说15分钟乘以每分钟
60 秒。)
和站点缓存一样,视图缓存与 URL无关。如果多个 URL
指向同一视图,每个视图将会分别缓存。继续my_view范例,如果 URLconf如下所示:
urlpatterns = ('', (r'^foo/(/d{1,2})/$', my_view), )
那么正如你所期待的那样,发送到/foo/1/和/foo/23/的请求将会分别缓存。但一旦发出了特定的请求(如:/foo/23/),之后再度发出的指向该
URL 的请求将使用缓存。
在 URLconf中指定视图缓存
前一节中的范例将视图硬编码为使用缓存,因为cache_page在适当的位置对my_view函数进行了转换。该方法将视图与缓存系统进行了耦合,从几个方面来说并不理想。例如,你可能想在某个无缓存的站点中重用该视图函数,或者你可能想将该视图发布给那些不想通过缓存使用它们的人。解决这些问题的方法是在
URLconf 中指定视图缓存,而不是紧挨着这些视图函数本身来指定。
完成这项工作非常简单:在 URLconf中用到这些视图函数的时候简单地包裹一个cache_page。以下是刚才用到过的
URLconf : urlpatterns = ('', (r'^foo/(/d{1,2})/$', my_view), ) 以下是同一个 URLconf,不过用cache_page包裹了my_view: from django.views.decorators.cache import cache_page urlpatterns = ('', (r'^foo/(/d{1,2})/$', cache_page(my_view, 60 * 15)), )
如果采取这种方法,不要忘记在 URLconf
中导入cache_page.
低层次缓存API
有些时候,对整个经解析的页面进行缓存并不会给你带来太多,事实上可能会过犹不及。
比如说,也许你的站点所包含的一个视图依赖几个费时的查询,每隔一段时间结果就会发生变化。在这种情况下,使用站点级缓存或者视图级缓存策略所提供的整页缓存并不是最理想的,因为你可能不会想对整个结果进行缓存(因为一些数据经常变化),但你仍然会想对很少变化的部分进行缓存。
在像这样的情形下, Django展示了一种位于django.core.cache模块中的简单、低层次的缓存
API。你可以使用这种低层次的缓存 API在缓存中以任何级别粒度进行对象储存。你可以对所有能够安全进行 pickle处理的
Python 对象进行缓存:字符串、字典和模型对象列表等等;查阅 Python文档可以了解到更多关于 pickling的信息。)
下面是如何导入这个 API :
>>> from django.core.cache import cache
基本的接口是set(key, value, timeout_seconds)和get(key):
>>> cache.set('my_key', 'hello, world!', 30) >>> cache.get('my_key') 'hello, world!'
timeout_seconds参数是可选的,并且默认为前面讲过的CACHE_BACKEND设置中的timeout参数.
如果对象在缓存中不存在,或者缓存后端是不可达的,cache.get()返回None:
# Wait 30 seconds for 'my_key' to expire... >>> cache.get('my_key') None >>> cache.get('some_unset_key') None
我们不建议在缓存中保存None常量,因为你将无法区分所保存的None变量及由返回值None所标识的缓存未中。
cache.get()接受一个缺省参数。其指定了当缓存中不存在该对象时所返回的值:
>>> cache.get('my_key', 'has expired')
'has expired'
要想一次获取多个缓存值,可以使用cache.get_many()。如果可能的话,对于给定的缓存后端,get_many()将只访问缓存一次,而不是对每个缓存键值都进行一次访问。get_many()所返回的字典包括了你所请求的存在于缓存中且未超时的所有键值。
>>> cache.set('a', 1) >>> cache.set('b', 2) >>> cache.set('c', 3) >>> cache.get_many(['a', 'b', 'c']) {'a': 1, 'b': 2, 'c': 3}
如果某个缓存关键字不存在或者已超时,它将不会被包含在字典中。下面是范例的延续:
>>> cache.get_many(['a', 'b', 'c', 'd'])
{'a': 1, 'b': 2, 'c': 3}
最后,你可以用cache.delete()显式地删除关键字。这是在缓存中清除特定对象的简单途径。
>>> cache.delete('a')
cache.delete()没有返回值,不管给定的缓存关键字对应的值存在与否,它都将以同样方式工作。
上游缓存
目前为止,本章的焦点一直是对你自己的数据进行缓存。但还有一种与 Web开发相关的缓存:由上游高速缓存执行的缓冲。有一些系统甚至在请求到达站点之前就为用户进行页面缓存。
下面是上游缓存的几个例子:
§ 你的 ISP (互联网服务商)可能会对特定的页面进行缓存,因此如果你向http://www.infocool.net/请求一个页面,你的
ISP 可能无需直接访问 www.infocool.net就能将页面发送给你。而 www.infocool.net的维护者们却无从得知这种缓存,ISP位于
www.infocool.net和你的网页浏览器之间,透明底处理所有的缓存。
§ 你的 Django网站可能位于某个代理缓存之后,例如
Squid网页代理缓存 (http://www.squid-cache.org/),该缓存为提高性能而对页面进行缓存。在此情况下,每个请求将首先由代理服务器进行处理,然后仅在需要的情况下才被传递至你的应用程序。
§ 你的网页浏览器也对页面进行缓存。如果某网页送出了相应的头部,你的浏览器将在为对该网页的后续的访问请求使用本地缓存的拷贝,甚至不会再次联系该网页查看是否发生了变化。
上游缓存将会产生非常明显的效率提升,但也存在一定风险。许多网页的内容依据身份验证以及许多其他变量的情况发生变化,缓存系统仅盲目地根据 URL保存页面,可能会向这些页面的后续访问者暴露不正确或者敏感的数据。
举个例子,假定你在使用网页电邮系统,显然收件箱页面的内容取决于登录的是哪个用户。如果 ISP盲目地缓存了该站点,那么第一个用户通过该 ISP登录之后,他(或她)的用户收件箱页面将会缓存给后续的访问者。这一点也不好玩。
幸运的是, HTTP提供了解决该问题的方案。已有一些 HTTP头标用于指引上游缓存根据指定变量来区分缓存内容,并通知缓存机制不对特定页面进行缓存。我们将在本节后续部分将对这些头标进行阐述。
使用 Vary头标
Vary头标定义了缓存机制在构建其缓存键值时应当将哪个请求头标考虑在内。例如,如果网页的内容取决于用户的语言偏好,该页面被称为根据语言而不同。
缺省情况下,Django的缓存系统使用所请求的路径(比如:"/stories/2005/jun/23/bank_robbed/")来创建其缓存键。这意味着对该
URL的每个请求都将使用同一个已缓存版本,而不考虑 cookies或语言偏好之类的 user-agent差别。然而,如果该页面基于请求头标的区别(例如
cookies、语言或者 user-agent)产生不同内容,你就不得不使用
Vary头标来通知缓存机制:该页面的输出取决与这些东西。
要在 Django完成这项工作,可使用便利的vary_on_headers视图修饰器,如下所示:
from django.views.decorators.vary import vary_on_headers # Python 2.3 syntax. def my_view(request): # ... my_view = vary_on_headers(my_view, 'User-Agent') # Python 2.4+ decorator syntax. @vary_on_headers('User-Agent') def my_view(request): # ...
在这种情况下,缓存装置(如 Django自己的缓存中间件)将会为每一个单独的用户浏览器缓存一个独立的页面版本。
使用vary_on_headers修饰器而不是手动设置Vary头标(使用像response['Vary']
= 'user-agent'之类的代码)的好处是修饰器在(可能已经存在的)Vary之上进行添加,而不是从零开始设置,且可能覆盖该处已经存在的设置。
你可以向vary_on_headers()传入多个头标:
@vary_on_headers('User-Agent', 'Cookie') def my_view(request): # ...
该段代码通知上游缓存对两者都进行不同操作,也就是说 user-agent和 cookie
的每种组合都应获取自己的缓存值。举例来说,使用Mozilla作为 user-agent而foo=bar作为
cookie值的请求应该和使用Mozilla作为 user-agent而foo=ham的请求应该被视为不同请求。
由于根据 cookie而区分对待是很常见的情况,因此有vary_on_cookie修饰器。以下两个视图是等效的:
@vary_on_cookie def my_view(request): # ... @vary_on_headers('Cookie') def my_view(request): # ...
传入vary_on_headers头标是大小写不敏感的;"User-Agent"与"user-agent"完全相同。
你也可以直接使用帮助函数:django.utils.cache.patch_vary_headers。该函数设置或增加Vary header,例如:
from django.utils.cache import patch_vary_headers def my_view(request): # ... response = render_to_response('template_name', context) patch_vary_headers(response, ['Cookie']) return response
patch_vary_headers以一个HttpResponse实例为第一个参数,以一个大小写不敏感的头标名称列表或元组为第二个参数。
其它缓存头标
关于缓存剩下的问题是数据的私隐性以及关于在级联缓存中数据应该在何处储存的问题。
通常用户将会面对两种缓存:他或她自己的浏览器缓存(私有缓存)以及他或她的提供者缓存(公共缓存)。公共缓存由多个用户使用,而受其他某人的控制。这就产生了你不想遇到的敏感数据的问题,比如说你的银行账号被存储在公众缓存中。因此,Web应用程序需要以某种方式告诉缓存那些数据是私有的,哪些是公共的。
解决方案是标示出某个页面缓存应当是私有的。要在 Django中完成此项工作,可使用cache_control视图修饰器:
from django.views.decorators.cache import cache_control @cache_control(private=True) def my_view(request): # ...
该修饰器负责在后台发送相应的 HTTP头标。
还有一些其他方法可以控制缓存参数。例如, HTTP允许应用程序执行如下操作:
§ 定义页面可以被缓存的最大次数。
§ 指定某个缓存是否总是检查较新版本,仅当无更新时才传递所缓存内容。(一些缓存即便在服务器页面发生变化的情况下都可能还会传送所缓存的内容,只因为缓存拷贝没有过期。)
在 Django中,可使用cache_control视图修饰器指定这些缓存参数。在本例中,cache_control告诉缓存对每次访问都重新验证缓存并在最长
3600 秒内保存所缓存版本:
from django.views.decorators.cache import cache_control @cache_control(must_revalidate=True, max_age=3600) def my_view(request): ...
在cache_control()中,任何有效Cache-ControlHTTP指令都是有效的。以下是一个完整的清单:
§ public=True § private=True § no_cache=True § no_transform=True § must_revalidate=True § proxy_revalidate=True § max_age=num_seconds § s_maxage=num_seconds
小提示
要了解有关Cache-ControlHTTP指令的相关解释,
可以查阅http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9的规范文档。
注意
缓存中间件已经使用CACHE_MIDDLEWARE_SETTINGS设置设定了缓存头标max-age。如果你在cache_control修饰器中使用了自定义的max_age,该修饰器将会取得优先权,该头标的值将被正确地被合并。)
其他优化
Django 带有一些其它中间件可帮助您优化应用程序的性能:
§ django.middleware.http.ConditionalGetMiddleware adds support for conditional GET responses based on the ETag and Last-Modified headers for modern browsers.
§ django.middleware.gzip.GZipMiddleware compresses response content for all modern browsers to save bandwidth and delivery time.
The order of MIDDLEWARE_CLASSES
If you use caching middleware, be sure to place it in the correct location of the MIDDLEWARE_CLASSES setting , because the cache middleware needs to know the headers used to generate different cache stores.
Place CacheMiddleware after all middleware that may add content to the Vary header, including the following middleware:
§ Add Cookie's SessionMiddleware
#§ Add Accept-Encoding's GZipMiddleware,
The above is the content of Django caching mechanism, please pay attention to more related content PHP Chinese website (www.php.cn)!