Python Tutorial

Detailed explanation of memcached operation in Python (picture and text)

黄舟

May 07, 2017 am 10:57 AM

Preface

Many Web applications save data to Relational database management systems such as MySQL, application The server reads the data from it and displays it in the browser. However, as the amount of data increases and access becomes concentrated, there will be adverse effects such as increased burden on the database, deterioration of database response, and delayed website display. Distributed caching is an important means to optimize website performance. A large number of sites provide large-scale hot data caching services through scalable server clusters. By caching data library query results and reducing the number of database accesses, the speed and scalability of dynamic web applications can be significantly improved. Commonly used in the industry are redis, memcached, etc. Today I want to talk about how to use the memcached cache service in the python project.

memcachedIntroduction

memcached is an open source, high-performance, distributed memory object caching system that can be applied Various scenarios require caching, the main purpose of which is to speed up web applications by reducing access to the Database.
Memcached itself does not actually provide a distributed solution. On the server side, the memcached cluster environment is actually the accumulation of memcached servers, and the environment construction is relatively simple; the distribution of the cache is mainly implemented on the client, and is processed through the client's routing Achieve the purpose of distributed solutions. The principle of client routing is very simple. Every time the application server accesses the value of a certain key, it maps the key to a certain memcached server nodeA through the routing algorithm. Therefore, all operations on this key are performed on nodeA. Carried on. As long as the server still caches the data, a cache hit is guaranteed.
Detailed explanation of memcached operation in Python (picture and text)

Routing algorithm

Simple routing algorithm

Simple routing algorithm, using the remainder

Hash: Use the hash value of the cached data key , divided by the number of servers, the remainder is the number in the table below of the server list. This algorithm can evenly distribute cache data throughout the memcached cluster, and can also meet most cache routing requirements. However, when the memcached cluster needs to be expanded, problems will occur. For example: the website needs to expand the capacity of 3 cache servers to 4 cache servers. After changing the server list, if you still use the remainder hash, it is easy to calculate that 75% of the requests will not hit the cache. As the size of the server cluster increases, the miss rate becomes higher.

1%3 = 1    
1%4 = 1
2%3 = 2    
2%4 = 2
3%3 = 0    
3%4 = 3
4%4 = 1    
4%4 = 0
#以此类推

Such an expansion operation is extremely risky, and may bring great instantaneous pressure to the database, and may even cause the database to crash. There are two ways to solve this problem: 1. Expand the capacity when access is low, and warm up the data after the expansion; 2. Use a better routing algorithm. The most commonly used one at present is the consistent Hash algorithm.

Consistent Hash

The memcached client can use the consistent hash algorithm as the routing strategy, as shown in the figure. Compared with the general hash (such as simple modulo) algorithm, the consistent hash algorithm except In addition to calculating the hash value of the key, the hash value corresponding to each server is also calculated, and then these hash values are mapped to a limited value range (such as 0~2^32). By finding the smallest server with a hash value greater than hash(key), it is used as the target server to store the key data. If it cannot be found, the server with the smallest hash value is directly used as the target server. At the same time, to a certain extent, the expansion problem is solved. Adding or

deleting a single node will not have a big impact on the entire cluster.
Detailed explanation of memcached operation in Python (picture and text)

Virtual Layer

Consistent hashing is not perfect and may cause load imbalance during expansion. In the latest version, the design of virtual nodes has been added to further improve usability. When expanding, it affects existing servers in the cluster more evenly and distributes the load evenly. No further details will be given here.

内存管理

存储方式

为了提高性能，memcached中保存的数据都存储在memcached内置的内存存储空间中。由于数据仅存在于内存中，因此重启memcached、重启操作系统会导致全部数据消失。另外，缓存的内容容量达到指定值之后，就基于LRU(Least Recently Used)算法自动删除不使用的缓存。memcached本身是为缓存而设计的服务，因此并没有过多考虑数据的永久性问题。

内存结构

memcached仅支持基础的key-value键值对类型数据存储。在memcached内存结构中有两个非常重要的概念：slab和chunk。
slab是一个内存块，它是memcached一次申请内存的最小单位。在启动memcached的时候一般会使用参数-m指定其可用内存，但是并不是在启动的那一刻所有的内存就全部分配出去了，只有在需要的时候才会去申请，而且每次申请一定是一个slab。Slab的大小固定为1M（1048576 Byte），一个slab由若干个大小相等的chunk组成。每个chunk中都保存了一个item结构体、一对key和value。

虽然在同一个slab中chunk的大小相等的，但是在不同的slab中chunk的大小并不一定相等，在memcached中按照chunk的大小不同，可以把slab分为很多种类（class），默认情况下memcached把slab分为40类（class1～class40），在class 1中，chunk的大小为80字节，由于一个slab的大小是固定的1048576字节（1M），因此在class1中最多可以有13107个chunk（也就是这个slab能存最多13107个小于80字节的key-value数据）。
Detailed explanation of memcached operation in Python (picture and text)

memcached内存管理采取预分配、分组管理的方式，分组管理就是我们上面提到的slab class，按照chunk的大小slab被分为很多种类。内存预分配过程是怎样的呢？向memcached添加一个item时候，memcached首先会根据item的大小，来选择最合适的slab class：例如item的大小为190字节，默认情况下class 4的chunk大小为160字节显然不合适，class 5的chunk大小为200字节，大于190字节，因此该item将放在class 5中（显然这里会有10字节的浪费是不可避免的），计算好所要放入的chunk之后，memcached会去检查该类大小的chunk还有没有空闲的，如果没有，将会申请1M（1个slab）的空间并划分为该种类chunk。例如我们第一次向memcached中放入一个190字节的item时，memcached会产生一个slab class 2（也叫一个page），并会用去一个chunk，剩余5241个chunk供下次有适合大小item时使用，当我们用完这所有的5242个chunk之后，下次再有一个在160～200字节之间的item添加进来时，memcached会再次产生一个class 5的slab（这样就存在了2个pages）。

注意事项

chunk是在page里面划分的，而page固定为1m，所以chunk最大不能超过1m。
chunk实际占用内存要加48B，因为chunk数据结构本身需要占用48B。
如果用户数据大于1m，则memcached会将其切割，放到多个chunk内。
已分配出去的page不能回收。
-对于key-value信息，最好不要超过1m的大小；同时信息长度最好相对是比较均衡稳定的，这样能够保障最大限度的使用内存；同时，memcached采用的LRU清理策略，合理甚至过期时间，提高命中率。

使用场景

key-value能满足需求的前提下，使用memcached分布式集群是较好的选择，搭建与操作使用都比较简单；分布式集群在单点故障时，只影响小部分数据异常，目前还可以通过Magent缓存代理模式，做单点备份，提升高可用；整个缓存都是基于内存的，因此响应时间是很快，不需要额外的序列化、反序列化的程序，但同时由于基于内存，数据没有持久化，集群故障重启数据无法恢复。高版本的memcached已经支持CAS模式的原子操作，可以低成本的解决并发控制问题。

安装启动

$ sudo apt-get install memcached
$ memcached -m 32 -p 11211 -d
# memcached将会以守护程序的形式启动 memcached（-d），为其分配32M内存（-m 32），并指定监听 localhost的11211端口。

python操作memcached

在python中可通过memcache库来操作memcached，这个库使用很简单，声明一个client就可以读写memcached缓存了。

python访问memcached

#!/usr/bin/env pythonimport memcache

mc = memcache.Client([&#39;127.0.0.1:12000&#39;],debug=0)

mc.set("some_key", "Some value")
value = mc.get("some_key")

mc.set("another_key", 3)
mc.delete("another_key")

mc.set("key", "1")   # note that the key used for incr/decr must be a string.
mc.incr("key")
mc.decr("key")

然而，python-memcached默认的路由策略没有使用一致性哈希。

    def _get_server(self, key):
        if isinstance(key, tuple):
            serverhash, key = key        
            else:
            serverhash = serverHashFunction(key)        
            if not self.buckets:            
            return None, None

        for i in range(Client._SERVER_RETRIES):
            server = self.buckets[serverhash % len(self.buckets)]            
            if server.connect():                
            # print("(using server %s)" % server,)
                return server, key
            serverhash = serverHashFunction(str(serverhash) + str(i))        
            return None, None

从源码中可以看到：server = self.buckets[serverhash % len(self.buckets)]，只是根据key进行了简单的取模。我们可以通过重写_get_server方法，让python-memcached支持一致性哈希。

import memcacheimport typesfrom hash_ring import HashRingclass MemcacheRing(memcache.Client):
    """Extends python-memcache so it uses consistent hashing to
    distribute the keys.
    """
    def init(self, servers, *k, **kw):
        self.hash_ring = HashRing(servers)
        memcache.Client.init(self, servers, *k, **kw)
        self.server_mapping = {}        
        for server_uri, server_obj in zip(servers, self.servers):
            self.server_mapping[server_uri] = server_obj    
            def _get_server(self, key):
        if type(key) == types.TupleType:            
        return memcache.Client._get_server(key)        
        for i in range(self._SERVER_RETRIES):
            iterator = self.hash_ring.iterate_nodes(key)            
            for server_uri in iterator:
                server_obj = self.server_mapping[server_uri]                
                if server_obj.connect():                    
                return server_obj, key        
                return None, None

torando项目中使用memcached

这里采用的策略是：1. 应用程序先从cache取数据，没有得到，则从数据库中取数据，成功后，放到缓存中。2. 应用程序从cache中取数据，取到后返回。缓存更新是一个很复杂的问题，一般是先把数据存到数据库中，成功后，再让缓存失效。后面会再写文单独讨论memcached缓存更新的问题。

代码

# coding: utf-8import sysimport tornado.ioloopimport tornado.webimport loggingimport memcacheimport jsonimport urllib# 初始化memcache clientmc = memcache.Client([&#39;127.0.0.1:11211&#39;], debug=0)
mc_prefix = &#39;demo&#39;class BaseHandler(tornado.web.RequestHandler):
    """ 把缓存处理抽象到BaseHandler基类 """
    USE_CACHE = False  # 控制是否使用缓存

    def format_args(self):
        arg_list = []        
        for a in self.request.arguments:            
        for value in self.request.arguments[a]:
                arg_list.append(&#39;%s=%s&#39; % (a, urllib.quote(value.replace(&#39; &#39;, &#39;&#39;))))        
                # 根据请求的URL产生key
        arg_list.sort()
        key = &#39;%s?%s&#39; % (self.request.path, &#39;&&#39;.join(arg_list)) if arg_list else self.request.path
        key = &#39;%s_%s&#39; % (mc_prefix, key)        
        # key太长，不进行缓存处理
        if len(key) > 250:
            logging.error(&#39;key out of length: %s&#39;, key)            
            return None

        return key    def get(self, *args, **kwargs):
        if self.USE_CACHE:            
        try:                
        # 根据请求获取key
                self.key = self.format_args()                
                if self.key:
                    data = mc.get(self.key)  
                    # 若缓存命中，则直接返回数据
                    if data:
                        logging.info(&#39;get data from memecahce&#39;)
                        self.finish(data)                        
                        return
            except Exception, e:
                logging.exception(e)        
                # 若未命中缓存，调用do_get处理请求，获取数据
        data = self.do_get()
        data_str = json.dumps(data)        
        # 把成功获取到的数据，放入memcache缓存
        if self.USE_CACHE and data and data.get(&#39;result&#39;, -1) == 0 and self.key:            
        try:
                mc.set(self.key, data_str, 60)            
                except Exception, e:
                logging.exception(e)

        self.finish(data_str)    def do_get(self):
        return Noneclass DemoHandler(BaseHandler):
    USE_CACHE = True

    def do_get(self):
        a = self.get_argument(&#39;a&#39;, &#39;test&#39;)
        b = self.get_argument(&#39;b&#39;, &#39;test&#39;)        
        # 访问数据库获取数据，此处略去
        data = {&#39;result&#39;: 0, &#39;a&#39;: a, &#39;b&#39;: b}        return datadef make_app():
    return tornado.web.Application([
        (r"/", DemoHandler),
    ])if name == "main":
    logging.basicConfig(stream=sys.stdout, level=logging.INFO,
                    format=&#39;%(asctime)s %(levelno)s %(message)s&#39;,
                    )

    app = make_app()
    app.listen(8888)
    tornado.ioloop.IOLoop.current().start()

测试结果

在浏览器访问http://127.0.0.1:8888/?a=1&b=3，终端打印的log如下：

2017-02-21 22:45:05,987 20 304 GET /?a=1&b=2 (127.0.0.1) 3.11ms
2017-02-21 22:45:07,427 20 get data from memecahce
2017-02-21 22:45:07,427 20 304 GET /?a=1&b=2 (127.0.0.1) 0.71ms
2017-02-21 22:45:10,350 20 200 GET /?a=1&b=3 (127.0.0.1) 0.82ms
2017-02-21 22:45:13,586 20 get data from memecahce

从日志可以看到，缓存命中的情况。

小结

本文介绍了memcached的路由算法、内存管理、使用场景等基本概念，然后举例说明了在python项目中如何使用memcached缓存。缓存更新的问题还需要进一步分析讨论。

The above is the detailed content of Detailed explanation of memcached operation in Python (picture and text). For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Python vs. C : Learning Curves and Ease of UseApr 19, 2025 am 12:20 AM

Python is easier to learn and use, while C is more powerful but complex. 1. Python syntax is concise and suitable for beginners. Dynamic typing and automatic memory management make it easy to use, but may cause runtime errors. 2.C provides low-level control and advanced features, suitable for high-performance applications, but has a high learning threshold and requires manual memory and type safety management.

Python vs. C : Memory Management and ControlApr 19, 2025 am 12:17 AM

Python and C have significant differences in memory management and control. 1. Python uses automatic memory management, based on reference counting and garbage collection, simplifying the work of programmers. 2.C requires manual management of memory, providing more control but increasing complexity and error risk. Which language to choose should be based on project requirements and team technology stack.

Python for Scientific Computing: A Detailed LookApr 19, 2025 am 12:15 AM

Python's applications in scientific computing include data analysis, machine learning, numerical simulation and visualization. 1.Numpy provides efficient multi-dimensional arrays and mathematical functions. 2. SciPy extends Numpy functionality and provides optimization and linear algebra tools. 3. Pandas is used for data processing and analysis. 4.Matplotlib is used to generate various graphs and visual results.

Python and C : Finding the Right ToolApr 19, 2025 am 12:04 AM

Whether to choose Python or C depends on project requirements: 1) Python is suitable for rapid development, data science, and scripting because of its concise syntax and rich libraries; 2) C is suitable for scenarios that require high performance and underlying control, such as system programming and game development, because of its compilation and manual memory management.

Python for Data Science and Machine LearningApr 19, 2025 am 12:02 AM

Python is widely used in data science and machine learning, mainly relying on its simplicity and a powerful library ecosystem. 1) Pandas is used for data processing and analysis, 2) Numpy provides efficient numerical calculations, and 3) Scikit-learn is used for machine learning model construction and optimization, these libraries make Python an ideal tool for data science and machine learning.

Learning Python: Is 2 Hours of Daily Study Sufficient?Apr 18, 2025 am 12:22 AM

Is it enough to learn Python for two hours a day? It depends on your goals and learning methods. 1) Develop a clear learning plan, 2) Select appropriate learning resources and methods, 3) Practice and review and consolidate hands-on practice and review and consolidate, and you can gradually master the basic knowledge and advanced functions of Python during this period.

Python for Web Development: Key ApplicationsApr 18, 2025 am 12:20 AM

Key applications of Python in web development include the use of Django and Flask frameworks, API development, data analysis and visualization, machine learning and AI, and performance optimization. 1. Django and Flask framework: Django is suitable for rapid development of complex applications, and Flask is suitable for small or highly customized projects. 2. API development: Use Flask or DjangoRESTFramework to build RESTfulAPI. 3. Data analysis and visualization: Use Python to process data and display it through the web interface. 4. Machine Learning and AI: Python is used to build intelligent web applications. 5. Performance optimization: optimized through asynchronous programming, caching and code

Python vs. C : Exploring Performance and EfficiencyApr 18, 2025 am 12:20 AM

Python is better than C in development efficiency, but C is higher in execution performance. 1. Python's concise syntax and rich libraries improve development efficiency. 2.C's compilation-type characteristics and hardware control improve execution performance. When making a choice, you need to weigh the development speed and execution efficiency based on project needs.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

4 weeks agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks agoByDDD

Hot Tools

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Dreamweaver CS6

Visual web development tools

Hot Topics

Where is the login entrance for gmail email?

7591

CakePHP Tutorial

1386

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

123