Home  >  Article  >  Backend Development  >  How Python uses contextvars to manage context variables

How Python uses contextvars to manage context variables

WBOY
WBOYforward
2022-08-01 14:52:572431browse

This article brings you relevant knowledge about Python. Python introduced a module in 3.7: contextvars. It is easy to see from the name that it refers to context variables. Below Let me tell you in detail how to use contextvars to manage context variables. I hope it will be helpful to you.

How Python uses contextvars to manage context variables

[Related recommendations: Python3 video tutorial ]

Python introduced a module in 3.7: contextvars, from the name It is easy to see that it refers to context variables (Context Variables), so before introducing contextvars we need to first understand what context (Context) is.

Context is an object that contains relevant information. For example: "For example, in a 13-episode anime, you click directly into the eighth episode and see the heroine crying in front of the hero." I believe you don’t know why the heroine is crying at this time, because you have not watched the content of the previous episodes and are missing relevant contextual information.

So Context is not a magical thing, its function is to carry some specified information.

Request in the web framework

We take fastapi and sanic as examples to see how they parse a request when it comes in.

# fastapi
from fastapi import FastAPI, Request
import uvicorn

app = FastAPI()


@app.get("/index")
async def index(request: Request):
    name = request.query_params.get("name")
    return {"name": name}


uvicorn.run("__main__:app", host="127.0.0.1", port=5555)

# -------------------------------------------------------

# sanic
from sanic import Sanic
from sanic.request import Request
from sanic import response

app = Sanic("sanic")


@app.get("/index")
async def index(request: Request):
    name = request.args.get("name")
    return response.json({"name": name})


app.run(host="127.0.0.1", port=6666)

Send a request to test and see if the result is correct.

You can see that the requests are all successful, and for fastapi and sanic, their request and view functions are bound together. That is, when the request comes, it will be encapsulated into a Request object and then passed to the view function.

But this is not the case for flask. Let’s take a look at how flask receives request parameters.

from flask import Flask, request

app = Flask("flask")


@app.route("/index")
def index():
    name = request.args.get("name")
    return {"name": name}


app.run(host="127.0.0.1", port=7777)

We see that for flask, it is through import request. If it is not needed, there is no need to import. Of course, I am not comparing which method is better here, mainly to introduce our topic today. . First of all, for flask, if I define another view function, then the request parameters are still obtained in the same way, but then the problem arises. If different view functions use the same request internally, won't there be a conflict?

Obviously based on our experience in using flask, the answer is no, and the reason is ThreadLocal.

ThreadLocal

ThreadLocal, judging from the name, it can be concluded that it is definitely related to threads. That's right, it is specifically used to create local variables, and the local variables created are bound to threads.

import threading

# 创建一个 local 对象
local = threading.local()

def get():
    name = threading.current_thread().name
    # 获取绑定在 local 上的 value
    value = local.value
    print(f"线程: {name}, value: {value}")

def set_():
    name = threading.current_thread().name
    # 为不同的线程设置不同的值
    if name == "one":
        local.value = "ONE"
    elif name == "two":
        local.value = "TWO"
    # 执行 get 函数
    get()

t1 = threading.Thread(target=set_, name="one")
t2 = threading.Thread(target=set_, name="two")
t1.start()
t2.start()
"""
线程 one, value: ONE
线程 two, value: TWO
"""

You can see that the two threads have no influence on each other, because each thread has its own unique id. When binding the value, it will be bound to the current thread, and the acquisition will also Get it from the current thread. You can think of ThreadLocal as a dictionary:

{
    "one": {"value": "ONE"},
    "two": {"value": "TWO"}
}

More accurately, the key should be the id of the thread. For the sake of intuition, we use the name of the thread instead, but in short, when retrieving, only the key bound to the thread will be retrieved. The value of the variable on the thread.

Flask is also designed in this way, but it does not use threading.local directly, but implements a Local class. In addition to supporting threads, it also supports greenlet coroutines. So how is it implemented? What about? First of all, we know that there are "request context" and "application context" inside flask, which are maintained through stacks (two different stacks).

# flask/globals.py
_request_ctx_stack = LocalStack()
_app_ctx_stack = LocalStack()
current_app = LocalProxy(_find_app)
request = LocalProxy(partial(_lookup_req_object, "request"))
session = LocalProxy(partial(_lookup_req_object, "session"))

Each request will be bound to the current Context and will be destroyed after the request is completed. This process is completed by the framework, and developers only need to use request directly. Therefore, the specific details of the request process can be viewed in the source code. Here we focus on one object: werkzeug.local.Local, which is the Local class mentioned above. It is the key to setting and obtaining variables. Look directly at part of the source code:

# werkzeug/local.py

class Local(object):
    __slots__ = ("__storage__", "__ident_func__")

    def __init__(self):
        # 内部有两个成员:__storage__ 是一个字典,值就存在这里面
        # __ident_func__ 只需要知道它是用来获取线程 id 的即可
        object.__setattr__(self, "__storage__", {})
        object.__setattr__(self, "__ident_func__", get_ident)

    def __call__(self, proxy):
        """Create a proxy for a name."""
        return LocalProxy(self, proxy)

    def __release_local__(self):
        self.__storage__.pop(self.__ident_func__(), None)

    def __getattr__(self, name):
        try:
            # 根据线程 id 得到 value(一个字典)
            # 然后再根据 name 获取对应的值
            # 所以只会获取绑定在当前线程上的值
            return self.__storage__[self.__ident_func__()][name]
        except KeyError:
            raise AttributeError(name)

    def __setattr__(self, name, value):
        ident = self.__ident_func__()
        storage = self.__storage__
        try:
            # 将线程 id 作为 key,然后将值设置在对应的字典中
            # 所以只会将值设置在当前的线程中
            storage[ident][name] = value
        except KeyError:
            storage[ident] = {name: value}

    def __delattr__(self, name):
        # 删除逻辑也很简单
        try:
            del self.__storage__[self.__ident_func__()][name]
        except KeyError:
            raise AttributeError(name)

So we see that the internal logic of flask is actually very simple, and the isolation between threads is achieved through ThreadLocal. Each request will be bound to its own Context, and when obtaining the value, it will also be obtained from the respective Context, because it is used to save relevant information (importantly, it also achieves isolation).

Correspondingly, at this point you have understood the context, but here comes the problem. Whether it is threading.local or Local implemented by flask itself, they are all for threads. What should I do if I use a coroutine defined by async def? How to achieve context isolation of each coroutine? So finally our protagonist is introduced: contextvars.

contextvars

This module provides a set of interfaces that can be used to manage, set, and access the state of local Context in coroutines.

import asyncio
import contextvars

c = contextvars.ContextVar("只是一个标识, 用于调试")

async def get():
    # 获取值
    return c.get() + "~~~"

async def set_(val):
    # 设置值
    c.set(val)
    print(await get())

async def main():
    coro1 = set_("协程1")
    coro2 = set_("协程2")
    await asyncio.gather(coro1, coro2)


asyncio.run(main())
"""
协程1~~~
协程2~~~
"""

ContextVar provides two methods, get and set, for getting and setting values. We see that the effect is similar to ThreadingLocal. Data is isolated between coroutines and will not be affected by each other.

但我们再仔细观察一下,我们是在 set_ 函数中设置的值,然后在 get 函数中获取值。可 await get() 相当于是开启了一个新的协程,那么意味着设置值和获取值不是在同一个协程当中。但即便如此,我们依旧可以获取到希望的结果。因为 Python 的协程是无栈协程,通过 await 可以实现级联调用。

我们不妨再套一层:

import asyncio
import contextvars

c = contextvars.ContextVar("只是一个标识, 用于调试")

async def get1():
    return await get2()

async def get2():
    return c.get() + "~~~"

async def set_(val):
    # 设置值
    c.set(val)
    print(await get1())
    print(await get2())

async def main():
    coro1 = set_("协程1")
    coro2 = set_("协程2")
    await asyncio.gather(coro1, coro2)


asyncio.run(main())
"""
协程1~~~
协程1~~~
协程2~~~
协程2~~~
"""

我们看到不管是 await get1() 还是 await get2(),得到的都是 set_ 中设置的结果,说明它是可以嵌套的。

并且在这个过程当中,可以重新设置值。

import asyncio
import contextvars

c = contextvars.ContextVar("只是一个标识, 用于调试")

async def get1():
    c.set("重新设置")
    return await get2()

async def get2():
    return c.get() + "~~~"

async def set_(val):
    # 设置值
    c.set(val)
    print("------------")
    print(await get2())
    print(await get1())
    print(await get2())
    print("------------")

async def main():
    coro1 = set_("协程1")
    coro2 = set_("协程2")
    await asyncio.gather(coro1, coro2)


asyncio.run(main())
"""
------------
协程1~~~
重新设置~~~
重新设置~~~
------------
------------
协程2~~~
重新设置~~~
重新设置~~~
------------
"""

先 await get2() 得到的就是 set_ 函数中设置的值,这是符合预期的。但是我们在 get1 中将值重新设置了,那么之后不管是 await get1() 还是直接 await get2(),得到的都是新设置的值。

这也说明了,一个协程内部 await 另一个协程,另一个协程内部 await 另另一个协程,不管套娃(await)多少次,它们获取的值都是一样的。并且在任意一个协程内部都可以重新设置值,然后获取会得到最后一次设置的值。再举个栗子:

import asyncio
import contextvars

c = contextvars.ContextVar("只是一个标识, 用于调试")

async def get1():
    return await get2()

async def get2():
    val = c.get() + "~~~"
    c.set("重新设置啦")
    return val

async def set_(val):
    # 设置值
    c.set(val)
    print(await get1())
    print(c.get())

async def main():
    coro = set_("古明地觉")
    await coro

asyncio.run(main())
"""
古明地觉~~~
重新设置啦
"""

await get1() 的时候会执行 await get2(),然后在里面拿到 c.set 设置的值,打印 "古明地觉~~~"。但是在 get2 里面,又将值重新设置了,所以第二个 print 打印的就是新设置的值。\

如果在 get 之前没有先 set,那么会抛出一个 LookupError,所以 ContextVar 支持默认值:

import asyncio
import contextvars

c = contextvars.ContextVar("只是一个标识, 用于调试",
                           default="哼哼")

async def set_(val):
    print(c.get())
    c.set(val)
    print(c.get())

async def main():
    coro = set_("古明地觉")
    await coro

asyncio.run(main())
"""
哼哼
古明地觉
"""

除了在 ContextVar 中指定默认值之外,也可以在 get 中指定:

import asyncio
import contextvars

c = contextvars.ContextVar("只是一个标识, 用于调试",
                           default="哼哼")

async def set_(val):
    print(c.get("古明地恋"))
    c.set(val)
    print(c.get())

async def main():
    coro = set_("古明地觉")
    await coro

asyncio.run(main())
"""
古明地恋
古明地觉
"""

所以结论如下,如果在 c.set 之前使用 c.get:

  • 当 ContextVar 和 get 中都没有指定默认值,会抛出 LookupError;
  • 只要有一方设置了,那么会得到默认值;
  • 如果都设置了,那么以 get 为准;

如果 c.get 之前执行了 c.set,那么无论 ContextVar 和 get 有没有指定默认值,获取到的都是 c.set 设置的值。

所以总的来说还是比较好理解的,并且 ContextVar 除了可以作用在协程上面,它也可以用在线程上面。没错,它可以替代 threading.local,我们来试一下:

import threading
import contextvars

c = contextvars.ContextVar("context_var")

def get():
    name = threading.current_thread().name
    value = c.get()
    print(f"线程 {name}, value: {value}")

def set_():
    name = threading.current_thread().name
    if name == "one":
        c.set("ONE")
    elif name == "two":
        c.set("TWO")
    get()

t1 = threading.Thread(target=set_, name="one")
t2 = threading.Thread(target=set_, name="two")
t1.start()
t2.start()
"""
线程 one, value: ONE
线程 two, value: TWO
"""

和 threading.local 的表现是一样的,但是更建议使用 ContextVars。不过前者可以绑定任意多个值,而后者只能绑定一个值(可以通过传递字典的方式解决这一点)。

c.Token

当我们调用 c.set 的时候,其实会返回一个 Token 对象:

import contextvars

c = contextvars.ContextVar("context_var")
token = c.set("val")
print(token)
"""
<Token var=<ContextVar name=&#39;context_var&#39; at 0x00..> at 0x00...>
"""

Token 对象有一个 var 属性,它是只读的,会返回指向此 token 的 ContextVar 对象。

import contextvars

c = contextvars.ContextVar("context_var")
token = c.set("val")

print(token.var is c)  # True
print(token.var.get())  # val

print(
    token.var.set("val2").var.set("val3").var is c
)  # True
print(c.get())  # val3

Token 对象还有一个 old_value 属性,它会返回上一次 set 设置的值,如果是第一次 set,那么会返回一个 0b605b4cd64c40ca4e894460fc6a4c96。

import contextvars

c = contextvars.ContextVar("context_var")
token = c.set("val")

# 该 token 是第一次 c.set 所返回的
# 在此之前没有 set,所以 old_value 是 <Token.MISSING>
print(token.old_value)  # <Token.MISSING>

token = c.set("val2")
print(c.get())  # val2
# 返回上一次 set 的值
print(token.old_value)  # val

那么这个 Token 对象有什么作用呢?从目前来看貌似没太大用处啊,其实它最大的用处就是和 reset 搭配使用,可以对状态进行重置。

import contextvars
#### 
c = contextvars.ContextVar("context_var")
token = c.set("val")
# 显然是可以获取的
print(c.get())  # val

# 将其重置为 token 之前的状态
# 但这个 token 是第一次 set 返回的
# 那么之前就相当于没有 set 了
c.reset(token)
try:
    c.get()  # 此时就会报错
except LookupError:
    print("报错啦")  # 报错啦

# 但是我们可以指定默认值
print(c.get("默认值"))  # 默认值

contextvars.Context

它负责保存 ContextVars 对象和设置的值之间的映射,但是我们不会直接通过 contextvars.Context 来创建,而是通过 contentvars.copy_context 函数来创建。

import contextvars

c1 = contextvars.ContextVar("context_var1")
c1.set("val1")
c2 = contextvars.ContextVar("context_var2")
c2.set("val2")

# 此时得到的是所有 ContextVar 对象和设置的值之间的映射
# 它实现了 collections.abc.Mapping 接口
# 因此我们可以像操作字典一样操作它
context = contextvars.copy_context()
# key 就是对应的 ContextVar 对象,value 就是设置的值
print(context[c1])  # val1
print(context[c2])  # val2
for ctx, value in context.items():
    print(ctx.get(), ctx.name, value)
    """
    val1 context_var1 val1
    val2 context_var2 val2
    """

print(len(context))  # 2

除此之外,context 还有一个 run 方法:

import contextvars

c1 = contextvars.ContextVar("context_var1")
c1.set("val1")
c2 = contextvars.ContextVar("context_var2")
c2.set("val2")

context = contextvars.copy_context()

def change(val1, val2):
    c1.set(val1)
    c2.set(val2)
    print(c1.get(), context[c1])
    print(c2.get(), context[c2])

# 在 change 函数内部,重新设置值
# 然后里面打印的也是新设置的值
context.run(change, "VAL1", "VAL2")
"""
VAL1 VAL1
VAL2 VAL2
"""

print(c1.get(), context[c1])
print(c2.get(), context[c2])
"""
val1 VAL1
val2 VAL2
"""

我们看到 run 方法接收一个 callable,如果在里面修改了 ContextVar 实例设置的值,那么对于 ContextVar 而言只会在函数内部生效,一旦出了函数,那么还是原来的值。但是对于 Context 而言,它是会受到影响的,即便出了函数,也是新设置的值,因为它直接把内部的字典给修改了。

小结

以上就是 contextvars 模块的用法,在多个协程之间传递数据是非常方便的,并且也是并发安全的。如果你用过 Go 的话,你应该会发现和 Go 在 1.7 版本引入的 context 模块比较相似,当然 Go 的 context 模块功能要更强大一些,除了可以传递数据之外,对多个 goroutine 的级联管理也提供了非常清蒸的解决方案。

总之对于 contextvars 而言,它传递的数据应该是多个协程之间需要共享的数据,像 cookie, session, token 之类的,比如上游接收了一个 token,然后不断地向下透传。但是不要把本应该作为函数参数的数据,也通过 contextvars 来传递,这样就有点本末倒置了。

【相关推荐:Python3视频教程

The above is the detailed content of How Python uses contextvars to manage context variables. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:jb51.net. If there is any infringement, please contact admin@php.cn delete