Home >Backend Development >Python Tutorial >How to implement random values in Python
The ability to generate random values is an important programming skill. Even if you don't need them every day, random values still show up in a surprising number of places. Of course, for many people, when they think of random values, the first things that come to mind are gaming-related use cases like card shuffling, craps, and slot machines.
However, there are many other uses for random values. These include:
Generate a secure password or password reset URL.
Create sample data for test cases or code demonstrations.
Clean personally identifiable information (PII) data to prepare data for analysis.
Like many other important programming tasks, the Python programming language has well-designed, consistent support for generating random values as part of the Python standard library. Some related modules include the random
module, the secrets
module, and the uuid
module. The secrets
module provides cryptographically strong random numbers, while the random
module provides less secure numbers, which can be used for games, test data, simulations, etc. The differences between these two types of random values are discussed below.
Key Differences between “True” Random Numbers and Pseudo-Random Numbers: Pseudo-random number generators use an algorithm to generate seemingly random but deterministic sequence of numbers. In fact, they are so certain that given the same seed value (or starting value), they will reliably generate the same sequence of numbers! Random number generators, on the other hand, rely on physical processes to create truly random numbers.
While pseudo-random number generators are suitable for many purposes, they are not suitable for applications that require true randomness, such as cryptography. This is because someone who knows the algorithm can predict the next number in the sequence. Therefore, true random number generators are essential for safety-critical applications.
As we discussed in the previous section, the random
module is great if we are writing a game or simulating data. However, if we deal with authorization tokens or other secure data, we need the secrets
module.
There are other differences between these two modules. The random module is the older of the two modules and has been around since Python version 1. It has a large functional interface that is essentially a wrapper around a shared instance of a class random.Random
. However, you can always construct your own random class and use it. random.Random
The Random class can be instantiated with a known seed to give a reproducible sequence of random numbers. If missing, a seed value based on the system timer will be used.
This example shows the effect of choosing a seed on the random.Random
class:
"""Creating random integers demo""" from random import Random, randint seed = 42 seeded_1 = Random(seed) seeded_2 = Random(seed) randomly_seeded = Random() # Get a random number between 1 and 1000, inclusive print(seeded_1.randint(1, 1000)) print(seeded_2.randint(1, 1000)) print(randomly_seeded.randint(1, 1000)) # Use the functional interface print(randint(1,1000))
655 655 161 956
Two instances of the random class are instantiated with the same seed, generating the first two line input. Each time I run the program, these lines print 655. (Try it!).
The last two lines are created using a random seed instance and a function interface (also a random seed). The code changes with each run, and in the case of these two lines, any run of the numerical arrangement is completely coincident.
At first glance, the secrets module, first described in PEP 506 and first appearing in Python 3.6, looks very different from the Python random module. Many functional interface wrappers are gone, so, for example, you can't import randint
directly. Additionally, there is the secrets.SystemRandom
class that will ignore any seed value you might pass to it.
However, if we look below these superficial differences, the two classes are very similar, in fact, looking at the source code:
The secrets.SystemRandom
class is actually an alias for the random.SystemRandom
class.
random.SystemRandom
is a subclass of random.Random
. Therefore, generally speaking, functionality available in one system is also available in another system. (Two exceptions are getstate
and setstate
, which are not implemented in SystemRandom
).
The most significant internal difference is that the core "randomization" behavior in SystemRandom is implemented based on os.urandom
. In turn, the urandom
function is passed an integer and returns a number of random bytes using a platform-dependent cryptographically strong random value generator.
我们现在知道我们可以使用这两个类中的任何一个,它们基本上是可互换的,但secrets.SystemRandom
将更真实地随机,因此在更安全的环境中使用。
考虑到这一点,接下来让我们看一些用例。
我们已经了解了如何使用randint
在特定范围内生成随机数。randrange
函数非常相似,只是它不包括上限,而randint
的上限是包含的。如果你足够频繁地运行这段代码,第一行输出会出现5,但第二行不会。
from secrets import SystemRandom rand = SystemRandom() # Integers betewen 1 and 5, inclusive print(rand.randint(1,5)) # Integers between 1 and 5, but not including 5 print(rand.randrange(1,5))
Random和SystemRandom共享了几种方法,允许你根据各种分布生成随机值。这些包括均匀分布(获得两个端点之间的浮点值,类似于randint
提供的)、高斯(正态)分布等。
例如,我们可以创建一个包含20个虚拟IQ值的列表,这些值沿着与真实人群相同的正态曲线随机分布。根据定义,智商的平均值为100,标准差为15。(顺便说一句,为了本示例的目的,我们希望对这种分布建模,即使我们忽略了对这个想法和我们如何测试它提出的合理批评。)
以下是随机创建 20 个 IQ 的“群体”的代码:
"""IQ distribution""" from secrets import SystemRandom rand = SystemRandom() population = [round(rand.gauss(100, 15)) for _ in range(0,20)] print(population)
当然,输出会有所不同。这是一个有代表性的运行:
[102, 90, 88, 82, 102, 93, 127, 121, 94, 107, 103, 80, 106, 106, 84, 107, 108, 88, 123, 121]
在其他语言中,从列表或其他序列中进行选择通常需要两个步骤。首先,你得到一个从0到列表上界(长度减1)的随机数。然后将该索引应用于列表以选择元素。在Python中,choice
和chchoices
ices这两种方法使你能够同时执行这两个步骤。这使得从任何类型的序列中选择所需大小的随机样本非常容易。
例如,给定上面的代码,假设我们想要获取IQ的总体,并从中选择一个或多个值。下面是我们可以快速完成的方法:
# Select one IQ at random print(rand.choice(population)) # Select four IQs at random print(rand.choices(population, k=4))
输出(示例):
102 [107,102,88,103]
因为在Python中使用随机方法很容易从序列中选择随机选项。通过Random.choice
或者Random.choices
函数,在Python中创建随机字符串也很简单。此外,secrets模块定义了一些特殊的函数,根据你的需要,也可以使用这些函数。
让我们首先看看一种通用方法,你可以使用它生成多种类型的字符串。字符串模块包括几个基本上是硬编码字符序列的字符串,例如ascii_lowercase
(a-z)、ascii_uppercase
(A-Z)、ascii_letters
、punctuation
和digits
。Random.choices
或者SystemRandom.choices
可以调用其中任何一个来创建所需长度的数组,然后可以使用str类的join方法将数组转换为新字符串。
我们在以下示例中结合了这些步骤:
from string import ascii_letters, digits, punctuation, ascii_lowercase, ascii_uppercase from secrets import SystemRandom rand = SystemRandom() four_digits = "".join(rand.choices(digits, k=4)) ten_mixed_case = "".join(rand.choices(ascii_letters, k=10)) assorted = ascii_letters + punctuation twenty_assorted = "".join(rand.choices(assorted, k=20)) print(four_digits) print(ten_mixed_case) print(twenty_assorted)
代码输出:
8782 PLZYOxFLoQ !mNsKsF;([I#F(c<jcg><h3 id="h20" data-id="h590b8bf-SL4CIZsY">使用Secrets模块加密随机字符串</h3> <p data-id="p838747a-IPEgWEdV">除了如上所示轻松创建随机字符串外,secrets模块还提供了几个函数,可用于生成各种格式的随机字节序列。在最低级别,我们可以使用<code>token_bytes</code>函数生成各种长度的原始“字节”数组。</p> <pre class="brush:php;toolbar:false">from secrets import token_bytes b = token_bytes(10) print(type(b)) print(b)
代码输出:
<class> b'!\x05P\xc6a\x87\xf9~(\xa9'</class>
原始字节作为加密算法或类似算法的输入可能很有用,但请记住,它们不会包含有效的UTF-8代码点,因此不应使用此函数生成字符串。要获取字符串,可以使用上一节中的技术或下面两个函数中的一个。
我们可以返回一个字符串,它不是以原始格式获取字节,而是以十六进制格式再次由随机字节组成。这为每个字节提供了两个十六进制输出字符:
from secrets import token_hex token = token_hex(10) print(f"Returned a {type(token)} of length: {len(token)}:") print(token)
代码输出:
Returned a <class> of length: 20:</class>
同一系列中的第三个功能——在某些方面可能是最有用的——是token_urlsafe
。此函数允许我们将随机字节字符串转换为稍微修改的base64编码字符串。在这里,每个字节平均产生1.3个字符,结果可以安全地用作URL-例如,表示缩短的URL或用作密码重置令牌。另一个好处是字符串来自比16位token_hex
更大的潜在随机字符集。
from secrets import token_urlsafe token = token_urlsafe(15) print(token)
代码输出:
gfN2nGjO7izMPyXs5tvU
虽然我们在本文中的重点是随机值,但我们现在想花一些时间讨论对于所有实际目的来说都是随机和唯一的值。解决这个问题的一种非常普遍的方法是通用唯一标识符(UUID)的概念。UUID是一个128位的数字,不能100%保证是唯一的,但在统计上很可能是唯一的,以至于发生冲突的机会非常小。
除了大数字之外,UUID还共享一种通用的表示格式。128位的数字可以表示为32个十六进制数字,而UUID添加四个连字符以形成一个36个字符的字符串,以8-4-4-6-12的模式排列。例如:
'967909e3-7231-4040-aae4-8b6b2fb96a0b'
Python模块uuid有几个不同的函数,对应于许多公认的算法,用于创建此类标识符,但建议使用两种最常见的类型之一,uuid1和uuid4。
uuid1值是通过将网络节点id(通常意味着网卡的mac地址)与有关UUID版本和变体的少量信息以及表示高分辨率时间戳的许多位相结合来创建的。
相反,uuid4值通常包含用于存储版本和变体信息的6位,以及122位纯随机数据。因此,根据维基百科,尽管原则上可能存在两个uuid4值的冲突,但在实践中,“在103万亿个版本4 UUIDs中找到重复的概率是十亿分之一。”。
你可以使用Python uuid模块轻松创建uuid1和uuid4值。正如我们将看到的,字符串表示看起来是相同的,尽管描述uuid1字段中的位更有意义。
from uuid import uuid1, uuid4 print(uuid1()) print(uuid4())
代码输出:
bfc89f3e-e6ab-11ec-abfc-4a9b744d17b8 025586c2-50ed-41a6-ae31-bf96b9d79df2
与本文中的大多数内容一样,当我们说“示例输出”时,实际上只是他代码运行其中一次的结果。当然,至少在uuid4的情况下,如果你运行这段代码103万亿次,你有十亿分之一的机会得到与我相同的结果。
在结束对UUID的讨论之前,我们在这里提到它们,因为它们是一个被广泛接受的标准,但正如我们所看到的,uuid4的实现与系统模块中的许多实用函数之间有很多重叠。例如,我经常看到uuid4
函数用于在数据库中生成主键,特别是在NoSQL上下文中,其中可能不支持自动递增字段。原则上,人们也可以使用secrets.token_hex
用于同样的任务,但uuid4可能会使代码的意图更加清晰。
附上产生随机数分布的图以及代码实现:
import random import matplotlib.pyplot as plt x = [random.randint(1, 100) for n in range(100)] y = [random.randint(1, 100) for n in range(100)] plt.figure(figsize=(8,6), dpi=80) plt.scatter(x, y) plt.show()
结果:
The above is the detailed content of How to implement random values in Python. For more information, please follow other related articles on the PHP Chinese website!