Home >Backend Development >Python Tutorial >What is the implementation principle of integers in the Python virtual machine?
The implementation data structure of the int type inside cpython is as follows:
typedef struct _longobject PyLongObject; struct _longobject { PyObject_VAR_HEAD digit ob_digit[1]; }; #define PyObject_VAR_HEAD PyVarObject ob_base; typedef struct { PyObject ob_base; Py_ssize_t ob_size; /* Number of items in variable part */ } PyVarObject; typedef struct _object { _PyObject_HEAD_EXTRA Py_ssize_t ob_refcnt; struct _typeobject *ob_type; } PyObject;
The above data structure is represented graphically as shown below:
ob_refcnt represents the number of reference counts of the object. This is very useful for garbage collection. Later we will analyze the garbage collection part of the virtual machine in depth. .
ob_type, indicates the data type of this object. In python, sometimes it is necessary to judge the data type of the data. For example, the two keywords of isinstance and type will be used. field.
ob_size, this field indicates how many elements there are in this integer object array ob_digit.
The digit type is actually a macro definition of the uint32_t type, representing 32-bit integer data.
First of all, we know that integers in python will not overflow. This is why PyLongObject uses arrays. In the internal implementation of cpython, integers include 0, positive numbers, and negative numbers. Regarding this, there are the following regulations in cpython:
0 1⋅21 1⋅22 ... 1⋅229 0⋅230 0⋅231 1⋅232
Because we only use the first 30 bits for each array element, so the second integer data corresponds to 230, everyone You can understand the entire calculation process based on the above results. The above is very simple: −(1⋅20 1⋅21 1⋅22 ... 1⋅229 0⋅230 0⋅231 1⋅2 32)
Small integer poolIn order to avoid frequently creating some commonly used integers and speed up program execution, we can cache some commonly used integers first, if necessary Just return this data directly. The relevant code in cpython is as follows: (The interval of cached data in the small integer pool is [-5, 256])#define NSMALLPOSINTS 257 #define NSMALLNEGINTS 5 static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];We use the following code to test , to see if the data in the small integer pool is used. If so, the return value of their id() is the same for the data in the small integer pool. The id built-in function returns the memory address of the Python object.
>>> a = 1 >>> b = 2 >>> c = 1 >>> id(a), id(c) (4343136496, 4343136496) >>> a = -6 >>> c = -6 >>> id(a), id(c) (4346020624, 4346021072) >>> a = 257 >>> b = 257 >>> id(a), id(c) (4346021104, 4346021072) >>>What we can see from the above results is that for the values in the interval [-5, 256], the return value of id is indeed the same, and the return value not within this interval is different. of. We can also implement a small trick with this feature, which is to find the memory space occupied by a PyLongObject object, because we can use the first memory address of the two data -5 and 256, and then add this address By subtracting, you can get the size of the memory space occupied by 261 PyLongObjects (note that although there are 262 data in the small integer pool, the last data is the first address of the memory, not the last address, so there are only 261 data), so we You can find the memory size of a PyLongObject object.
>>> a = -5 >>> b = 256 >>> (id(b) - id(a)) / 261 32.0 >>>From the above output, we can see that a PyLongObject object occupies 32 bytes. We can use the following C program to view the actual memory space occupied by a PyLongObject.
#include "Python.h" #include <stdio.h> int main() { printf("%ld\n", sizeof(PyLongObject)); return 0; }The output of the above program is as follows: The above two results are equal, so our idea is also verified. The core code for obtaining data from the small integer pool is as follows:
static PyObject * get_small_int(sdigit ival) { PyObject *v; assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS); v = (PyObject *)&small_ints[ival + NSMALLNEGINTS]; Py_INCREF(v); return v; }
如果你了解过大整数加法就能够知道,大整数加法的具体实现过程了,在 cpython 内部的实现方式其实也是一样的,就是不断的进行加法操作然后进行进位操作。
#define Py_ABS(x) ((x) < 0 ? -(x) : (x)) // 返回 x 的绝对值 #define PyLong_BASE ((digit)1 << PyLong_SHIFT) #define PyLong_MASK ((digit)(PyLong_BASE - 1)) static PyLongObject * x_add(PyLongObject *a, PyLongObject *b) { // 首先获得两个整型数据的 size Py_ssize_t size_a = Py_ABS(Py_SIZE(a)), size_b = Py_ABS(Py_SIZE(b)); PyLongObject *z; Py_ssize_t i; digit carry = 0; // 确保 a 保存的数据 size 是更大的 /* Ensure a is the larger of the two: */ if (size_a < size_b) { { PyLongObject *temp = a; a = b; b = temp; } { Py_ssize_t size_temp = size_a; size_a = size_b; size_b = size_temp; } } // 创建一个新的 PyLongObject 对象,而且数组的长度是 size_a + 1 z = _PyLong_New(size_a+1); if (z == NULL) return NULL; // 下面就是整个加法操作的核心 for (i = 0; i < size_b; ++i) { carry += a->ob_digit[i] + b->ob_digit[i]; // 将低 30 位的数据保存下来 z->ob_digit[i] = carry & PyLong_MASK; // 将 carry 右移 30 位,如果上面的加法有进位的话 刚好可以在下一次加法当中使用(注意上面的 carry) // 使用的是 += 而不是 = carry >>= PyLong_SHIFT; // PyLong_SHIFT = 30 } // 将剩下的长度保存 (因为 a 的 size 是比 b 大的) for (; i < size_a; ++i) { carry += a->ob_digit[i]; z->ob_digit[i] = carry & PyLong_MASK; carry >>= PyLong_SHIFT; } // 最后保存高位的进位 z->ob_digit[i] = carry; return long_normalize(z); // long_normalize 这个函数的主要功能是保证 ob_size 保存的是真正的数据的长度 因为可以是一个正数加上一个负数 size 还变小了 } PyLongObject * _PyLong_New(Py_ssize_t size) { PyLongObject *result; /* Number of bytes needed is: offsetof(PyLongObject, ob_digit) + sizeof(digit)*size. Previous incarnations of this code used sizeof(PyVarObject) instead of the offsetof, but this risks being incorrect in the presence of padding between the PyVarObject header and the digits. */ if (size > (Py_ssize_t)MAX_LONG_DIGITS) { PyErr_SetString(PyExc_OverflowError, "too many digits in integer"); return NULL; } // offsetof 会调用 gcc 的一个内嵌函数 __builtin_offsetof // offsetof(PyLongObject, ob_digit) 这个功能是得到 PyLongObject 对象 字段 ob_digit 之前的所有字段所占的内存空间的大小 result = PyObject_MALLOC(offsetof(PyLongObject, ob_digit) + size*sizeof(digit)); if (!result) { PyErr_NoMemory(); return NULL; } // 将对象的 result 的引用计数设置成 1 return (PyLongObject*)PyObject_INIT_VAR(result, &PyLong_Type, size); } static PyLongObject * long_normalize(PyLongObject *v) { Py_ssize_t j = Py_ABS(Py_SIZE(v)); Py_ssize_t i = j; while (i > 0 && v->ob_digit[i-1] == 0) --i; if (i != j) Py_SIZE(v) = (Py_SIZE(v) < 0) ? -(i) : i; return v; }
The above is the detailed content of What is the implementation principle of integers in the Python virtual machine?. For more information, please follow other related articles on the PHP Chinese website!