


What is the implementation principle of dictionary in Python virtual machine
Dictionary data structure analysis
/* The ma_values pointer is NULL for a combined table * or points to an array of PyObject* for a split table */ typedef struct { PyObject_HEAD Py_ssize_t ma_used; PyDictKeysObject *ma_keys; PyObject **ma_values; } PyDictObject; struct _dictkeysobject { Py_ssize_t dk_refcnt; Py_ssize_t dk_size; dict_lookup_func dk_lookup; Py_ssize_t dk_usable; PyDictKeyEntry dk_entries[1]; }; typedef struct { /* Cached hash code of me_key. */ Py_hash_t me_hash; PyObject *me_key; PyObject *me_value; /* This field is only meaningful for combined tables */ } PyDictKeyEntry;
The meaning of each field above is:
- ##ob_refcnt, the reference count of the object.
- ob_type, the data type of the object.
- ma_used, the number of data in the current hash table.
- ma_keys, points to the array holding key-value pairs.
- ma_values, this points to an array of values, but this value is not necessarily used in the specific implementation of cpython, because the objects in the PyDictKeyEntry array in _dictkeysobject can also store values. This The value may only be used when all keys are strings. In this article, the value in PyDictKeyEntry is mainly used to discuss the implementation of the dictionary, so you can ignore this variable.
- dk_refcnt, this is also used to represent reference counting. This is related to the dictionary view. The principle is similar to reference counting, so we will ignore it here for now.
- dk_size, this represents the size of the hash table, which must be 2n. In this case, the modular operation can be turned into a bitwise AND operation.
- dk_lookup, this represents the lookup function of the hash table, it is a function pointer.
- dk_usable, indicates how many key-value pairs are available in the current array.
- dk_entries, hash table, where key-value pairs are actually stored.
static PyObject * dict_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { PyObject *self; PyDictObject *d; assert(type != NULL && type->tp_alloc != NULL); // 申请内存空间 self = type->tp_alloc(type, 0); if (self == NULL) return NULL; d = (PyDictObject *)self; /* The object has been implicitly tracked by tp_alloc */ if (type == &PyDict_Type) _PyObject_GC_UNTRACK(d); // 因为还没有增加数据 因此哈希表当中 ma_used = 0 d->ma_used = 0; // 申请保存键值对的数组 PyDict_MINSIZE_COMBINED 是一个宏定义 值为 8 表示哈希表数组的最小长度 d->ma_keys = new_keys_object(PyDict_MINSIZE_COMBINED); // 如果申请失败返回 NULL if (d->ma_keys == NULL) { Py_DECREF(self); return NULL; } return self; } // new_keys_object 函数如下所示 static PyDictKeysObject *new_keys_object(Py_ssize_t size) { PyDictKeysObject *dk; Py_ssize_t i; PyDictKeyEntry *ep0; assert(size >= PyDict_MINSIZE_SPLIT); assert(IS_POWER_OF_2(size)); // 这里是申请内存的位置真正申请内存空间的大小为 PyDictKeysObject 的大小加上 size-1 个PyDictKeyEntry的大小 // 这里你可能会有一位为啥不是 size 个 PyDictKeyEntry 的大小 因为在结构体 PyDictKeysObject 当中已经申请了一个 PyDictKeyEntry 对象了 dk = PyMem_MALLOC(sizeof(PyDictKeysObject) + sizeof(PyDictKeyEntry) * (size-1)); if (dk == NULL) { PyErr_NoMemory(); return NULL; } // 下面主要是一些初始化的操作 dk_refcnt 设置成 1 因为目前只有一个字典对象使用 这个 PyDictKeysObject 对象 DK_DEBUG_INCREF dk->dk_refcnt = 1; dk->dk_size = size; // 哈希表的大小 // 下面这行代码主要是表示哈希表当中目前还能存储多少个键值对 在 cpython 的实现当中允许有 2/3 的数组空间去存储数据 超过这个数则需要进行扩容 dk->dk_usable = USABLE_FRACTION(size); // #define USABLE_FRACTION(n) ((((n) << 1)+1)/3) ep0 = &dk->dk_entries[0]; /* Hash value of slot 0 is used by popitem, so it must be initialized */ ep0->me_hash = 0; // 将所有的键值对初始化成 NULL for (i = 0; i < size; i++) { ep0[i].me_key = NULL; ep0[i].me_value = NULL; } dk->dk_lookup = lookdict_unicode_nodummy; return dk; }Hash table expansion mechanismFirst of all, let’s take a look at the expansion mechanism of the hash table in dictionary implementation. When we continue to add new data to the dictionary, the dictionary will soon The data in it will reach 23 of the array length. At this time, it needs to be expanded. The array size after expansion is calculated as follows:
#define GROWTH_RATE(d) (((d)->ma_used*2)+((d)->ma_keys->dk_size>>1))The size of the new array is equal to the number of original key-value pairs multiplied by 2 plus half the length of the original array. In general, there are three main steps for expansion:
- Calculate the size of the new array.
- Create a new array.
- Add the data from the original hash table to the new array (that is, the re-hashing process).
static int insertion_resize(PyDictObject *mp) { return dictresize(mp, GROWTH_RATE(mp)); } static int dictresize(PyDictObject *mp, Py_ssize_t minused) { Py_ssize_t newsize; PyDictKeysObject *oldkeys; PyObject **oldvalues; Py_ssize_t i, oldsize; // 下面的代码的主要作用就是计算得到能够大于等于 minused 最小的 2 的整数次幂 /* Find the smallest table size > minused. */ for (newsize = PyDict_MINSIZE_COMBINED; newsize <= minused && newsize > 0; newsize <<= 1) ; if (newsize <= 0) { PyErr_NoMemory(); return -1; } oldkeys = mp->ma_keys; oldvalues = mp->ma_values; /* Allocate a new table. */ // 创建新的数组 mp->ma_keys = new_keys_object(newsize); if (mp->ma_keys == NULL) { mp->ma_keys = oldkeys; return -1; } if (oldkeys->dk_lookup == lookdict) mp->ma_keys->dk_lookup = lookdict; oldsize = DK_SIZE(oldkeys); mp->ma_values = NULL; /* If empty then nothing to copy so just return */ if (oldsize == 1) { assert(oldkeys == Py_EMPTY_KEYS); DK_DECREF(oldkeys); return 0; } /* Main loop below assumes we can transfer refcount to new keys * and that value is stored in me_value. * Increment ref-counts and copy values here to compensate * This (resizing a split table) should be relatively rare */ if (oldvalues != NULL) { for (i = 0; i < oldsize; i++) { if (oldvalues[i] != NULL) { Py_INCREF(oldkeys->dk_entries[i].me_key); oldkeys->dk_entries[i].me_value = oldvalues[i]; } } } /* Main loop */ // 将原来数组当中的元素加入到新的数组当中 for (i = 0; i < oldsize; i++) { PyDictKeyEntry *ep = &oldkeys->dk_entries[i]; if (ep->me_value != NULL) { assert(ep->me_key != dummy); insertdict_clean(mp, ep->me_key, ep->me_hash, ep->me_value); } } // 更新一下当前哈希表当中能够插入多少数据 mp->ma_keys->dk_usable -= mp->ma_used; if (oldvalues != NULL) { /* NULL out me_value slot in oldkeys, in case it was shared */ for (i = 0; i < oldsize; i++) oldkeys->dk_entries[i].me_value = NULL; assert(oldvalues != empty_values); free_values(oldvalues); DK_DECREF(oldkeys); } else { assert(oldkeys->dk_lookup != lookdict_split); if (oldkeys->dk_lookup != lookdict_unicode_nodummy) { PyDictKeyEntry *ep0 = &oldkeys->dk_entries[0]; for (i = 0; i < oldsize; i++) { if (ep0[i].me_key == dummy) Py_DECREF(dummy); } } assert(oldkeys->dk_refcnt == 1); DK_DEBUG_DECREF PyMem_FREE(oldkeys); } return 0; }Inserting data into the dictionaryWhen we continue to insert data into the dictionary, it is likely that When a hash conflict is encountered, the dictionary's method of handling a hash conflict is basically the same as the method used by a collection to handle a hash conflict. They both use the development address method. However, the implementation of this open address method is relatively complicated. The specific procedures are as follows. Display:
static void insertdict_clean(PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject *value) { size_t i; size_t perturb; PyDictKeysObject *k = mp->ma_keys; // 首先得到 mask 的值 size_t mask = (size_t)DK_SIZE(k)-1; PyDictKeyEntry *ep0 = &k->dk_entries[0]; PyDictKeyEntry *ep; i = hash & mask; ep = &ep0[i]; for (perturb = hash; ep->me_key != NULL; perturb >>= PERTURB_SHIFT) { // 下面便是遇到哈希冲突时的处理办法 i = (i << 2) + i + perturb + 1; ep = &ep0[i & mask]; } assert(ep->me_value == NULL); ep->me_key = key; ep->me_hash = hash; ep->me_value = value; }
The above is the detailed content of What is the implementation principle of dictionary in Python virtual machine. For more information, please follow other related articles on the PHP Chinese website!

ToappendelementstoaPythonlist,usetheappend()methodforsingleelements,extend()formultipleelements,andinsert()forspecificpositions.1)Useappend()foraddingoneelementattheend.2)Useextend()toaddmultipleelementsefficiently.3)Useinsert()toaddanelementataspeci

TocreateaPythonlist,usesquarebrackets[]andseparateitemswithcommas.1)Listsaredynamicandcanholdmixeddatatypes.2)Useappend(),remove(),andslicingformanipulation.3)Listcomprehensionsareefficientforcreatinglists.4)Becautiouswithlistreferences;usecopy()orsl

In the fields of finance, scientific research, medical care and AI, it is crucial to efficiently store and process numerical data. 1) In finance, using memory mapped files and NumPy libraries can significantly improve data processing speed. 2) In the field of scientific research, HDF5 files are optimized for data storage and retrieval. 3) In medical care, database optimization technologies such as indexing and partitioning improve data query performance. 4) In AI, data sharding and distributed training accelerate model training. System performance and scalability can be significantly improved by choosing the right tools and technologies and weighing trade-offs between storage and processing speeds.

Pythonarraysarecreatedusingthearraymodule,notbuilt-inlikelists.1)Importthearraymodule.2)Specifythetypecode,e.g.,'i'forintegers.3)Initializewithvalues.Arraysofferbettermemoryefficiencyforhomogeneousdatabutlessflexibilitythanlists.

In addition to the shebang line, there are many ways to specify a Python interpreter: 1. Use python commands directly from the command line; 2. Use batch files or shell scripts; 3. Use build tools such as Make or CMake; 4. Use task runners such as Invoke. Each method has its advantages and disadvantages, and it is important to choose the method that suits the needs of the project.

ForhandlinglargedatasetsinPython,useNumPyarraysforbetterperformance.1)NumPyarraysarememory-efficientandfasterfornumericaloperations.2)Avoidunnecessarytypeconversions.3)Leveragevectorizationforreducedtimecomplexity.4)Managememoryusagewithefficientdata

InPython,listsusedynamicmemoryallocationwithover-allocation,whileNumPyarraysallocatefixedmemory.1)Listsallocatemorememorythanneededinitially,resizingwhennecessary.2)NumPyarraysallocateexactmemoryforelements,offeringpredictableusagebutlessflexibility.

InPython, YouCansSpectHedatatYPeyFeLeMeReModelerErnSpAnT.1) UsenPyNeRnRump.1) UsenPyNeRp.DLOATP.PLOATM64, Formor PrecisconTrolatatypes.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Dreamweaver Mac version
Visual web development tools

Dreamweaver CS6
Visual web development tools
