Home  >  Article  >  Backend Development  >  Among Tuple and List, why can only the former be used as the key of the dictionary?

Among Tuple and List, why can only the former be used as the key of the dictionary?

爱喝马黛茶的安东尼
爱喝马黛茶的安东尼forward
2019-06-03 14:46:103782browse

ManyPythonBeginners often have this question, why does Python have two types: tuple (tuple) and list (list)? Why can tuple be used as the key of a dictionary, but list cannot? To understand this problem, you must first understand how Python's dictionary works.

Among Tuple and List, why can only the former be used as the key of the dictionary?

1. How Python’s dictionary works

In Python, dictionaries are just " Mapping", maps key to value:

# You can get a value for a specific key

value = d[key]

In order to implement this function, Python must It can be done, given a key, to find which value corresponds to this key. Let's first consider a relatively simple implementation. Store all key-value pairs in a list. Whenever needed, traverse the list and use the key to match the key of the key-value pair. If they are equal, , get the value. However, this implementation becomes inefficient when the amount of data is large. Its algorithm complexity is O(n), where n is the number of key-value pairs stored. (For the specific working principle of the Hash table, you can refer to my article.

To this end, Python uses the hash (hash) method to implement it, requiring each object stored in the dictionary to Implement the hash function. This function can generate an int value, called hash value. Through this int value, you can quickly determine the position of the object in the dictionary. However, due to the existence of Hash collision, there may be two objects The hash value is the same, so in the process of searching the dictionary, the hash value must be compared, and the value of value must be compared.

The general process of this query is as follows:

def lookup(d, key):

The dictionary query process is summarized in the following three steps:

1. Calculate the key as a hash value through the hash function.

2. Determine a position through the hash value. This position is an array that stores elements that may have conflicts (called "bucket" in many places).

Each element is A key-value pair. Ideally, there is only 1 element in this array.

3. Traverse the array, find the target key, and return the corresponding value.

h = hash(key)                  # step 1
    cl = d.data[h]                 # step 2
    for pair in cl:                # step 3
        if key == pair[0]:
            return pair[1]
    else:
        raise KeyError, "Key %s not found." % key

For this search process to work properly, the hash function must meet the conditions: if two keys produce different hash values, then the two key objects are not equal. That is

for all i1, i2, if hash(i1) != hash(i2), then i1 != i2

Otherwise, if the hash value is different but the object is the same, then the same object will produce different hash value. When searching, you will enter the wrong bucket (step 2), and you will never find the value you are looking for in the wrong bucket.

In addition, to maintain high search efficiency in the dictionary, you must also ensure that: If two keys produce the same hash value, then they are equal.

for all i1, i2, if hash(i1) == hash(i2), then i1 == i2

The purpose of this is to try to satisfy that each hash bucket has only one element. Why is this? Consider the following hash function.

def hash(obj):

return 1

This hash function satisfies the first condition we talked about above: if the hash values ​​of the two keys are different, then the two key objects are not the same. Because the hash value generated by all objects is 1, there is no key that can generate different hash values, and there is no unsatisfactory situation. But the disadvantage of this is that because all hash values ​​are the same, all objects are assigned to the same place. When searching, in the third step, the traversal efficiency becomes O(n).

The Hash function should ensure that all elements are evenly distributed to each bucket. The ideal situation is that each There is only one element at a position.

The above two principles, the first ensures that you can get the elements you are looking for from the dictionary, and the second ensures query efficiency.


2. The requirements that dictionary Key must meet

After the above discussion, we should understand that Python Why are there such requirements for dictionary keys:

To be used as a dictionary key, the object must support hash function (i.e. __hash__), equality comparison (__eq__ or __cmp__), and meet the requirements we discussed above passed conditions.

3.Why List cannot be used as key

As for this question, the most direct answer is: list does not support the __hash__ method, so why?

For the hash function of list, we may have the following two ways to implement it:

The first one is based on id. This satisfies the condition - "If the hash values ​​are different, then of course their IDs are different." However, considering that lists are generally used as containers, hashing based on ID may lead to the following two situations:

用相同的list作为key去字典中找某个元素可能会得到不同的结果,因为是基于id hash的,所以即使他们的内容相同,字典依然将他们作为不同的元素对待。创建一个一模一样的list用字典查找永远会得到一个KeyError。

第二种,基于内容。tuple就是这样做的,但是要注意一点,tuple是不可以修改的,但list是可以修改的。当list修改之后,你就永远别想再从字典中拿回来了。见下面的代码。  

>>> l = [1, 2]
>>> d = {}
>>> d[l] = 42
>>> l.append(3)
>>> d[l] # 原来的hash值是基于[1, 2]hash的,
         # 现在是基于[1, 2, 3],所以找不到
Traceback (most recent call last):
  File "<interactive input>", line 1, in ?
KeyError: [1, 2, 3]
>>> d[[1, 2]] # 基于hash [1, 2]
              # 但是遍历的时候找不到key相等的键值对
              #(因为字典里的key变成了[1, 2, 3]
Traceback (most recent call last):
  File "<interactive input>", line 1, in ?
KeyError: [1, 2]

   

鉴于两种实现的方式都存在一定的副作用,所以Python规定:

内置的list不能作为字典的key.

但tuple是不可变,所以tuple可以作为字典的key。

(2018年1月2日更新,上面我说tuple不可变可以作为字典的key,这句话并不是完全正确的。tuple只是相对不可改变的,如果tuple中有元素是可变对象,那么虽然tuple不可改变,那么其中元素所指向的对象是可变的,所以同样会出现上面“list不能作为字典的key”这个问题,即含有可变对象的tuple也不能作为字典的key,举个例子就很好懂了。)

In [11]: li = [1,2,] 
In [12]: d = dict() 
In [13]: t2 = (1,2,)
In [14]: t3 = (1,2,li,) 
In [15]: d[li] = 1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-cc334e53316a> in <module>()
----> 1 d[li] = 1
 
TypeError: unhashable type: &#39;list&#39;
 
In [16]: d[t2] = 2
 
In [17]: d[t3] = 3
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-c9021fe91ba8> in <module>()
----> 1 d[t3] = 3
 
TypeError: unhashable type: &#39;list&#39;

   

4.自定义的类型作为字典的Key

用户自定义的类型就可以作为key了,默认的hash(object)是 id(object), 默认的cmp(object1, object2)是cmp(id(object1), id(object2)),同样是可以修改的对象,为什么这里就没有上面说的问题呢?

一般来说,在映射中比较常见的需求是用一个object替换掉原来的,所以id比内容更重要,就可以基于id来hash如果内容重要的话,自定义的类型可以通过覆盖__hash__函数和__cmp__函数或__eq__函数来实现

总结

值得注意的是:将对象和一个value关联起来,更好的做法是将value设置为对象的一个属性。

The above is the detailed content of Among Tuple and List, why can only the former be used as the key of the dictionary?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:kawabangga.com. If there is any infringement, please contact admin@php.cn delete