Home >Backend Development >PHP Tutorial >PHP kernel analysis: Hash table in PHP_PHP tutorial
The most frequently used data types in PHP are strings and arrays. PHP is relatively easy to use and benefits from the very flexible array type. Before starting to introduce these data types in detail, it is necessary to introduce the hash table (HashTable). The hash table is a particularly critical data structure in PHP implementation.
Hash tables are widely used in practice. For example, compilers usually maintain a symbol table to save tags. Many high-level languages also explicitly support hash tables. Hash tables usually provide operations such as Search, Insert, and Delete. In the worst case, the performance of these operations is the same as that of a linked list, which is O(n). But it's usually not that bad. A properly designed hash algorithm can effectively avoid this kind of situation. Usually the time complexity of these operations in a hash table is O(1). This is why it is loved.
It is precisely because of the convenience and efficiency of hash tables that most current implementations of dynamic languages use hash tables.
In order to facilitate readers to read the following content, here are the basic concepts that appear in the implementation of HashTable in advance. A hash table is a data structure that maps specific keys to specific values through a hash function. It maintains a one-to-one correspondence between keys and values.
Key: An indicator used to manipulate data, such as an index in a PHP array, or a string key, etc.
Slot/bucket: A unit in the hash table used to store data, which is the container where the data is actually stored.
Hash function: A function that maps the key to the location of the slot where the data should be stored.
Hash collision: A situation where a hash function maps two different keys to the same index.
A hash table can be understood as an extension of an array or an associative array. Arrays are addressed using numerical subscripts. If the range of the key (key) is small and it is a number, we can directly use the array to complete the hash table. , and if the key range is too large, if we use the array directly we need to apply for space for all possible keys. In many cases this is unrealistic. Even if there is enough space, space utilization will be low, which is not ideal. At the same time, the key may not be a number, especially in PHP, so people use a mapping function (hash function) to map the key to a specific field:
With a properly designed hash function, we can map the key to a suitable range, because our key space can be very large (such as a string key), which may occur when mapping to a smaller space. When two different keys are mapped to the same index, this is what we call a conflict. Currently, there are two main methods to resolve hash conflicts: linking method and open addressing method.
Conflict Resolution
Linking method: The linking method resolves conflicts by using a linked list to save slot values, that is, when different keys are mapped to a slot, a linked list is used to save these values. Therefore, the linking method is used in the worst case, that is, all keys are mapped to the same slot, and the time complexity of operating the linked list is O(n). Therefore, choosing an appropriate hash function is the most critical. The current implementation of HashTable in PHP uses this method to resolve conflicts.
Open addressing method: There is usually another way to resolve conflicts: open addressing method. Using the open addressing method, the slot itself directly stores data. When inserting data, if the index mapped to the key already has data, this indicates that a conflict has occurred, and the next slot will be searched. If the slot is also occupied, then Continue to search for the next slot until an unoccupied slot is found, and the same policy is used when searching.
Implementation of hash table
It is easy to implement a hash table after understanding the principle of hash table. There are only three main tasks that need to be completed:
Implementing the hash function
Resolving conflicts
Operation interface To implement
first we need a container to save our hash table. The main content that needs to be saved in the hash table is the incoming data. At the same time, in order to conveniently know the number of elements stored in the hash table, it needs to be saved. A size field, the second thing needed is the container to save the data. As an example, a simple hash table will be implemented below. There are two main basic data structures. One is used to save the hash table itself, and the other is a singly linked list used to actually save the data. It is defined as follows:
The above definition is similar to the implementation in PHP. Most of the irrelevant details are trimmed for ease of understanding. In this section, for simplicity, the data type of key is string, and the stored data type can be any type.
The Bucket structure is a singly linked list. This is to solve the problem of multiple key hash conflicts, which is the linking method mentioned earlier. Link conflicting elements when multiple keys map to the same index.
The hash function needs to map different keys to different slots (slots or buckets) as much as possible. First, we use the simplest hash algorithm to implement: add up all the characters of the key string, and then The result is modulo the size of the hash table so that the index falls within the range of the array index.
int hash_init(HashTable *ht); // Initialize hash table
int hash_lookup(HashTable *ht, char *key, void **result); // Find content according to key
int hash_remove(HashTable *ht, char *key); // Delete the content pointed to by key
{
// check if we need to resize the hashtableresize_hash_table_if_needed(ht); // The hash table does not have a fixed size. When the inserted content almost fills up the storage space of the hash table
Arrays in PHP are implemented based on hash tables. When adding elements to the array in sequence, there is an order between the elements. However, the hash table here is obviously close to evenly distributed in physical positions. This is impossible. These elements are obtained according to the order of insertion. In the implementation of PHP, the Bucket structure also maintains another pointer field to maintain the relationship between elements. The specific content is explained in detail in the next section HashTable in PHP. The above example is a streamlined version implemented in PHP.