PHP kernel analysis: Hash table in PHP_PHP tutorial-PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

PHP kernel analysis: Hash table in PHP_PHP tutorial

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 13, 2016 am 10:39 AM

phpHash table

The most frequently used data types in PHP are strings and arrays. PHP is relatively easy to use and benefits from the very flexible array type. Before starting to introduce these data types in detail, it is necessary to introduce the hash table (HashTable). The hash table is a particularly critical data structure in PHP implementation.

Hash tables are widely used in practice. For example, compilers usually maintain a symbol table to save tags. Many high-level languages also explicitly support hash tables. Hash tables usually provide operations such as Search, Insert, and Delete. In the worst case, the performance of these operations is the same as that of a linked list, which is O(n). But it's usually not that bad. A properly designed hash algorithm can effectively avoid this kind of situation. Usually the time complexity of these operations in a hash table is O(1). This is why it is loved.
It is precisely because of the convenience and efficiency of hash tables that most current implementations of dynamic languages use hash tables.

In order to facilitate readers to read the following content, here are the basic concepts that appear in the implementation of HashTable in advance. A hash table is a data structure that maps specific keys to specific values through a hash function. It maintains a one-to-one correspondence between keys and values.
Key: An indicator used to manipulate data, such as an index in a PHP array, or a string key, etc.

Slot/bucket: A unit in the hash table used to store data, which is the container where the data is actually stored.

Hash function: A function that maps the key to the location of the slot where the data should be stored.

Hash collision: A situation where a hash function maps two different keys to the same index.

A hash table can be understood as an extension of an array or an associative array. Arrays are addressed using numerical subscripts. If the range of the key (key) is small and it is a number, we can directly use the array to complete the hash table. , and if the key range is too large, if we use the array directly we need to apply for space for all possible keys. In many cases this is unrealistic. Even if there is enough space, space utilization will be low, which is not ideal. At the same time, the key may not be a number, especially in PHP, so people use a mapping function (hash function) to map the key to a specific field:

Copy code The code is as follows:

h(key) -> index

With a properly designed hash function, we can map the key to a suitable range, because our key space can be very large (such as a string key), which may occur when mapping to a smaller space. When two different keys are mapped to the same index, this is what we call a conflict. Currently, there are two main methods to resolve hash conflicts: linking method and open addressing method.

Conflict Resolution

Linking method: The linking method resolves conflicts by using a linked list to save slot values, that is, when different keys are mapped to a slot, a linked list is used to save these values. Therefore, the linking method is used in the worst case, that is, all keys are mapped to the same slot, and the time complexity of operating the linked list is O(n). Therefore, choosing an appropriate hash function is the most critical. The current implementation of HashTable in PHP uses this method to resolve conflicts.
Open addressing method: There is usually another way to resolve conflicts: open addressing method. Using the open addressing method, the slot itself directly stores data. When inserting data, if the index mapped to the key already has data, this indicates that a conflict has occurred, and the next slot will be searched. If the slot is also occupied, then Continue to search for the next slot until an unoccupied slot is found, and the same policy is used when searching.

Implementation of hash table

It is easy to implement a hash table after understanding the principle of hash table. There are only three main tasks that need to be completed:
Implementing the hash function
Resolving conflicts
Operation interface To implement
first we need a container to save our hash table. The main content that needs to be saved in the hash table is the incoming data. At the same time, in order to conveniently know the number of elements stored in the hash table, it needs to be saved. A size field, the second thing needed is the container to save the data. As an example, a simple hash table will be implemented below. There are two main basic data structures. One is used to save the hash table itself, and the other is a singly linked list used to actually save the data. It is defined as follows:

Copy code The code is as follows:

typedef struct _Bucket
{
char *key;
void *value;
struct _Bucket *next;

} Bucket;

typedef struct _HashTable
{
int size;
Bucket* buckets;
} HashTable;

The above definition is similar to the implementation in PHP. Most of the irrelevant details are trimmed for ease of understanding. In this section, for simplicity, the data type of key is string, and the stored data type can be any type.
The Bucket structure is a singly linked list. This is to solve the problem of multiple key hash conflicts, which is the linking method mentioned earlier. Link conflicting elements when multiple keys map to the same index.
The hash function needs to map different keys to different slots (slots or buckets) as much as possible. First, we use the simplest hash algorithm to implement: add up all the characters of the key string, and then The result is modulo the size of the hash table so that the index falls within the range of the array index.

Copy code The code is as follows:

static int hash_str(char *key)
{
int hash = 0;

char *cur = key;

while(*(cur++) != '
This hash algorithm is relatively simple, but its effect is not good. This hash algorithm will not be used in actual scenarios. For example, the algorithm used in PHP is called DJBX33A. Here are open source software such as Mysql and OpenSSL. The hash algorithm used, interested readers can refer to it.
Implementation of operation interface
In order to operate the hash table, the following operation functions are implemented:

Copy code

The code is as follows:

int hash_init(HashTable *ht); // Initialize hash table
int hash_lookup(HashTable *ht, char *key, void **result); // Find content according to key

int hash_insert(HashTable *ht, char *key, void *value); // Insert content into hash table Medium

int hash_remove(HashTable *ht, char *key); // Delete the content pointed to by key

int hash_destroy(HashTable *ht); The following takes the insertion and acquisition operation functions as an example:

Copy code

The code is as follows:

int hash_insert(HashTable *ht, char *key, void *value)

{

// check if we need to resize the hashtable

resize_hash_table_if_needed(ht); // The hash table does not have a fixed size. When the inserted content almost fills up the storage space of the hash table

// it will The hash table is expanded to accommodate all elements int index = HASH_INDEX(ht, key); // Find the index mapped to key Bucket *org_bucket = ht-> buckets[index];

Bucket *bucket = (Bucket *)malloc(sizeof(Bucket)); // Apply for space for new elements

bucket->key = strdup(key);
//Save the value content in. Here we simply point the pointer to the content to be stored without copying the content.
bucket->value = value;

LOG_MSG("Insert data p: %pn", value);

ht->elem_num += 1; // Record it The number of elements in the hash table now

if(org_bucket != NULL) { // A collision occurred, place the new element at the head of the linked list
LOG_MSG("Index collision found with org hashtable : %pn", org_bucket);
bucket->next = org_bucket;
}

ht->buckets[index]= bucket;

LOG_MSG("Element inserted at index %i, now we have: %i elementsn",
index, ht->elem_num);

return SUCCESS;
}

The insertion operation of the above hash table is relatively simple. Simply use the key to hash, find the location where the element should be stored, and check whether the location already has content. If a collision occurs, the new element will be linked to the original one. The head of the element list. When searching, follow the same strategy to find the location of the element. If an element exists, compare the keys of all elements in the linked list with the key to be found in sequence until a consistent element is found. Otherwise, it means that the value has no matching content. .

Copy code

The code is as follows:

int hash_lookup(HashTable *ht, char *key, void **result)
{
int index = HASH_INDEX(ht, key);
Bucket *bucket = ht-> buckets[index];

if(bucket == NULL) return FAILED;

// Search this linked list to find the correct element. Usually this linked list should have only one element, that is No need to loop multiple times
//. To ensure this, a suitable hash algorithm is required, see the previous link for related hash functions.
while(bucket)
{
if(strcmp(bucket->key, key) == 0)
{
LOG_MSG("HashTable found key in index: %i with key : %s value: %pn",
index, key, bucket->value);
*result = bucket->value;
return SUCCESS;
}

bucket = bucket->next;
}

LOG_MSG("HashTable lookup missed the key: %sn", key);
return FAILED;
}

Arrays in PHP are implemented based on hash tables. When adding elements to the array in sequence, there is an order between the elements. However, the hash table here is obviously close to evenly distributed in physical positions. This is impossible. These elements are obtained according to the order of insertion. In the implementation of PHP, the Bucket structure also maintains another pointer field to maintain the relationship between elements. The specific content is explained in detail in the next section HashTable in PHP. The above example is a streamlined version implemented in PHP.

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn