Home >Backend Development >PHP Tutorial >Variables explored in PHP kernel (3) - hash table_PHP tutorial

Variables explored in PHP kernel (3) - hash table_PHP tutorial

WBOY
WBOYOriginal
2016-07-13 10:11:20892browse

PHP kernel exploration variables (3) - hash table

In PHP, in addition to zval, another important data structure is the hash table. For example, our most common array is the hash table at the bottom. In addition to arrays, there are almost traces of Hash tables in thread safety (TSRM), GC, resource management, Global variables, and ini configuration management (we also mentioned last time that symbol tables are also implemented using Hash tables). So, what is special about this kind of data in PHP, and how is the structure implemented? With these questions, we begin this journey of kernel exploration.
Main content of this article:
Basic introduction to Hash table
The structure and implementation of PHP’s underlying Hash table
Zend Hash table API
1. Basic introduction and background knowledge of Hash table
1. Basic definition
Hash table, also called hash table, hash table, Hash table, the definition of hash table on Wikipedia is: "Hash table is a data that directly accesses the data stored in the memory location based on the keyword (Key value) Structure. That is, it accesses records by mapping the key value to a location in the table through the calculation of a function, which speeds up the search. This mapping function is called a hash function, and the array storing the records is called a hash table. ." Extracting the main stem of the article, we can get the following information:
(1).Hash table is a data structure.
(2). This data structure is an extension of ordinary array.
(3). This data structure makes insertion and search very efficient through the key->value mapping relationship (the array can be directly addressed and any element can be accessed in O(1) time).
We know that in general arrays, linear tables, and trees, the position of records in the structure is relatively random, that is, there is no direct and definite correspondence between records and keywords. In these structures, to find and insert keywords, a series of comparisons are often required, and the search efficiency is usually O(n) or O(lgn). The Hash table establishes the corresponding relationship between keywords and records through the Hash function, so that ordinary search and insertion operations can be completed in O(1) (average time complexity) time. This is obviously the most ideal search method. .
2. Hash function
As mentioned above, the Hash function establishes the correspondence between keywords and records, namely: Record = Hash(key). This correspondence is as follows:
Theoretically, the hash function can be any function such as Crc32, unique_id, MD5, SHA1 or user-defined function. The quality of this function is directly related to the performance of the Hash table (considering the performance of conflicts and searches). Here are several common Hash functions and corresponding implementations. Interested children can take a look. A typical string Hash algorithm is as follows:
function hash( $key ){
$result = 0;
$len = strlen($key);
for($i = 0;$i < $len; $i++ ){
$result += ord($key{$i}) * ((1 << 5) + 1);
}
return $result;
}
3.Conflict resolution
In an ideal situation, we expect that the Hash value calculated by any keyword is unique, so that we can directly locate the record we are looking for through Hash(key). But unfortunately, there is almost no Hash function that can satisfy such characteristics (even if there is such a Hash function, it may be too complicated to be used in practice). In other words, even with a well-designed Hash function, key1 != key2 but hash(key1) = hash(key2) often occurs. This is a Hash conflict (Hash collision). There are many main methods to resolve Hash collisions (see here). As an example, we will only briefly discuss the chaining method to resolve conflicts. The basic idea of ​​this method is: when a conflict occurs in the hash table, all records with the same hash value are linked in the form of a linked list, and only the head pointer of the linked list is stored in the hash table. The underlying Hash table of PHP uses linked lists (double linked lists) to resolve hash conflicts.
The implementation of a simple Hash table is as follows:
Class HashTable{
private $buckets = null;
/* current size */
private $size = 0;
/* max hashtable size */
private $max = 2048;
private $mask = 0;
    public function __construct($size){
        $this->_init_hash($size);
    }
     
    /* hashtable init */
    private function _init_hash($size){
        if($size > $this->max){
            $size = $this->max;
        }
 
        $this->size     = $size;
        $this->mask    = $this->size - 1;
     
        // SplFixedArray is faster when the size is known
        // see http://php.net/manual/en/class.splfixedarray.php
        $this->buckets = new SplFixedArray($this->size);
    }
 
    public function hash( $key ){
        $result = 0;
        $len  = strlen($key);
  
        for($i = 0;$i < $len; $i++ ){
            $result += ord($key{$i}) * ((1 << 5) + 1);
        }
        return $result % ($this->size);
    }
 
    /* 拉链法 */
    public function insert( $key, $val ){
        $h = $this->hash($key);
        if(!isset($this->buckets[$h])){
            $next = NULL;
        }else{
            $next = $this->bucket[$h];
        }
        $this->buckets[$h] = new Node($key, $val, $next);
    }
   
    /* 拉链法 */
    public function lookup( $key ){
        $h   = $this->hash($key);
        $cur = $this->buckets[$h];
  
        while($cur !== NULL){
            if( $cur->key == $key){
                return $cur->value;
            }
            $cur = $cur->next;
        }
        return NULL;
    }
}
 
Class Node{
    public $key;
    public $value;
    public $next = null;
  
    public function __construct($key, $value, $next = null){
        $this->key   = $key;
        $this->value = $value;
        $this->next  = $next;
    }
}
 
$hash = new HashTable(200);
$hash->insert('apple','this is apple');
$hash->insert('orange','this is orange');
$hash->insert('banana','this is banana');
echo $hash->lookup('apple');
       我们知道,在PHP中,数组支持k->v这样的关联数组,也支持普通的数组。不仅支持直接寻址(根据关键字直接定位),而且支持线性遍历(foreach等)。这都要归功于Hash table这一强大和灵活的数据结构。那么,在PHP底层,Hash table究竟是如何实现的呢?我们一步步来看。
 
二、PHP中Hash table的基本结构和实现
 
1.   基本数据结构
 
在PHP底层,与Hash table相关的结构定义、算法实现都位于Zend/zend_hash.c和Zend/zend_hash.h这两个文件中。PHP 的hash table实现包括两个重要的数据结构,一个是HashTable,另一个是bucket.前者是hash table的主体,后者则是构成链表的每个“结点”,是真正数据存储的容器。
 
(1)   HashTable的基本结构
 
定义如下(zend_hash.h):
 
typedef struct _hashtable {
uint nTableSize;
uint nTableMask;
uint nNumOfElements;
ulong nNextFreeElement;
Bucket *pInternalPointer; /* Used for element traversal */
Bucket *pListHead;
Bucket *pListTail;
Bucket **arBuckets;
dtor_func_t pDestructor;
zend_bool persistent;
unsigned char nApplyCount;
zend_bool bApplyProtection;
#if ZEND_DEBUG
int inconsistent;
#endif
} HashTable;
This is a structure, with several important members:
nTableSize This member is used to indicate the size of the Hash table. When the hash table is initialized, the size of nTableSize will be set. When the hash table is expanded, the size of this value will also be adjusted accordingly. Note that this value is not the number of elements in the hash table.
nTableMask is a "mask", mainly used to quickly calculate the index of an element (nIndex = h & ht->nTableMask. In the general Hash function, the index is determined through modulo operation, but obviously, Bit operations are more efficient than modular operations), after arBuckets is initialized, this value is fixed to nTableSize – 1 by default;
nNumOfElements This member saves the number of elements saved in the hashtable. Normally, we use count($arr) in PHP scripts to be consistent with this result (see ext/standard/array.c)
nNextFreeElement This field records the next available index position. When we use $array[] = 'key' in the script, we use the index value given by nNextFreeElement (zend_hash.c):
if (flag & HASH_NEXT_INSERT) {
h = ht->nNextFreeElement;
}
pInternalPointer This is a pointer. In PHP scripts, when we use current, next, key, end and other array-related operations, we use the pInternalPointer pointer to complete it.
pListHead and pListTail The bottom layer of PHP actually maintains two important data structures. In addition to the hash table (and the doubly linked list used to resolve conflicts), there is also a doubly linked list for linear scanning of hash table elements. pListHead and pListTail point to the head and tail of this double linked list.
arBuckets This is an array of bucket * type. Each element in the array is a bucket* pointer. Elements with the same hash value are connected into a double linked list through the pNext and pLast pointers of the bucket (this double linked list is the same as the previous one). The double linked list used for linear traversal is not the same thing). Therefore, the bucket is the container that actually stores the data.
nApplyCount and bApplyProtection provide a protection mechanism, mainly used to prevent infinite recursion caused by circular references.
persistent This is a Boolean variable that will affect the way memory is allocated. This involves some knowledge of PHP memory management. We will not explain more for the time being. For details, please refer to
(2) Another data structure is Bucket
The structure is defined as:
typedef struct bucket {
ulong h;
uint nKeyLength;
void *pData;
void *pDataPtr;
struct bucket *pListNext;
struct bucket *pListLast;
struct bucket *pNext;
struct bucket *pLast;
const char *arKey;
} Bucket;
Among them:
h ,arKey,nKeyLength In PHP arrays, there are two different types of indexes, one is numeric index, which is very similar to arrays in C (such as $arr = array(1=>'cont')), The other type is string index, which uses keywords as the index of array items (such as $arr = array('index'=>'cont');). These two types of indexes use different mechanisms at the bottom of PHP To distinguish: for numeric indexes, h is used directly as the hash value. At the same time, arKey=NULL and nKeyLength=0. For string indexes, arKey saves the string key, nKeyLength saves the length of the key, and h is the character. The hash value of the string calculated by the hash function. In this way, in PHP, h, arKey, nKeyLength are actually used to uniquely determine an element in the array. This can also be seen from the definition of the zend_hash_key structure:
typedef struct _zend_hash_key {
const char *arKey;
uint nKeyLength;
ulong h;
} zend_hash_key;
The position of the array element in the hashtable is determined through h & ht->nTableMask:
/* String index */
h = zend_inline_hash_func(arKey, nKeyLength);
nIndex = h & ht->nTableMask;
/* Numeric index-append $arr[] = 'test'; this form */
if (flag & HASH_NEXT_INSERT) {
h = ht->nNextFreeElement;
}
/* Use h directly when specifying a numerical index */
nIndex = h & ht->nTableMask;
pData and pDataPtr Normally, the data in the Bucket is stored in the memory space pointed to by the pData pointer. But there are exceptions, such as saving a pointer. At this time, pDataPtr points to this pointer, and pData points to pDataPtr. This can be seen from the macro definition of INIT_DATA:
#define INIT_DATA(ht, p, pData, nDataSize);
if (nDataSize == sizeof(void*)) {                                                                                      
memcpy(&(p)->pDataPtr, pData, sizeof(void *));
(p)->pData=&(p)->pDataPtr;
}else{                                                                                                         
(p)->pData = (void *) pemalloc_rel(nDataSize, (ht)->persistent);
if(!(p)->pData){
pefree_rel(p, (ht)->persistent);
return FAILURE;
                                                                            
memcpy((p)->pData,pData,nDataSize);
(p)->pDataPtr=NULL;
}
pListNext and pListLast, pNext and pLast have been introduced before, pListNext and pListLast constitute the entire double linked list for traversal. pNext and pLast are used to link Buckets with the same hash value when a hash conflict occurs. The structures of these two doubly linked lists are as shown below:
a. Double linked list when hash conflict occurs:
b. Double linked list for global use:
It should be noted that these two double linked list structures do not exist alone, but are related to each other. In related operations of HashTable, these two linked lists need to be maintained at the same time:
It can be seen that PHP's hashTable is quite complex. It is this complexity that makes PHP's array operations very flexible (arrays in PHP can be used as arrays, stacks, and queues, which can be said to be very convenient)
3. Implementation of HashTable
1. HashTable related macro definitions
In order to facilitate the operation of HashTable, PHP defines a lot of macros at the bottom level. These macros include:
(1). CONNECT_TO_BUCKET_DLLIST(element, list_head)
This macro is used to insert elements into the head of the Bucket's double-linked list. That is to say, when a conflict occurs, the newly inserted element is always located at the head of the Bucket's linked list. The definition of this macro is:
#define CONNECT_TO_BUCKET_DLLIST(element, list_head)
(element)->pNext = (list_head);
(element)->pLast = NULL;
if ((element)->pNext) {                                                                                                                                
(element)->pNext->pLast = (element);
}
(2). CONNECT_TO_GLOBAL_DLLIST(element, ht)
Different from the above, this one inserts elements to the end of the globally traversed double-linked list. This double-linked list acts like a queue, and it ensures the correct order when we traverse the array. The definition of this macro is:
1 #define CONNECT_TO_GLOBAL_DLLIST(element, ht)
2 (element)->pListLast = (ht)->pListTail;
3 (ht)->pListTail = (element);
4 (element)->pListNext = NULL;
5 if ((element)->pListLast != NULL) {                                                                        
6 (element)->pListLast->pListNext = (element);
7 }                                                                                                  
8
9 if (!(ht)->pListHead) {                                                                       
10 (ht)->pListHead = (element);
11 }                                                                                           
12
13 if ((ht)->pInternalPointer == NULL) {                                                                  
14 (ht)->pInternalPointer = (element);
15 }
(3). HASH_PROTECT_RECURSION(ht)
This macro is mainly used to prevent the HashTable from being too deep when it is recursively traversed. It is a protection mechanism
#define HASH_PROTECT_RECURSION(ht)
if ((ht)->bApplyProtection) {
if ((ht)->nApplyCount++ >= 3) {
zend_error(E_ERROR, "Nesting level too deep - recursive dependency?");
} }
}
(4).ZEND_HASH_IF_FULL_DO_RESIZE(ht)
The size of the HashTable is not fixed. When nNumOfElements > nTableSize, the HashTable will be expanded to accommodate more elements. This is achieved through this macro (actually, it is achieved by calling zend_hash_do_resize of). The macro is defined as:
#define ZEND_HASH_IF_FULL_DO_RESIZE(ht)                                              
if ((ht)->nNumOfElements > (ht)->nTableSize) {
zend_hash_do_resize(ht);
}
(5). INIT_DATA(ht, p, pData, nDataSize)
There are actually two situations here. If the data to be saved itself is a pointer, then pDataPtr saves the pointer and points pData to the address of pDataPtr:
if (nDataSize == sizeof(void*)) {
memcpy(&(p)->pDataPtr, pData, sizeof(void *));
(p)->pData = &(p)->pDataPtr;
}
Otherwise, if ordinary data is saved, apply for allocation of memory of nDataSize bytes, and copy the content pointed to by pData to the memory of p->pData. Here, copying is performed through memcpy, because its src and dest pointers are both void *, so almost any type of data can be copied:
else {
(p)->pData = (void *) pemalloc_rel(nDataSize, (ht)->persistent);
if (!(p)->pData) {
pefree_rel(p, (ht)->persistent);
return FAILURE;
}
memcpy((p)->pData, pData, nDataSize);
(p)->pDataPtr=NULL;
}
The entire macro is defined as:
#define UPDATE_DATA(ht, p, pData, nDataSize)                                                                              
if (nDataSize == sizeof(void*)) {                                                                                  
if ((p)->pData != &(p)->pDataPtr) {
pefree_rel((p)->pData, (ht)->persistent);
                                                                          
memcpy(&(p)->pDataPtr, pData, sizeof(void *));
(p)->pData = &(p)->pDataPtr;
} else {                                                                                       
                                                                                  
(p)->pData = (void *) pemalloc_rel(nDataSize, (ht)->persistent);
(p)->pDataPtr=NULL;
                                                                                                 
          (p)->pData = (void *) perealloc_rel((p)->pData, nDataSize, (ht)->persistent);
                                                                                                                                                                                                                                         
                                                                          
memcpy((p)->pData, pData, nDataSize);
}
(6). UPDATE_DATA(ht, p, pData, nDataSize)
Similar to INIT_DATA, the difference is that more processing needs to be done on the previous memory block (for example, the actual data saved by pData before, but the pointer saved after update, the originally applied for memory needs to be released, otherwise It will cause a memory leak. On the contrary, if the pointer data was saved before and ordinary data is saved after update, pDataPtr should be set to NULL and new memory space should be allocated for pData). The definition of this macro is:
#define UPDATE_DATA(ht, p, pData, nDataSize)                                                                              
if (nDataSize == sizeof(void*)) {                                                                                  
if ((p)->pData != &(p)->pDataPtr) {
pefree_rel((p)->pData, (ht)->persistent);
                                                                          
memcpy(&(p)->pDataPtr, pData, sizeof(void *));
(p)->pData = &(p)->pDataPtr;
} else {                                                                                       
                                                                                  
(p)->pData = (void *) pemalloc_rel(nDataSize, (ht)->persistent);
(p)->pDataPtr=NULL;
                                                                                                 
(p)->pData = (void *) perealloc_rel((p)->pData, nDataSize, (ht)->persistent);
                                                                                                                                                                                                                                         
                                                                          
memcpy((p)->pData, pData, nDataSize);
}
(7). CHECK_INIT(ht)
After calling _zend_hash_init() to initialize the hash table, arBuckets does not actually allocate memory space, and the value of nTableMask is not set. CHECK_INIT will check whether arBuckets has been initialized (nTableMask==0 means not initialized). If not initialized, allocate memory space for arBuckets and set the value of nTableMask to nTableSize – 1. The macro is defined as:
#define CHECK_INIT(ht) do {                                                                                     
if (UNEXPECTED((ht)->nTableMask == 0)) {                                                                      
(ht)->arBuckets = (Bucket **) pecalloc((ht)->nTableSize, sizeof(Bucket *), (ht)->persistent);
(ht)->nTableMask = (ht)->nTableSize - 1;
}                                                                                     
} while (0)
2. Hash function
When I was writing this article, I found that Brother Niao had already written an article "Hash Algorithm in PHP", which gave a more detailed answer to the hash algorithm and ideas, so I won't explain too much here. , just say one thing: unrolled. Unrolled itself means expansion. For keys with nKeyLength length, PHP’s hash algorithm will unroll in units of 8, which is in the form:
for (; nKeyLength >= 8; nKeyLength -= 8) {
hash = ((hash << 5) + hash) + *arKey++;
hash = ((hash << 5) + hash) + *arKey++;
hash = ((hash << 5) + hash) + *arKey++;
hash = ((hash << 5) + hash) + *arKey++;
hash = ((hash << 5) + hash) + *arKey++;
hash = ((hash << 5) + hash) + *arKey++;
hash = ((hash << 5) + hash) + *arKey++;
hash = ((hash << 5) + hash) + *arKey++;
}
Then why not just use a loop?
For example:
for(;nKeyLength > 0; nKeyLength--){
hash = ((hash << 5) + hash) + *arKey++;
}
There is actually no problem with this, and the reason for unrolling is naturally more efficient: for the CPU, generally sequential instructions are faster than loops (the latter is represented by JMP, JNZ and other jumps in assembly instructions. and the comparison before the loop). At the same time, there will be better efficiency for string indexes below 8 bits.
By the way, the implementation source code of the hash function is posted:
/*
* 1. Inline static is to improve efficiency
* 2. Const qualifies arKey, indicating that the content of arKey should not be modified in the function
*/
static inline ulong zend_inline_hash_func(const char *arKey, uint nKeyLength)
{
/* 3. Register variables, also to improve efficiency */
register ulong hash = 5381;
/* 4. variant with the hash unrolled eight times */
for (; nKeyLength >= 8; nKeyLength -= 8) {
hash = ((hash << 5) + hash) + *arKey++;
hash = ((hash << 5) + hash) + *arKey++;
hash = ((hash << 5) + hash) + *arKey++;
hash = ((hash << 5) + hash) + *arKey++;
hash = ((hash << 5) + hash) + *arKey++;
hash = ((hash << 5) + hash) + *arKey++;
hash = ((hash << 5) + hash) + *arKey++;
hash = ((hash << 5) + hash) + *arKey++;
}
switch (nKeyLength) {
case 7: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
case 6: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
case 5: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
case 4: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
case 3: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
case 2: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
case 1: hash = ((hash << 5) + hash) + *arKey++; break;
case 0: break;
EMPTY_SWITCH_DEFAULT_CASE()
}
/* 5. The returned hash value has not undergone modulo operation */
return hash;
}
3. Initialization, add/update, search, delete, etc. API
(1). Initialization
_zend_hash_init is used for the initialization operation of hash table (mainly including assigning initial values ​​to the data members of the hashTable structure). After calling _zend_hash_init, nTableMask defaults to 0 (later assigned to nTableSize-1 when CHECK_INIT), nTableSize is assigned to the smallest integer power of 2 greater than nSize, and the minimum nTableSize is 8, the maximum is 0x80000000, and in _ After zend_hash_init, arBuckets does not allocate memory space (it is also allocated during CHECK_INIT). nTableMask is used to quickly calculate the index corresponding to the hash value, because it has a characteristic, that is, nTableMask = 2^n – 1. After expanding into binary, all bits are 1, so the index position can be quickly obtained through nIndex = h & nTableMask. The implementation source code of this function (the specific implementation of different versions is different, the PHP version of this article is 5.4.24):
ZEND_API int _zend_hash_init(HashTable *ht, uint nSize, hash_func_t pHashFunction, dtor_func_t pDestructor, zend_bool persistent ZEND_FILE_LINE_DC)
{
/* The minimum size of hashTable is 1<<3 = 8 */
uint i = 3;
SET_INCONSISTENT(HT_OK);
if (nSize >= 0x80000000) {
/* prevent overflow */
ht->nTableSize = 0x80000000;
} else {
while ((1U << i) < nSize) {
i++;
}
ht->nTableSize = 1 << i;
}
ht->nTableMask = 0; /* 0 means that ht->arBuckets is uninitialized */
ht->pDestructor = pDestructor;
ht->arBuckets = (Bucket**)&uninitialized_bucket;
ht->pListHead = NULL;
ht->pListTail = NULL;
ht->nNumOfElements = 0;
ht->nNextFreeElement = 0;
ht->pInternalPointer = NULL;
ht->persistent = persistent;
ht->nApplyCount = 0;
ht->bApplyProtection = 1;
return SUCCESS;
}
(2). Find elements.
For string index and numeric index, two search methods, zend_hash_find and zend_hash_index_find, are provided respectively. There is no essential difference between these two methods. They both find the position of the element in the corresponding Bucket after calculating the hash value. For string indexes, the condition to determine the same is: p->arKey == arKey ||((p->h == h) && (p->nKeyLength == nKeyLength) && !memcmp(p-> ;arKey, arKey, nKeyLength)), that is, either arKey and p->arKey point to the same memory, or the contents pointed to by h, nKeyLength and arKey are exactly the same, so that they can be determined to be the same. For numeric indexes, only (p->h == h) && (p->nKeyLength == 0) is required. The implementation of these two searches is as follows:
/* Search for numeric index */
ZEND_API int zend_hash_index_find(const HashTable *ht, ulong h, void **pData)
{
uint nIndex;
Bucket *p;
IS_CONSISTENT(ht);
/* Calculate index */
nIndex = h & ht->nTableMask;
p = ht->arBuckets[nIndex];
/* Traverse the double linked list and return immediately once found */
while (p != NULL) {
if ((p->h == h) && (p->nKeyLength == 0)) {
*pData = p->pData;
return SUCCESS;
}
p = p->pNext;
}
/* If nothing is found after traversing the double linked list, the search fails */
return FAILURE;
}

/* Search for string index */
ZEND_API int zend_hash_find(const HashTable *ht, const char *arKey, uint nKeyLength, void **pData)
{
ulong h;
uint nIndex;
Bucket *p;
IS_CONSISTENT(ht);
/* String index needs to calculate the hash value of the string first */
h = zend_inline_hash_func(arKey, nKeyLength);
nIndex = h & ht->nTableMask;
p = ht->arBuckets[nIndex];
/* Search in the Bucket double-linked list. Once found, return immediately. Pay attention to the conditions for successful search */
while (p != NULL) {
if (p->arKey == arKey ||
         ((p->h == h) && (p->nKeyLength == nKeyLength) && !memcmp(p->arKey, arKey, nKeyLength))) {
*pData = p->pData;
return SUCCESS;
}
p = p->pNext;
}
/* Search failed */
return FAILURE;
}
(3).Insert element
In PHP script, there are three ways to insert elements into the current array, such as:
$arr = array();
$arr['index'] = 'cont';
$arr[2] = 'test';
$arr[] = 10;
The three insertion methods are: "String index insertion", "Number index insertion", "Next available position insertion". In the implementation, "String index insertion" corresponds to _zend_hash_add_or_update, and the latter two correspond to _zend_hash_index_update_or_next_insert. Taking the operation $arr['index'] = 'cont' as an example, PHP will try to update the corresponding data first. If the corresponding Bucket is not found, it means that this is a new element, so insert will be executed. Operation, this is implemented in _zend_hash_add_or_update as follows (non-critical steps omitted):
ZEND_API int _zend_hash_add_or_update(HashTable *ht, const char *arKey, uint nKeyLength, void *pData, uint nDataSize, void **pD est, int flag ZEND_FILE_LINE_DC)
{
/* Since it is a string index, the index key cannot be empty, nKeyLength must be >0 */
if (nKeyLength <= 0) {
return FAILURE;
}
/* Whether ht is initialized, if not, allocate the memory space of arBuckets and set nTableMask */
CHECK_INIT(ht);
/* Calculate the index in the hash table */
h = zend_inline_hash_func(arKey, nKeyLength);
nIndex = h & ht->nTableMask;
/* Scan the Bucket list to see if the element exists. If it exists, update it and return */
p = ht->arBuckets[nIndex];
while (p != NULL) {
 if (p->arKey == arKey ||
((p->h == h) && (p->nKeyLength == nKeyLength) && !memcmp(p->arKey, arKey, nKeyLength))) {
/* Conflict, cannot be added */
       if (flag & HASH_ADD) {
return FAILURE;
      }
HANDLE_BLOCK_INTERRUPTIONS();
if (ht->pDestructor) {
ht->pDestructor(p->pData);
      }
          /* Perform update operations */
UPDATE_DATA(ht, p, pData, nDataSize);
if (pDest) {
*pDest = p->pData;
      }
HANDLE_UNBLOCK_INTERRUPTIONS();
return SUCCESS;
}
p = p->pNext;
}
/* If the element does not exist, insert */
if (IS_INTERNED(arKey)) {
p = (Bucket *) pemalloc(sizeof(Bucket), ht->persistent);
if (!p) {
return FAILURE;
}
p->arKey = arKey;
} else {
p = (Bucket *) pemalloc(sizeof(Bucket) + nKeyLength, ht->persistent);
if (!p) {
return FAILURE;
}
p->arKey = (const char*)(p + 1);
memcpy((char*)p->arKey, arKey, nKeyLength);
}
p->nKeyLength = nKeyLength;
INIT_DATA(ht, p, pData, nDataSize);
p->h = h;
 /* Insert into the head of the Buckets list */
CONNECT_TO_BUCKET_DLLIST(p, ht->arBuckets[nIndex]);
/* Insert into the global double-linked list for traversal, which is a logical queue */
CONNECT_TO_GLOBAL_DLLIST(p, ht);
ht->arBuckets[nIndex] = p;
/* Increase the number of elements */
ht->nNumOfElements++;
 /* If nNumOfElements > nTableSize, you need to expand the HashTable */
ZEND_HASH_IF_FULL_DO_RESIZE(ht);
}
More operations of HashTable such as zend_hash_do_resize (expansion), zend_hash_rehash (the elements of the original hashTable need to be re-hashed after expansion), zend_hash_del_key_or_index (delete elements in HashTable), zend_hash_destroy (destroy the Hash table), zend_hash_copy (hash table copy), I won’t list them one by one here. Interested students can check the source code.
Previous articlehttp://www.Bkjia.com/kf/201411/356800.html

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/929773.htmlTechArticlePHP kernel exploration variables (3) - hash table In PHP, in addition to zval, another important data The structure is none other than hash table. For example, our most common array is hash t at the bottom...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn