search
HomeBackend DevelopmentPHP Tutorial 详解PHP中Array构造HashTable

详解PHP中Array结构HashTable

我们知道PHP中的Array在内部是以Hash的结构进行存储的。本文主要重点也是对PHP中Array的静态结构和动态结构进行分析和记录。

这里的静态结构,是指存储PHP中Array数据时使用的数据结构,即所谓的HashTable。

动态结构,是指程序在运行过程中,Array数据的存储状态。

?

首先PHP中的hashTable的结构如下:

typedef struct bucket {
    ulong h;                        /* Used for numeric indexing */
    uint nKeyLength;
    void *pData;
    void *pDataPtr;
    struct bucket *pListNext;
    struct bucket *pListLast;
    struct bucket *pNext;
    struct bucket *pLast;
    char *arKey;
} Bucket;


typedef struct _hashtable {
    uint nTableSize;
    uint nTableMask;
    uint nNumOfElements;
    ulong nNextFreeElement;
    Bucket *pInternalPointer;   /* Used for element traversal */
    Bucket *pListHead;
    Bucket *pListTail;
    Bucket **arBuckets; ? ? ? ? ?
    dtor_func_t pDestructor;
    zend_bool persistent;
    unsigned char nApplyCount;
    zend_bool bApplyProtection;
#if ZEND_DEBUG
    int inconsistent;
#endif
} HashTable;

?

一个PHP中的Array在内部对应一个HashTable,HashTable内部的四个Bucket类型的指针数据记录着数组实际存储的元素内容的地址。具体的内容,各字段名都可以自解释,不做多说明了。

?

?

如果只看这几行代码,可能无法理解PHP数组实际的工作原理,接下来,我们可以手工模拟一下PHP数组中的一些最简单的操作。

?

1. 从无到有

HashTable的初始化,首先需要给一个HashTable构造一个内存空间,具体代码如下:

?

//hash_func_t在函数内用不到,hash函数在PHP范围内都是固定的
int _zend_hash_init(HashTable *ht, uint nSize, hash_func_t pHashFunction, dtor_func_t pDestructor, zend_bool persistent ZEND_FILE_LINE_DC)
{
    uint i = 3;

    SET_INCONSISTENT(HT_OK);

    if (nSize >= 0x80000000) {
        /* prevent overflow */
        ht->nTableSize = 0x80000000;
    } else {
        while ((1U nTableSize = 1 nTableMask = 0; /* 0 means that ht->arBuckets is uninitialized */
    ht->pDestructor = pDestructor;
    ht->arBuckets = (Bucket**)&uninitialized_bucket; ? //实际的数据存储空间还未创建
    ht->pListHead = NULL;
    ht->pListTail = NULL;
    ht->nNumOfElements = 0; ? ? ? ? ? ? ? ? ? //表示数组内还没有一个元素,
    ht->nNextFreeElement = 0;
    ht->pInternalPointer = NULL;
    ht->persistent = persistent;
    ht->nApplyCount = 0;
    ht->bApplyProtection = 1;
    return SUCCESS;
}
?

上述代码可以理解为,为数组构造了一个总的大门,数据都可以经由这个门进入到自己对应的内存块中。当然现在门里还没有“座位”呢。

?

2. 数据插入

对于一个一无所有的空间,怎么给它加点东西呢?这就是数据的插入,即数据是如何保存到这个HashTable中的。

PHP的数组索引可以是数值或字符串,我们首先看字符串的索引如何存储,代码如下:

int _zend_hash_add_or_update(HashTable *ht, const char *arKey, uint nKeyLength, void *pData, uint nDataSize, void **pDest, int flag ZEND_FILE_LINE_DC)
{
	ulong h;
	uint nIndex;
	Bucket *p;

	IS_CONSISTENT(ht);

	if (nKeyLength nTableMask;

	p = ht->arBuckets[nIndex];
	while (p != NULL) {
		if (p->arKey == arKey ||
			((p->h == h) && (p->nKeyLength == nKeyLength) && !memcmp(p->arKey, arKey, nKeyLength))) {
				if (flag & HASH_ADD) {
					return FAILURE;
				}
				HANDLE_BLOCK_INTERRUPTIONS();
#if ZEND_DEBUG
				if (p->pData == pData) {
					ZEND_PUTS("Fatal error in zend_hash_update: p->pData == pData\n");
					HANDLE_UNBLOCK_INTERRUPTIONS();
					return FAILURE;
				}
#endif
				if (ht->pDestructor) {
					ht->pDestructor(p->pData);
				}
				UPDATE_DATA(ht, p, pData, nDataSize);
				if (pDest) {
					*pDest = p->pData;
				}
				HANDLE_UNBLOCK_INTERRUPTIONS();
				return SUCCESS; ?//更新之后直接退出
		}
		p = p->pNext;
	}
	
	if (IS_INTERNED(arKey)) {
		p = (Bucket *) pemalloc(sizeof(Bucket), ht->persistent);
		if (!p) {
			return FAILURE;
		}
		p->arKey = (char*)arKey;
	} else {
		p = (Bucket *) pemalloc(sizeof(Bucket) + nKeyLength, ht->persistent);
		if (!p) {
			return FAILURE;
		}
		p->arKey = (char*)(p + 1);
		memcpy(p->arKey, arKey, nKeyLength);
	}
	p->nKeyLength = nKeyLength;
	INIT_DATA(ht, p, pData, nDataSize);
	p->h = h;
	CONNECT_TO_BUCKET_DLLIST(p, ht->arBuckets[nIndex]);
	if (pDest) {
		*pDest = p->pData;
	}

	HANDLE_BLOCK_INTERRUPTIONS();
	CONNECT_TO_GLOBAL_DLLIST(p, ht);
	ht->arBuckets[nIndex] = p;
	HANDLE_UNBLOCK_INTERRUPTIONS();

	ht->nNumOfElements++;
	ZEND_HASH_IF_FULL_DO_RESIZE(ht);		/* If the Hash table is full, resize it */
	return SUCCESS;
}

首先,检查数组空间是否初始化,代码如下:

?

#define CHECK_INIT(ht) do {                                             \
    if (UNEXPECTED((ht)->nTableMask == 0)) {                                \
        (ht)->arBuckets = (Bucket **) pecalloc((ht)->nTableSize, sizeof(Bucket *), (ht)->persistent);   \
        (ht)->nTableMask = (ht)->nTableSize - 1;                        \
    }                                                                   \
} while (0)
?

?

然后计算要插入的字符串索引的hash值,并与nTableMask做按位与,得到nindex,这个nIndex就是对应的bucket*在二维数组arBucket**中的偏移量。根据代码逻辑,如果nIndex位置不为空,则说明当前计算得到的hash值之前存在。如果连key也相同并且flag为HASH_ADD则失败,否则就是更新操作。如果是更新操作则不会对现有数组结构有任何影响,更新了对应的值之后直接退出即可。

?

在需要有新元素插入到HashTable时,构造好的新元素会经过两步来链入该HashTable

?

第一步代码如下:

?

#define CONNECT_TO_BUCKET_DLLIST(element, list_head)        \
    (element)->pNext = (list_head);                         \
    (element)->pLast = NULL;                                \
    if ((element)->pNext) {                                 \
        (element)->pNext->pLast = (element);                \
    }

?

在这一步中如果新元素的key的hash值之前存在过,则list_head为HashTable.arBucket[nIndex],nIndex怎么来的前面已经说过了。在这一步过后会将HashTable.arBucket[nIndex]赋值为当前的新元素,你懂得。

?

如果新元素的key对应的hash之前没有存在过,则list_head就为NULL,因为HashTable.arBucket[nIndex]为NULL。你也懂得。

?

第二步代码如下:

?

#define CONNECT_TO_GLOBAL_DLLIST(element, ht)               \
    (element)->pListLast = (ht)->pListTail;                 \
    (ht)->pListTail = (element);                            \
    (element)->pListNext = NULL;                            \
    if ((element)->pListLast != NULL) {                     \
        (element)->pListLast->pListNext = (element);        \
    }                                                       \
    if (!(ht)->pListHead) {                                 \
        (ht)->pListHead = (element);                        \
    }                                                       \
    if ((ht)->pInternalPointer == NULL) {                   \
        (ht)->pInternalPointer = (element);                 \
    }
?

关于这一步会对HashTable的内容有什么样的影响,请参看下面的动态示例。相信你也懂得。

?

?

?

动态示例:

现在我们假设数组中没有任何元素,则进行插入操作。现在我们按照代码的逻辑,手动模拟一下数据插入的过程:

?

1.

插入第一个元素A,假设其key对应的hash值为1

则插入之后,内存中的状态如下:

?

HashTable.arBucket[1]=A;

HashTable.pListHead = A

HashTable.pListTail = A

HashTable.pInternalPointer = A

A.pNext = null

A.pLast = null

A.pListLast = null

A.pListNext = null

?

2.

插入第二个元素B,假设其key对应的hash值为2

则插入之后内存的状态如下:

HashTable.arBucket[2] = B;

HashTable.pListHead = A

HashTable.pListTail = B

HashTable.pInternalPointer = A?????? //这个只在第一次的时候设置

A.pNext=null

A.pLast = null

A.pListNext = B

A.pListLast = null

B.pListLast = A

B.pListNext = null

B.pNext = null

B.pLast = null

?

3.

插入第三个元素C,假设其key的hash值为1,和A相同

则插入之后内存状态如下:

HashTable.arBucket[1] = C;

HashTable.pListHead = A

HashTable.pListTail =C

HashTable.pInternalPointer = A?????? //这个只在第一次的时候设置

A.pNext=null

A.pLast = C

A.pListNext = B

A.pListLast = null

?

B.pNext = null

B.pLast = null

B.pListLast = A

B.pListNext = C

C.pNext = A

C.pLast = null

C.pListNext = null

C.pListLast = B

?

插入A,B,C三个值之后的内存中状态即为:

HashTable.arBucket[1] = C;

HashTable.pListHead = A

HashTable.pListTail =C

HashTable.pInternalPointer = A

A.pNext=null

A.pLast = C

A.pListNext = B

A.pListLast = null

?

B.pNext = null

B.pLast = null

B.pListLast = A

B.pListNext = C

C.pNext = A

C.pLast = null

C.pListNext = null

C.pListLast = B

?

OK,A、B、C三个元素都已插入了,现在我们要实现两个任务:

?

1.

查找某key的元素值(value):

如果我们要访问A元素,则提供A的key:key_a,得到对应的hash值为1

然后找HastTable.arBucket[1]。这时HastTable.arBucket[1]其实为C不是A,但由于C的key不等于A的key,因此,要沿着pNext的指针找下去,直到NULL,而此时C.pNext就是A,即找到了key_a对应的值A。

总之由key查找一个元素时,首先要hash,然后顺着hash后的索引位置的pNext指针一直找下去,直到NULL,如果遇到了和要查找的key相同的值,则找到,否则找不到。

?

2.

遍历数组:

由于我们的例子中的key是字符串类型的,全部循环遍历不能用for。只能用foreach,那foreach的遍历是如何实现的呢?

?

简单,根据最后的HashTable的状态,我们从HastTable.pListHead开始沿着pListNext指针顺序找下去即可了。以本文例子为例,则结果为:

?

?

HashTable.pListHead====>A

A.pListNext?????????????????? ====>B

B.pListNext?????????????????? ====>C

?

则最后的遍历顺序就是A,B,C,发现foreach的遍历顺序是和元素插入到数组的顺序相关的。

?

?

如果插入的元素的key不是字符串,而是数值。则可以省去做计算hash值这一步,直接拿数值的key做为hash值使用。

这样就不存在hash冲突的问题,这样也就不会用到每个元素的pNext、pLast两个指针了,这两个指针都只会是NULL。

?

这样我们可以通过使用for循环来遍历数组了,因为不存在hash冲突。

?

同样,如果我们使用foreach来遍历数组的话,遍历顺序还是元素的插入顺序,这个你当然懂得。

?

?

ps:

本文并未对zend中的hash结够做全面的记录,只是对本文主题涉及到的逻辑的重点代码进行了分析和演示。同时也为了能抓住重点。有些代码并未列出,如:再hash的逻辑,和索引为数值类型数据的代码等。这些可在代码文件Zend/zend_hash.c中找到详细内容。

?

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
How can you prevent session fixation attacks?How can you prevent session fixation attacks?Apr 28, 2025 am 12:25 AM

Effective methods to prevent session fixed attacks include: 1. Regenerate the session ID after the user logs in; 2. Use a secure session ID generation algorithm; 3. Implement the session timeout mechanism; 4. Encrypt session data using HTTPS. These measures can ensure that the application is indestructible when facing session fixed attacks.

How do you implement sessionless authentication?How do you implement sessionless authentication?Apr 28, 2025 am 12:24 AM

Implementing session-free authentication can be achieved by using JSONWebTokens (JWT), a token-based authentication system where all necessary information is stored in the token without server-side session storage. 1) Use JWT to generate and verify tokens, 2) Ensure that HTTPS is used to prevent tokens from being intercepted, 3) Securely store tokens on the client side, 4) Verify tokens on the server side to prevent tampering, 5) Implement token revocation mechanisms, such as using short-term access tokens and long-term refresh tokens.

What are some common security risks associated with PHP sessions?What are some common security risks associated with PHP sessions?Apr 28, 2025 am 12:24 AM

The security risks of PHP sessions mainly include session hijacking, session fixation, session prediction and session poisoning. 1. Session hijacking can be prevented by using HTTPS and protecting cookies. 2. Session fixation can be avoided by regenerating the session ID before the user logs in. 3. Session prediction needs to ensure the randomness and unpredictability of session IDs. 4. Session poisoning can be prevented by verifying and filtering session data.

How do you destroy a PHP session?How do you destroy a PHP session?Apr 28, 2025 am 12:16 AM

To destroy a PHP session, you need to start the session first, then clear the data and destroy the session file. 1. Use session_start() to start the session. 2. Use session_unset() to clear the session data. 3. Finally, use session_destroy() to destroy the session file to ensure data security and resource release.

How can you change the default session save path in PHP?How can you change the default session save path in PHP?Apr 28, 2025 am 12:12 AM

How to change the default session saving path of PHP? It can be achieved through the following steps: use session_save_path('/var/www/sessions');session_start(); in PHP scripts to set the session saving path. Set session.save_path="/var/www/sessions" in the php.ini file to change the session saving path globally. Use Memcached or Redis to store session data, such as ini_set('session.save_handler','memcached'); ini_set(

How do you modify data stored in a PHP session?How do you modify data stored in a PHP session?Apr 27, 2025 am 12:23 AM

TomodifydatainaPHPsession,startthesessionwithsession_start(),thenuse$_SESSIONtoset,modify,orremovevariables.1)Startthesession.2)Setormodifysessionvariablesusing$_SESSION.3)Removevariableswithunset().4)Clearallvariableswithsession_unset().5)Destroythe

Give an example of storing an array in a PHP session.Give an example of storing an array in a PHP session.Apr 27, 2025 am 12:20 AM

Arrays can be stored in PHP sessions. 1. Start the session and use session_start(). 2. Create an array and store it in $_SESSION. 3. Retrieve the array through $_SESSION. 4. Optimize session data to improve performance.

How does garbage collection work for PHP sessions?How does garbage collection work for PHP sessions?Apr 27, 2025 am 12:19 AM

PHP session garbage collection is triggered through a probability mechanism to clean up expired session data. 1) Set the trigger probability and session life cycle in the configuration file; 2) You can use cron tasks to optimize high-load applications; 3) You need to balance the garbage collection frequency and performance to avoid data loss.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Atom editor mac version download

Atom editor mac version download

The most popular open source editor