Home >Backend Development >PHP Tutorial >How big are PHP arrays and values?
This article is about PHP 5 memory usage. For the situation described in this article, memory usage is about 3 times lower in PHP 7.
In this post I want to study the memory usage of PHP arrays (and values in general) using the following script as an example, which creates 100000 Unique integer array elements and measured the memory usage of the result:
$startMemory = memory_get_usage(); $array = range(1, 100000); echo memory_get_usage() - $startMemory, ' bytes';
What do you want it to be? In simple terms, an integer is 8 bytes (using the long type on a 64-bit unix machine), You get 100,000 integers, so obviously you need 800000 bytes.
Now try running the above code. This gives us 14649024 bytes. Yes, you heard it right, 13.97 MB, 18 times more than we estimated.
So, where does the extra factor of 18 come from?
Summary
For those who don’t want to know the whole story, here’s the one involved A quick summary of the memory usage of the different components:
| 64 bit | 32 bit --------------------------------------------------- zval | 24 bytes | 16 bytes + cyclic GC info | 8 bytes | 4 bytes + allocation header | 16 bytes | 8 bytes =================================================== zval (value) total | 48 bytes | 28 bytes =================================================== bucket | 72 bytes | 36 bytes + allocation header | 16 bytes | 8 bytes + pointer | 8 bytes | 4 bytes =================================================== bucket (array element) total | 96 bytes | 48 bytes =================================================== total total | 144 bytes | 76 bytes
The above numbers will vary depending on your operating system, compiler, and compilation options. For example, if you compile PHP with debug or thread safety, you will get different numbers. But I think the size given above is what you will see in a 64-bit production version of PHP 5.3 on Linux.
If you multiply these 144 bytes by 100,000 elements, you get 14,400,000 bytes, which is 13.73 MB, which is very close to the actual number - most of the rest is uninitialized bucket pointers, but I'll discuss that later.
Now, if you want a more detailed analysis of the values mentioned above, keep reading :)
zvalue_value Alliance
First look at how PHP stores values of. As you know, PHP is a weakly typed language, so it needs some way to quickly switch between types. PHP uses union for this, which is defined in zend as follows.
typedef union _zvalue_value { long lval; // For integers and booleans double dval; // For floats (doubles) struct { // For strings char *val; // consisting of the string itself int len; // and its length } str; HashTable *ht; // For arrays (hash tables) zend_object_value obj; // For objects } zvalue_value;
If you don't know C, this is not a problem because the code is very simple: union is a way to make certain values accessible as various types. For example, if you do zvalue_value->lval, you will get a value that is interpreted as an integer. On the other hand, if you use zvalue_value->ht, the value will be interpreted as a pointer to a hash table (i.e. an array).
But let’s not talk too much here. For us, the only thing that matters is that the size of a union is equal to the size of its largest component. The largest component here is the string struct (the zend_object_value struct is the same size as the str struct, but I'll omit it for simplicity). The string struct stores a pointer (8 bytes) and an integer (4 bytes), a total of 12 bytes. Due to memory alignment (12 byte structures are not cool because they are not multiples of 64 bit/8 bytes), the total size of the structure will be 16 bytes, which is also the size of the union as a whole.
Now we know that due to PHP's dynamic typing, each value does not need 8 bytes, but 16 bytes. Multiplying by 100000 values gives us 1600000 bytes, which is 1.53 MB, but the actual value is 13.97 MB, so we can't get it yet.
The structure of zval
This is very logical - the union only stores the value itself, but PHP obviously also needs the storage type and some garbage collection information. The structure that holds this information is called a zval, which you may have heard of. For more information on why PHP needs it, I recommend reading an article by Sara Golemon. Anyway, the structure is defined as follows:
struct _zval_struct { zvalue_value value; // The value zend_uint refcount__gc; // The number of references to this value (for GC) zend_uchar type; // The type zend_uchar is_ref__gc; // Whether this value is a reference (&) };
The size of the structure is determined by the sum of the sizes of its components: 16 bytes for zvalue_value (calculated as above), 4 bytes for zend_uint, and 1 byte for zend_uchars . The total is 22 bytes. Due to memory alignment, the actual size will be 24 bytes.
So if we store 100,000 elements a 24 bytes, the total is 2,400,000, which is 2.29 MB, the gap is closing, but the actual value is still more than 6 times the original.
Circular Collector (starting with PHP 5.3)
PHP 5.3 introduces a new circular reference garbage collector. To do this, PHP must store some additional data. I don't want to explain here how this algorithm works, you can read about it on the linked page of the manual. What's important for our size calculation is that PHP will wrap each zval into zval_gc_info:
typedef struct _zval_gc_info { zval z; union { gc_root_buffer *buffered; struct _zval_gc_info *next; } u; } zval_gc_info;
As you can see, Zend just adds a union to it, which consists of two pointers composition. I hope you remember that the size of a union is the size of its largest component: both union components are pointers, so they are both 8 bytes in size. So the size of union is also 8 bytes.
If we add this to 24 bytes we already have 32 bytes. Multiplied by 100000 elements we get a memory usage of 3.05 MB.
Zend MM Allocator
C Unlike PHP, it does not manage memory for you. You need to keep track of your allocations yourself. To do this, PHP uses a custom memory manager optimized specifically for its needs: the Zend Memory Manager. Zend MM is based on Doug Lea's malloc and adds some PHP-specific optimizations and features (such as memory limits, cleanup after each request, etc.).
The important thing for us here is that MM adds an allocation header for every allocation done through it. The definition is as follows:
typedef struct _zend_mm_block { zend_mm_block_info info; #if ZEND_DEBUG unsigned int magic; # ifdef ZTS THREAD_T thread_id; # endif zend_mm_debug_info debug; #elif ZEND_MM_HEAP_PROTECTION zend_mm_debug_info debug; #endif } zend_mm_block; typedef struct _zend_mm_block_info { #if ZEND_MM_COOKIES size_t _cookie; #endif size_t _size; // size of the allocation size_t _prev; // previous block (not sure what exactly this is) } zend_mm_block_info;
如您所见,这些定义充斥着大量的编译选项检查。如果你用堆保护,多线程,调试和MM cookie来构建PHP,那么如果你用堆保护,多线程,调试和MM cookie来构建PHP,那么如果你用堆保护,多线程,调试和MM cookie来构建PHP,那么分配头文件会更大。
对于本例,我们假设所有这些选项都是禁用的。在这种情况下,只剩下两个size_ts _size和_prev。size_t有8个字节(在64位上),所以分配头的总大小是16个字节——并且在每个分配上都添加了这个头。
现在我们需要再次调整zval大小。实际上,它不是32字节,而是48字节,这是由分配头决定的。乘以100000个元素是4。58 MB,实际值是13。97 MB,所以我们已经得到了大约三分之一的面积。
Buckets
到目前为止,我们只考虑单个值。但是PHP中的数组结构也会占用大量空间:“数组”在这里实际上是一个不合适的术语。PHP数组实际上是散列表/字典。那么哈希表是如何工作的呢?基本上,对于每个键,都会生成一个散列,该散列用作“real”C数组的偏移量。由于哈希值可能会冲突,具有相同哈希值的所有元素都存储在链表中。当访问一个元素时,PHP首先计算散列,查找正确的bucket并遍历链接列表,逐个元素比较确切的键。bucket的定义如下:
typedef struct bucket { ulong h; // The hash (or for int keys the key) uint nKeyLength; // The length of the key (for string keys) void *pData; // The actual data void *pDataPtr; // ??? What's this ??? struct bucket *pListNext; // PHP arrays are ordered. This gives the next element in that order struct bucket *pListLast; // and this gives the previous element struct bucket *pNext; // The next element in this (doubly) linked list struct bucket *pLast; // The previous element in this (doubly) linked list const char *arKey; // The key (for string keys) } Bucket;
正如您所看到的,需要存储大量数据才能获得PHP使用的抽象数组数据结构(PHP数组同时是数组、字典和链表,这当然需要大量信息)。单个组件的大小为无符号long为8字节,无符号int为4字节,指针为7乘以8字节。总共是68。添加对齐,得到72字节。
像zvals这样的bucket需要在头部分配,因此我们需要再次为分配头添加16个字节,从而得到88个字节。我们还需要在“real”C数组中存储指向这些Bucket的指针(Bucket ** arbucket;)我上面提到过,每个元素增加8个字节。所以总的来说,每个bucket需要96字节的存储空间。
如果每个值都需要一个bucket,那么bucket是96字节,zval是48字节,总共144字节。对于100000个元素,也就是14400000字节,即13.73 MB。
神秘的解决。
等等,还有0.24 MB !
最后的0.24 MB是由于未初始化的存储bucket造成的:理想情况下,存储bucket的实际C数组的大小应该与存储的数组元素的数量大致相同。通过这种方式,冲突最少(除非希望浪费大量内存)。但是PHP显然不能在每次添加元素时重新分配整个数组——这将非常缓慢。相反,如果内部bucket数组达到限制,PHP总是将其大小加倍。所以数组的大小总是2的幂。
在我们的例子中是2 ^ 17 = 131072。但是我们只需要100000个bucket,所以我们留下31072个bucket没有使用。这些bucket不会被分配(因此我们不需要花费全部的96字节),但是bucket指针(存储在内部桶数组中的那个)的内存仍然需要分配。所以我们另外使用8字节(一个指针)* 31072个元素。这是248576字节或0.23 MB,与丢失的内存匹配。(当然,这里仍然缺少一些字节,但是我不想在这里介绍。比如哈希表结构本身,变量等等)
神秘真的解决了。
这告诉我们什么?
PHP不是c,这就是所有这些告诉我们的。您不能期望像PHP这样的超级动态语言具有与C语言相同的高效内存使用。你不能。
但是,如果您确实想节省内存,可以考虑使用SplFixedArray处理大型静态数组。
看看这个修改后的脚本:
$startMemory = memory_get_usage(); $array = new SplFixedArray(100000); for ($i = 0; $i < 100000; ++$i) { $array[$i] = $i; } echo memory_get_usage() - $startMemory, ' bytes';
它基本上做的是相同的事情,但是如果运行它,您会注意到它只使用了“5600640字节”。这是每个元素56字节,因此比普通数组使用的每个元素144字节要少得多。这是因为一个固定的数组不需要bucket结构:所以它只需要每个元素一个zval(48字节)和一个指针(8字节),从而得到观察到的56字节。
The above is the detailed content of How big are PHP arrays and values?. For more information, please follow other related articles on the PHP Chinese website!