Home  >  Article  >  Backend Development  >  In-depth understanding of PHP5.3's garbage collection mechanism (dynamic storage allocation scheme)_PHP tutorial

In-depth understanding of PHP5.3's garbage collection mechanism (dynamic storage allocation scheme)_PHP tutorial

WBOY
WBOYOriginal
2016-07-21 15:14:42870browse

The garbage collection mechanism is a dynamic storage allocation scheme. It automatically releases allocated memory blocks that are no longer needed by the program. The process of automatically reclaiming memory is called garbage collection. The garbage collection mechanism allows programmers not to worry too much about program memory allocation, so that they can devote more energy to business logic. Among the various popular languages ​​today, the garbage collection mechanism is a common feature of the new generation of languages. For example, Python, PHP, Eiffel, C#, Ruby, etc. all use the garbage collection mechanism. Although garbage collection is a popular practice now, it is not young anymore. It has been present in the Lisp system developed by MIT as early as the 1960s. However, due to the immature technical conditions at the time, the garbage collection mechanism became a seemingly beautiful technology until the emergence of Java in the 1990s. , the garbage collection mechanism has been widely used.

PHP also implements dynamic management of memory at the language layer, which has been explained in detail in the previous chapters. Dynamic management of memory saves developers from cumbersome memory management. In conjunction with this, PHP also provides a garbage collection mechanism at the language layer, so that programmers do not have to worry too much about program memory allocation.

Before PHP5.3, PHP only had simple garbage collection based on reference counting. When the reference count of a variable becomes 0, PHP will destroy the variable in memory, but the garbage here cannot Call it garbage. And PHP will release the content clicked by this process/thread after the end of a life cycle. This method determines that PHP does not need to consider too much memory leaks in the early stage. However, with the development of PHP, the increase of PHP developers and the expansion of the business scope it carries, a more complete garbage collection mechanism was introduced in PHP5.3. The new garbage collection mechanism solves the problem of reference memory leaks that cannot handle cycles. The garbage collection mechanism in PHP5.3 uses the synchronization algorithm in the article's Concurrent Cycle Collection in Reference Counted Systems. We won’t go into details about the introduction of this algorithm. There is an illustrated introduction in PHP’s official documentation: Collecting Cycles.
As mentioned before, in PHP, the main memory management method is reference counting. The purpose of introducing the garbage collection mechanism is to break the circular reference in the reference counting, thereby preventing memory leaks caused by this. The garbage collection mechanism exists based on PHP's dynamic memory management. In order to introduce the garbage collection mechanism, PHP5.3 has some changes in the basic structure of variable storage, as shown below:

Copy code The code is as follows:

struct _zval_struct {
/* Variable information */
zvalue_value value; /* value */
zend_uint refcount__gc;
zend_uchar type; /* active type */
zend_uchar is_ref__gc;
};

Compared with versions before PHP5.3, both the reference count field refcount and the reference field is_ref have __gc added after them for new Garbage collection mechanism. In PHP source code style, a large number of macros is a very distinctive feature. These macros are equivalent to an interface layer, which shields some underlying implementations below the interface layer, such as the ALLOC_ZVAL macro. Before PHP5.3, this macro directly called PHP's memory management allocation function emalloc to allocate memory. The allocated memory size is determined by The type and size of the variable are determined. After the garbage collection mechanism is introduced, the ALLOC_ZVAL macro directly adopts the new garbage collection unit structure. The allocated sizes are all the same, all of which are the memory size occupied by the zval_gc_info structure. After allocating the memory, the garbage collection of this structure is initialized. mechanism. The following code:
Copy code The code is as follows:

/* The following macroses override macroses from zend_alloc.h */
#undef ALLOC_ZVAL
#define ALLOC_ZVAL(z)
do {
(z) = (zval*)emalloc(sizeof(zval_gc_info));
GC_ZVAL_INIT(z);
} while (0)

The zend_gc.h file is referenced at line 749 of zend.h: #include "zend_gc.h" thereby replacing ALLOC_ZVAL, etc. in the zend_alloc.h file referenced at line 237 Macro In the new macro, the key change is the change in the allocated memory size and allocation content. The content of the garbage collection mechanism is added to the previous pure memory allocation. All content is included in the zval_gc_info structure:
Copy code The code is as follows:

typedef struct _zval_gc_info {
zval z;
union {
gc_root_buffer *buffered;
struct _zval_gc_info *next;
} u;
} zval_gc_info;

For any variable stored in a ZVAL container, a zval structure is allocated. This structure ensures that it is aligned with the beginning of the memory allocated with the zval variable, so that it can be used as a zval when the zval_gc_info type pointer is cast. There is a union after the zval field: u. u includes the buffered field of the gc_root_buffer structure and the next field of the zval_gc_info structure. One of these two fields represents the root node cached by the garbage collection mechanism, and the other represents the next node in the zval_gc_info list. Whether the node cached by the garbage collection mechanism is used as a root node or a list node, it can be reflected here. . ALLOC_ZVAL will call GC_ZVAL_INIT after allocating memory to initialize zval_gc_info that replaces zval. It will set the buffered field of member u in zval_gc_info to NULL. This field will only have a value when it is placed in the garbage collection buffer. , otherwise it will always be NULL. Since all variables in PHP exist in the form of zval variables, zval_gc_info is used here to replace zval, thereby successfully integrating the garbage collection mechanism in the original system.
PHP’s garbage collection mechanism is enabled by default in PHP5.3, but we can directly set it to disable through the configuration file. The corresponding configuration field is: zend.enable_gc. There is no this field in the php.ini file by default. If we need to disable this feature, add zend.enable_gc=0 or zend.enable_gc=off in php.ini. In addition to modifying the php.ini configuration zend.enable_gc, you can also turn on/off the garbage collection mechanism by calling the gc_enable()/gc_disable() function. The effect of calling these functions is the same as modifying the configuration item to turn on or off the garbage collection mechanism. In addition to these two functions, PHP provides the gc_collect_cycles() function to force cycle recycling when the root buffer is not full. There are some operations and fields related to whether the garbage collection mechanism is turned on in the PHP source code. There is the following code in the zend.c file:
Copy code The code is as follows:

static ZEND_INI_MH(OnUpdateGCEnabled) /* { {{ */
{
OnUpdateBool(entry, new_value, new_value_length, mh_arg1, mh_arg2, mh_arg3, stage TSRMLS_CC);
if (GC_G(gc_enabled)) {
gc_init(TSRMLS_C);
}
return SUCCESS;
}
/* }}} */
ZEND_INI_BEGIN()
ZEND_INI_ENTRY("error_reporting", NULL, ZEND_INI_ALL, OnUpdateErrorReporting)
STD_ZEND_INI_BOOLEAN("zend .enable_gc", "1", ZEND_INI_ALL, OnUpdateGCEnabled, gc_enabled, zend_gc_globals, gc_globals)
#ifdef ZEND_MULTIBYTE
STD_ZEND_INI_BOOLEAN("detect_unicode", "1", ZEND_INI_ALL, OnUpdateBool, detect_unicode, zend_ compiler_globals, compiler_globals)
#endif
ZEND_INI_END()

The corresponding operation function of zend.enable_gc is ZEND_INI_MH (OnUpdateGCEnabled). If the garbage collection mechanism is turned on, that is, GC_G (gc_enabled) is true, the gc_init function will be called. Perform initialization operations of the garbage collection mechanism. The gc_init function is in line 121 of zend/zend_gc.c. This function will determine whether the garbage collection mechanism is turned on. If it is turned on, the entire mechanism will be initialized, that is, malloc will be directly called to allocate 10,000 gc_root_buffer memory spaces for the entire cache list. The 10000 here is hard-coded in the code and exists as the macro GC_ROOT_BUFFER_MAX_ENTRIES. If you need to modify this value, you need to modify the source code and recompile PHP. The gc_init function calls the gc_reset function after pre-allocating memory to reset some global variables used in the entire mechanism, such as setting the statistics of the number of gc runs (gc_runs) and the number of garbage in the gc (collected) to 0, and setting the head node of the doubly linked list. The previous node and next node point to itself, etc. In addition to the global variables mentioned for the garbage collection mechanism, there are other commonly used variables, some of which are explained below:
Copy code Code As follows:

typedef struct _zend_gc_globals {
zend_bool gc_enabled; /* Whether to turn on the garbage collection mechanism*/
zend_bool gc_active; /* Whether it is in progress*/
gc_root_buffer *buf; /* Preallocated buffer array, default is 10000 (preallocated arrays of buffers) */
gc_root_buffer roots; /* Root node of the list (list of possible roots of cycles) */
gc_root_buffer *unused; /* List of unused buffers */
gc_root_buffer *first_unused; /* Pointer to first unused buffer */
gc_root_buffer *last_unused; /* Points to the last unused buffer node, here is the pointer to last unused buffer */
zval_gc_info *zval_to_free; /* Temporary list of zval variables to be released (temporaryt list of zvals to free) */
zval_gc_info *free_list; /* Temporary variables, the beginning of the list that needs to be released */
zval_gc_info *next_to_free; /* Temporary variables, the position of the next variable to be released */
zend_uint gc_runs; /* Statistics of the number of times gc runs*/
zend_uint collected; /* Number of garbage in gc*/
// Omitted...
}

When we use an unset operation to clear the memory occupied by this variable (it may just decrease the reference count by one), the item corresponding to the variable name will be deleted from the hash table of the current symbol. After all operations are performed, , and calls a destructor for the item deleted from the symbol table. Temporary variables will call zval_dtor, and general variables will call zval_ptr_dtor.
Of course we cannot find the unset function in PHP's function set because it is a language construct. The corresponding intermediate code is ZEND_UNSET, and you can find its related implementation in the Zend/zend_vm_execute.h file.
zval_ptr_dtor is not a function, it is just a macro that looks a bit like a function. In the Zend/zend_variables.h file, this macro points to the function _zval_ptr_dtor. In line 424 of Zend/zend_execute_API.c, the function-related code is as follows:
Copy code The code is as follows:

ZEND_API void _zval_ptr_dtor( zval **zval_ptr ZEND_FILE_LINE_DC) /* {{{ */
{
#if DEBUG_ZEND>=2
printf("Reducing refcount for %x (%x): %d->%dn" , *zval_ptr, zval_ptr, Z_REFCOUNT_PP(zval_ptr), Z_REFCOUNT_PP(zval_ptr) - 1);
#endif
Z_DELREF_PP(zval_ptr);
if (Z_REFCOUNT_PP(zval_ptr) == 0) {
TSRMLS_FETCH ();
if (*zval_ptr != &EG(uninitialized_zval)) {
GC_REMOVE_ZVAL_FROM_BUFFER(*zval_ptr);
zval_dtor(*zval_ptr);
efree_rel(*zval_ptr);
}
} else {
TSRMLS_FETCH();
if (Z_REFCOUNT_PP(zval_ptr) == 1) {
Z_UNSET_ISREF_PP(zval_ptr);
}
GC_ZVAL_CHECK_POSSIBLE_ROOT(*zval_ptr);
}
}
/* }}} */

From the code we can clearly see the destruction process of this zval. The following two operations are performed on the reference counting field. :
If the reference count of the variable is 1, that is, the reference count is 0 after decrementing one, clear the variable directly. If the current variable is cached, the cache needs to be cleared. If the reference count of the variable is greater than 1, that is, the reference count after subtracting one is greater than 0, the variable will be placed in the garbage list. If the change has a reference, remove its reference.

The operation of putting variables into the garbage list is GC_ZVAL_CHECK_POSSIBLE_ROOT, which is also a macro and corresponds to the function gc_zval_check_possible_root, but this function only performs garbage collection operations on arrays and objects. For array and object variables, it calls the gc_zval_possible_root function.
Copy code The code is as follows:

ZEND_API void gc_zval_possible_root(zval *zv TSRMLS_DC)
{
if ( UNEXPECTED(GC_G(free_list) != NULL &&
GC_ZVAL_ADDRESS(zv) != NULL &&
GC_ZVAL_GET_COLOR(zv) == GC_BLACK) &&
(GC_ZVAL_ADDRESS(zv) < GC_G(buf) ||
GC_ZVAL_ADDRESS(zv) >= GC_G(last_unused))) {
/* The given zval is a garbage that is going to be deleted by
* currently running GC */
return;
}
if (zv->type == IS_OBJECT) {
GC_ZOBJ_CHECK_POSSIBLE_ROOT(zv);
return;
}
GC_BENCH_INC(zval_possible_root);
if (GC_ZVAL_GET_COLOR(zv ) != GC_PURPLE) {
GC_ZVAL_SET_PURPLE(zv);
if (!GC_ZVAL_ADDRESS(zv)) {
gc_root_buffer *newRoot = GC_G(unused);
if (newRoot) {
GC_G (unused) = newRoot->prev;
} else if (GC_G(first_unused) != GC_G(last_unused)) {
newRoot = GC_G(first_unused);
GC_G(first_unused)++;
} else {
if (!GC_G(gc_enabled)) {
GC_ZVAL_SET_BLACK(zv);
return;
}
zv->refcount__gc++;
gc_collect_cycles(TSRMLS_C) ;
zv->refcount__gc--;
newRoot = GC_G(unused);
if (!newRoot) {
return;
}
GC_ZVAL_SET_PURPLE(zv);
GC_G(unused) = newRoot->prev;
}
newRoot->next = GC_G(roots).next;
newRoot->prev = &GC_G(roots);
GC_G (roots).next->prev = newRoot;
GC_G(roots).next = newRoot;
GC_ZVAL_SET_ADDRESS(zv, newRoot);
newRoot->handle = 0;
newRoot- >u.pz = zv;
GC_BENCH_INC(zval_buffered);
GC_BENCH_INC(root_buf_length);
GC_BENCH_PEAK(root_buf_peak, root_buf_length);
}
}
}

As mentioned earlier, the gc_zval_check_possible_root function only performs garbage collection operations on arrays and objects. However, in the gc_zval_possible_root function, the GC_ZOBJ_CHECK_POSSIBLE_ROOT macro will be called for variables of object type. For other variable types that can be used for garbage collection mechanisms, the calling process is as follows:
Check whether the zval node information has been put into the node buffer. If it has been put into the node buffer, return directly. This optimizes its performance. Then process the object node and return directly without performing subsequent operations to determine whether the node has been marked purple. If it is purple, it will no longer be added to the node buffer. This is to ensure that a node is only added to the node once. Buffer operations.

Mark the color of the node as purple, indicating that the node has been added to the buffer, and there is no need to add it next time
Find the location of the new node. If the buffer is full, then Perform garbage collection operations.
Add new nodes to the doubly linked list where the buffer is located.
In the gc_zval_possible_root function, when the buffer is full, the program calls the gc_collect_cycles function to perform garbage collection operations. The most critical steps are :
Line 628 is step B of the algorithm in its official document. The algorithm uses depth-first search to find all possible roots. After finding it, each variable container is The reference count in is decremented by 1. To ensure that the same variable container is not decremented by "1" twice, the ones that have been decremented by 1 are marked in gray.
Line 629 This is step C of the algorithm, which again uses a depth-first search for each root node, checking the reference count of each variable container. If the reference count is 0, the variable container is marked white. If the reference count is greater than 0, resume the operation that used depth-first search to decrement the reference count at this point (i.e., increase the reference count by 1), and then re-mark them in black.
Line 630 The last step of the algorithm D, the algorithm traverses the root buffer to remove the variable container roots (zval roots) from there, and at the same time, checks whether there are variable containers that were marked white in the previous step. Each white-marked variable container is cleared. In [gc_collect_cycles() -> gc_collect_roots() -> zval_collect_white() ] we can see that the nodes marked white will be added to the global variable zval_to_free list. This list will be used later.
PHP’s garbage collection mechanism marks the status with four colors during execution.
GC_WHITE white indicates garbage
GC_PURPLE purple indicates that it has been put into the buffer
GC_GREY gray indicates that a refcount operation has been performed by decrementing one
GC_BLACK black is the default color, normal
related tags and The operation code is as follows:
Copy code The code is as follows:

#define GC_COLOR 0x03
#define GC_BLACK 0x00
#define GC_WHITE 0x01
#define GC_GREY 0x02
#define GC_PURPLE 0x03
#define GC_ADDRESS(v)
((gc_root_buffer*)(((zend_uintptr_t)(v)) & ~GC_COLOR))
#define GC_SET_ADDRESS(v, a)
(v) = ((gc_root_buffer*)(((zend_uintptr_t)(v)) & GC_COLOR) | ((zend_uintptr_t)(a))))
# GC_GET_COLOR(v)
(((zend_uintptr_t)(v)) & GC_COLOR)
#define GC_SET_COLOR(v, c)
(v) = ((gc_root_buffer*)((((zend_uintptr_t)( define v)) & ~GC_COLOR) | (c)))
#define GC_SET_BLACK(v)
(v) = ((gc_root_buffer*)(((zend_uintptr_t)(v)) & ~GC_COLOR))
#define GC_SET_PURPLE(v)
(v) = ((gc_root_buffer*)((zend_uintptr_t)(v)) | GC_PURPLE))

The above uses bits to mark the status The method is used more frequently in PHP source code, such as memory management. This is a more efficient and economical solution. However, when we design the database, we may not be able to use this method for fields. We should implement it in a more intuitive and readable way.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/326234.htmlTechArticleThe garbage collection mechanism is a dynamic storage allocation scheme. It automatically releases allocated memory blocks that are no longer needed by the program. The process of automatically reclaiming memory is called garbage collection. The garbage collection mechanism can...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn