Home  >  Article  >  Backend Development  >  A detailed discussion of PHP garbage collection mechanism

A detailed discussion of PHP garbage collection mechanism

不言
不言Original
2018-04-17 09:10:441060browse

The content of this article is about the detailed discussion of PHP garbage collection mechanism, which has certain reference value. Now I share it with everyone. Friends in need can refer to it

Basic knowledge of reference counting

Each PHP variable exists in a variable container called "zval". A zval variable container contains, in addition to the type and value of the variable, two bytes of additional information. The first one is "is_ref", which is a bool value used to identify whether this variable belongs to the reference set. Through this byte, the PHP engine can distinguish ordinary variables from reference variables. Since PHP allows users to use custom references by using &, there is also an internal reference counting mechanism in the zval variable container to optimize memory usage. The second extra byte is "refcount", which is used to indicate the number of variables (also called symbols) pointing to this zval variable container. All symbols exist in a symbol table, where each symbol has a scope (scope), the main script (for example: the script requested through the browser) and each function or method also have a scope.

When a variable is assigned a constant value, a zval variable container will be generated, as in the following example:

Example #1 Generate a new zval container

<?php
$a = "new string";
?>


In the above example, the new variable a is generated in the current scope. And a variable container with type string and value new string is generated. In the extra two bytes of information, "is_ref" is set to FALSE by default because no custom reference is generated. "refcount" is set to 1 because there is only one variable using this variable container. Note that when "refcount" is 1, "is_ref" is always FALSE. If you have » Xdebug installed, you can display the values ​​of "refcount" and "is_ref" by calling the function xdebug_debug_zval().

Example #2 Display zval information

<?php
xdebug_debug_zval(&#39;a&#39;);
?>
以上例程会

Output:

a: (refcount=1, is_ref=0)='new string'

Assigning one variable to another variable will increase the number of references (refcount).

Example #3 Increase the reference count of a zval

<?php
$a = "new string";
$b = $a;
xdebug_debug_zval( &#39;a&#39; );
?>


The above routine will output:

a: (refcount=2, is_ref=0)='new string'

At this time, the number of references is 2, because the same variable container is associated with the variable a and the variable b. When it is not necessary, PHP will not copy the generated variable container. The variable container is destroyed when "refcount" becomes 0. When any variable associated with a variable container leaves its scope (for example: the function execution ends), or the function unset()# is called on the variable ##, "refcount" will be reduced by 1, as shown in the following example:

Example #4 Reduce the reference count

<?php
$a = "new string";
$c = $b = $a;
xdebug_debug_zval( &#39;a&#39; );
unset( $b, $c );
xdebug_debug_zval( &#39;a&#39; );
?>


The above routine will output:

a: (refcount=3, is_ref=0)='new string'a: (refcount=1, is_ref=0)=' new string'

如果我们现在执行 unset($a);,包含类型和值的这个变量容器就会从内存中删除。

复合类型(Compound Types)

当考虑像 arrayobject这样的复合类型时,事情就稍微有点复杂. 与 标量(scalar)类型的值不同,arrayobject类型的变量把它们的成员或属性存在自己的符号表中。这意味着下面的例子将生成三个zval变量容器。

Example #5 Creating aarray zval

<?php
$a = array( &#39;meaning&#39; => &#39;life&#39;, &#39;number&#39; => 42 );
xdebug_debug_zval( &#39;a&#39; );
?>


以上例程的输出类似于:

a: (refcount=1, is_ref=0)=array ( 'meaning' => (refcount=1, is_ref=0)='life', 'number' => (refcount=1, is_ref=0)=42)

图示:

上面的结果如果在PHP5中是没有问题的,但是当我在PHP7中进行试验验证是发现输出的结果和上面并不一致,如下:

a:

(refcount=1, is_ref=0)array(size=2) 'meaning'=> (refcount=2, is_ref=0)string'life' (length=4) 'number'=> (refcount=0, is_ref=0)int42


这三个zval变量容器是:ameaningnumber。增加和减少”refcount”的规则和上面提到的一样. 下面, 我们在数组中再添加一个元素,并且把它的值设为数组中已存在元素的值:

Example #6 添加一个已经存在的元素到数组中

<?php
$a = array( &#39;meaning&#39; => &#39;life&#39;, &#39;number&#39; => 42 );
$a[&#39;life&#39;] = $a[&#39;meaning&#39;];
xdebug_debug_zval( &#39;a&#39; );
?>


以上例程的输出类似于:

a: (refcount=1, is_ref=0)=array ( 'meaning' => (refcount=2, is_ref=0)='life', 'number' => (refcount=1, is_ref=0)=42, 'life' => (refcount=2, is_ref=0)='life')

PHP7中的运行结果

a:

(refcount=1, is_ref=0)array(size=3) 'meaning'=> (refcount=3, is_ref=0)string'life' (length=4) 'number'=> (refcount=0, is_ref=0)int42 'life' =>(refcount=3, is_ref=0)string 'life' (length=4)

图示:

从以上的xdebug输出信息,我们看到原有的数组元素和新添加的数组元素关联到同一个"refcount"2的zval变量容器. 尽管 Xdebug的输出显示两个值为'life'的 zval 变量容器,其实是同一个。 函数xdebug_debug_zval()不显示这个信息,但是你能通过显示内存指针信息来看到。

删除数组中的一个元素,就是类似于从作用域中删除一个变量. 删除后,数组中的这个元素所在的容器的“refcount”值减少,同样,当“refcount”为0时,这个变量容器就从内存中被删除,下面又一个例子可以说明:

Example #7 从数组中删除一个元素

<?php
$a = array( &#39;meaning&#39; => &#39;life&#39;, &#39;number&#39; => 42 );
$a[&#39;life&#39;] = $a[&#39;meaning&#39;];
unset( $a[&#39;meaning&#39;], $a[&#39;number&#39;] );
xdebug_debug_zval( &#39;a&#39; );
?>


以上例程的输出类似于:

a: (refcount=1, is_ref=0)=array ( 'life' => (refcount=1, is_ref=0)='life')

PHP7中运行的结果

a:

(refcount=1, is_ref=0)array(size=1) 'life'=> (refcount=2, is_ref=0)string'life' (length=4)

现在,当我们添加一个数组本身作为这个数组的元素时,事情就变得有趣,下个例子将说明这个。例中我们加入了引用操作符,否则php将生成一个复制。

Example #8 把数组作为一个元素添加到自己

<?php
$a = array( &#39;one&#39; );
$a[] =& $a;
xdebug_debug_zval( &#39;a&#39; );
?>


以上例程的输出类似于:

a: (refcount=2, is_ref=1)=array ( 0 => (refcount=1, is_ref=0)='one', 1 => (refcount=2, is_ref=1)=...)

PHP中运行的结果

a:

(refcount=2, is_ref=1)array(size=2) 0=>(refcount=2, is_ref=0)string'one' (length=3) 1=> (refcount=2, is_ref=1)&array4a1caacb728512ad16d8065cb72b0614 (refcount=1, is_ref=0)='one', 1 => (refcount=1, is_ref=1)=...)

图示:


通过PHP5和PHP7环境中的运行结果对比可以看出,PHP7中的内存回收机制有了改变,那么为什么会有这种改变呢?我查阅了一些资料供大家参考。

PHP7 中的 zval

在 PHP7 中 zval 有了新的实现方式。最基础的变化就是 zval 需要的内存不再是单独从堆上分配,不再自己存储引用计数。复杂数据类型(比如字符串、数组和对象)的引用计数由其自身来存储。这种实现方式有以下好处:

简单数据类型不需要单独分配内存,也不需要计数;

不会再有两次计数的情况。在对象中,只有对象自身存储的计数是有效的;

由于现在计数由数值自身存储,所以也就可以和非 zval 结构的数据共享,比如 zval 和 hashtable key 之间;

间接访问需要的指针数减少了。

我们看看现在 zval 结构体的定义(现在在 zend_types.h 文件中):

struct _zval_struct {
 zend_value  value;   /* value */
 union {
  struct {
   ZEND_ENDIAN_LOHI_4(
    zend_uchar type,   /* active type */
    zend_uchar type_flags,
    zend_uchar const_flags,
    zend_uchar reserved)  /* call info for EX(This) */
  } v;
  uint32_t type_info;
 } u1;
 union {
  uint32_t  var_flags;
  uint32_t  next;     /* hash collision chain */
  uint32_t  cache_slot;   /* literal cache slot */
  uint32_t  lineno;    /* line number (for ast nodes) */
  uint32_t  num_args;    /* arguments number for EX(This) */
  uint32_t  fe_pos;    /* foreach position */
  uint32_t  fe_iter_idx;   /* foreach iterator index */
 } u2;
};


结构体的第一个元素没太大变化,仍然是一个 value 联合体。第二个成员是由一个表示类型信息的整型和一个包含四个字符变量的结构体组成的联合体(可以忽略 ZEND_ENDIAN_LOHI_4 宏,它只是用来解决跨平台大小端问题的)。这个子结构中比较重要的部分是 type(和以前类似)和 type_flags,这个接下来会解释。

上面这个地方也有一点小问题:value 本来应该占 8 个字节,但是由于内存对齐,哪怕只增加一个字节,实际上也是占用 16 个字节(使用一个字节就意味着需要额外的 8 个字节)。但是显然我们并不需要 8 个字节来存储一个 type 字段,所以我们在 u1 的后面增加了了一个名为 u2 的联合体。默认情况下是用不到的,需要使用的时候可以用来存储 4 个字节的数据。这个联合体可以满足不同场景下的需求。

PHP7 中 value 的结构定义如下:

typedef union _zend_value {
 zend_long   lval;    /* long value */
 double   dval;    /* double value */
 zend_refcounted *counted;
 zend_string  *str;
 zend_array  *arr;
 zend_object  *obj;
 zend_resource *res;
 zend_reference *ref;
 zend_ast_ref  *ast;
 zval    *zv;
 void    *ptr;
 zend_class_entry *ce;
 zend_function *func;
 struct {
  uint32_t w1;
  uint32_t w2;
 } ww;
} zend_value;
首先需要注意的是现在 value 联合体需要的内存是 8 个字节而不是 16。它只会直接存储整型(lval)或者浮点型(dval)数据,其他情况下都是指针(上面提到过,指针占用 8 个字节,最下面的结构体由两个 4 字节的无符号整型组成)。上面所有的指针类型(除了特殊标记的)都有一个同样的头(zend_refcounted)用来存储引用计数:
typedef struct _zend_refcounted_h {
 uint32_t   refcount;   /* reference counter 32-bit */
 union {
  struct {
   ZEND_ENDIAN_LOHI_3(
    zend_uchar type,
    zend_uchar flags, /* used for strings & objects */
    uint16_t  gc_info) /* keeps GC root number (or 0) and color */
  } v;
  uint32_t type_info;
 } u;
} zend_refcounted_h;



现在,这个结构体肯定会包含一个存储引用计数的字段。除此之外还有 type、flags 和 gc_info。type 存储的和 zval 中的 type 相同的内容,这样 GC 在不存储 zval 的情况下单独使用引用计数。flags 在不同的数据类型中有不同的用途,这个放到下一部分讲。

gc_info 和 PHP5 中的 buffered 作用相同,不过不再是位于根缓冲区的指针,而是一个索引数字。因为以前根缓冲区的大小是固定的(10000 个元素),所以使用一个 16 位(2 字节)的数字代替 64 位(8 字节)的指针足够了。gc_info 中同样包含一个『颜色』位用于回收时标记结点。

zval 内存管理

上文提到过 zval 需要的内存不再单独从堆上分配。但是显然总要有地方来存储它,所以会存在哪里呢?实际上大多时候它还是位于堆中(所以前文中提到的地方重点不是堆,而是单独分配),只不过是嵌入到其他的数据结构中的,比如 hashtable 和 bucket 现在就会直接有一个 zval 字段而不是指针。所以函数表编译变量和对象属性在存储时会是一个 zval 数组并得到一整块内存而不是散落在各处的 zval 指针。之前的 zval * 现在都变成了 zval。

之前当 zval 在一个新的地方使用时会复制一份 zval * 并增加一次引用计数。现在就直接复制 zval 的值(忽略 u2),某些情况下可能会增加其结构指针指向的引用计数(如果在进行计数)。

那么 PHP 怎么知道 zval 是否正在计数呢?不是所有的数据类型都能知道,因为有些类型(比如字符串或数组)并不是总需要进行引用计数。所以 type_info 字段就是用来记录 zval 是否在进行计数的,这个字段的值有以下几种情况:

#define IS_TYPE_CONSTANT   (1<<0) /* special */
#define IS_TYPE_IMMUTABLE   (1<<1) /* special */
#define IS_TYPE_REFCOUNTED   (1<<2)
#define IS_TYPE_COLLECTABLE   (1<<3)
#define IS_TYPE_COPYABLE   (1<<4)
#define IS_TYPE_SYMBOLTABLE   (1<<5) /* special */

注:在 7.0.0 的正式版本中,上面这一段宏定义的注释这几个宏是供 zval.u1.v.type_flags 使用的。这应该是注释的错误,因为这个上述字段是 zend_uchar 类型。

type_info 的三个主要的属性就是『可计数』(refcounted)、『可回收』(collectable)和『可复制』(copyable)。计数的问题上面已经提过了。『可回收』用于标记 zval 是否参与循环,不如字符串通常是可计数的,但是你却没办法给字符串制造一个循环引用的情况。

是否可复制用于表示在复制时是否需要在复制时制造(原文用的 "duplication" 来表述,用中文表达出来可能不是很好理解)一份一模一样的实体。"duplication" 属于深度复制,比如在复制数组时,不仅仅是简单增加数组的引用计数,而是制造一份全新值一样的数组。但是某些类型(比如对象和资源)即使 "duplication" 也只能是增加引用计数,这种就属于不可复制的类型。这也和对象和资源现有的语义匹配(现有,PHP7 也是这样,不单是 PHP5)。

下面的表格上标明了不同的类型会使用哪些标记(x 标记的都是有的特性)。『简单类型』(simple types)指的是整型或布尔类型这些不使用指针指向一个结构体的类型。下表中也有『不可变』(immutable)的标记,它用来标记不可变数组的,这个在下一部分再详述。

interned string(保留字符)在这之前没有提过,其实就是函数名、变量名等无需计数、不可重复的字符串。

                | refcounted | collectable | copyable | immutable

----------------+------------+-------------+----------+----------

simple types    |            |             |          |

string          |      x     |             |     x    |

interned string |            |             |          |

array           |      x     |      x      |     x    |

immutable array |            |             |          |     x

object          |      x     |      x      |          |

resource        |      x     |             |          |

reference       |      x     |             |          |

要理解这一点,我们可以来看几个例子,这样可以更好的认识 zval 内存管理是怎么工作的。

下面是整数行为模式,在上文中 PHP5 的例子的基础上进行了一些简化 :

<?php
$a= 42; // $a = zval_1(type=IS_LONG, value=42)
$b= $a; // $a = zval_1(type=IS_LONG, value=42)
   // $b = zval_2(type=IS_LONG, value=42)
$a+= 1; // $a = zval_1(type=IS_LONG, value=43)
   // $b = zval_2(type=IS_LONG, value=42)
unset($a); // $a = zval_1(type=IS_UNDEF)
   // $b = zval_2(type=IS_LONG, value=42)


这个过程其实挺简单的。现在整数不再是共享的,变量直接就会分离成两个单独的 zval,由于现在 zval 是内嵌的所以也不需要单独分配内存,所以这里的注释中使用 = 来表示的而不是指针符号 ->,unset 时变量会被标记为 IS_UNDEF。下面看一下更复杂的情况:

<?php
$a= []; // $a = zval_1(type=IS_ARRAY) -> zend_array_1(refcount=1, value=[])
$b= $a; // $a = zval_1(type=IS_ARRAY) -> zend_array_1(refcount=2, value=[])
   // $b = zval_2(type=IS_ARRAY) ---^
// zval 分离在这里进行
$a[] = 1 // $a = zval_1(type=IS_ARRAY) -> zend_array_2(refcount=1, value=[1])
   // $b = zval_2(type=IS_ARRAY) -> zend_array_1(refcount=1, value=[])
unset($a); // $a = zval_1(type=IS_UNDEF), zend_array_2 被销毁
   // $b = zval_2(type=IS_ARRAY) -> zend_array_1(refcount=1, value=[])


这种情况下每个变量变量有一个单独的 zval,但是是指向同一个(有引用计数) zend_array 的结构体。修改其中一个数组的值时才会进行复制。这点和 PHP5 的情况类似。

类型(Types)

我们大概看一下 PHP7 支持哪些类型(zval 使用的类型标记):

/* regular data types */
#define IS_UNDEF     0
#define IS_NULL     1
#define IS_FALSE     2
#define IS_TRUE      3
#define IS_LONG     4
#define IS_DOUBLE    5
#define IS_STRING    6
#define IS_ARRAY    7
#define IS_OBJECT    8
#define IS_RESOURCE    9
#define IS_REFERENCE    10
/* constant expressions */
#define IS_CONSTANT     11
#define IS_CONSTANT_AST    12
/* internal types */
#define IS_INDIRECT     15
#define IS_PTR      17


这个列表和 PHP5 使用的类似,不过增加了几项:

IS_UNDEF 用来标记之前为 NULL 的 zval 指针(和 IS_NULL 并不冲突)。比如在上面的例子中使用 unset 注销变量;

IS_BOOL 现在分割成了 IS_FALSE 和 IS_TRUE 两项。现在布尔类型的标记是直接记录到 type 中,这么做可以优化类型检查。不过这个变化对用户是透明的,还是只有一个『布尔』类型的数据(PHP 脚本中)。

PHP 引用不再使用 is_ref 来标记,而是使用 IS_REFERENCE 类型。这个也要放到下一部分讲;

IS_INDIRECT  和  IS_PTR 是特殊的内部标记。

实际上上面的列表中应该还存在两个 fake types,这里忽略了。

IS_LONG 类型表示的是一个 zend_long 的值,而不是原生的 C 语言的 long 类型。原因是 Windows 的 64 位系统(LLP64)上的 long 类型只有 32 位的位深度。所以 PHP5 在 Windows 上只能使用 32 位的数字。PHP7 允许你在 64 位的操作系统上使用 64 位的数字,即使是在 Windows 上面也可以。

zend_refcounted 的内容会在下一部分讲。下面看看 PHP 引用的实现。

引用

PHP7 使用了和 PHP5 中完全不同的方法来处理 PHP & 符号引用的问题(这个改动也是 PHP7 开发过程中大量 bug 的根源)。我们先从 PHP5 中 PHP 引用的实现方式说起。

通常情况下, 写时复制原则意味着当你修改一个 zval 之前需要对其进行分离来保证始终修改的只是某一个 PHP 变量的值。这就是传值调用的含义。

但是使用 PHP 引用时这条规则就不适用了。如果一个 PHP 变量是 PHP 引用,就意味着你想要在将多个 PHP 变量指向同一个值。PHP5 中的 is_ref 标记就是用来注明一个 PHP 变量是不是 PHP 引用,在修改时需不需要进行分离的。比如:

<?php
$a= []; // $a  -> zval_1(type=IS_ARRAY, refcount=1, is_ref=0) -> HashTable_1(value=[])
$b=& $a; // $a, $b -> zval_1(type=IS_ARRAY, refcount=2, is_ref=1) -> HashTable_1(value=[])
 
$b[] = 1; // $a = $b = zval_1(type=IS_ARRAY, refcount=2, is_ref=1) -> HashTable_1(value=[1])
   // 因为 is_ref 的值是 1, 所以 PHP 不会对 zval 进行分离



但是这个设计的一个很大的问题在于它无法在一个 PHP 引用变量和 PHP 非引用变量之间共享同一个值。比如下面这种情况:

<?php
$a= []; // $a   -> zval_1(type=IS_ARRAY, refcount=1, is_ref=0) -> HashTable_1(value=[])
$b= $a; // $a, $b  -> zval_1(type=IS_ARRAY, refcount=2, is_ref=0) -> HashTable_1(value=[])
$c= $b// $a, $b, $c -> zval_1(type=IS_ARRAY, refcount=3, is_ref=0) -> HashTable_1(value=[])
$d=& $c; // $a, $b -> zval_1(type=IS_ARRAY, refcount=2, is_ref=0) -> HashTable_1(value=[])
   // $c, $d -> zval_1(type=IS_ARRAY, refcount=2, is_ref=1) -> HashTable_2(value=[])
   // $d 是 $c 的引用, 但却不是 $a 的 $b, 所以这里 zval 还是需要进行复制
   // 这样我们就有了两个 zval, 一个 is_ref 的值是 0, 一个 is_ref 的值是 1.
$d[] = 1; // $a, $b -> zval_1(type=IS_ARRAY, refcount=2, is_ref=0) -> HashTable_1(value=[])
   // $c, $d -> zval_1(type=IS_ARRAY, refcount=2, is_ref=1) -> HashTable_2(value=[1])
   // 因为有两个分离了的 zval, $d[] = 1 的语句就不会修改 $a 和 $b 的值.



这种行为方式也导致在 PHP 中使用引用比普通的值要慢。比如下面这个例子:

<?php
$array= range(0, 1000000);
$ref=& $array;
var_dump(count($array)); // <-- 这里会进行分离



因为 count() 只接受传值调用,但是 $array 是一个 PHP 引用,所以 count() 在执行之前实际上会有一个对数组进行完整的复制的过程。如果 $array 不是引用,这种情况就不会发生了。

现在我们来看看 PHP7 中 PHP 引用的实现。因为 zval 不再单独分配内存,也就没办法再使用和 PHP5 中相同的实现了。所以增加了一个 IS_REFERENCE 类型,并且专门使用 zend_reference 来存储引用值:

struct _zend_reference {
 zend_refcounted gc;
 zval    val;
};


本质上 zend_reference 只是增加了引用计数的 zval。所有引用变量都会存储一个 zval 指针并且被标记为 IS_REFERENCE。val 和其他的 zval 的行为一样,尤其是它也可以在共享其所存储的复杂变量的指针,比如数组可以在引用变量和值变量之间共享。

我们还是看例子,这次是 PHP7 中的语义。为了简洁明了这里不再单独写出 zval,只展示它们指向的结构体:

<?php
$a= []; // $a          -> zend_array_1(refcount=1, value=[])
$b=& $a; // $a, $b -> zend_reference_1(refcount=2) -> zend_array_1(refcount=1, value=[])
$b[] = 1; // $a, $b -> zend_reference_1(refcount=2) -> zend_array_1(refcount=1, value=[1])


上面的例子中进行引用传递时会创建一个 zend_reference,注意它的引用计数是 2(因为有两个变量在使用这个 PHP 引用)。但是值本身的引用计数是 1(因为 zend_reference 只是有一个指针指向它)。下面看看引用和非引用混合的情况:

<?php
$a= []; // $a   -> zend_array_1(refcount=1, value=[])
$b= $a; // $a, $b, -> zend_array_1(refcount=2, value=[])
$c= $b// $a, $b, $c -> zend_array_1(refcount=3, value=[])
$d=& $c; // $a, $b         -> zend_array_1(refcount=3, value=[])
   // $c, $d -> zend_reference_1(refcount=2) ---^
   // 注意所有变量共享同一个 zend_array, 即使有的是 PHP 引用有的不是
$d[] = 1; // $a, $b         -> zend_array_1(refcount=2, value=[])
   // $c, $d -> zend_reference_1(refcount=2) -> zend_array_2(refcount=1, value=[1])
   // 只有在这时进行赋值的时候才会对 zend_array 进行赋值



这里和 PHP5 最大的不同就是所有的变量都可以共享同一个数组,即使有的是 PHP 引用有的不是。只有当其中某一部分被修改的时候才会对数组进行分离。这也意味着使用 count() 时即使给其传递一个很大的引用数组也是安全的,不会再进行复制。不过引用仍然会比普通的数值慢,因为存在需要为 zend_reference 结构体分配内存(间接)并且引擎本身处理这一块儿也不快的的原因。

结语

To summarize, the most important change in PHP7 is that zval no longer allocates memory from the heap alone and does not store its own reference count. Complex types that require zval pointers (such as strings, arrays, and objects) store their own reference counts. This results in fewer memory allocation operations, less indirect pointer usage, and fewer memory allocations.


Cleanup Problems

Although there are no longer any symbols in a scope pointing to this Structure (that is, a variable container), because the array element "1" still points to the array itself, this container cannot be cleared. Since there is no other symbol pointing to it, the user has no way to clear the structure, resulting in a memory leak. Fortunately, PHP will clear this data structure at the end of the script execution, but until PHP clears it, it will consume a lot of memory. This happens a lot if you're implementing a parsing algorithm, or doing other things like having a child element point to its parent. Of course, the same situation can happen with objects, in fact it is more likely to happen with objects, because objects are always implicitly referenced.

It’s okay if the above situation occurs only once or twice, but if memory leaks occur thousands or even hundreds of thousands of times, this is obviously a big problem. Such problems often occur in long-running scripts, such as daemons whose requests rarely end, or large sets in unit tests. An example of the latter: problems may arise when unit testing template components of the huge eZ (a well-known PHP Library) component library. Sometimes the test may require 2GB of memory, and the test server may not have such a large memory.


Collecting Cycles

Traditionally, the reference counting memory mechanism used in PHP cannot be processed Circular reference memory leak. However, 5.3.0 PHP uses the synchronization algorithm in the article»Concurrent Cycle Collection in Reference Counted Systems to deal with this memory leak problem.

A complete description of the algorithm is somewhat beyond the scope of this section, and only the basics will be introduced. First, we need to establish some basic rules. If a reference count is increased, it will continue to be used and of course no longer in the garbage. If the reference count is reduced to zero, the variable container will be cleared (free). That is, a garbage cycle occurs only when the reference count decreases to a non-zero value. Secondly, during a garbage cycle, find out which parts are garbage by checking whether the reference count is reduced by 1 and checking which variable containers have zero references.

To avoid having to check all garbage cycles where reference counts may be reduced, this algorithm puts all possible roots (possible roots are zval variable containers) in the root buffer (colored purple) mark, called suspected garbage), which also ensures that each possible garbage root appears only once in the buffer. Garbage collection is performed on all different variable containers within the buffer only when the root buffer is full. Look at step A in the image above.

In step B, simulate deleting each purple variable. When simulating deletion, the reference count of ordinary variables that are not purple may be reduced by "1". If the reference count of an ordinary variable becomes 0, simulate deletion of this ordinary variable again. Each variable can only be simulated deleted once, and will be marked gray after simulated deletion (the original article said to ensure that the same variable container is not decremented by "1" twice, which is wrong).

In step C, the simulation restores each purple variable. Recovery is conditional. When the reference count of the variable is greater than 0, simulated recovery is performed. Similarly, each variable can only be restored once. After restoration, it is marked as black. It is basically the inverse operation of step B. In this way, the remaining pile of unrecoverable blue nodes are the blue nodes that should be deleted. Traverse them in step D and delete them.

The algorithms are all simulated deletion, simulated recovery, and real deletion, all using simple traversal (the most typical deep search traversal). The complexity is positively related to the number of nodes performing simulation operations, not just those suspected garbage variables in purple.

Now that you have a basic understanding of this algorithm, let's go back and see how this is integrated with PHP. By default, PHP's garbage collection mechanism is turned on, and there is a php.ini setting that allows you to modify it: zend.enable_gc.

When the garbage collection mechanism is turned on, the loop search algorithm described above will be executed whenever the root buffer is full. The root cache area has a fixed size and can store 10,000 possible roots. Of course, you can modify the constant GC_ROOT_BUFFER_MAX_ENTRIES in the PHP source file Zend/zend_gc.c and then recompile PHP. Modify this 10,000 value. When garbage collection is turned off, the loop search algorithm never executes, however, it is possible that the root will always exist in the root buffer regardless of whether garbage collection is activated in the configuration.

When the garbage collection mechanism is turned off, if the root buffer is full of possible roots, more possible roots will obviously not be recorded. Possible roots that are not recorded will not be analyzed and processed by this algorithm. If they are part of a cyclic reference cycle, they will never be cleared and cause a memory leak.

The reason possible roots are recorded even when garbage collection is unavailable is that recording possible roots is faster than checking whether garbage collection is turned on each time a possible root is found. However, the garbage collection and analysis mechanism itself takes a lot of time.

In addition to modifying the configuration zend.enable_gc, you can also turn on and off garbage collection by calling the gc_enable() and gc_disable() functions respectively. mechanism. Calling these functions has the same effect as modifying configuration items to turn on or off the garbage collection mechanism. Ability to force periodic collection even when the root buffer may not be full. You can call the gc_collect_cycles() function to achieve this purpose. This function will return the number of cycles recycled using this algorithm.

The reason you allow turning garbage collection on and off and allowing autonomous initialization is because some parts of your application may be time-sensitive. In this case, you probably don't want to use garbage collection. Of course, turning off garbage collection for certain parts of your application runs the risk of possible memory leaks, since some possible roots may not fit into the limited root buffer. Therefore, just before you call the gc_disable() function to release the memory, it may be wise to call the gc_collect_cycles() function first. Because this will clear all possible roots that have been stored in the root buffer, then when the garbage collection mechanism is turned off, an empty buffer can be left to have more space to store possible roots.

Related recommendations:

Master the PHP garbage collection mechanism

Detailed explanation of PHP garbage collection and memory management mechanism


The above is the detailed content of A detailed discussion of PHP garbage collection mechanism. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:php output logNext article:php output log