search
HomeBackend DevelopmentPHP TutorialAnalyzing the evolution of the garbage collection mechanism in PHP5

Preface

PHP is a managed language. In PHP programming, programmers do not need to manually handle the allocation and release of memory resources (except when using C to write PHP or Zend extensions), which means that PHP It implements the garbage collection mechanism (Garbage Collection) itself. Now if you go to the official PHP website (php.net) you can see that the current two branch versions of PHP5, PHP5.2 and PHP5.3, are updated separately. This is because many projects still use the 5.2 version of PHP, and the 5.3 version is 5.2 is not fully compatible. PHP5.3 has made many improvements based on PHP5.2, among which the garbage collection algorithm is a relatively big change. This article will discuss the garbage collection mechanisms of PHP5.2 and PHP5.3 respectively, and discuss the impact of this evolution and improvement on programmers writing PHP and the issues they should pay attention to.

Internal representation of PHP variables and associated memory objects

Garbage collection is ultimately an operation on variables and their associated memory objects, so before discussing PHP’s garbage collection mechanism, let’s briefly introduce it The internal representation of variables and their memory objects in PHP (their representation in C source code).

The PHP official documentation divides variables in PHP into two categories: scalar types and complex types. Scalar types include booleans, integers, floating point types and strings; complex types include arrays, objects and resources; there is also a special NULL, which is not divided into any type, but becomes a separate category.

All these types are uniformly represented by a structure called zval within PHP. In the PHP source code, the name of this structure is "_zval_struct". The specific definition of zval is in the "Zend/zend.h" file of the PHP source code. The following is an excerpt of the relevant code.

typedef union _zvalue_value {
    long lval;                  /* long value */
    double dval;                /* double value */
    struct {
        char *val;
        int len;
    } str;
    HashTable *ht;              /* hash table value */
    zend_object_value obj;
} zvalue_value;
 
struct _zval_struct {
    /* Variable information */
    zvalue_value value;     /* value */
    zend_uint refcount__gc;
    zend_uchar type;    /* active type */
    zend_uchar is_ref__gc;
};

The union "_zvalue_value" is used to represent the values ​​of all variables in PHP. The reason why union is used here is because a zval can only represent one type of variable at a time. You can see that there are only 5 fields in _zvalue_value, but there are 8 data types in PHP including NULL. So how does PHP use 5 fields to represent 8 types internally? This is one of the more clever aspects of PHP design. It achieves the purpose of reducing fields by reusing fields. For example, within PHP, Boolean types, integers and resources (as long as the identifier of the resource is stored) are stored through the lval field; dval is used to store floating point types; str stores strings; ht stores arrays (note that in PHP The array is actually a hash table); and obj stores the object type; if all fields are set to 0 or NULL, it means NULL in PHP, so that 5 fields are used to store 8 types of values.

The type of value in the current zval (the type of value is _zvalue_value) is determined by the type in "_zval_struct". _zval_struct is the specific implementation of zval in C language. Each zval represents a memory object of a variable. In addition to value and type, you can see that there are two fields refcount__gc and is_ref__gc in _zval_struct. From their suffixes, you can conclude that these two guys are related to garbage collection. That's right, PHP's garbage collection relies entirely on these two fields. Among them, refcount__gc indicates that there are several variables currently referencing this zval, and is_ref__gc indicates whether the current zval is referenced by reference. This sounds very confusing. This is related to the "Write-On-Copy" mechanism of zval in PHP. Since this topic is not This article is the focus, so I won’t go into details here. Readers only need to remember the role of the refcount__gc field.

Garbage collection algorithm in PHP5.2 - Reference Counting

The memory recycling algorithm used in PHP5.2 is the famous Reference Counting. The Chinese translation of this algorithm is called "reference counting". The idea is very intuitive and concise: assign a counter to each memory object. When a memory object is created, the counter is initialized to 1 (so there is always a variable referencing this object at this time). Every time a new variable refers to this memory object, The counter is incremented by 1, and every time a variable that references this memory object is reduced, the counter is decremented by 1. When the garbage collection mechanism operates, all memory objects with a counter of 0 are destroyed and the memory they occupy is recycled. The memory object in PHP is zval, and the counter is refcount__gc.

For example, the following PHP code demonstrates the working principle of the PHP5.2 counter (the counter value is obtained through xdebug):

<?php
 
$val1 = 100; //zval(val1).refcount_gc = 1;
$val2 = $val1; //zval(val1).refcount_gc = 2,zval(val2).refcount_gc = 2(因为是Write on copy,当前val2与val1共同引用一个zval)
$val2 = 200; //zval(val1).refcount_gc = 1,zval(val2).refcount_gc = 1(此处val2新建了一个zval)
unset($val1); //zval(val1).refcount_gc = 0($val1引用的zval再也不可用,会被GC回收)
 
?>
Reference Counting简单直观,实现方便,但却存在一个致命的缺陷,就是容易造成内存泄露。很多朋友可能已经意识到了,如果存在循环引用,那么Reference Counting就可能导致内存泄露。例如下面的代码:
<?php
$a = array();
$a[] = & $a;
unset($a);
 
?>

This code first creates the array a, and then lets the first An element points to a by reference. At this time, the refcount of a's zval becomes 2. Then we destroy the variable a. At this time, the refcount of the zval initially pointed to by a is 1, but we can no longer operate on it because it A circular self-reference is formed, as shown in the following figure:

Analyzing the evolution of the garbage collection mechanism in PHP5

The gray part indicates that it no longer exists. Since the refcount of the zval pointed to by a is 1 (referenced by the first element of its HashTable), this zval will not be destroyed by GC, and this part of the memory will be leaked.

这里特别要指出的是,PHP是通过符号表(Symbol Table)存储变量符号的,全局有一个符号表,而每个复杂类型如数组或对象有自己的符号表,因此上面代码中,a和a[0]是两个符号,但是a储存在全局符号表中,而a[0]储存在数组本身的符号表中,且这里a和a[0]引用同一个zval(当然符号a后来被销毁了)。希望读者朋友注意分清符号(Symbol)的zval的关系。

在PHP只用于做动态页面脚本时,这种泄露也许不是很要紧,因为动态页面脚本的生命周期很短,PHP会保证当脚本执行完毕后,释放其所有资源。但是PHP发展到目前已经不仅仅用作动态页面脚本这么简单,如果将PHP用在生命周期较长的场景中,例如自动化测试脚本或deamon进程,那么经过多次循环后积累下来的内存泄露可能就会很严重。这并不是我在耸人听闻,我曾经实习过的一个公司就通过PHP写的deamon进程来与数据存储服务器交互。

由于Reference Counting的这个缺陷,PHP5.3改进了垃圾回收算法。

PHP5.3中的垃圾回收算法——Concurrent Cycle Collection in Reference Counted Systems

PHP5.3的垃圾回收算法仍然以引用计数为基础,但是不再是使用简单计数作为回收准则,而是使用了一种同步回收算法,这个算法由IBM的工程师在论文Concurrent Cycle Collection in Reference Counted Systems中提出。

这个算法可谓相当复杂,从论文29页的数量我想大家也能看出来,所以我不打算(也没有能力)完整论述此算法,有兴趣的朋友可以阅读上面的提到的论文(强烈推荐,这篇论文非常精彩)。

我在这里,只能大体描述一下此算法的基本思想。

首先PHP会分配一个固定大小的“根缓冲区”,这个缓冲区用于存放固定数量的zval,这个数量默认是10,000,如果需要修改则需要修改源代码Zend/zend_gc.c中的常量GC_ROOT_BUFFER_MAX_ENTRIES然后重新编译。

由上文我们可以知道,一个zval如果有引用,要么被全局符号表中的符号引用,要么被其它表示复杂类型的zval中的符号引用。因此在zval中存在一些可能根(root)。这里我们暂且不讨论PHP是如何发现这些可能根的,这是个很复杂的问题,总之PHP有办法发现这些可能根zval并将它们投入根缓冲区。

当根缓冲区满额时,PHP就会执行垃圾回收,此回收算法如下:

1、对每个根缓冲区中的根zval按照深度优先遍历算法遍历所有能遍历到的zval,并将每个zval的refcount减1,同时为了避免对同一zval多次减1(因为可能不同的根能遍历到同一个zval),每次对某个zval减1后就对其标记为“已减”。

2、再次对每个缓冲区中的根zval深度优先遍历,如果某个zval的refcount不为0,则对其加1,否则保持其为0。

3、清空根缓冲区中的所有根(注意是把这些zval从缓冲区中清除而不是销毁它们),然后销毁所有refcount为0的zval,并收回其内存。

如果不能完全理解也没有关系,只需记住PHP5.3的垃圾回收算法有以下几点特性:

1、并不是每次refcount减少时都进入回收周期,只有根缓冲区满额后在开始垃圾回收。

2、可以解决循环引用问题。

3、可以总将内存泄露保持在一个阈值以下。

PHP5.2与PHP5.3垃圾回收算法的性能比较

由于我目前条件所限,我就不重新设计试验了,而是直接引用PHP Manual中的实验,关于两者的性能比较请参考PHP Manual中的相关章节:http://www.php.net/manual/en/features.gc.performance-considerations.php。

首先是内存泄露试验,下面直接引用PHP Manual中的实验代码和试验结果图:

<?php
class Foo
{
    public $var = &#39;3.1415962654&#39;;
}
 
$baseMemory = memory_get_usage();
 
for ( $i = 0; $i <= 100000; $i++ )
{
    $a = new Foo;
    $a->self = $a;
    if ( $i % 500 === 0 )
    {
        echo sprintf( &#39;%8d: &#39;, $i ), memory_get_usage() - $baseMemory, "\n";
    }
}
?>

Analyzing the evolution of the garbage collection mechanism in PHP5

Analyzing the evolution of the garbage collection mechanism in PHP5

可以看到在可能引发累积性内存泄露的场景下,PHP5.2发生持续累积性内存泄露,而PHP5.3则总能将内存泄露控制在一个阈值以下(与根缓冲区大小有关)。

另外是关于性能方面的对比:

<?php
class Foo
{
    public $var = &#39;3.1415962654&#39;;
}
 
for ( $i = 0; $i <= 1000000; $i++ )
{
    $a = new Foo;
    $a->self = $a;
}
 
echo memory_get_peak_usage(), "\n";
?>

这个脚本执行1000000次循环,使得延迟时间足够进行对比。

然后使用CLI方式分别在打开内存回收和关闭内存回收的的情况下运行此脚本:

time php -dzend.enable_gc=0 -dmemory_limit=-1 -n example2.php
# and
time php -dzend.enable_gc=1 -dmemory_limit=-1 -n example2.php

在我的机器环境下,运行时间分别为6.4s和7.2s,可以看到PHP5.3的垃圾回收机制会慢一些,但是影响并不大。

You can turn on or off PHP's garbage collection mechanism by modifying zend.enable_gc in php.ini, or you can turn it on by calling gc_enable() or gc_disable() Or turn off PHP's garbage collection mechanism. Even if the garbage collection mechanism is turned off in PHP5.3, PHP will still record possible roots to the root buffer, but when the root buffer is full, PHP will not automatically run garbage collection. Of course, you can manually call gc_collect_cycles at any time. () function forces memory recycling.

The above is the detailed content of Analyzing the evolution of the garbage collection mechanism in PHP5. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
php5和php8有什么区别php5和php8有什么区别Sep 25, 2023 pm 01:34 PM

php5和php8的区别在性能、语言结构、类型系统、错误处理、异步编程、标准库函数和安全性等方面。详细介绍:1、性能提升,PHP8相对于PHP5来说在性能方面有了巨大的提升,PHP8引入了JIT编译器,可以对一些高频执行的代码进行编译和优化,从而提高运行速度;2、语言结构改进,PHP8引入了一些新的语言结构和功能,PHP8支持命名参数,允许开发者通过参数名而不是参数顺序等等。

php怎么把负数转为正整数php怎么把负数转为正整数Apr 19, 2022 pm 08:59 PM

php把负数转为正整数的方法:1、使用abs()函数将负数转为正数,使用intval()函数对正数取整,转为正整数,语法“intval(abs($number))”;2、利用“~”位运算符将负数取反加一,语法“~$number + 1”。

php怎么实现几秒后执行一个函数php怎么实现几秒后执行一个函数Apr 24, 2022 pm 01:12 PM

实现方法:1、使用“sleep(延迟秒数)”语句,可延迟执行函数若干秒;2、使用“time_nanosleep(延迟秒数,延迟纳秒数)”语句,可延迟执行函数若干秒和纳秒;3、使用“time_sleep_until(time()+7)”语句。

php怎么除以100保留两位小数php怎么除以100保留两位小数Apr 22, 2022 pm 06:23 PM

php除以100保留两位小数的方法:1、利用“/”运算符进行除法运算,语法“数值 / 100”;2、使用“number_format(除法结果, 2)”或“sprintf("%.2f",除法结果)”语句进行四舍五入的处理值,并保留两位小数。

php字符串有没有下标php字符串有没有下标Apr 24, 2022 am 11:49 AM

php字符串有下标。在PHP中,下标不仅可以应用于数组和对象,还可应用于字符串,利用字符串的下标和中括号“[]”可以访问指定索引位置的字符,并对该字符进行读写,语法“字符串名[下标值]”;字符串的下标值(索引值)只能是整数类型,起始值为0。

php怎么根据年月日判断是一年的第几天php怎么根据年月日判断是一年的第几天Apr 22, 2022 pm 05:02 PM

判断方法:1、使用“strtotime("年-月-日")”语句将给定的年月日转换为时间戳格式;2、用“date("z",时间戳)+1”语句计算指定时间戳是一年的第几天。date()返回的天数是从0开始计算的,因此真实天数需要在此基础上加1。

php怎么读取字符串后几个字符php怎么读取字符串后几个字符Apr 22, 2022 pm 08:31 PM

在php中,可以使用substr()函数来读取字符串后几个字符,只需要将该函数的第二个参数设置为负值,第三个参数省略即可;语法为“substr(字符串,-n)”,表示读取从字符串结尾处向前数第n个字符开始,直到字符串结尾的全部字符。

php怎么替换nbsp空格符php怎么替换nbsp空格符Apr 24, 2022 pm 02:55 PM

方法:1、用“str_replace("&nbsp;","其他字符",$str)”语句,可将nbsp符替换为其他字符;2、用“preg_replace("/(\s|\&nbsp\;||\xc2\xa0)/","其他字符",$str)”语句。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool