Home > Article > Backend Development > How to perform memory debugging in php
This chapter is a brief introduction to memory debugging of PHP source code. This is not a complete course: memory debugging is not difficult, but you will need some experience using it, and a lot of practice is probably what you will have to do when designing any code written in C. We will introduce here a very famous memory debugger: valgrind; and how to use it with PHP to debug memory issues.
Related learning recommendations: PHP programming from entry to proficiency
Valgrind is used in many Unix environments Well-known tool for debugging many common memory problems in any software written in C/C. Valgrind is a versatile front-end tool for memory debugging. The most commonly used low-level tool is called "memcheck". The way it works is by replacing each libc's heap allocation with its own heap allocation and keeping track of what you do with them. You may also be interested in "massif": it is a memory tracker that is useful for understanding a program's general heap memory usage.
NOTE
You should read the Valgrind documentation to understand further. It's well written, with some great examples.
In order to do memory allocation replacement, you need to run the program you want to profile (PHP in this case) through valgrind, that is, start the valgrind binary.
When valgrind replaces and tracks all libc's heap allocations, it tends to slow down the debugger significantly. For PHP, you'll notice it. Although the PHP slowdown isn't as drastic, it's still clearly noticeable; if you notice it, don't worry, it's normal.
Valgrind is not the only tool you might use, but it is the most commonly used. There are other tools like Dr.Memory, LeakSanitizer, Electric Fence, AddressSanitizer.
The following are the steps required to have good experience with memory debugging and mitigate the chances of finding defects and reduce debugging time:
- You should always use Debug version of PHP. Trying to debug memory in production builds is irrelevant.
- You should always start the debugger in a USE_ZEND_ALLOC = 0 environment. As you may have learned in the Zend Memory Manager chapter, this environment var disables ZendMM when the current process starts. It is highly recommended to do this when starting the memory debugger. It helps to bypass ZendMM completely to understand the traces generated by valgrind.
- It is strongly recommended to start the memory debugger with the environment ZEND_DONT_UNLOAD_MODULES = 1 . This prevents PHP from unloading the extension's .so file at the end of the process. This is to get better tracking of valgrind's reports; if PHP will unload the extension when valgrind is about to display its errors, it will be incomplete later because the file it got the information from is no longer part of the process memory image.
- You may need some suppression. When you tell PHP not to unload its extension at the end of the process, you may be given false positives in the valgrind output. PHP extensions will be checked for leaks and if you get false positives on your platform you can use suppression to turn them off like this. Feel free to write your own based on examples like this.
- Compared to Zend Memory Manager, Valgrind is clearly a better tool for finding leaks and other memory-related issues. You should always run valgrind on your code, it's a step that virtually every C programmer must perform. Whether it's because it's crashing and you want to find it and debug it, or you're running it as a high-quality tool that looks like it can do no harm, valgrind is the tool that points out hidden flaws, ready to blow them away once or later . Use it even if you think everything seems fine with your code: you might be surprised.
Warning
You must use valgrind (or any memory debugger) on your program. As with every powerful C program, it's impossible to be 100% confident without debugging the memory. Memory errors can cause harmful security issues, and program crashes often depend on many parameters, often at random.
Valgrind is a complete heap memory debugger. It can also debug procedural memory maps and function stacks. Please get more information in its documentation.
Let's detect dynamic memory leaks and try a simple, most common leak:
PHP_RINIT_FUNCTION(pib) { void *foo = emalloc(128); }
The code above leaks 128 bytes per request because it has no efree()
related calls related to such a buffer. Since it's a call to emalloc()
, it goes through the Zend Memory Manager, so we'll be warned about it later like we saw in the ZendMM chapter. We also want to see if valgrind can notice the leak:
> ZEND_DONT_UNLOAD_MODULES=1 USE_ZEND_ALLOC=0 valgrind --leak-check=full --suppressions=/path/to/suppression --show-reachable=yes --track-origins=yes ~/myphp/bin/php -dextension=pib.so /tmp/foo.php
We use valgrind to start the PHP-CLI process. Let's assume here an extension called "pib". Here is the output:
==28104== 128 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==28104== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==28104== by 0xA3701E: __zend_malloc (zend_alloc.c:2820) ==28104== by 0xA362E7: _emalloc (zend_alloc.c:2413) ==28104== by 0xE896F99: zm_activate_pib (pib.c:1880) ==28104== by 0xA79F1B: zend_activate_modules (zend_API.c:2537) ==28104== by 0x9D31D3: php_request_startup (main.c:1673) ==28104== by 0xB5909A: do_cli (php_cli.c:964) ==28104== by 0xB5A423: main (php_cli.c:1381) ==28104== LEAK SUMMARY: ==28104== definitely lost: 128 bytes in 1 blocks ==28104== indirectly lost: 0 bytes in 0 blocks ==28104== possibly lost: 0 bytes in 0 blocks ==28104== still reachable: 0 bytes in 0 blocks ==28104== suppressed: 7,883 bytes in 40 blocks
From our perspective, "absolute loss" is what we have to focus on.
Note
For details on the different fields output by memcheck, check out.
Note
We use
USE_ZEND_ALLOC = 0
to disable and completely bypass Zend Memory Manager. Every call to its API (e.g.emalloc()
) will directly result in a libc call, as we can see on the calgrind output stack frame.
Valgrind caught our vulnerability.
It's easy, now we can use persistent allocation (that is, bypassing ZendMM and using traditional libc's dynamic memory allocation) to generate leaks. Go:
PHP_RINIT_FUNCTION(pib) { void *foo = malloc(128); }
This is the report:
==28758== 128 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==28758== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==28758== by 0xE896F82: zm_activate_pib (pib.c:1880) ==28758== by 0xA79F1B: zend_activate_modules (zend_API.c:2537) ==28758== by 0x9D31D3: php_request_startup (main.c:1673) ==28758== by 0xB5909A: do_cli (php_cli.c:964) ==28758== by 0xB5A423: main (php_cli.c:1381)
was also caught.
Note
Valgrind does catch everything. Every little forgotten byte somewhere in the huge process memory map is reported by valgrind's eyes. You can't pass.
This is a more complex setup. Can you spot the leak in the code below?
static zend_array ar; PHP_MINIT_FUNCTION(pib) { zend_string *str; zval string; str = zend_string_init("yo", strlen("yo"), 1); ZVAL_STR(&string, str); zend_hash_init(&ar, 8, NULL, ZVAL_PTR_DTOR, 1); zend_hash_next_index_insert(&ar, &string); }
There are two leaks here. First, we allocate a zend_string, but we don't free it. Second, we allocate a new zend_hash, but we also don't free it. Let's start it with valgrind and see the results:
==31316== 296 (264 direct, 32 indirect) bytes in 1 blocks are definitely lost in loss record 1 of 2 ==32006== by 0xA3701E: __zend_malloc (zend_alloc.c:2820) ==32006== by 0xA814B2: zend_hash_real_init_ex (zend_hash.c:133) ==32006== by 0xA816D2: zend_hash_check_init (zend_hash.c:161) ==32006== by 0xA83552: _zend_hash_index_add_or_update_i (zend_hash.c:714) ==32006== by 0xA83D58: _zend_hash_next_index_insert (zend_hash.c:841) ==32006== by 0xE896AF4: zm_startup_pib (pib.c:1781) ==32006== by 0xA774F7: zend_startup_module_ex (zend_API.c:1843) ==32006== by 0xA77559: zend_startup_module_zval (zend_API.c:1858) ==32006== by 0xA85AF5: zend_hash_apply (zend_hash.c:1508) ==32006== by 0xA77B25: zend_startup_modules (zend_API.c:1969) ==31316== 32 bytes in 1 blocks are indirectly lost in loss record 2 of 2 ==31316== by 0xA3701E: __zend_malloc (zend_alloc.c:2820) ==31316== by 0xE880B0D: zend_string_alloc (zend_string.h:122) ==31316== by 0xE880B76: zend_string_init (zend_string.h:158) ==31316== by 0xE896F9D: zm_activate_pib (pib.c:1781) ==31316== by 0xA79F1B: zend_activate_modules (zend_API.c:2537) ==31316== by 0x9D31D3: php_request_startup (main.c:1673) ==31316== by 0xB5909A: do_cli (php_cli.c:964) ==31316== by 0xB5A423: main (php_cli.c:1381) ==31316== LEAK SUMMARY: ==31316== definitely lost: 328 bytes in 2 blocks
As expected, both leaks are reported. As you can see, valgrind is accurate and puts your eye where it needs to be.
Fix them now:
PHP_MSHUTDOWN_FUNCTION(pib) { zend_hash_destroy(&ar); }
We destroyed persistent arrays in MSHUTDOWN at the end of the PHP program. When we create it, we pass it as a destructor to ZVAL_PTR_DTOR
and it will run that callback on all items inserted. This is the destructor of zval which will destroy zval parsing their contents. For type IS_STRING
, the destructor will release zend_string
and release it if necessary. Done
Note
As you can see, PHP - like any C strong program - is full of nested pointers.
zend_string
is encapsulated inzval
, which itself is part ofzend_array
. Leaking the array will obviously leakzval
andzend_string
, butzvals
is not allocated on the heap (we allocate on the stack), so no leaks are reported. You should get used to the fact that forgetting to release/free compound structures likezend_array
can cause a lot of leaks, since structures often have embedded structures, embedded structures, etc.
Memory leaks are bad. This will cause your program to trigger OOM once or later, and will significantly slow down the host since the latter will get less and less free memory over time. This is a sign of a memory leak.
But even worse: buffer out-of-bounds access. Accessing pointers beyond allocation limits is the root of many nefarious operations (such as getting a root shell on your computer), so you should definitely prevent them. Minor out-of-bounds accesses can also often cause programs to crash due to memory corruption. However, it all depends on the hardware target machine, the compiler and options used, the OS memory layout, the libc used, etc... many factors.
So out of bounds accesses are very annoying, they are bombs that may or may not explode, either in a minute, or if you are very lucky, they take forever Will not explode.
Let's look at a simple example:
PHP_MINIT_FUNCTION(pib) { char *foo = malloc(16); foo[16] = 'a'; foo[-1] = 'a'; }
This code allocates a buffer and intentionally writes data one byte after the boundary and one byte after the boundary . Now, if you run code like this, you have about one in two chances of crashing immediately and then randomly. You may also have created a security hole in PHP, but it may not be exploitable remotely (this behavior is rare).
Warning
Out-of-bounds access results in undefined behavior. There's no way to predict what will happen, but make sure it's either bad (immediate crash) or terrible (security issue). Remember.
Let's ask valgrind, start it using the exact same command line as before, nothing has changed except the output:
==12802== Invalid write of size 1 ==12802== at 0xE896A98: zm_startup_pib (pib.c:1772) ==12802== by 0xA774F7: zend_startup_module_ex (zend_API.c:1843) ==12802== by 0xA77559: zend_startup_module_zval (zend_API.c:1858) ==12802== by 0xA85AF5: zend_hash_apply (zend_hash.c:1508) ==12802== by 0xA77B25: zend_startup_modules (zend_API.c:1969) ==12802== by 0x9D4541: php_module_startup (main.c:2260) ==12802== by 0xB5802F: php_cli_startup (php_cli.c:427) ==12802== by 0xB5A367: main (php_cli.c:1348) ==12802== Address 0xeb488f0 is 0 bytes after a block of size 16 alloc'd ==12802== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==12802== by 0xE896A85: zm_startup_pib (pib.c:1771) ==12802== by 0xA774F7: zend_startup_module_ex (zend_API.c:1843) ==12802== by 0xA77559: zend_startup_module_zval (zend_API.c:1858) ==12802== by 0xA85AF5: zend_hash_apply (zend_hash.c:1508) ==12802== by 0xA77B25: zend_startup_modules (zend_API.c:1969) ==12802== by 0x9D4541: php_module_startup (main.c:2260) ==12802== by 0xB5802F: php_cli_startup (php_cli.c:427) ==12802== by 0xB5A367: main (php_cli.c:1348) ==12802== ==12802== Invalid write of size 1 ==12802== at 0xE896AA6: zm_startup_pib (pib.c:1773) ==12802== by 0xA774F7: zend_startup_module_ex (zend_API.c:1843) ==12802== by 0xA77559: zend_startup_module_zval (zend_API.c:1858) ==12802== by 0xA85AF5: zend_hash_apply (zend_hash.c:1508) ==12802== by 0xA77B25: zend_startup_modules (zend_API.c:1969) ==12802== by 0x9D4541: php_module_startup (main.c:2260) ==12802== by 0xB5802F: php_cli_startup (php_cli.c:427) ==12802== by 0xB5A367: main (php_cli.c:1348) ==12802== Address 0xeb488df is 1 bytes before a block of size 16 alloc'd ==12802== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==12802== by 0xE896A85: zm_startup_pib (pib.c:1771) ==12802== by 0xA774F7: zend_startup_module_ex (zend_API.c:1843) ==12802== by 0xA77559: zend_startup_module_zval (zend_API.c:1858) ==12802== by 0xA85AF5: zend_hash_apply (zend_hash.c:1508) ==12802== by 0xA77B25: zend_startup_modules (zend_API.c:1969) ==12802== by 0x9D4541: php_module_startup (main.c:2260) ==12802== by 0xB5802F: php_cli_startup (php_cli.c:427) ==12802== by 0xB5A367: main (php_cli.c:1348)
These two invalid writes have been detected, now your goal is to track them down and fix them.
在这里,我们使用了一个示例,其中我们超出范围地写入内存,这是最糟糕的情况,因为您的写入操作成功后(可能会立即导致SIGSEGV)将覆盖该指针旁边的一些关键区域。当我们使用libc的malloc()
进行分配时,我们将覆盖libc用于管理和跟踪其分配的关键头尾块。取决于许多因素(平台,使用的libc,如何编译等等),这将导致崩溃。
Valgrind也可能报告无效读取。这意味着您将在分配的指针的范围之外执行内存读取操作。更好的情况是块被覆盖,但您仍然不应该访问内存区域,在这种情况下又可能会导致立即崩溃,或者稍后崩溃,或者永远不会访问?不要那样做
Note
一旦您在valgrind的输出中读取“ Invalid”,那对您来说真的很不好。无论是无效的读取还是写入,您的代码中都存在问题,因此您应该将这个问题视为高风险:现在就真正修复它。
这是有关字符串连接的第二个示例:
char *foo = strdup("foo"); char *bar = strdup("bar"); char *foobar = malloc(strlen("foo") + strlen("bar")); memcpy(foobar, foo, strlen(foo)); memcpy(foobar + strlen("foo"), bar, strlen(bar)); fprintf(stderr, "%s", foobar); free(foo); free(bar); free(foobar);
你能发现问题吗?
让我们问一下valgrind:
==13935== Invalid read of size 1 ==13935== at 0x4C30F74: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==13935== by 0x768203E: fputs (iofputs.c:33) ==13935== by 0xE896B91: zm_startup_pib (pib.c:1779) ==13935== by 0xA774F7: zend_startup_module_ex (zend_API.c:1843) ==13935== by 0xA77559: zend_startup_module_zval (zend_API.c:1858) ==13935== by 0xA85AF5: zend_hash_apply (zend_hash.c:1508) ==13935== by 0xA77B25: zend_startup_modules (zend_API.c:1969) ==13935== by 0x9D4541: php_module_startup (main.c:2260) ==13935== by 0xB5802F: php_cli_startup (php_cli.c:427) ==13935== by 0xB5A367: main (php_cli.c:1348) ==13935== Address 0xeb48986 is 0 bytes after a block of size 6 alloc'd ==13935== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==13935== by 0xE896B14: zm_startup_pib (pib.c:1774) ==13935== by 0xA774F7: zend_startup_module_ex (zend_API.c:1843) ==13935== by 0xA77559: zend_startup_module_zval (zend_API.c:1858) ==13935== by 0xA85AF5: zend_hash_apply (zend_hash.c:1508) ==13935== by 0xA77B25: zend_startup_modules (zend_API.c:1969) ==13935== by 0x9D4541: php_module_startup (main.c:2260) ==13935== by 0xB5802F: php_cli_startup (php_cli.c:427) ==13935== by 0xB5A367: main (php_cli.c:1348)
第1779行指向fprintf()
调用。该调用确实要求fputs()
,其本身称为strlen()
(均来自libc),在这里strlen()
读取1个字节无效。
我们只是忘记了\ 0
来终止我们的字符串。我们传递fprintf()
无效的字符串。它首先尝试计算调用strlen()
的字符串的长度。然后strlen()
将扫描缓冲区,直到找到\ 0
,并且它将扫描缓冲区的边界,因为我们忘记了对其进行零终止。我们在这里很幸运,strlen()
仅从末尾传递一个字节。那可能更多,并且可能崩溃了,因为我们真的不知道下一个\ 0
在内存中的位置,这是随机的。
解:
size_t len = strlen("foo") + strlen("bar") + 1; /* note the +1 for \0 */ char *foobar = malloc(len); /* ... ... same code ... ... */ foobar[len - 1] = '\0'; /* terminate the string properly */
Note
上述错误是C语言中最常见的错误之一。它们被称为一次性错误:您忘记仅分配一个字节,但是由于以下原因,您将在代码中产生大量问题那。
最后,这里是最后一个示例,展示了一个有余使用的场景。这也是C编程中的一个非常常见的错误,与错误的内存访问一样严重:它创建了安全缺陷,可能导致非常讨厌的行为。显然,valgrind可以检测到无用后使用。这是一个:
char *foo = strdup("foo"); free(foo); memcpy(foo, "foo", sizeof("foo"));
同样,这里是一个与PHP无关的PHP场景。我们释放一个指针,然后再使用它。这是一个大错误。让我们问一下valgrind:
==14594== Invalid write of size 1 ==14594== at 0x4C3245C: memcpy@GLIBC_2.2.5 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==14594== by 0xE896AA1: zm_startup_pib (pib.c:1774) ==14594== by 0xA774F7: zend_startup_module_ex (zend_API.c:1843) ==14594== by 0xA77559: zend_startup_module_zval (zend_API.c:1858) ==14594== by 0xA85AF5: zend_hash_apply (zend_hash.c:1508) ==14594== by 0xA77B25: zend_startup_modules (zend_API.c:1969) ==14594== by 0x9D4541: php_module_startup (main.c:2260) ==14594== by 0xB5802F: php_cli_startup (php_cli.c:427) ==14594== by 0xB5A367: main (php_cli.c:1348) ==14594== Address 0xeb488e0 is 0 bytes inside a block of size 4 free'd ==14594== at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==14594== by 0xE896A86: zm_startup_pib (pib.c:1772) ==14594== by 0xA774F7: zend_startup_module_ex (zend_API.c:1843) ==14594== by 0xA77559: zend_startup_module_zval (zend_API.c:1858) ==14594== by 0xA85AF5: zend_hash_apply (zend_hash.c:1508) ==14594== by 0xA77B25: zend_startup_modules (zend_API.c:1969) ==14594== by 0x9D4541: php_module_startup (main.c:2260) ==14594== by 0xB5802F: php_cli_startup (php_cli.c:427) ==14594== by 0xB5A367: main (php_cli.c:1348) ==14594== Block was alloc'd at ==14594== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==14594== by 0x769E8D9: strdup (strdup.c:42) ==14594== by 0xE896A70: zm_startup_pib (pib.c:1771) ==14594== by 0xA774F7: zend_startup_module_ex (zend_API.c:1843) ==14594== by 0xA77559: zend_startup_module_zval (zend_API.c:1858) ==14594== by 0xA85AF5: zend_hash_apply (zend_hash.c:1508) ==14594== by 0xA77B25: zend_startup_modules (zend_API.c:1969) ==14594== by 0x9D4541: php_module_startup (main.c:2260) ==14594== by 0xB5802F: php_cli_startup (php_cli.c:427) ==14594== by 0xB5A367: main (php_cli.c:1348)
这里的一切再次变得清晰。
在投入生产之前,请使用内存调试器。正如您在本章中学到的那样,您在计算中忘记的小字节可能导致可利用的安全漏洞。它还经常(非常频繁地)导致简单的崩溃。这意味着您的扩展很酷,可以减少整个服务器(服务器)及其每个客户端的数量。
C是一种非常严格的编程语言。您将获得数十亿字节的内存来进行编程,并且必须安排这些内存来执行一些计算。但是请不要搞砸这种强大的功能:在最好的情况下(罕见),什么都不会发生,在更坏的情况下(非常常见),您会在这里和那里随机崩溃,在最坏的情况下,您会创建一个漏洞在恰好可以被远程利用的程序中...
您的工具娴熟,聪明,请确实照顾机器内存。
The above is the detailed content of How to perform memory debugging in php. For more information, please follow other related articles on the PHP Chinese website!