Home >Backend Development >PHP Tutorial >[php extensions and embedded] -Memory Management_PHP Tutorial

[php extensions and embedded] -Memory Management_PHP Tutorial

WBOYOriginal: 2016-07-14 10:08:211115browse

Memory management

The most important difference between php and c is whether to control the memory pointer.

Memory

In php, setting a string variable is very simple: , the string can be freely modified, copied, and moved. In C, it is another This way, although you can simply initialize it with a static string: char *str = "hello world"; But this string cannot be modified because it exists in the code segment. To create a maintainable string, you need to allocate A piece of memory, and use a function like strdup() to copy the content into it.

[cpp]

{

char *str;

str = strdup("hello world");

if (!str) {

fprintf(stderr, "Unable to allocate memory!");

}

Traditional memory management functions (malloc(), free(), strdup(), realloc(), calloc(), etc.) will not be used directly by PHP source code. This chapter will explain the reasons for this.

Release allocated memory

Memory management was handled in a request/release manner on all previous platforms. The application tells its upper layer (usually the operating system) "I want some memory usage", and if space allows, the operating system provides it to the program, And keep a record of the provided memory.

After the application uses the memory, it should return the memory to the OS so that it can be allocated elsewhere. If the program does not return the memory, the OS has no way to know that this memory is no longer used, so it cannot be allocated to Other processes. If a piece of memory is not released and the application that owns it loses its handle, we call it a "leak" because no one can get it directly.

In typical client applications, small infrequent leaks are usually tolerated because the process will terminate after a period of time, so that the leaked memory will be reclaimed by the OS. It is not that the OS is very good at knowing leaks memory, but it knows that the memory allocated for the terminated process will not be used again.

For long-running server-side daemons, including webservers like Apache, the process is designed to run for a long period of time, usually indefinitely. Therefore, the OS cannot interfere with memory usage, and any degree of leakage, no matter how small, is It may accumulate enough to cause system resource exhaustion.

Consider the userspace stristr() function; in order to find strings in case-insensitive terms, it actually creates a lowercase copy of each of haystack and needle, and then performs a normal case-sensitive search to find the relevant Offset. After the offset of the string is located, the lowercase versions of the haystack and needle strings will no longer be used. If these copies are not released, each script that uses stristr() will be called each time. Sometimes some memory will be leaked. Eventually, the webserver process will occupy the entire system's memory, but it will not be used.

The perfect solution is to write well-written, clean, consistent code that is absolutely correct. But in an environment like the PHP interpreter, this is only half the solution.

Error handling

In order to provide the ability to jump out of the activation request and the extension function of the user script, there needs to be a way to jump out of the entire activation request. The way the Zend engine handles it is to set a jump address at the beginning of the request, in all After die()/exit() is called, or when some critical errors (E_ERROR) are encountered, longjmp() is executed to redirect to the preset exit address.

Although this kind of exit processing simplifies the program flow, it has a problem: resource cleanup code (such as free() call) will be skipped, which will cause leakage. Consider the following simplified engine processing function call code :

[cpp]

void call_function(const char *fname, int fname_len TSRMLS_DC)

{

zend_function *fe;

char *lcase_fname;

/* PHP functions are case-insensitive. In order to simplify their positioning in the function table, all function names are implicitly translated to lowercase */

lcase_fname = estrndup(fname, fname_len);

zend_str_tolower(lcase_fname, fname_len);

if (zend_hash_find(EG(function_table),

lcase_fname, fname_len + 1, (void **)&fe) == FAILURE) {

zend_execute(fe->op_array TSRMLS_CC);

} else {

php_error_docref(NULL TSRMLS_CC, E_ERROR,

"Call to undefined function: %s()", fname);

}

efree(lcase_fname);

}

When the php_error_docref() line is executed, the internal processor sees that the error level is critical, and calls longjmp() to interrupt the current program flow and leave call_function(), so that the efree(lcase_fname) line cannot be reached. Then you may think, move the efree() line to php_error_docref(), but what if this call_function() call enters the first conditional branch (the function name is found and executed normally)? Another point, fname itself is An allocated string, and it is used in the error message, you cannot free it until you are done using it.

The php_error_docref() function is an internal equivalent to trigger_error(). The first parameter is an optional document reference, which will be appended to docref.root if enabled in php.ini. The third parameter Can be any E_* family constant to mark the severity of the error. The fourth and subsequent parameters are format strings and variable parameter lists conforming to printf() style.

Zend Memory Management

The solution to memory leaks caused by request bounces (faults) is the Zend Memory Management (ZendMM) layer. This part of the engine plays a role equivalent to that usually played by the operating system, allocating memory to the calling application. The difference is, From the cognitive perspective of process space requests, it is low-level enough. When requesting die, it can perform the same thing as the OS does when the process dies. In other words, it will implicitly release all requested memory. Space. The following figure shows the relationship between ZendMM and OS in the php process:

[php extensions and embedded] -Memory Management_PHP Tutorial

In addition to providing implicit memory cleanup, ZendMM also controls the memory usage of each request through the memory_limit setting in php.ini. If the script attempts to request more memory than the system allows, or exceeds the remaining amount of single-process memory limit, ZendMM Will automatically raise an E_ERROR message and start jumping out of the process. An additional benefit is that most of the time the result of the memory allocation does not need to be checked, because if it fails, longjmp() will immediately jump out to the termination part of the engine.

The most complex aspect of hooking between PHP's internal code and the OS's real memory management layer is that all internal memory allocations are required to be selected from a set of functions. For example, allocating a 16-byte memory block is not done using malloc(16), PHP code should use emalloc(16). In addition to performing the actual memory allocation task, ZendMM also marks the relevant information of the request bound to the memory block so that ZendMM can implicitly release it when the request is faulted. it(allocated memory).

Many times memory needs to be allocated and used beyond the lifetime of a single request. This type of allocation is called persistent allocation because they persist after the request ends and can be allocated using a traditional memory allocator because They cannot be marked with per-request information by ZendMM. Sometimes, it is only known at runtime whether a particular allocation needs to be persisted or not, so ZendMM exposes some helper macros that replace other memory allocation functions, but in Additional parameters are added at the end to mark whether it is persistent.

If you really want persistent allocations, this parameter should be set to 1, in which case the memory allocation request will be passed to the traditional malloc() family allocator. If the runtime logic determines this block If persistence is not required, this parameter is set to 0, and the call will be redirected to the single-request memory allocator function.

For example, pemalloc(buffer_len, 1) maps to malloc(buffer_len), and pemalloc(buffer_len, 0) maps to emalloc(buffer_len), as follows:

[cpp]

#define in Zend/zend_alloc.h:

#define pemalloc(size, persistent)

((persistent)?malloc(size): emalloc(size))

The list of allocator functions provided by ZendMM is as follows, and their corresponding traditional allocators are listed.

Traditional allocator

Allocator in php

void *malloc(size_t count);

void *emalloc(size_t count);

void *pemalloc(size_t count, char persistent);

void *calloc(size_t count);

void *ecalloc(size_t count);

void *pecalloc(size_t count, char persistent);

void *realloc(void *ptr, size_t count);

void *erealloc(void *ptr, size_t count);

void *perealloc(void *ptr, size_t count, char persistent);

void *strdup(void *ptr);

void *estrdup(void *ptr);

void *pestrdup(void *ptr, char persistent);

void free(void *ptr);

void efree(void *ptr);

void pefree(void *ptr, char persistent);

You may have noticed that pefree requires passing a persistence tag. This is because when pefree() is called, it does not know whether ptr is persistently allocated. Calling free() on a pointer that has been allocated persistently may cause double free, while calling efree() on a persistent allocation will usually cause a segfault because the memory manager will try to see the management information, and it does not exist. Your code needs to remember whether the data structure it allocated is persistent of.

In addition to the core allocator, ZendMM also adds special functions:

[cpp]

void *estrndup(void *ptr, int len);

It allocates len + 1 bytes of memory and copies len bytes from ptr into the newly allocated block. estrndup() behaves roughly as follows:

[cpp]

void *estrndup(void *ptr, int len)

{

char *dst = emalloc(len + 1);

memcpy(dst, ptr, len);

dst[len] = 0;

return dst;

}

The terminating NULL byte is quietly placed at the end of the buffer. This ensures that all functions that use estrndup() for string assignment do not have to worry about passing the result buffer to functions that expect a NULL-terminated string (such as printf() )). When using estrndup() to copy non-string data, the last byte will be wasted, but compared with the convenience it brings, this small waste is nothing.

[cpp]

void *safe_emalloc(size_t size, size_t count, size_t addtl);

void *safe_pemalloc(size_t size, size_t count, size_t addtl, char persistent);

The memory size allocated by these two functions is the result of ((size * count) + addtl). You may ask, "Why extend such a function? Why not use emalloc/pemalloc and calculate it yourself?" The reason for this comes from its name "safe". Although it is rare, it is still possible, when the result of the calculation exceeds the integer limit of the host platform, the results will be bad. It may lead to negative allocation. number of bytes, or even worse is to allocate a positive memory size that is smaller than the requested size. safe_emalloc() avoids this type of pitfall by checking for integer overflow, and if an overflow occurs, it explicitly Report failure.

Not all memory allocation routines have p* copies. For example, pestrndup() and safe_pemalloc() did not exist before PHP 5.1. Sometimes you need to work around these shortcomings of ZendAPI.

Reference Count

It is very important to carefully allocate and release memory in a long-running multi-request process like PHP, but this is only half the job. In order to make a highly concurrent server more efficient, each request needs to use as little memory as possible, minimum Eliminate unnecessary data copies. Consider the following php code snippet:

[php]

$a = 'Hello World';

$b = $a;

unset($a);

After the first call, a variable is created, which is assigned to a 12-byte memory block, holding the string "Hello world" and the trailing NULL. Now look at the second sentence: $b is set to The same value as $a, then $a is unset (released)

If PHP believes that each variable assignment requires copying the contents of the variable, then an additional 12 bytes of duplicate strings will be copied during the data copy, as well as additional processor load. When the third line appears, This behavior seems a bit ridiculous. The original variables are unloaded so that the copy of the data is completely unnecessary. Now let's take a closer look and think about what happens when the contents of a 10MB file are loaded in the two variables? It 20MB of memory is required, but only 10MB is enough. Will the engine really waste so much time and memory on such useless work?

You know php is very smart.

Remember? In the engine, a variable name and its value are two different concepts. Its value is itself a zval * without a name. Use zend_hash_add() to assign it to the variable $a. So two Is it okay for variable names to point to the same value?

[cpp]

{

zval *helloval;

MAKE_STD_ZVAL(helloval);

ZVAL_STRING(helloval, "Hello World", 1);

zend_hash_add(EG(active_symbol_table), "a", sizeof("a"),

zend_hash_add(EG(active_symbol_table), "b", sizeof("b"),

}

At this point, when you check $a or $b, you can see that they actually contain the string "Hello World". Unfortunately, then comes the third line: unset($a) ;. In this case, unset() does not know that the data pointed to by $a is also referenced by another name, it just releases the memory. Any subsequent access to $b will look at the memory space that has been released, which will Causing the engine to crash. Of course, you don’t want the engine to crash.

This is solved by the third member of zval: refcount. When a variable is first created, its refcount is initialized to 1, because we think that only the variable when created points to it. When your code executes When it comes time to assign helloval to $b, it needs to increase the refcount to 2 because the value is now "referenced" by two variables

[cpp]

{

zval *helloval;

MAKE_STD_ZVAL(helloval);

ZVAL_STRING(helloval, "Hello World", 1);

zend_hash_add(EG(active_symbol_table), "a", sizeof("a"),

ZVAL_ADDREF(helloval);

zend_hash_add(EG(active_symbol_table), "b", sizeof("b"),

}

Now, when unset() deletes the $a copy of the variable, it sees through refcount that someone else is interested in this data, so it just decreases refcount by 1 and does nothing else.

Copy on write

Saving memory by reference counting is a good idea, but what do you do when you only want to modify one of the variables? Consider the following code snippet:

[php]

$a = 1;

$b = $a;

$b += 5;

Look at the logic of the above code. After processing, it is expected that $a is still equal to 1, and $b is equal to 6. Now you know that in order to maximize memory saving, Zend only wants to Same zval, so what happens when the third line of code is reached? Will $b be modified as well?

The answer is that Zend looks at refcount, sees that it is greater than 1, and isolates it. Isolation in the Zend engine is to destroy a reference pair, which is opposite to the processing you just saw:

[cpp]

zval *get_var_and_separate(char *varname, int varname_len TSRMLS_DC)

{

zval **varval, *varcopy;

if (zend_hash_find(EG(active_symbol_table),

varname, varname_len + 1, (void**)&varval) == FAILURE) {

/* Variable does not exist */

return NULL;

}

if ((*varval)->refcount < 2) {

/* The variable name has only one reference and does not need to be isolated */

return *varval;

}

/* In other cases, make a shallow copy of zval **/

MAKE_STD_ZVAL(varcopy);

varcopy = *varval;

/* Make a deep copy of zval * */

zval_copy_ctor(varcopy);

/* Destroy the relationship between varname and varval. This step will reduce the reference count of varval by 1 */

zend_hash_del(EG(active_symbol_table), varname, varname_len + 1);

/* Initialize the reference count of the newly created value, and associate the newly created value with varname */

varcopy->refcount = 1;

varcopy->is_ref = 0;

zend_hash_add(EG(active_symbol_table), varname, varname_len + 1,

/* Return new zval * */

return varcopy;

}

Now that the engine has a zval * that is only referenced by the $b variable, it can convert it to a long and increase its value by 5 as requested by the script.

Revise as you write

The concept of reference counting also creates a new way of maintaining data, which user-space scripts call "references". Consider the following user-space code snippet:

[php]

$a = 1;

$b = &$a;

$b += 5;

Based on your experience with PHP, you may intuitively realize that the value of $a should now be 6, even though it was initialized to 1 and has not been (directly) modified. This happens because the engine will When the value of $b is incremented by 5, it notices that $b is a reference to $a, and it says "It's okay for me to modify it without isolating its value, because I want all the references." Variables all see changes"

But how does the engine know? Quite simply, it looks at the last element of the zval structure: is_ref. It is just a simple switch that defines whether the zval is a value or a reference in user space. In the previous code snippet, no. After one line is executed, the refcount of the zval created for $a is 1 and is_ref is 0, because it only belongs to one variable ($a) and there are no references to other variables pointing to it. When the second line is executed, the refcount of this zval Increase it to 2, but at this time, because an address character (&) is added to the script to mark it as a reference by value, so is_ref is set to 1.

Finally, in the third line, the engine gets the zval associated with $b and checks whether it needs to be isolated. At this time, this zval will not be isolated because we did not include a piece of code (as follows). In get_var_and_separate() Where refcount is checked, there is another condition:

[cpp

if ((*varval)->is_ref || (*varval)->refcount < 2) {

/* varname will not be isolated only if it is really a reference, or is only referenced by one variable */

return *varval;

}

At this time, even if refcount is 2, isolation processing will be short-circuited, because this value is passed by reference. The engine can modify it freely without worrying about other variables that reference it being accidentally modified.

Isolation issues

For these copies and references, there are some combinations that is_ref and refcount cannot handle well. Consider the following code:

[php]

$a = 1;

$b = $a;

$c = &$a;

Here you have a value that needs to be related to 3 different variables, two are modified-on-write references, and the other is an isolated copy-on-write context. How to describe this relationship using only is_ref and refcount?

The answer is: No. In this case, the value must be copied to two separate zval*, although both contain the same data. As shown below:

Similarly, the following code block will cause the same conflict and force the value to be isolated into a copy (as shown below)

[php]

$a = 1;

$b = &$a;

$c = $a;

Note that in both cases here, $b is associated with the original zval object, because when isolation occurs, the engine does not know the name of the third variable involved in the operation.

Summary

php is a managed language. Thinking from the user space side, careful control of resources and memory means easier prototyping involved and fewer crashes. Once you delve deeper and unveil the engine, you can't Don’t worry about it, but be responsible for the development and maintenance of the integrity of the operating environment.

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：ThinkPHP study notes ThinkPHP paging and the use of verification code_PHP tutorialNext article：ThinkPHP study notes ThinkPHP paging and the use of verification code_PHP tutorial

See more

[php extensions and embedded] -Memory Management_PHP Tutorial

Related articles