Students who use PHP know that the php.ini configuration will take effect throughout the entire SAPI life cycle. During the execution of a php script, if you manually modify the ini configuration, it will not take effect. If you cannot restart apache or nginx at this time, you can only explicitly call the ini_set interface in the php code. ini_set is a function provided by PHP to dynamically modify the configuration. It should be noted that the configuration set using ini_set and the configuration set in the ini file have different effective time ranges. After the php script is executed, the ini_set settings will become invalid immediately.
Therefore, this article is divided into two parts. The first part explains the principle of php.ini configuration, and the second part talks about dynamically modifying the php configuration.
The configuration of php.ini will roughly involve three pieces of data, configuration_hash, EG (ini_directives), and PG, BG, PCRE_G, JSON_G, XXX_G, etc. It doesn’t matter if you don’t know the meaning of these three types of data, they will be explained in detail below.
1. Parse INI configuration file
Since php.ini needs to be in effect throughout the SAPI process, the work of parsing the ini file and building the php configuration accordingly must be the beginning of SAPI. In other words, it must occur during the startup process of PHP. PHP needs these configurations to be generated internally before any actual request arrives.
Reflected into the core of php, which is the php_module_startup function.
php_module_startup is mainly responsible for starting php. It is usually called when SAPI starts. btw, another common function is php_request_startup, which is responsible for initializing each request when it arrives. php_module_startup and php_request_startup are two iconic actions, but their analysis is beyond the scope of this article.
For example, when php is hooked into a module under apache, then when apache starts, all these modules will be activated, including the php module. When activating the php module, php_module_startup will be called. The php_module_startup function completes a lot of work. Once the php_module_startup call ends, it means, OK, php has been started and can now accept requests and respond.
In the php_module_startup function, the implementation related to parsing the ini file is:
As you can see, the php_init_config function is actually called to complete the parse of the ini file. The parse work mainly performs lex&grammar analysis, and extracts and saves the key and value pairs in the ini file. The format of php.ini is very simple, with key on the left side of the equal sign and value on the right side. Whenever a pair of kvs are extracted, where does php store them? The answer is the configuration_hash mentioned earlier.
static HashTable configuration_hash;
configuration_hash is declared in php_ini.c, which is a HashTable type data structure. As the name suggests, it is actually a hash table. As an aside, configuration_hash cannot be obtained in versions before php5.3 because it is a static variable in the php_ini.c file. Later, php5.3 added the php_ini_get_configuration_hash interface, which directly returns &configuration_hash, so that various PHP extensions can easily get a glimpse of the configuration_hash... What a great blessing...
Note four points:
First, php_init_config does not perform any verification other than lexical and syntax. In other words, if we add a line hello=world to the ini file, as long as this is a correctly formatted configuration item, then the final configuration_hash will contain an element with the key hello and the value world, and the configuration_hash will reflect it to the maximum extent. ini file.
Second, the ini file allows us to configure in the form of an array. For example, write the following three lines in the ini file:
Then in the final generated configuration_hash table, there will be an element with the key drift.arr, and its value is an array containing three numbers: 1, 2, and 3. This is an extremely rare configuration method.
Thirdly, php also allows us to build some additional ini files in addition to the default php.ini file (php-%s.ini to be precise). These ini files will be placed in an additional directory. This directory is specified by the environment variable PHP_INI_SCAN_DIR. After php_init_config has parsed php.ini, it will scan this directory again and find all the .ini files in the directory for analysis. The kv key-value pairs generated in these additional ini files will also be added to the configuration_hash.
This is an occasionally useful feature. If we develop a PHP extension ourselves but don't want to mix the configuration into php.ini, we can write another ini and tell PHP where to find it through PHP_INI_SCAN_DIR. Of course, its disadvantages are also obvious, and it requires setting additional environment variables to support it. A better solution is for developers to call php_parse_user_ini_file or zend_parse_ini_file themselves in the extension to parse the corresponding ini file.
Fourth, in configuration_hash, the key is a string, so what is the type of the value? The answer is also a string (except for the very special array mentioned above). Specifically, such as the following configuration:
Then the key-value pairs actually stored in the final configuration_hash are:
key: "log_errors"
val : ""
key: "log_errors_max_len"
val : "1024"
Pay attention to log_errors, the value stored in it is not even "0", it is a real empty string. In addition, log_errors_max_len is not a number, but a string of 1024.
At this point in the analysis, basically everything related to parsing the ini file has been explained clearly. To briefly summarize:
1. Parsing ini occurs in the php_module_startup stage
2. The parsing results are stored in configuration_hash.
2. Configuration applies to modules
The general structure of PHP can be seen as a zend engine at the bottom, which is responsible for interacting with the OS, compiling PHP code, providing memory hosting, etc. There are many modules arranged on the upper layer of the zend engine. The core module is the Core module, and others include Standard, PCRE, Date, Session, etc... These modules also have another name called php extension. We can simply understand that each module provides a set of functional interfaces for developers to call. For example, commonly used built-in functions such as explode, trim, array, etc. are provided by the Standard module.
Why we need to talk about these is because in php.ini, in addition to some configurations for php itself, that is, for the Core module (such as safe_mode, display_errors, max_execution_time, etc.), there are quite a few configurations for other different modules. of.
For example, the date module provides common date, time, strtotime and other functions. In php.ini, its related configuration looks like:
In addition to these modules having independent configurations, the zend engine is also configurable, but the zend engine has very few configurable items, only error_reporting, zend.enable_gc and detect_unicode.
As we have mentioned in the previous section, php_module_startup will call php_init_config, whose purpose is to parse the ini file and generate configuration_hash. So what else will be done in php_module_startup next? Obviously, the configuration in configuration_hash will be applied to different modules such as Zend, Core, Standard, and SPL. Of course, this is not an overnight process, because PHP usually contains many modules, and these modules will also be started in sequence during PHP startup. Then, the process of configuring module A occurs during the startup process of module A.
Students with experience in extension development will point out directly that module A is started in PHP_MINIT_FUNCTION(A), isn't it?
Yes, if module A needs to be configured, then in PHP_MINIT_FUNCTION, you can call REGISTER_INI_ENTRIES() to complete it. REGISTER_INI_ENTRIES will search the configuration_hash for the configuration value set by the user based on the name of the configuration item required by the current module, and update it to the module's own global space.
2.1, Global space of module
To understand how to apply the ini configuration from configuration_hash to each module, it is necessary to first understand the global space of the php module. For different PHP modules, you can open up a storage space of your own, and this space is globally visible to the module. Generally speaking, it will be used to store the ini configuration required by the module. In other words, the configuration items in configuration_hash will eventually be stored in the global space. During the execution of the module, you only need to directly access this global space to get the user's settings for the module. Of course, it is also often used to record intermediate data during the execution of the module.
Let’s take the bcmath module as an example. bcmath is a PHP module that provides an interface for mathematical calculations. First, let’s take a look at its ini configuration:
bcmath has only one configuration item. We can use bcmath.scale in php.ini to configure the bcmath module.
Next, continue to look at the global space definition of the bcmatch module. There is the following statement in php_bcmath.h:
After the macro is expanded, it is:
In fact, the zend_bcmath_globals type is the global space type in the bcmath module. Only the zend_bcmath_globals structure is declared here, and there is a specific instantiation definition in bcmath.c:
//After expansion, it is zend_bcmath_globals bcmath_globals;
ZEND_DECLARE_MODULE_GLOBALS(bcmath)
It can be seen that the definition of the variable bcmath_globals is completed with ZEND_DECLARE_MODULE_GLOBALS.
bcmath_globals is a real global space, which contains four fields. Its last field, bc_precision, corresponds to bcmath.scale in the ini configuration. We set the value of bcmath.scale in php.ini, and then when starting the bcmath module, the value of bcmath.scale is updated to bcmath_globals.bc_precision.
Update the value in configuration_hash to the xxx_globals variable defined by each module, which is the so-called applying the ini configuration to the module. Once the module is started, these configurations are in place. Therefore, in the subsequent execution phase, the php module does not need to access the configuration_hash again. The module only needs to access its own XXX_globals to get the configuration set by the user.
bcmath_globals, in addition to one field for the ini configuration item, what are the other three fields? This is the second role of the module global space. In addition to being used for ini configuration, it can also store some data during module execution.
Another example is the json module, which is also a very commonly used module in PHP:
You can see that the json module does not require ini configuration, and its global space has only one field error_code. error_code records the errors that occurred in the last execution of json_decode or json_encode. The json_last_error function returns this error_code to help users locate the cause of the error.
In order to easily access module global space variables, PHP has conventionally proposed some macros. For example, if we want to access the error_code in json_globals, we can of course write it directly as json_globals.error_code (not available in a multi-threaded environment), but a more general way of writing it is to define the JSON_G macro:
We use JSON_G(error_code) to access json_globals.error_code. At the beginning of this article, I mentioned PG, BG, JSON_G, PCRE_G, XXX_G, etc. These macros are also very common in PHP source code. Now we can easily understand them. The PG macro can access the global variables of the Core module, BG can access the global variables of the Standard module, and PCRE_G can access the global variables of the PCRE module.
2.2. How to determine what configuration a module requires?
What kind of INI configuration the module requires is defined in each module. For example, for the Core module, there are the following configuration item definitions:
The above code can be found in the php-src\main\main.c file at about line 450. There are many macros involved, including ZEND_INI_BEGIN, ZEND_INI_END, PHP_INI_ENTRY_EX, STD_PHP_INI_BOOLEAN, etc. This article will not go into details one by one. Interested readers can analyze them by themselves.
After macro expansion of the above code, we get:
我们看到,配置项的定义,其本质上就是定义了一个zend_ini_entry类型的数组。zend_ini_entry结构体的字段具体含义为:
char *value; // The value of the configuration item
uint value_length;
char *orig_value; // The original value of the configuration item
uint orig_value_length;
int orig_modifiable; // The original modifiable of the configuration item
int modified; //Whether it has been modified, if so, orig_value will save the value before modification
void (*displayer)(zend_ini_entry *ini_entry, int type);
};
2.3, apply configuration to module - REGISTER_INI_ENTRIES
REGISTER_INI_ENTRIES can often be seen in PHP_MINIT_FUNCTION of different extensions. REGISTER_INI_ENTRIES is mainly responsible for completing two things. First, filling the global space XXX_G of the module and synchronizing the value in configuration_hash to XXX_G. Secondly, it also generates EG(ini_directives).
REGISTER_INI_ENTRIES is also a macro, and after expansion it is actually the zend_register_ini_entries method. Let’s look specifically at the implementation of zend_register_ini_entries:
// If not found in configuration_hash, the default value is used
If (!config_directive_success && hashed_ini_entry->on_modify) {
hashed_ini_entry->on_modify(hashed_ini_entry, hashed_ini_entry->value, hashed_ini_entry->value_length, hashed_ini_entry->mh_arg1, hashed_ini_entry->mh_arg2, hashed_ini_entry->mh_arg3, ZEND_INI_STAGE_STARTUP TSRMLS_CC);
}
p ;
}
Return SUCCESS;
}
To put it simply, the logic of the above code can be expressed as:
1. Add the ini configuration items declared by the module to EG (ini_directives). Note that the value of the ini configuration item may be modified later.
2. Try to find the ini required by each module in configuration_hash.
If it can be found, it means that this value is configured in the user's ini file, and the user's configuration is used.
If it is not found, OK, it doesn't matter, because the module will bring the default value when declaring ini.
3. Synchronize the value of ini to XX_G. After all, during the execution of php, these XXX_globals still play a role. The specific process is to call the on_modify method corresponding to each ini configuration. on_modify is specified by the module when declaring the ini.
Let’s take a closer look at on_modify, which is actually a function pointer. Let’s look at the configuration statements of two specific Core modules:
For log_errors, its on_modify is set to OnUpdateBool, and for log_errors_max_len, its on_modify is set to OnUpdateLong.
Further assume that our configuration in php.ini is:
Let’s take a closer look at the OnUpdateBool function:
// p represents the address of core_globals plus the offset of the log_errors field
//The obtained address is the address of the log_errors field
p = (zend_bool *) (base (size_t) mh_arg1);
if (new_value_length == 2 && strcasecmp("on", new_value) == 0) {
*p = (zend_bool) 1;
}
else if (new_value_length == 3 && strcasecmp("yes", new_value) == 0) {
*p = (zend_bool) 1;
}
else if (new_value_length == 4 && strcasecmp("true", new_value) == 0) {
*p = (zend_bool) 1;
}
else {
//The value stored in configuration_hash is the string "1", not "On"
// So here we use atoi to convert it into the number 1
*p = (zend_bool) atoi(new_value);
}
Return SUCCESS;
}
The most puzzling ones are probably mh_arg1 and mh_arg2. In fact, compared with the zend_ini_entry definition mentioned above, mh_arg1 and mh_arg2 are still easy to understand. mh_arg1 represents the byte offset, mh_arg2 represents the address of XXX_globals. Therefore, the result of (char *)mh_arg2 mh_arg1 is the address of a field in XXX_globals. Specifically in this case, it is to calculate the address of log_errors in core_globals. Therefore, when OnUpdateBool is finally executed
Its function is equivalent to
After analyzing OnUpdateBool, let’s look at OnUpdateLong and it will be clear at a glance:
// Get the address of log_errors_max_len
p = (long *) (base (size_t) mh_arg1);
// Convert "1024" into long type and assign it to core_globals.log_errors_max_len
*p = zend_atol(new_value, new_value_length);
Return SUCCESS;
}
Finally, it should be noted that in the zend_register_ini_entries function, if there is a configuration in the configuration_hash, the value and value_length in the hashed_ini_entry will be updated when on_modify is called. In other words, if the user has configured it in php.ini, EG (ini_directives) stores the actual configured value. If the user is not configured, EG (ini_directives) stores the default value given when declaring zend_ini_entry.
The default_value variable in zend_register_ini_entries is poorly named and can easily cause misunderstanding. In fact, default_value does not represent the default value, but the value actually configured by the user.
3. Summary
At this point, the three pieces of data configuration_hash, EG (ini_directives) and PG, BG, PCRE_G, JSON_G, XXX_G... have all been explained clearly.
To summarize:
1, configuration_hash, stores the configuration in the php.ini file, does not perform verification, and its value is a string.
2. EG (ini_directives) stores the zend_ini_entry defined in each module. If the user has configured it in php.ini (existing in configuration_hash), the value is replaced by the value in configuration_hash, and the type is still a string.
3. XXX_G, this macro is used to access the global space of the module. This memory space can be used to store ini configuration and be updated through the function specified by on_modify. Its data type is determined by the field declaration in XXX_G.