Home  >  Article  >  Backend Development  >  PHP Kernel Introduction and Extension Development Guide—Basic Knowledge_PHP Tutorial

PHP Kernel Introduction and Extension Development Guide—Basic Knowledge_PHP Tutorial

WBOY
WBOYOriginal
2016-07-21 15:24:41677browse

1. Basic knowledge
This chapter briefly introduces some of the internal mechanisms of the Zend engine. This knowledge is closely related to Extensions and can also help us write more efficient PHP code.
 1.1 Storage of PHP variables
 1.1.1 zval structure
Zend uses the zval structure to store the value of PHP variables. The structure is as follows:

Copy code The code is as follows:

typedef union _zvalue_value {
long lval; /* long value */
double dval; /* double value */
struct {
char *val;
int len;
} str;
HashTable *ht; /* hash table value */
zend_object_value obj;
} zvalue_value;
struct _zval_struct {
/* Variable information */
zvalue_value value; /* value */
zend_uint refcount;
zend_uchar type; /* active type */
zend_uchar is_ref;
} ;
typedef struct _zval_struct zval;
Zend determines which member of value to access based on the type value. The available values ​​are as follows:

IS_NULLN/A

IS_LONG corresponds to value.lval

IS_DOUBLE corresponds to value.dval

IS_STRING corresponds to value.str

IS_ARRAY corresponds to value. ht

IS_OBJECT corresponds to value.obj

IS_BOOL corresponds to value.lval.

IS_RESOURCE corresponds to value.lval

According to this table, two interesting ones can be found Places: First of all, PHP's array is actually a HashTable, which explains why PHP can support associative arrays; secondly, Resource is a long value, which usually stores a pointer, the index of an internal array, or something else that can only be created It can be regarded as a handle

1.1.1 Reference counting

Reference counting is widely used in garbage collection, memory pools, strings, etc., and Zend implements it A typical reference count. Multiple PHP variables can share the same zval through the reference counting mechanism. The remaining two members of zval, is_ref and refcount, are used to support this sharing.

Obviously, refcount is used for counting. When the reference is increased or decreased, this value is also incremented and decremented accordingly. Once it decreases to zero, Zend will recycle the zval.

What about is_ref?

1.1.2 zval status

In PHP, there are two types of variables - reference and non-reference, both of which are used in Zend Stored in a reference counting manner. For non-reference variables, the variables are required to be independent of each other. When modifying one variable, it cannot affect other variables. This conflict can be solved by using the Copy-On-Write mechanism - when trying to write a variable, Zend will find If the zval pointed to by this variable is shared by multiple variables, a zval with a refcount of 1 will be copied to it, and the refcount of the original zval will be decremented. This process is called "zval separation". However, for reference variables, the requirements are opposite to those for non-reference types. Variables assigned by reference must be bundled. Modifying one variable modifies all bundled variables.

It can be seen that it is necessary to point out the status of the current zval to deal with these two situations respectively. is_ref is for this purpose. It points out whether all the variables currently pointing to the zval are assigned by reference - or all of them are Quote, or none of it. At this time, another variable is modified. Only when it is found that the is_ref of its zval is 0, that is, it is not a reference, Zend will execute Copy-On-Write.

 1.1.3 zval state switching

When all assignment operations performed on a zval are references or non-references, one is_ref is enough to cope with it. However, the world is not always so beautiful. PHP cannot impose such restrictions on users. When we mix reference and non-reference assignments, special handling must be carried out.

Case I, look at the following PHP code:





The whole process is as follows:

The first three sentences of this code will Point a, b and c to a zval with is_ref=1, refcount=3; the fourth sentence is a non-reference assignment. Normally, you only need to increase the reference count. However, the target zval is a reference variable and simply increases the reference count. Obviously wrong, Zend's solution is to generate a separate copy of zval for d.

The whole process is as follows:

1.1.1 Parameter passing

The passing of PHP function parameters is the same as variable assignment. Non-reference passing is equivalent to non-reference assignment, and reference passing is equivalent to reference assignment. , and may also cause zval state switching to be performed. This will be mentioned later.

 1.2 HashTable structure

 HashTable is the most important and widely used data structure in Zend engine. It is used to store almost everything.

 1.1.1 Data structure

The HashTable data structure is defined as follows:
Copy code The code is as follows:

typedef struct bucket {
ulong h; // Store hash
uint nKeyLength;
void *pData; // Point to value, which is a copy of user data
void *pDataPtr;
struct bucket *pListNext; // pListNext and pListLast form
struct bucket *pListLast; // Doubly linked list of the entire HashTable
struct bucket *pNext; // pNext and pLast are used to form a hash corresponding
struct bucket *pLast; // Doubly linked list
char arKey[1]; // key
} Bucket;
typedef struct _hashtable {
uint nTableSize;
uint nTableMask;
uint nNumOfElements;
ulong nNextFreeElement;
Bucket *pInternalPointer; /* Used for element traversal */
Bucket *pListHead;
Bucket *pListTail;
Bucket **arBuckets; // Hash array
dtor_func_t pDestructor; // Specify when initializing HashTable, call when destroying Bucket
zend_bool persistent; // Whether to use C memory allocation routine
unsigned char nApplyCount;
zend_bool bApplyProtection;
#if ZEND_DEBUG
int inconsistent;
#endif
} HashTable;

In general, Zend's HashTable is a linked list hash, which is also optimized for linear traversal, as shown below:


HashTable contains two data structures, a linked list hash and a doubly linked list. The former is used for fast key-value query, and the latter is convenient for linear traversal and sorting. A Bucket simultaneously exists in both data structures.
Several explanations about this data structure:
l Why doubly linked lists are used in linked list hashing?
General linked list hashing only needs to operate by key, and only singly linked lists are needed. That's enough. However, Zend sometimes needs to delete a given Bucket from the linked list hash, which can be achieved very efficiently using a double linked list.
 l What does nTableMask do?
 This value is used to convert the hash value to the arBuckets array index. When initializing a HashTable, Zend first allocates memory of nTableSize size for the arBuckets array. nTableSize is the smallest 2^n that is not less than the user-specified size, which is 10* in binary. nTableMask = nTableSize – 1, which is binary 01*. At this time, h & nTableMask happens to fall in [0, nTableSize – 1], and Zend uses it as the index to access the arBuckets array.
l What does pDataPtr do?
Normally, when the user inserts a key-value pair, Zend will copy the value and point pData to the value copy. The copy operation requires calling Zend's internal routine emalloc to allocate memory. This is a very time-consuming operation and will consume a memory larger than the value (the extra memory is used to store cookies). If the value is small, it will cause Big waste. Considering that HashTable is mostly used to store pointer values, Zend introduces pDataPtr. When the value is as small as the pointer, Zend directly copies it to pDataPtr and points pData to pDataPtr. This avoids emalloc operations and also helps improve the Cache hit rate.
Why is the size of arKey only 1? Why not use pointers to manage keys?
arKey is an array that stores keys, but its size is only 1, which is not enough to hold the key. The following code can be found in the initialization function of HashTable:
 1p = (Bucket *) pemalloc(sizeof(Bucket) - 1 + nKeyLength, ht->persistent);
 It can be seen that Zend allocates a block for a Bucket Enough memory for yourself and the key,
 l The upper half is the Bucket, the lower half is the key, and arKey "happens" to be the last element of the Bucket, so you can use arKey to access the key. This technique is most common in memory management routines. When memory is allocated, memory larger than the specified size is actually allocated. The extra upper half is usually called a cookie, which stores information about this memory. , such as block size, previous block pointer, next block pointer, etc. Baidu's Transmit program uses this method.
The purpose of not using pointers to manage keys is to reduce one emalloc operation and to improve the Cache hit rate. Another necessary reason is that the key is fixed in most cases, and the entire Bucket will not be reallocated because the key becomes longer. This also explains why value is not allocated as an array as well - because value is mutable.
1.2.2 PHP Array
There is still an unanswered question about HashTable, that is, what does nNextFreeElement do?
Unlike general hashing, Zend’s HashTable allows users to directly specify the hash value and ignore the key. You don't even need to specify a key (at this time, nKeyLength is 0). At the same time, HashTable also supports the append operation. The user does not even need to specify the hash value, but only needs to provide the value. At this time, Zend uses nNextFreeElement as the hash, and then increments nNextFreeElement.
This behavior of HashTable looks strange, because it will not be able to access the value by key, and it is not a hash at all. The key to understanding the problem is that PHP arrays are implemented using HashTable - associative arrays use normal k-v mapping to add elements to HashTable, and their keys are strings specified by the user; non-associative arrays directly use the array subscript as the hash value, without There is a key; and when you mix associative and non-associative elements in an array, or when using the array_push operation, you need to use nNextFreeElement.
Let’s look at value again. The value of PHP array directly uses the general structure zval. pData points to zval*. According to the introduction in the previous section, this zval* will be directly stored in pDataPtr. Due to the direct use of zval, the elements of the array can be of any PHP type.
Array traversal operations, namely foreach, each, etc., are performed through the doubly linked list of HashTable, and pInternalPointer is used as a cursor to record the current position.
 1.2.3 Variable symbol table
 In addition to arrays, HashTable is also used to store many other data, such as PHP functions, variable symbols, loaded modules, class members, etc.
A variable symbol table is equivalent to an associative array, its key is the variable name (it can be seen that using long variable names is not a good idea), and the value is zval*.
At any time, the PHP code can see two variable symbol tables - symbol_table and active_symbol_table - the former is used to store global variables, called the global symbol table; the latter is a pointer pointing to the currently active variable symbol table, usually In this case it is the global symbol table. However, every time you enter a PHP function (here refers to the function created by the user using PHP code), Zend will create a variable symbol table local to the function and point active_symbol_table to the local symbol table. Zend always uses active_symbol_table to access variables, thus achieving scope control of local variables.
But if a variable marked as global is accessed locally in a function, Zend will perform special processing - create a reference to the variable with the same name in symbol_table in active_symbol_table. If there is no variable with the same name in symbol_table, it will be created first.
 1.3 Memory and files
The resources owned by the program generally include memory and files. For ordinary programs, these resources are process-oriented. When the process ends, the operating system or C library will automatically recycle those resources that we have not explicitly released resources.
However, the PHP program has its own particularity. It is based on pages. When a page is running, it will also apply for resources such as memory or files. However, when the page is finished running, the operating system or C library may not know the need. Carry out resource recycling. For example, we compile php into apache as a module and run apache in prefork or worker mode. In this case, the apache process or thread is reused, and the memory allocated by the php page will remain in the memory until the core is released.
In order to solve this problem, Zend provides a set of memory allocation APIs. Their functions are the same as the corresponding functions in C. The difference is that these functions allocate memory from Zend's own memory pool, and they can implement page-based Automatic recycling. In our module, the memory allocated for the page should use these APIs instead of C routines, otherwise Zend will try to efree our memory at the end of the page, and the result is usually a crush.
emalloc()
efree()
estrdup()
estrndup()
ecalloc()
erealloc()
In addition, Zend also provides a group of shapes such as VCWD_xxx The macros are used to replace the C library and the corresponding file API of the operating system. These macros can support PHP's virtual working directory and should always be used in module code. For the specific definition of macro, please refer to the PHP source code "TSRM/tsrm_virtual_cwd.h". You may notice that the close operation is not provided in all those macros. This is because the object of close is an opened resource and does not involve the file path, so you can use C or operating system routines directly; similarly, read/ Operations such as write also directly use C or operating system routines.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/324258.htmlTechArticle1. Basic knowledge This chapter briefly introduces some internal mechanisms of the Zend engine. This knowledge is closely related to Extensions, and can also be Help us write more efficient PHP code. 1.1 PHP becomes...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn