Home >Backend Development >PHP Tutorial >Detailed explanation of the underlying operating mechanism of PHP

Detailed explanation of the underlying operating mechanism of PHP

小云云
小云云Original
2018-03-21 15:43:421269browse

In this article, we mainly share with you a detailed explanation of the underlying operating mechanism of PHP. First, we will share with you the design concepts and characteristics of PHP, the four-layer system of PHP, etc. We hope it can help you.

1. The design concept and characteristics of PHP

  1. Multi-process model: Since PHP is a multi-process model, different requests do not interfere with each other, which ensures that one request will hang The loss will not affect the overall service. Of course, with the development of the times, PHP has already supported the multi-threading model.

  2. Weakly typed language: Unlike C/C++, Java, C# and other languages, PHP is a weakly typed language. The type of a variable is not determined at the beginning. It is determined during operation and implicit or explicit type conversion may occur. The flexibility of this mechanism is very convenient and efficient in web development. The details will be discussed in PHP later. Variables are detailed in.

  3. Interpreted language: PHP is different from C/C++, Java, C# and other compiled languages ​​in the running steps. PHP needs to be parsed into a compiled language through lexical and syntactic analysis first. , to run! Therefore, PHP is not suitable for large-scale applications such as high performance or big data calculations. Although there is no difference between 0.001 seconds and 0.1 seconds for browser users, it is not suitable for other fields

  4. The engine (Zend) + component (ext) mode reduces internal coupling.

  5. The middle layer (sapi) isolates the web server and PHP.

  6. The syntax is simple and flexible, without too many specifications. Shortcomings lead to mixed styles, but no matter how bad a programmer is, he will not write a program that is too outrageous and endangers the overall situation.

2. PHP’s four-layer system

The core architecture of PHP is as shown below:

From the picture It can be seen that PHP is a 4-layer system from bottom to top:

  • Zend engine: Zend is implemented entirely in pure C and is the core part of PHP. It translates PHP code (lexical , grammar parsing and a series of compilation processes) to process executable opcodes and implement corresponding processing methods, implement basic data structures (such as hashtable, oo), memory allocation and management, and provide corresponding api methods for external calls. It is the core of everything, and all peripheral functions are implemented around Zend.

  • Extensions: Around the Zend engine, extensions provide various basic services in a component-based manner. Our common built-in functions (such as array series), standard libraries, etc. are all passed through extension, users can also implement their own extensions as needed to achieve function expansion, performance optimization and other purposes (for example, the PHP middle layer and rich text parsing currently used by Tieba are typical applications of extensions).

  • Sapi: The full name of Sapi is Server Application Programming Interface, which is the server application programming interface. Sapi enables PHP to interact with peripheral data through a series of hook functions. This is very elegant for PHP. With a successful design, PHP itself has been successfully decoupled and isolated from upper-layer applications through sapi. PHP can no longer consider how to be compatible with different applications, and the application itself can also implement different processing methods according to its own characteristics.

  • Upper-layer application: This is the PHP program we usually write. We can obtain various application modes through different sapi methods, such as implementing web applications through webserver, and using Run in script mode, etc.

If PHP is a car, then the framework of the car is PHP itself, Zend is the engine (engine) of the car, and the various components under Ext are the wheels of the car. Sapi can see It is a road, and cars can run on different types of roads, and the execution of a PHP program means that the car runs on the road. Therefore, we need: a high-performance engine + the right wheels + the right track.

3. Sapi

As mentioned above, Sapi allows external applications to exchange data with PHP through a series of interfaces and implement specific processing methods according to different application characteristics. We commonly see Some of the sapis are:

  • apache2handler: This is the processing method when using apache as the webserver and running in mod_PHP mode. It is also the most widely used one now.

  • cgi: This is another direct interaction method between webserver and PHP, which is the famous fastcgi protocol. In recent years, fastcgi+PHP has been used more and more, and it is also asynchronous. The only method supported by webserver.

  • cli: Application mode for command line calls

4. PHP execution process&opcode

Let’s take a look first The process through which PHP code is executed.

As you can see from the picture, PHP implements a typical dynamic language execution process: after getting a piece of code, after lexical analysis, syntax analysis and other stages, the source program will be translated into instructions (opcodes). The ZEND virtual machine then executes these instructions in sequence to complete the operation. PHP itself is implemented in C, so the functions ultimately called are all C functions. In fact, we can regard PHP as a software developed in C.

The core of PHP execution is the translated instructions, that is, opcode.

Opcode is the most basic unit of PHP program execution. An opcode consists of two parameters (op1, op2), return value and processing function. The PHP program is ultimately translated into the sequential execution of a set of opcode processing functions.

Several common processing functions:

PHP

##ZEND_ASSIGN_SPEC_CV_CV_HANDLER : Variable allocation ($a=$b)


5. HashTable - core data structure

HashTable is the core data structure of zend. It is used to implement almost all common functions in PHP. The PHP array we know is its typical application. In addition , within zend, such as function symbol table, global variables, etc. are also implemented based on hash table.

PHP’s hash table has the following characteristics:

  • Supports typical key->value query

  • Can be used as an array Using

  • Adding and deleting nodes is O(1) complexity

  • key supports mixed types: there are associative number combination index arrays at the same time

  • Value supports mixed types: array ("string",2332)

  • supports linear traversal: such as foreach

Zend hash table implements the typical hash table hash structure, and at the same time provides the function of forward and reverse traversal of the array by attaching a doubly linked list. Its structure is as shown below:

It can be seen that in the hash table, there are both hash structures in the form of key->value and doubly linked list mode, making it very Conveniently supports fast search and linear traversal.

  • Hash structure: Zend’s hash structure is a typical hash table model, which resolves conflicts through a linked list. It should be noted that zend's hash table is a self-growing data structure. When the hash table is full, it will dynamically expand by 2 times and reposition elements. The initial size is 8. In addition, when performing key->value fast search, zend itself has also made some optimizations to speed up the process by exchanging space for time. For example, a variable nKeyLength is used in each element to identify the length of the key for quick determination.

  • Doubly linked list: Zend hash table implements linear traversal of elements through a linked list structure. Theoretically, it is enough to use a one-way linked list for traversal. The main purpose of using a doubly linked list is to quickly delete and avoid traversal. Zend hash table is a composite structure. When used as an array, it supports common associative arrays and can also be used as sequential index numbers, and even allows a mixture of the two.

  • PHP associative array: Associative array is a typical hash_table application. A query process goes through the following steps (as can be seen from the code, this is a common hash query process and some quick judgments are added to speed up the search.):

PHP

##1

2

3

4

5

6

ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER: Function call

ZEND_CONCAT_SPEC_CV_CV_HANDLER: String concatenation $a.$b

ZEND_ADD_SPEC_CV_CONST_HANDLER: Addition $a+2

ZEND_IS_EQUAL_SPEC_CV_CONST: Judge equality $a== 1

ZEND_IS_IDENTICAL_SPEC_CV_CONST: Judgment equal $a===1


  • PHP index array: The index array is our common array, accessed through subscripts. For example, $arr[0], Zend HashTable is internally normalized, and the hash value and nKeyLength (0) are also assigned to the index type key. The internal member variable nNextFreeElement is the currently assigned maximum id, which is automatically increased by one after each push. It is this normalization process that allows PHP to achieve a mixture of associative and non-associative data. Due to the particularity of the push operation, the order of the index keys in the PHP array is not determined by the size of the subscript, but by the order of the push. For example $arr[1] = 2; $arr[2] = 3; For double type keys, Zend HashTable will treat them as index keys

6. PHP variables

PHP is a weakly typed language and does not strictly distinguish the types of variables. PHP does not need to specify the type when declaring variables. PHP may perform implicit conversions of variable types during program execution. Like other strongly typed languages, explicit type conversion can also be performed in the program. PHP variables can be divided into simple types (int, string, bool), collection types (array resource object) and constants (const). All the above variables have the same structure zval under the hood.

Zval is another very important data structure in zend, used to identify and implement PHP variables. Its data structure is as follows:

Zval is mainly composed of three Part composition:

  • type: specifies the type of variable (integer, string, array, etc.)

  • ##refcount&is_ref: used to implement Reference counting (detailed introduction later)

  • value: The core part stores the actual data of the variable

Zvalue is used to save a variable The actual data. Because multiple types need to be stored, zvalue is a union, thus implementing weak typing.

The corresponding relationship between PHP variable types and their actual storage is as follows:

PHP

1

2

3

4

5

6

7

8

9

10

##getKeyHashValueh;

index=n&nTableMask ;

Bucket*p=arBucket[index];

while(p){

if((p->h==h)&(p-> ;nKeyLength==nKeyLength)){

RETURNp->data;

}

p=p->next;

}

RETURNFALTURE;

##1IS_LONG -> lvalue

Reference counting is widely used in memory recycling, string operations, etc. Variables in PHP are a typical application of reference counting. Zval's reference counting is implemented through the member variables is_ref and ref_count. Through reference counting, multiple variables can share the same data. Avoid the heavy consumption caused by frequent copying.

When performing an assignment operation, zend points the variable to the same zval while ref_count++, and during the unset operation, the corresponding ref_count-1. The destruction operation will only be performed when ref_count is reduced to 0. If it is a reference assignment, zend will modify is_ref to 1.

PHP variables realize variable sharing data through reference counting. What if you change the value of one of the variables? When trying to write a variable, if Zend finds that the zval pointed to by the variable is shared by multiple variables, it will copy a zval with a ref_count of 1 and decrement the refcount of the original zval. This process is called "zval separation". It can be seen that zend only performs copy operations when a write operation occurs, so it is also called copy-on-write (copy on write)

For reference variables, the requirements are opposite to non-reference types. Reference assignment Variables must be bundled. Modifying one variable modifies all bundled variables.

Integers and floating-point numbers are one of the basic types in PHP and are also simple variables. For integers and floating point numbers, the corresponding values ​​are stored directly in zvalue. Their types are long and double respectively.

It can be seen from the zvalue structure that for integer types, unlike strongly typed languages ​​such as c, PHP does not distinguish between int, unsigned int, long, long long and other types. For it, integers only One type is long. From this, it can be seen that in PHP, the value range of integers is determined by the number of compiler bits and is not fixed.

For floating point numbers, similar to integers, it does not distinguish between float and double but only double.

In PHP, what should I do if the integer range is out of bounds? In this case, it will be automatically converted to double type. You must be careful about this, as many tricks are caused by this.

Like integers, character variables are also basic types and simple variables in PHP. It can be seen from the zvalue structure that in PHP, a string is composed of a pointer to the actual data and a length structure, which is similar to the string in C++. Since the length is represented by an actual variable, unlike c, its string can be binary data (inclusive). At the same time, in PHP, finding the string length strlen is an O(1) operation.

When adding, modifying, or appending string operations, PHP will reallocate memory to generate new strings. Finally, for security reasons, PHP will still add

at the end when generating a string. Common string splicing methods and speed comparison:

Assume there are the following 4 variables: $strA= '123'; $strB = '456'; $intA=123; intB=456;

Now we will compare and explain the following string splicing methods:

PH

2

3

4

5

IS_DOUBLE -> dvalue

IS_ARRAY -> ht

IS_STRING -> str

IS_RESOURCE -> lvalue


PHP arrays are naturally implemented through Zend HashTable.

How to implement foreach operation? Foreach on an array is completed by traversing the doubly linked list in the hashtable. For index arrays, traversal through foreach is much more efficient than for, eliminating the need to search for key->value. The count operation directly calls HashTable->NumOfElements, O(1) operation. For a string like '123', zend will convert it to its integer form. $arr[‘123’] and $arr[123] are equivalent

The resource type variable is the most complex variable in PHP and is also a composite structure.

PHP's zval can represent a wide range of data types, but it is difficult to fully describe custom data types. Since there is no efficient way to represent these composite structures, there is no way to use traditional operators on them. To solve this problem, you only need to refer to the pointer through an essentially arbitrary identifier (label), which is called a resource.

In zval, for resource, lval is used as a pointer, directly pointing to the address of the resource. Resource can be any composite structure. The familiar mysqli, fsock, memcached, etc. are all resources.

How to use resources:

  • Registration: For a custom data type, you want to use it as a resource. First, you need to register it, and zend will assign it a globally unique identifier.

  • Get a resource variable: For resources, zend maintains an id->hash_tale of actual data. For a resource, only its id is recorded in zval. When fetching, find the specific value in the hash_table through the id and return it.

  • Resource destruction: The data types of resources are diverse. Zend itself has no way to destroy it. Therefore, users need to provide a destruction function when registering resources. When unset resources, zend calls the corresponding function to complete the destruction. Also delete it from the global resource table.

Resources can persist for a long time, not just after all variables referencing it go out of scope, but even after a request ends and a new request is generated. These resources are called persistent resources because they persist throughout the life cycle of the SAPI unless specifically destroyed. In many cases, persistent resources can improve performance to a certain extent. For example, in our common mysql_pconnect, persistent resources allocate memory through pemalloc so that they will not be released when the request ends.
For zend, there is no distinction between the two.

How are local variables and global variables implemented in PHP? For a request, PHP can see two symbol tables (symbol_table and active_symbol_table) at any time, with the former used to maintain global variables. The latter is a pointer pointing to the currently active variable symbol table. When the program enters a function, zend will allocate a symbol table x to it and point active_symbol_table to a. In this way, the distinction between global and local variables is achieved.

Get variable values: PHP's symbol table is implemented through hash_table. Each variable is assigned a unique identifier. When obtaining, the corresponding zval is found from the table and returned according to the identifier.

Using global variables in functions: In functions, we can use global variables by explicitly declaring global. Create a reference to the variable with the same name in symbol_table in active_symbol_table. If there is no variable with the same name in symbol_table, it will be created first.

PHP running mechanism process:

1. We have never manually started the PHP related process, it runs with the startup of Apache;

2. PHP is connected to Apache through the mod_php5.so module (specifically, SAPI, server application programming interface);

3. PHP has three modules in total: kernel, Zend engine, and extension layer;

4. The PHP kernel is used to handle requests, file streams, error handling and other related operations;

5. The Zend engine (ZE) is used to convert source files into machine language and then run on the virtual machine It;

6. The extension layer is a set of functions, libraries, and streams that PHP uses to perform some specific operations. For example, we need the mysql extension to connect to the MySQL database;

7. When ZE executes the program, it may need to connect to several extensions. At this time, ZE will hand over control to the extension and return it after processing the specific task;

8. Finally, ZE returns the program execution results to the PHP kernel, which then transmits the results to the SAPI layer and finally outputs them to the browser.

In-depth discussion of PHP operating mechanism

PHP operating mechanism The first step of PHP startup

Not sure what the first and second steps are? Don’t worry, we’ll discuss this in detail next.

Let’s take a look at the first and most important step. The thing to remember is that the first step of the operation happens before any requests arrive. After starting Apache, the PHP interpreter also starts; PHP calls the MINIT method of each extension, thereby switching these extensions to an available state. Take a look at what extensions are opened in the php.ini file; MINIT means "module initialization". Each module defines a set of functions, class libraries, etc. to handle other requests.

A typical MINIT method is as follows:

PHP_MINIT_FUNCTION(extension_name){ /* Initialize functions, classes etc */ }

PHP operating mechanism: The second step of PHP startup

When a page request occurs, the SAPI layer hands over control to the PHP layer. So PHP sets the environment variables needed to reply to this request. At the same time, it also creates a variable table to store variable names and values ​​generated during execution. PHP calls the RINIT method of each module, which is "request initialization". A classic example is the RINIT of the Session module. If the Session module is enabled in php.ini, the $_SESSION variable will be initialized and the relevant content will be read in when the RINIT of the module is called; the RINIT method can be regarded as a The preparation process starts automatically between program executions.

A typical RINIT method is as follows:

PHP_RINIT_FUNCTION(extension_name) { /* Initialize session variables, pre-populate variables, redefine global variables etc */ }

PHP operation The first step of PHP shutdown mechanism

Just like PHP startup, PHP shutdown is also divided into two steps: Once the page is executed (whether it reaches the end of the file or is terminated with the exit or die function), PHP will start Cleanup procedure. It will call the RSHUTDOWN method of each module in sequence. RSHUTDOWN is used to clear the symbol table generated when the program is running, that is, to call the unset function on each variable.

A typical RSHUTDOWN method is as follows:

PHP_RSHUTDOWN_FUNCTION(extension_name) { /* Do memory management, unset all variables used in the last PHP call etc */ }

PHP The second step of PHP shutdown of operating mechanism

Finally, all requests have been processed, SAPI is also ready to be shut down, and PHP begins to execute the second step: PHP calls the MSHUTDOWN method of each extension, which is each module Last chance to free memory.

A typical RSHUTDOWN method is as follows:

PHP_MSHUTDOWN_FUNCTION(extension_name) { /* Free handlers and persistent memory etc */ }

In this way, the entire PHP life cycle is over . It should be noted that "starting the first step" and "closing the second step" will only be executed when there is no request from the server.

Related recommendations:

Detailed code explanation of the underlying operating mechanism of JavaScript closures

The underlying operating mechanism of JavaScript closures

Exploring the underlying operating mechanism of PHP_PHP tutorial

1

2

3

4

5

6

7

8

##$res=$strA.$strB and $res=“$strA$strB”

In this case, zend will malloc a piece of memory again and process it accordingly. The speed is generally

$strA=$strA.$strB

This is the fastest. zend will directly relloc based on the current strA to avoid repeated copying

$res=$intA.$intB

This is slower because it requires implicit format conversion and the actual writing In the program, you should also pay attention to avoid

$strA=sprintf(“%s%s”,$strA.$strB);

This will be the slowest way, because sprintf PHP is not a language structure. It takes a lot of time to identify and process the format. In addition, the mechanism itself is malloc. However, the sprintf method is the most readable, and in practice it can be chosen flexibly according to specific circumstances.

The above is the detailed content of Detailed explanation of the underlying operating mechanism of PHP. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn