Home >Backend Development >PHP7 >Detailed explanation of the overall framework of PHP7 source code
Recommended (free): PHP7
1. PHP7 language execution Principle
There are many commonly used high-level languages, which can be roughly divided into two types according to the way they are run: compiled languages and interpreted languages.
Compilation refers to "translating" the program source code into assembly language before the application source program is executed, and then further compiling it into a target file according to the software and hardware environment. . The tool that completes the compilation work is generally called compiler.
# Interpreted languages are "translated" into machine language when the program is running. However, "translation" is performed once, so the execution efficiency is low. The job of the interpreter is the program responsible for "translating" the source code in an interpreted language.
For a piece of C language code, it needs to be precompiled, compiled, assembled and linked before it can become an executable binary file.
For compiled languages represented by C language, code updates must go through the above steps.
Execution instructions for compiled languages:
The understanding of the difference between compiled languages and interpreted languages is based on the source The timing at which code is compiled into instructions for the target platform's CPU. For compiled languages, the compilation results are already instructions for the current CPU system; for interpreted languages, they need to be compiled into intermediate code first, and then translated into instructions for a specific CPU system through the specific virtual machine of the interpreted language for execution. Interpreted languages are translated into instructions for the target platform during runtime. Interpreted languages are often said to be "slow", and that's mainly why they are slow.
In PHP 7, the source code is first lexically analyzed, and the source code is cut into multiple string units. The divided strings are called Tokens. Each independent Token cannot express complete semantics. It needs to go through the syntax analysis stage to convert the Token into an abstract syntax tree (AST). Afterwards, the abstract syntax tree is converted into machine instructions for execution. In PHP, these instructions are called opcodes.
Step 1: Obtain the Token through lexical analysis of the source code.
Step 2: Generate an abstract syntax tree (AST) based on the syntax analyzer.
Step 3: The abstract syntax tree is converted into opcodes (opcode instruction set), and PHP interprets and executes the opcodes.
1.Token
Token is the PHP code that is cut into Meaningful logos. PHP provides the token_get_all() function to obtain the Token after the PHP code is cut.
The first value of each member array of the two-dimensional array is the enumeration value corresponding to Token. The second value is the original string content corresponding to the Token. The third value is the line number corresponding to the code.
It can be seen that Token is a "chunk" one by one, but the chunks that exist alone cannot express complete semantics, and they need to be organized and connected with the help of rules. The parser is this organizer. It will check the syntax, match the Token, and associate the Token.
2.AST
AST is a new feature of PHP 7 version. In previous versions, there was no step of generating AST during the execution of PHP code.
AST nodes are divided into multiple types, corresponding to PHP syntax.
PHP-Parser tool, which can be used to view the AST generated by PHP code.
Note: PHP-Parser is a tool written by Nikic, one of the authors of the PHP 7 kernel, to generate AST from PHP source code. The source code is available at https://github.com/nikic/PHP-Parser.
3.opcodes
opcode is just a single instruction, opcodes is a collection of opcodes, and is the intermediate code during PHP execution. After opcode is generated, it is executed by the virtual machine.
One of the more common PHP project optimization measures is "turn on opcache", which refers to the opcodes cache here. By eliminating the step from source code to opcode, the engine can directly execute the cached opcode, thereby
improving performance.
With the vld plug-in, you can intuitively see the opcode generated by a piece of PHP code.
opcode is a set of instruction identifiers defined by PHP 7, and the instructions correspond to the corresponding handler (processing function). When the virtual machine calls opcode, it will find the processing function behind the opcode and perform real processing.
2. Kernel Architecture
The Zend engine contains a compiler and an interpreter. The execution from PHP code to opcode is all completed by the Zend engine.
In addition to implementing the core functions of PHP, the Zend engine also provides a set of interfaces that allow PHP to be used in more scenarios, such as command line environments, Web environments, etc.
The architecture diagram is roughly divided into four parts.
1) Zend engine: The lexical/grammatical analysis, AST compilation and opcodes execution introduced above are all implemented in the Zend engine. In addition, PHP's variable design, memory management, process management, etc. are also implemented at the engine layer. The engine provides basic services for PHP. The reliability and high performance of PHP rely on the basic support of the engine. At the same time, the scalability of the Zend engine is one of the important reasons for PHP's large-scale application.
2) PHP layer: Zend engine provides basic capabilities for PHP (such as memory allocation and recycling), while interactions from the outside need to be handled through the PHP layer.
3) SAPI: SAPI is the abbreviation of Server API, which includes the common cli SAPI and fpm SAPI. PHP defines input/output specifications, and the party that interacts with PHP according to this specification can be called Server.
#4) Extension part: Zend engine provides core capabilities and interface specifications. The extensions developed on this basis provide richer options for the performance and functional diversity of PHP code.
3. PHP source code directory
sapi directory source code
sapi directory is The abstraction of the input and output layers is the specification for PHP to provide external services.
The input to the PHP program can be the standard input from the command line or a network request based on the cgi/fastcgi protocol. In the same way, the output can be written to the standard output of the command line or returned to the client as a network response based on the cgi/fastcgi protocol.
The command line mode corresponds to the binary program bin/php; the built-in module mode does not need to provide a binary program, it can be called by Apache or any C/C program as an ordinary function; the CGI mode corresponds to the binary program Program bin/cgi; FastCGI mode corresponds to the binary program sbin/php-fpm.
Several commonly used SAPIs.
1) apache2handler: Apache extension, compile and generate a dynamic link library, configure it under Apache, when there is an http request to Apache, this dynamic link library will be called according to the configuration, execute the PHP code, and complete the interaction with PHP Interaction.
2) cgi-fcgi: After compilation, an executable program that supports the CGI protocol is generated. The webserver (usually Apache or Nginx) passes the request to the CGI process through the CGI protocol, and the execution code returns the result to the webserver and exits. process.
3) fpm-fcgi: The full name of fpm is FastCGI Process Manager, the FastCGI process manager officially provided by PHP. Taking the Nginx server as an example, when an http protocol request is sent to the Nginx server, Nginx hands the request to the php-fpm process for processing according to the FastCGI protocol.
4) cli: short for Command Line Interface, PHP’s command line interactive interface.
Zend directory source code
Zend directory is the core code of PHP.
1. Memory management module
2. Garbage collection
3. Array implementation
main directory source code
The main directory is the glue between the SAPI layer and the Zend layer.
The Zend layer implements the compilation and execution of PHP scripts, the sapi layer implements the abstraction of input and output, and the main directory serves as a link between the previous and the following: parsing the SAPI request and analyzing the script file to be executed. and parameters; when enabled, complete necessary initialization and other work before calling the Zend engine.
ext directory source code
ext is a directory related to PHP extensions. Commonly used array, str, pdo and other series of functions are defined here.
TSRM directory source code
In the early days, PHP was mostly run in a single process and single-thread model. Only later did the thread safety mechanism ZTS (Zend Thread Safety) be introduced. ).
TSRM is the abbreviation of Thread Safe Resource Manager - Thread Safe Resource Manager.
The thread safety mechanism is mainly to ensure the security of shared resources. PHP's thread safety mechanism is simple and intuitive - in a multi-threaded environment, each thread is provided with an independent copy of global variables. The specific implementation is to allocate (lock before allocation) an independent ID (auto-increment) to each thread through TSRM as the global variable memory area index of the current thread. In the future global variable access, complete independence between threads is achieved.
The above is the detailed content of Detailed explanation of the overall framework of PHP7 source code. For more information, please follow other related articles on the PHP Chinese website!