Home >Backend Development >PHP Tutorial >The execution principle of PHP7 language (PHP7 source code analysis)

The execution principle of PHP7 language (PHP7 source code analysis)

藏色散人
藏色散人forward
2019-03-20 11:04:504423browse

The execution principle of PHP7 language (PHP7 source code analysis)

There are many high-level languages ​​we commonly use, the more famous ones include CC, Python, PHP, Go, Pascal, etc. These languages ​​can be roughly divided into two types according to the way they run: compiled languages ​​and interpreted languages.

Among them, compiled languages ​​include CC, Pascal, Go, etc. The compilation mentioned here means that before the application source program is executed, the program source code is "translated" into assembly language, and then further compiled into a target file according to the software and hardware environment. Generally, we call the tool that completes the compilation work a compiler. Interpreted languages ​​are "translated" into machine language when the program is running. However, "translation" is performed once, so the execution efficiency is low. The job of the interpreter is the program responsible for "translating" the source code in an interpreted language.

Let’s discuss in more detail how compiled and interpreted languages ​​operate.

1. Compiled Language and Interpreted Language

We know that a piece of C language code needs to be precompiled, compiled, assembled and linked before it can become readable Executed binary file. Take hello.c as an example:

#include<stdio.h>
int main(){   
    printf("hello world");   
    return 1;
}

For this C code, main is the program entry function, and its function is to print the string "hello world" to the screen. The compilation and execution process is shown in Figure 1.

The execution principle of PHP7 language (PHP7 source code analysis)

Figure 1 Schematic diagram of compiled language execution

Step 1: C language code preprocessing (such as dependency processing, macro replacement, etc.). As in the above code example, #inlcude will be replaced during the preprocessing stage.

Step 2: Compile. The compiler will translate the C language into an assembly language program. A piece of C language usually represents multiple lines of assembly code. At the same time, the compiler will optimize the program and generate a target assembly program.

Step 3: The compiled assembly language is then assembled into the target program hello.o through the assembler.

Step 4: Link. Programs often contain some shared object files, such as the printf() function in the sample program, which is located in a static library and needs to be linked through a linker (such as Uinx connector ld).

Compiled languages ​​represented by C language, code updates must go through the above steps:

We distinguish between compiled languages ​​and interpreted languages, mainly based on the source code being compiled into the target The timing of platform CPU instructions. For compiled languages, the compilation results are already instructions for the current CPU system; for interpreted languages, they need to be compiled into intermediate code first, and then translated into instructions for a specific CPU system through the specific virtual machine of the interpreted language for execution. Interpreted languages ​​are translated into instructions for the target platform during runtime. Interpreted languages ​​are often said to be "slow", and that's mainly why they are slow.

In PHP7, the source code will first be lexically analyzed, and the source code will be cut into multiple string units. The divided strings are called Tokens. Each independent Token cannot express complete semantics, and it needs to go through the syntax analysis stage to convert the Token into an abstract syntax tree (AST). Afterwards, the abstract syntax tree is converted into machine instructions for execution. In PHP, these instructions are called opcode (opcode will be explained in more detail later, and readers can think of it as CPU instructions here).

Up to the step of generating AST, the process that compiled languages ​​and interpreted languages ​​go through is similar. The differences begin after the abstract syntax tree.

Figure 2 shows the simplified steps in which PHP (if no special instructions are specified, the PHP mentioned in this chapter is PHP7 version) code is executed. The left branch of the last step is the process of compiled language.

The execution principle of PHP7 language (PHP7 source code analysis)

Figure 2 Taking PHP as an example of the execution diagram of an interpreted language

Step 1: The source code obtains the Token through lexical analysis;

Step 2: Generate abstract syntax tree (AST) based on syntax analyzer;

Step 3: Convert abstract syntax tree to Opcodes (opcode instruction set), PHP interprets and executes Opcodes.

Next, based on the basic steps, we will refine the execution principle of the PHP language and try to establish a clearer understanding.

2. Overview of the execution principle of PHP7

First of all, we will supplement the execution process of the PHP7 program mentioned above, please see Figure 3.

The execution principle of PHP7 language (PHP7 source code analysis)

Figure 3 Execution process diagram of the program written in PHP7 language

Step 1: Lexical analysis converts the PHP code into Meaningful identification Token. The lexical analyzer for this step is implemented using Re2c.

第2步:语法分析将Token和符合文法规则的代码生成抽象语法树。语法分析器基于Bison实现。语法分析使用了巴科斯范式(BNF)来表达文法规则,Bison借助状态机、状态转移表和压栈、出栈等一系列操作,生成抽象语法树。

第3步:上步的抽象语法树生成对应的opcode,被虚拟机执行。opcode是PHP7定义的一组指令标识,指令对应着相应的handler(处理函数)。当虚拟机调用opcode,会找到opcode背后的处理函数,执行真正的处理。以我们常见的echo语句为例,其对应的opcode便是ZEND_ECHO。

注意:这里为了便于理解词法分析和语法分析过程,将两者分开描述。但实际情况,出于效率考虑,两个过程并非完全独立。

下面,我们通过一段示例代码,来建立PHP7运转的初步理解。

示例代码如下:

<?phpecho "hello world";

从图3可知,这段代码首先会被切割为Token。

1. Token

Token是PHP代码被切割成的有意义的标识。本书介绍的PHP7版本中有137 种Token,在zend_language_parser.h文件中做了定义:

/* Tokens.  */#define END 0#define T_INCLUDE 258#define T_INCLUDE_ONCE 259…#define T_ERROR 392

更多Token的含义,感兴趣的读者可以参考《PHP 7底层设计与源码实现》附录。

PHP提供了token_get_all()函数来获取PHP代码被切割后的Token,可以在深入源码学习前,粗略查看PHP代码被切割后的Token。如下代码片段:

/home/vagrant/php7/bin/php –r &#39;print_r(Token_get_all("<?php echo \"hello world\";"));&#39;

输出结果为:

Array
(
   [0] => Array
       (
           [0] => 379
           [1] => <?php
           [2] => 1
       )
   [1] => Array
       (
           [0] => 328
           [1] => echo
           [2] => 1
       )
   [2] => Array
       (
           [0] => 382
           [1] =>
           [2] => 1
       )
   [3] => Array
       (
           [0] => 323
           [1] => "hello world"
           [2] => 1
       )
   [4] => ;
)

上文输出中,二维数组的每个成员数组第一个值为Token对应的枚举值;第二个值为Token对应的原始字符串内容;第三个值为代码对应的行号。可以看出,词法解析器将

1)文本“

#dfine T_OPEN_TAG 379

不难理解,它是PHP代码的起始tag,也就是

2)echo对应的Token是T_ECHO:

#define T_ECHO 328

3)源码中的空格,对应的Token叫T_WHITESPACE,值为382:

#define T_WHITESPACE 382

4)字符串“hello world”对应的Token值为323:

#define T_CONSTANT_ENCAPSED_STRING 323

可见,Token就是一个个的“词块”,但是单独存在的词块不能表达完整的语义,还需要借助规则进行组织串联。语法分析器就是这个组织者。它会检查语法、匹配Token,对Token进行关联。

PHP7中,组织串联的产物就是抽象语法树(Abstract Syntax Tree,AST)。

2. AST

AST是PHP7版本新特性。在这之前的版本,PHP代码的执行过程中没有生成AST这一步。PHP7对抽象语法树的支持,实现了PHP编译器和解释器解耦,有效提升了可维护性。

顾名思义,抽象语法树具有树状结构。AST的节点分为多种类型,对应着不同的PHP语法。在当前章节,我们可以认为节点类型是对语法规则的抽象,例如赋值语句,生成的抽象语法树节点为ZEND_AST_ASSIGN。而赋值语句的左右操作数,又将作为ZEND_AST_ASSIGN类型节点的孩子。通过这样的节点关系,构建出抽象语法树。

如果读者希望一睹为快,可以直接跳到本书第13章函数的实现,其中图片描绘了一段简单的PHP代码生成的抽象语法树。

在这里,我们推荐读者了解下PhpParser工具,可以用它来查看PHP代码生成的AST。

注意:PHP-Parser是PHP7内核作者之一nikic编写的将PHP源码生成AST的工具。源码见https://github.com/nikic/PHP-...

3. Opcodes

AST扮演了源码到中间代码的临时存储介质的角色,还需要将其转换为opcode,才能被引擎直接执行。Opcode只是单条指令,Opcodes是opcode的集合形式,是PHP执行过程中的中间代码,类似Java中的字节码。生成之后由虚拟机执行。

我们知道,PHP工程优化措施中有个比较常见的“开启Opcache”,指的就是这里的Opcodes的缓存(Opcodes Cache)。通过省去从源码到opcode的阶段,引擎可以直接执行缓存的opcode,以此提升性能。

借助vld插件,可以直观地看到一段PHP代码生成的opcode:

php -dvld.active=1 hello.php

经过过滤整理,对应的opcode为:

line     op              
 1      ECHO            
 2      RETURN

其实在源码实现中,上述代码生成的opcode及handler为:

ZEND_ECHO  // handler: ZEND_ECHO_SPEC_CONST_HANDLERZEND_RETURN  // handler: ZEND_RETURN_SPEC_CONST_HANDLER

可见,ZEND_ECHO对应的handler是ZEND_ECHO_SPEC_CONST_HANDLER。此handler的实现的功能便是预期的“hello world”语句的输出。

相关推荐:《PHP7新特性手册

The above is the detailed content of The execution principle of PHP7 language (PHP7 source code analysis). For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:imooc.com. If there is any infringement, please contact admin@php.cn delete