Home > Article > Backend Development > Analyzing the Zend VM engine from PHP syntactic sugar
## 1. Let’s talk about the syntactic sugar of PHP5.3+ first. Usually we write it like this:
Sugar can be written like this:
Sugar, especially the ones that are easy to understand and confuse, such as PHP 7’s new additions: ($a) ? $a : 1;
?: and ?? Are you easily confused? If so, I suggest you rather not use them. The more important thing is that the code is readable and easy to maintain.
Syntactic sugar is not the focus of this article. Our purpose is to start with syntactic sugar and talk about the parsing principle of Zend VM.
## 2.
Analyzed PHP source code branch => remotes/origin/PHP-5.6.14. Regarding how to view opcode through vld, please read this article I wrote before:
compiled vars: !0 = $a, !1 = $b
line -------------------------------------------------- --------- 2 0
3 # 0; line: 2- 4; sop: 0; eop: 4; out1: -2
path #1: 0,
vim Zend/zend_language_parser.y +834
~~~.bash
834 › |› expr '?' ':' { zend_do_jmp_set(&$1, &$2, &$3 TSRMLS_CC); }
835 › › › expr { zend_do_jmp_set_else(&$$, &$5, &$2, &$3 TSRMLS_CC); }
~~~
If you like, you can do it yourself and redefine the syntactic sugar of ?: . Follow the BNF grammar rules and use bison analysis. If you are interested, you can Google the relevant knowledge and continue to learn more.
From the opcode of vld, we can know that zend_do_jmp_set_else is executed, and the code is in Zend/zend_compile.c:
~~~.java
void zend_do_jmp_set_else(znode *result, const znode *false_value, const znode *jmp_token, const znode * colon_token TSRMLS_DC)
{
› zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC);
› SET_NODE(opline->result, colon_token);
› if (colon_token->op_type == IS_TMP_ VAR) {
› › if (false_value->op_type == IS_VAR || false_value->op_type == IS_CV) {
› › › CG(active_op_array)->opcodes[jmp_token->u.op.opline_num].opcode = ZEND_JMP_SET_VAR ;
› › › CG(active_op_array)->opcodes[jmp_token->u.op.opline_num].result_type = IS_VAR;
› › › opline->opcode = ZEND_QM_ASSIGN_VAR;
›› opline->result_type = IS_VAR;
› › } else {
› › › opline->opcode = ZEND_QM_ASSIGN;
› › }
› } else {
› › opline->opcode = ZEND_QM_ASSIGN_VAR;
› }
› opline-> extended_value = 0;
› SET_NODE(opline->op1, false_value);
› SET_UNUSED(opline->op2);
› GET_NODE(result, opline->result);
› CG(active_op_array)- >opcodes[jmp_token->u.op.opline_num].op2.opline_num = get_next_op_number(CG(active_op_array));
› DEC_BPC(CG(active_op_array));
}
~~~
## 3.
The two key opcodes are ZEND_JMP_SET_VAR and ZEND_QM_ASSIGN_VAR. How to continue reading the code? Let’s talk about PHP’s opcode.
PHP5.6 has 167 opcodes, which means it can perform 167 different calculation operations. The official documentation can be found here
PHP The _zend_op structure is used internally to represent opcode, vim Zend/zend_compile.h +111
111 struct _zend_op {
112 › opcode_handler_t handler;
113 › znode_op op1;
114 › znode_op op2;
115 › znode_op result;
116 › ulong extended_value;
117 › uint lineno;
118 › zend_uchar opcode;
119 › zend_uchar op1_type;
120 › zend_uchar op2_type;
121 › zend_uchar result_type;
122 }
PHP 7.0 is slightly different, the main difference is that For 64-bit systems, uint is replaced with uint32_t, and the number of bytes is explicitly specified.
You think of opcode as a calculator, which only accepts two operands (op1, op2), performs an operation (handler, such as addition, subtraction, multiplication and division), and then it returns a result (result) to you, and then does a little arithmetic processing Overflow situation (extended_value).
Zend’s VM works exactly the same way for each opcode, with a handler (function pointer) pointing to the address of the processing function. This is a C function that contains the code corresponding to the execution of opcode, using op1 and op2 as parameters. After the execution is completed, a result (result) will be returned, and sometimes a piece of information (extended_value) will be appended.
Use the operand ZEND_JMP_SET_VAR in our example to illustrate, vim Zend/zend_vm_def.h +4995
4942 ZEND_VM_HANDLER(158, ZEND_JMP_SET_VAR, CONST|TMP|VAR|CV, ANY)
4943 {
4944 › USE_OPLINE
4945 › zend_free_op free_op1;
4946 › zval *value, *ret; 4947
4948 › SAVE_OPLINE(); 4949 › value = GET_OP1_ZVAL_PTR(BP_VAR_R); 4950
4951 › if (i_zend_is_true(value)) {
4952 › › if ( OP1_TYPE == IS_VAR || OP1_TYPE == IS_CV) {
4953 › › › Z_ADDREF_P(value);
4954 › › › EX_T(opline->result.var).var.ptr = value;
4955 › › › EX_T(opline->result.var).var.ptr_ptr = &EX_T(opline->result.var).var.ptr;
4956 › › } else {
4957 › › › ALLOC_ZVAL(ret);
4958 › › › INIT_PZVAL_COPY(ret, value);
4959 › › › EX_T(opline->result.var).var.ptr = ret;
4960 › › › EX_T(opline->result.var).var.ptr_ptr = &EX_T(opline->result.var).var.ptr;
4961 › › › if (!IS_OP1_TMP_FREE()) {
4962 › › › › zval_copy_ctor(EX_T(opline->result.var).var.ptr);
4963 › › › }
4964 › › }
4965 › › FREE_OP1_IF_VAR();
4966 #if DEBUG_ZEND>=2
4967 › › printf("Conditional jmp to %dn", opline->op2.opline_num);
4968 #endif
4969 › › ZEND_VM_JMP(opline->op2.jmp_addr);
4970 › }
4971
4972 › FREE_OP1();
4973 › CHECK_EXCEPTION();
4974 › ZEND_VM_NEXT_OPCODE();
4975 }
i_zend_is_true 来判断操作数是否为true,所以ZEND_JMP_SET_VAR是一种条件赋值,相信大家都能看明白,下面讲重点。
注意`zend_vm_def.h`这并不是一个可以直接编译的C的头文件,只能说是一个模板,具体可编译的头为`zend_vm_execute.h`(这个文件可有45000多行哦),它并非手动生成,而是由`zend_vm_gen.php`这个PHP脚本解析`zend_vm_def.h`后生成(有意思吧,先有鸡还是先有蛋,没有PHP 哪来的这个脚本?),猜测这个是后期产物,早期php版本应该不会用这个。
上面ZEND_JMP_SET_VAR的代码,根据不同参数 `CONST|TMP|VAR|CV` 最终会生成不同类型的,但功能一致的handler函数:
static int ZEND_FASTCALL ZEND_JMP_SET_VAR_SPEC_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS)
static int ZEND_FASTCALL ZEND_JMP_SET_VAR_SPEC_TMP_HANDLER(ZEND_OPCODE_HANDLER_ARGS)
static int ZEND_FASTCALL ZEND_JMP_SET_VAR_SPEC_VAR_HANDLER(ZEND_OPCODE_HANDLER_ARGS)
static int ZEND_FASTCALL ZEND_JMP_SET_VAR_SPEC_CV_HANDLER(ZEND_OPCODE_HANDLER_ARGS)
这么做的目的是为了在编译期确定handler,提升运行期的性能。不这么做,在运行期根据参数类型选择,也可以做到,但性能不好。当然这么做有时也会生成一些垃圾代码(看似无用),不用担心,C的编译器会进一步优化处理。
zend_vm_gen.php 也可以接受一些参数,细节在PHP源码中的README文件 `Zend/README.ZEND_VM` 有详细说明。
## 4.
讲到这里,我们知道opcode怎么和handler对应了。但是在整体上还有一个过程,就是语法解析,解析后所有的opcode是怎么串联起来的呢?
语法解析的细节就不说了,解析过后,会有个包含所有opcode的大数组(说链表可能更准确),从上面代码我们可以看到,每个handler执行完后,都会调用 ZEND_VM_NEXT_OPCODE(),取出下一个opcode,继续执行,直到最后退出,循环的代码 vim Zend/zend_vm_execute.h +337:
~~~.java
ZEND_API void execute_ex(zend_execute_data *execute_data TSRMLS_DC)
{
› DCL_OPLINE
› zend_bool original_in_execution;
› original_in_execution = EG(in_execution);
› EG(in_execution) = 1;
› if (0) {
zend_vm_enter:
› › execute_data = i_create_execute_data_from_op_array(EG(active_op_array), 1 TSRMLS_CC);
› }
› LOAD_REGS();
› LOAD_OPLINE();
› while (1) {
› int ret;
#ifdef ZEND_WIN32
› › if (EG(timed_out)) {
› › › zend_timeout(0);
› › }
#endif
› › if ((ret = OPLINE->handler(execute_data TSRMLS_CC)) > 0) {
› › › switch (ret) {
› › › › case 1:
› › › › › EG(in_execution) = original_in_execution;
› › › › › return;
› › › › case 2:
› › › › › goto zend_vm_enter;
› › › › › break;
› › › › case 3:
› › › › › execute_data = EG(current_execute_data);
› › › › › break;
› › › › default:
› › › › › break;
› › › }
› › }
› }
› zend_error_noreturn(E_ERROR, "Arrived at end of main loop which shouldn't happen");
}
~~~
宏定义, vim Zend/zend_execute.c +1772
1772 #define ZEND_VM_NEXT_OPCODE()
1773 › CHECK_SYMBOL_TABLES()
1774 › ZEND_VM_INC_OPCODE();
1775 › ZEND_VM_CONTINUE()
329 #define ZEND_VM_CONTINUE() return 0
330 #define ZEND_VM_RETURN() return 1
331 #define ZEND_VM_ENTER() return 2
332 #define ZEND_VM_LEAVE() return 3
while是一个死循环,执行一个handler函数,除个别情况,多数handler函数末尾都调用ZEND_VM_NEXT_OPCODE() -> ZEND_VM_CONTINUE(),return 0,继续循环。
> 注:比如 yield 协程是个例外,它会返回1,直接return出循环。以后有机会我们再单独对yield做分析。
希望你看完上面内容,对PHP Zend 引擎的解析过程有个详细的了解,下面我们基于原理的分析,再简单聊聊PHP的优化。
## 5. PHP优化注意事项
### 5.1 echo 输出
$foo = 'foo';
$bar = 'bar';
echo $foo . $bar;
vld 查看opcode:
number of ops: 5
compiled vars: !0 = $foo, !1 = $bar
line #* E I O op fetch ext return operands
-------------------------------------------------------------------------------------
2 0 E > ASSIGN !0, 'foo'
3 ASSIGN : 4; out1: -2
path #1: 0,
ZEND_CONCAT Connect the values of $a and $b, save it to the temporary variable ~2, and then echo it out. This process involves allocating a piece of memory for temporary variables, which must be released after use, and the splicing function needs to be called to perform the splicing process.
If you change it to this:
$foo = 'foo';
$bar = 'bar';
echo $foo, $bar;
The corresponding opcode:
number of ops: 5
compiled vars: !0 = $foo, !1 = $bar
line #* E I O op fetch ext return operands
----------------------- -------------------------------------------------- ------------
2 0 E > ASSIGN 3 1 ASSIGN ECHO5 4 4; out1: -2
path #1: 0,
No need to allocate memory or perform splicing Function, is it more efficient? If you want to understand the splicing process, you can search for the handler corresponding to the ZEND_CONCAT opcode according to the content of this article. It has done a lot of things.
### 5.2 define() and const
The const keyword was introduced starting from 5.3. It is very different from define. It has similar meaning to `#define` in C language.
* define() is a function call and has function call overhead.
* const is a keyword that directly generates opcode, which can be determined during compilation and does not need to be dynamically allocated during execution. The value of
const is dead and cannot be changed during runtime, so it is similar to #define in C language, which is determined during compilation and has restrictions on numerical types.
Look at the code directly and compare the opcode:
define example:
compiled VARS: None e LINE #* E I O OP FETCH EXT RETURN Operats
---------------------------------- --------------------------------------------------
2 0 E > SEND_VAL 1 SEND_VAL 'foo'
2 DO_FCALL 'define'
3 3 FETCH_CONSTANT
const FOO = 'foo';
echo FOO;
const opcode:
number of ops: 4
compiled vars: none
line #* E I O op fetch ext return operands
------------------ -------------------------------------------------- ----------------
2 0 E > DECLARE_CONST 1 FETCH_CONSTANT 0
4 3 > RETURN 1
### 5.3 The cost of the dynamic function
& lt;? PHP
Function Foo () {}
foo (); EI O op ------------------------------------------
2 0 E > NOP
3 1 Do_fcall 0 'FOO'
4 2 & GT; Return 1
The code dynamic calls:
& lt ;? php
function foo () {}
$ a = 'foo';
$ a ();
opcode:
number of ops: 5
compiled vars: !0 = $a
line #* E I O op fetch ext return operands
-------------------------- -------------------------------------------------- ---------------
2 0 E > NOP
3 1 ASSIGN !0, 'foo'
4 2 INIT_FCALL_BY_NAME 0
5 4 > RETURN Made by AME Things, the code is too long, so I won’t list it here. Although dynamic features are convenient, they will definitely sacrifice performance, so you need to balance the pros and cons before using them. 4 ### 5.4 The cost of a delayed statement in category. Compiled Vars: None
LINE #* E I O OP FETCH EXT RETURN Operands
----------------------------------- -------------------------------------------------- T 2 0 e & gt; NOP
3 1 NOP
2 NOP
4 3 & gt; Return 1
Change statement order:
& lt ;? PHP
class foo extends bar {}
class bar {}
Corresponding OPCODE:
number of ops: 4
compiled vars: none
line #* E I O op fetch fetch ext return operands
--------------------------- -------------------------------------------------- --------
2 0 E > FETCH_CLASS1 DECLARE_INHERITED_CLASS '%00foo%2FUsers%2Fqisen%2Ftmp%2Fvld.php0x103d58020', 'foo'
3 2 NOP
4 3 > RETURN A dynamic language will defer class declaration until runtime. If you don't pay attention, you may step on this trap.
So after we understand the principles of Zend VM, we should pay more attention to using less dynamic features. When it is dispensable, we must not use them.