Home >Backend Development >PHP Tutorial >In-depth understanding of opcode optimization in php (picture and text)
PHP (the PHP version of the cases mentioned in this article are all 7.1.3) is a dynamic script language. Its execution process in the zend virtual machine is: read the script program string, and The lexical analyzer converts it into word symbols, then the syntax analyzer discovers the grammatical structure and generates an abstract syntax tree, then the static compiler generates opcode, and finally the interpreter simulates machine instructions to execute each opcode.
In the entire above process, the generated opcode can be streamlined by applying compilation optimization techniques such as dead code deletion, conditional constant propagation, function inlining and other optimizations to improve the execution performance of the code.
PHP extends opcache and supports caching optimization for the generated opcode based on shared memory. On this basis, the static compilation optimization of opcode is added. The optimization described here is usually managed by an optimizer (Optimizer). In the compilation principle, each optimization is generally described by an optimization pass (Opt pass).
Generally speaking, there are two types of optimization passes:
One is the analysis pass, which provides data flow and control flow analysis information to provide auxiliary information for the conversion pass;
One is to convert the pass, which will change the generated code, including adding and deleting instructions, changing and replacing instructions, adjusting the order of instructions, etc. Usually, the changes in the generated code can be dumped before and after each pass.
This article is based on the compilation principle, combined with the optimizer provided by the opcache extension, and takes the basic unit of PHP compilation op_array and the smallest unit of PHP execution opcode as the starting point. This article introduces the application of compilation optimization technology in the Zend virtual machine, and sorts out how each optimization pass optimizes opcode step by step to improve code execution performance. Finally, some prospects are given based on the execution of the PHP language virtual machine.
Static compilation (static compilation), also known as ahead-of- time compilation), referred to as AOT. That is, the source code is compiled into target code, and when executed, it runs on a platform that supports the target code.
Dynamic compilation (dynamic compilation), relative to static compilation, refers to "compiling at runtime". Usually, an interpreter is used for compilation and execution, which refers to interpreting and executing the source language one by one.
JIT compilation (just-in-time compilation), that is, just-in-time compilation, in a narrow sense refers to compiling a certain piece of code when it is about to be executed for the first time, and then directly executing it without compilation. It is a type of dynamic compilation. special case.
The above three types of different compilation execution processes can be roughly described as follows:
Compilation optimization needs to be obtained from the program Enough information, this is the foundation of all compilation optimization.
The result generated by the compiler front-end can be a syntax tree or some kind of low-level intermediate code. But whatever form the result takes, it still doesn't tell you much about what the program does or how it does it. The compiler leaves to control flow analysis the task of discovering the control flow hierarchy within each procedure, and to data flow analysis the task of determining global information relevant to data processing.
Control flow is a formal analysis method for obtaining program control structure information. It is the basis for data flow analysis and dependency analysis. A basic model of control is the Control Flow Graph (CFG). There are two ways to analyze the control flow of a single process: using necessary nodes to find loops and interval analysis.
Data flow collects the semantic information of the program from the program code, and determines the definition and use of variables at compile time through algebraic methods. A basic model of data is Data Flow Graph (DFG). Common data flow analysis is control-tree-based data-flow analysis, and the algorithms are divided into interval analysis and structural analysis.
is similar to the concept of stack frame in C language, which is the basic unit (one frame) of a running program, usually a one-time function The base unit to call. Here, a function or method, the entire PHP script file, and the string passed to eval to represent the PHP code will be compiled into an op_array.
In implementation, op_array is a structure that contains all information about the basic unit of program running. Of course, the opcode array is the most important field of the structure, but in addition it also contains variable types, annotation information, and exception capture information. , jump information, etc.
The interpreter execution (ZendVM) process is to execute the minimum optimized opcode in a basic unit op_array, traverse the execution in order, execute the current opcode, and pre-fetch the next opcode , until the last RETRUN this special opcode returns to exit.
The opcode here is also similar to the intermediate representation in the static compiler (similar to LLVM IR). It usually also takes the form of a three-address code, which includes an operator, two operands and an Operation result. Both operands contain type information. There are five types of type information here, which are:
Compiled Variable (CV for short). The compile-time variables are the variables defined in the php script.
Internal reusable variables (VAR), temporary variables used by ZendVM, can be shared with other opcodes.
Internal non-reusable variables (TMP_VAR), temporary variables used by ZendVM, cannot be shared with other opcodes.
Constant (CONST), read-only constant, the value cannot be changed.
Useless variable (UNUSED). Since opcode uses three address codes, not every opcode has an operand field. By default, this variable is used to complete the field.
Type information together with the operator allows the executor to match and select a specific compiled C function library template, and simulate and generate machine instructions for execution.
opcode is represented by the zend_op structure in ZendVM. Its main structure is as follows:
PHP script goes through lexical After analysis and syntax analysis to generate the abstract syntax tree structure, opcode is generated through static compilation. As a common platform for executing instructions to different virtual machines, it relies on the specific implementation of different virtual machines (but for PHP, most of them refer to ZendVM).
Before the virtual machine executes opcode, if the opcode is optimized, code with higher execution efficiency can be obtained. The function of pass is to optimize opcode. It acts on opcde, processes opcode, analyzes opcode, looks for optimization opportunities, and Modify opcode to produce code with higher execution efficiency.
In the Zend virtual machine (ZendVM), the static code optimizer of opcache is zend opcode optimization.
In order to observe the optimization effect and facilitate debugging, it also provides optimization and debugging options:
optimizationlevel (opcache.optimizationlevel=0xFFFFFFFF ) optimization level, most optimization passes are turned on by default, and users can also control the turning off by passing in command line parameters
optdebuglevel (opcache.opt debuglevel=-1) Debug level, not turned on by default, but provides the transformation process of opcode before and after each optimization
The script context information required to perform static optimization is encapsulated In the structure zend_script, it is as follows:
typedef struct _zend_script { zend_string *filename; //文件名 zend_op_array main_op_array; //栈帧 HashTable function_table; //函数单位符号表信息 HashTable class_table; //类单位符号表信息 } zend_script;
The above three content information are passed as input parameters to the optimizer for analysis and optimization. Of course, similar to the usual PHP extension, it together with the opcode cache module (zend_accel) constitutes the opcache extension. It embeds three internal APIs within the cache accelerator:
zendoptimizerstartup Start the optimizer
zendoptimizescript The main logic of optimization implemented by the optimizer
zendoptimizershutdown Resource cleanup generated by the optimizer
About opcode caching, it is also a very important optimization of opcode. Its basic application principle is roughly as follows:
Although PHP is a dynamic scripting language, it does not directly call the entire compiler tool chain such as GCC/LLVM, nor does it call a pure front-end compiler such as Javac. But every time a PHP script is requested to be executed, it goes through the complete life cycle of lexicon, syntax, compilation to opcode, and VM execution.
The first three steps except execution are basically the complete process of a front-end compiler. However, this compilation process is not fast. If the same script is executed repeatedly, the compilation time of the first three steps will seriously restrict the operating efficiency, but the opcode generated by each compilation will not change. Therefore, the opcode can be cached to a certain place when compiling for the first time. The opcache extension caches it to the shared memory (Java saves it to a file). The opcode is obtained directly from the shared memory the next time the same script is executed. This saves compilation time.
The opcode caching process of opcache extension is roughly as follows:
Since this article mainly focuses on static optimization passes, the specific implementation of cache optimization will not be discussed here.
According to the "Whale Book" ("Advanced Compiler Design and Implementation"), a more reasonable optimization pass sequence for an optimizing compiler is as follows:
The optimization involved in the above figure ranges from simple constants and dead code to loops and branch jumps, from function calls to inter-process optimization, from prefetching and caching to soft pipelining and register allocation, and of course also includes data flow, Control flow analysis.
Of course, the current opcode optimizer does not implement all the above optimization passes, and there is no need to implement machine-related low-level intermediate representation optimizations such as register allocation.
After receiving the above script parameter information, the opcache optimizer finds the minimum compilation unit. Based on this, according to the optimization pass macro and its corresponding optimization level macro, the registration control of a certain pass can be realized.
Among the registered optimizations, each optimization is organized in series in a certain order, including constant optimization, redundant nop deletion, function call optimization conversion pass, and analysis passes such as data flow analysis, control flow analysis, and call relationship analysis. .
zendoptimizescript and the actual optimization registration zend_optimize process are as follows:
zend_optimize_script(zend_script *script, zend_long optimization_level, zend_long debug_level) |zend_optimize_op_array(&script->main_op_array, &ctx); 遍历二元操作符的常量操作数,由运行时转化为编译时(反向pass2) 实际优化pass,zend_optimize 遍历二元操作符的常量操作数,由编译时转化为运行时(pass2) |遍历op_array内函数zend_optimize_op_array(op_array, &ctx); |遍历类内非用户函数zend_optimize_op_array(op_array, &ctx); (用户函数设static_variables) |若使用DFA pass & 调用图pass & 构建调用图成功 遍历二元操作符的常量操作数,由运行时转化为编译时(反向pass2) 设置函数返回值信息,供SSA数据流分析使用 遍历调用图的op_array,做DFA分析zend_dfa_analyze_op_array 遍历调用图的op_array,做DFA优化zend_dfa_optimize_op_array 若开调试,遍历dump调用图的每一个op_array(优化变换后) 若开栈矫正优化,矫正栈大小adjust_fcall_stack_size_graph 再次遍历调用图内的的所有op_array, 针对DFA pass变换后新产生的常量场景,常量优化pass2再跑一遍 调用图op_array资源清理 |若开栈矫正优化 矫正栈大小main_op_array 遍历矫正栈大小op_array |清理资源
该部分主要调用了SSA/DFA/CFG这几类用于opcode分析pass,涉及的pass有BB块、CFG、DFA(CFG、DOMINATORS、LIVENESS、PHI-NODE、SSA)。
用于opcode转换的pass则集中在函数zend_optimize内,如下:
zend_optimize |op_array类型为ZEND_EVAL_CODE,不做优化 |开debug, 可dump优化前内容 |优化pass1, 常量替换、编译时常量操作变换、简单操作转换 |优化pass2 常量操作转换、条件跳转指令优化 |优化pass3 跳转指令优化、自增转换 |优化pass4 函数调用优化(主要为函数调用优化) |优化pass5 控制流图(CFG)优化 |构建流图 |计算数据依赖 |划分BB块(basic block,简称BB,数据流分析基本单位) |BB块内基于数据流分析优化 |BB块间跳转优化 |不可到达BB块删除 |BB块合并 |BB块外变量检查 |重新构建优化后的op_array(基于CFG) |析构CFG |优化pass6/7 数据流分析优化 |数据流分析(基于静态单赋值SSA) |构建SSA |构建CFG 需要找到对应BB块序号、管理BB块数组、计算BB块后继BB、标记可到达BB块、计算BB块前驱BB |计算Dominator树 |标识循环是否可简化(主要依赖于循环回边) |基于phi节点构建完SSA def集、phi节点位置、SSA构造重命名 |计算use-def链 |寻找不当依赖、后继、类型及值范围值推断 |数据流优化 基于SSA信息,一系列BB块内opcode优化 |析构SSA |优化pass9 临时变量优化 |优化pass10 冗余nop指令删除 |优化pass11 压缩常量表优化
还有其他一些优化遍如下:
优化pass12 矫正栈大小 优化pass15 收集常量信息 优化pass16 函数调用优化,主要是函数内联优化
除此之外,pass 8/13/14可能为预留pass id。由此可看出当前提供给用户选项控制的opcode转换pass有13个。但是这并不计入其依赖的数据流/控制流的分析pass。
通常在函数调用过程中,由于需要进行不同栈帧间切换,因此会有开辟栈空间、保存返回地址、跳转、返回到调用函数、返回值、回收栈空间等一系列函数调用开销。因此对于函数体适当大小情况下,把整个函数体嵌入到调用者(Caller)内部,从而不实际调用被调用者(Callee)是一个提升性能的利器。
由于函数调用与目标机的应用二进制接口(ABI)强相关,静态编译器如GCC/LLVM的函数内联优化基本是在指令生成之前完成。
ZendVM的内联则发生在opcode生成后的FCALL指令的替换优化,pass id为16,其原理大致如下:
| 遍历op_array中的opcode,找到DO_XCALL四个opcode之一 | opcode ZEND_INIT_FCALL | opcode ZEND_INIT_FCALL_BY_NAMEZ | 新建opcode,操作码置为ZEND_INIT_FCALL,计算栈大小, 更新缓存槽位,析构常量池字面量,替换当前opline的opcode | opcode ZEND_INIT_NS_FCALL_BY_NAME | 新建opcode,操作码置为ZEND_INIT_FCALL,计算栈大小, 更新缓存槽位,析构常量池字面量,替换当前opline的opcode | 尝试函数内联 | 优化条件过滤 (每个优化pass通常有较多限制条件,某些场景下 由于缺乏足够信息不能优化或出于代价考虑而排除) | 方法调用ZEND_INIT_METHOD_CALL,直接返回不内联 | 引用传参,直接返回不内联 | 缺省参数为命名常量,直接返回不内联 | 被调用函数有返回值,添加一条ZEND_QM_ASSIGN赋值opcode | 被调用函数无返回值,插入一条ZEND_NOP空opcode | 删除调用被内联函数的call opcode(即当前online的前一条opcode)
如下示例代码,当调用fname()时,使用字符串变量名fname来动态调用函数foo,而没有使用直接调用的方式。此时可通过VLD扩展查看其生成的opcode,或打开opcache调试选项(opcache.optdebuglevel=0xFFFFFFFF)亦可查看。
function foo() { } $fname = 'foo';
开启debug后dump可看出,发生函数调用优化前opcode序列(仅截取片段)为:
ASSIGN CV0($fname) string("foo") INIT_FCALL_BY_NAME 0 CV0($fname) DO_FCALL_BY_NAME
INIT_FCALL_BY_NAME这条opcode执行逻辑较为复杂,当开启激进内联优化后,可将上述指令序列直接合并成一条DO_FCALL string("foo")指令,省去间接调用的开销。这样也恰好与直接调用生成的opcode一致。
根据以上描述,可见向当前优化器加入一个pass并不会太难,大体步骤如下:
先向zend_optimize优化器注册一个pass宏(例如添加pass17),并决定其优化级别。
在优化管理器某个优化pass前后调用加入的pass(例如添加一个尾递归优化pass),建议在DFA/SSA分析pass之后添加,因为此时获得的优化信息更多。
实现新加入的pass,进行定制代码转换(例如zendoptimizefunc_calls实现一个尾递归优化)。针对当前已有pass,主要添加转换pass,这里一般也可利用SSA/DFA的信息。不同于静态编译优化一般是在贴近于机器相关的低层中间表示优化,这里主要是在opcode层的opcode/operand相应的一些转换。
实现pass前,与函数内联类似,通常首先收集优化所需信息,然后排除掉不适用该优化的一些场景(如非真正的尾不递归调用、参数问题无法做优化等)。实现优化后,可dump优化前后生成opcode结构的变化是否优化正确、是否符合预期(如尾递归优化最终的效果是变换函数调用为forloop的形式)。
以下是对基于动态的PHP脚本程序执行的一些看法,仅供参考。
由于LLVM从前端到后端,从静态编译到jit整个工具链框架的支持,使得许多语言虚拟机都尝试整合。当前PHP7时代的ZendVM官方还没采用,原因之一虚拟机opcode承载着相当复杂的分析工作。相比于静态编译器的机器码每一条指令通常只干一件事情(通常是CPU指令时钟周期),opcode的操作数(operand)由于类型不固定,需要在运行期间做大量的类型检查、转换才能进行运算,这极度影响了执行效率。即使运行时采用jit,以byte code为单位编译,编译出的字节码也会与现有解释器一条一条opcode处理类似,类型需要处理、也不能把zval值直接存在寄存器。
以函数调用为例,比较现有的opcode执行与静态编译成机器码执行的区别,如下图:
在不改变现有opcode设计的前提下,加强类型推断能力,进而为opcode的执行提供更多的类型信息,是提高执行性能的可选方法之一。
既然opcode承担如此复杂的分析工作,能否将其分解成多层的opcode归一化中间表示( intermediate representation, IR)。各优化可选择应用哪一层中间表示,传统编译器的中间表示依据所携带信息量、从抽象的高级语言到贴近机器码,分成高级中间表示(HIR) 、中级中间表示(MIR)、低级中间表示(LIR)。
Regarding the optimized pass management of opcode, as mentioned in the previous article, there should be room for improvement. Although current analysis relies on data flow/control flow analysis, there is still a lack of analysis and optimization such as between processes. Pass management such as running order, number of runs, registration management, information dump of complex pass analysis, etc. are still lacking compared to mature frameworks such as llvm. Large gap.
ZendVM implements a large number of zval values, type conversion and other operations, which can be compiled into machine code for runtime with the help of LLVM, but at the cost of extremely rapid expansion of compilation time. Of course, libjit can also be used.
The above is the detailed content of In-depth understanding of opcode optimization in php (picture and text). For more information, please follow other related articles on the PHP Chinese website!