search
HomeBackend DevelopmentPHP TutorialPHP kernel analysis-Zend virtual machine detailed explanation

PHP is an interpreted language. For interpreted languages ​​such as Java, Python, Ruby, Javascript, the code we write will not be compiled into machine code for running, but will be compiled into intermediate code and run on the virtual machine ( VM) on. The virtual machine that runs PHP is called the Zend virtual machine. Today we will go deep into the kernel and explore the operating principles of the Zend virtual machine.

OPCODE

What is OPCODE? It is an instruction that the virtual machine can recognize and process. The Zend virtual machine contains a series of OPCODEs. The OPCODE virtual machine can do many things. Here are a few examples of OPCODEs:

  • ##ZEND_ADD Combine two operands Add up.

  • ZEND_NEW Create a PHP object.

  • ZEND_ECHO Output the content to standard output.

  • ZEND_EXIT Exit PHP.

PHP defines 186 such operations (with the update of PHP, more types of OPCODE will definitely be supported). The definition and implementation of all OPCODE can be found in the source code.

zend/zend_vm_def.h file (the content of this file is not native C code, but a template, the reason will be explained later).

Let’s take a look at how PHP designs the OPCODE data structure:

struct _zend_op {
	const void *handler;
	znode_op op1;
	znode_op op2;
	znode_op result;
	uint32_t extended_value;
	uint32_t lineno;
	zend_uchar opcode;
	zend_uchar op1_type;
	zend_uchar op2_type;
	zend_uchar result_type;
};

Look closely at the data structure of OPCODE and see if you can find the feeling of assembly language. Each OPCODE contains two operands,

op1 and op2, and the handler pointer points to the function that performs the OPCODE operation. The result processed by the function will be saved in result.

Let’s take a simple example:

<?php
$b = 1;
$a = $b + 2;

We see through the vld extension that after compilation, the above code generates the OPCODE of the ZEND_ADD instruction.

compiled vars:  !0 = $b, !1 = $a
line     #* E I O op                           
fetch          
ext  
return  
operands
-------------------------------------------------------------------------------------
   2     0  E >   ASSIGN                                                   !0, 1
   3     1        ADD                                              ~3      !0, 2
         2        ASSIGN                                                   !1, ~3
   8     3      > RETURN                                                   1

Among them, the second line is the OPCODE of the

ZEND_ADD instruction. We see that it receives 2 operands, op1 is the variable $b, op2 is the numeric constant 1, and the returned result is stored in a temporary variable. In the zend/zend_vm_def.h file, we can find the function implementation corresponding to the ZEND_ADD instruction:

ZEND_VM_HANDLER(1, ZEND_ADD, CONST|TMPVAR|CV, CONST|TMPVAR|CV)
{
	USE_OPLINE
	zend_free_op free_op1, free_op2;
	zval *op1, *op2, *result;

	op1 = GET_OP1_ZVAL_PTR_UNDEF(BP_VAR_R);
	op2 = GET_OP2_ZVAL_PTR_UNDEF(BP_VAR_R);
	if (EXPECTED(Z_TYPE_INFO_P(op1) == IS_LONG)) {
		if (EXPECTED(Z_TYPE_INFO_P(op2) == IS_LONG)) {
			result = EX_VAR(opline->result.var);
			fast_long_add_function(result, op1, op2);
			ZEND_VM_NEXT_OPCODE();
		} else if (EXPECTED(Z_TYPE_INFO_P(op2) == IS_DOUBLE)) {
			result = EX_VAR(opline->result.var);
			ZVAL_DOUBLE(result, ((double)Z_LVAL_P(op1)) + Z_DVAL_P(op2));
			ZEND_VM_NEXT_OPCODE();
		}
	} else if (EXPECTED(Z_TYPE_INFO_P(op1) == IS_DOUBLE)) {

	...
}

The above code is not native C code, but a template.

Why did you do this? Because PHP is a weakly typed language, and the C language it implements is a strongly typed language. Weakly typed languages ​​support automatic type matching, and the implementation of automatic type matching, just like the above code, handles parameters of different types through judgment. Just imagine, if each OPCODE needs to determine the type of incoming parameters when processing, then performance will inevitably become a huge problem (one request may need to process tens of thousands of OPCODEs).

Is there any way? We found that at compile time, we can already determine the type of each operand (may be a constant or a variable). Therefore, when PHP actually executes the C code, different types of operands will be divided into different functions for direct call by the virtual machine. This part of the code is placed in

zend/zend_vm_execute.h. The expanded file is quite large, and we noticed that there is also such code:

if (IS_CONST == IS_CV) {

It makes no sense at all, right? But it doesn't matter, the C compiler will automatically optimize and judge this way. In most cases, if we want to understand the logic of a certain OPCODE processing, it is easier to read the template file

zend/zend_vm_def.h. By the way, the program that generates C code based on templates is implemented in PHP.

Execution process

To be precise, the execution of PHP is divided into two parts: compilation and execution. I will not expand on the compilation part in detail here, but focus on the execution process.

After a series of compilation processes such as grammar and lexical analysis, we got a data called OPArray, whose structure is as follows:

struct _zend_op_array {
	/* Common elements */
	zend_uchar type;
	zend_uchar arg_flags[3]; /* bitset of arg_info.pass_by_reference */
	uint32_t fn_flags;
	zend_string *function_name;
	zend_class_entry *scope;
	zend_function *prototype;
	uint32_t num_args;
	uint32_t required_num_args;
	zend_arg_info *arg_info;
	/* END of common elements */

	uint32_t *refcount;

	uint32_t last;
	zend_op *opcodes;

	int last_var;
	uint32_t T;
	zend_string **vars;

	int last_live_range;
	int last_try_catch;
	zend_live_range *live_range;
	zend_try_catch_element *try_catch_array;

	/* static variables support */
	HashTable *static_variables;

	zend_string *filename;
	uint32_t line_start;
	uint32_t line_end;
	zend_string *doc_comment;
	uint32_t early_binding; /* the linked list of delayed declarations */

	int last_literal;
	zval *literals;

	int  cache_size;
	void **run_time_cache;

	void *reserved[ZEND_MAX_RESERVED_RESOURCES];
};

There is a lot of content, right? Simple understanding, its essence is an OPCODE array plus a collection of environmental data required during execution. Introducing several relatively important fields:

  • opcodes Array to store OPCODE.

  • #filename The file name of the currently executed script.

  • function_name The name of the currently executed method.

  • static_variables Static variable list.

  • last_try_catch try_catch_array In the current context, if an exception occurs try-catch-finally jumps to the required information.

  • literals A collection of all constant literals such as the string foo or the number 23.

Why do we need to generate such huge data? Because the more information generated during compilation, the less time is required during execution.

接下来,我们看下 PHP 是如何执行 OPCODE。OPCODE 的执行被放在一个大循环中,这个循环位于 zend/zend_vm_execute.h 中的 execute_ex 函数:

ZEND_API void execute_ex(zend_execute_data *ex)
{
	DCL_OPLINE

	zend_execute_data *execute_data = ex;

	LOAD_OPLINE();
	ZEND_VM_LOOP_INTERRUPT_CHECK();

	while (1) {
		if (UNEXPECTED((ret = ((opcode_handler_t)OPLINE->handler)(ZEND_OPCODE_HANDLER_ARGS_PASSTHRU)) != 0)) {
			if (EXPECTED(ret > 0)) {
				execute_data = EG(current_execute_data);
				ZEND_VM_LOOP_INTERRUPT_CHECK();
			} else {
				return;
			}
		}
	}

	zend_error_noreturn(E_CORE_ERROR, "Arrived at end of main loop which shouldn&#39;t happen");
}

这里,我去掉了一些环境变量判断分支,保留了运行的主流程。可以看到,在一个无限循环中,虚拟机会不断调用 OPCODE 指定的 handler 函数处理指令集,直到某次指令处理的结果 ret 小于0。注意到,在主流程中并没有移动 OPCODE 数组的当前指针,而是把这个过程放到指令执行的具体函数的结尾。所以,我们在大多数 OPCODE 的实现函数的末尾,都能看到调用这个宏:

ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION();

在之前那个简单例子中,我们看到 vld 打印出的执行 OPCODE 数组中,最后有一项指令为 ZEND_RETURN 的 OPCODE。但我们编写的 PHP 代码中并没有这样的语句。在编译时期,虚拟机会自动将这个指令加到 OPCODE 数组的结尾。ZEND_RETURN 指令对应的函数会返回 -1,判断执行的结果小于0时,就会退出循环,从而结束程序的运行。

方法调用

如果我们调用一个自定义的函数,虚拟机会如何处理呢?

<?php
function foo() {
    echo &#39;test&#39;;
}

foo();

我们通过 vld 查看生成的 OPCODE。出现了两个 OPCODE 指令执行栈,是因为我们自定义了一个 PHP 函数。在第一个执行栈上,调用自定义函数会执行两个 OPCODE 指令:INIT_FC<a href="http://www.php.cn/wiki/1483.html" target="_blank">ALL</a> 和 DO_FCALL

compiled vars:  none
line     
#* E I O op                           
fetch          
ext  return  operands
-------------------------------------------------------------------------------------
   2     0  E >   NOP
   6     1        INIT_FCALL                                               &#39;foo&#39;
         2        DO_FCALL                                      
         0
         3      > RETURN                                                   1

compiled vars:  none
line     #* E I O op                           
fetch          
ext  
return  
operands
-------------------------------------------------------------------------------------
   3     0  E >   ECHO                                                     &#39;test&#39;
   4     1      > RETURN                                                   null

其中,INIT_FCALL 准备了执行函数时所需要的上下文数据。DO_FCALL 负责执行函数。DO_FCALL 的处理函数根据不同的调用情况处理了大量逻辑,我摘取了其中执行用户定义的函数的逻辑部分:

ZEND_VM_HANDLER(60, ZEND_DO_FCALL, ANY, ANY, SPEC(RETVAL))
{
    USE_OPLINE
    zend_execute_data *call = EX(call);
    zend_function *fbc = call->func;
    zend_object *object;
    zval *ret;

    ...

    if (EXPECTED(fbc->type == ZEND_USER_FUNCTION)) {
        ret = NULL;
        if (RETURN_VALUE_USED(opline)) {
            ret = EX_VAR(opline->result.var);
            ZVAL_NULL(ret);
        }

        call->prev_execute_data = execute_data;
        i_init_func_execute_data(call, &fbc->op_array, ret);

        if (EXPECTED(zend_execute_ex == execute_ex)) {
            ZEND_VM_ENTER();
        } else {
            ZEND_ADD_CALL_FLAG(call, ZEND_CALL_TOP);
            zend_execute_ex(call);
        }
    }

    ...

    ZEND_VM_SET_OPCODE(opline + 1);
    ZEND_VM_CONTINUE();
}

可以看到,DO_FCALL 首先将调用函数前的上下文数据保存到 call->prev_execute_data,然后调用 i_init_func_execute_data 函数,将自定义函数对象中的 op_array(每个自定义函数会在编译的时候生成对应的数据,其数据结构中包含了函数的 OPCODE 数组) 赋值给新的执行上下文对象。

然后,调用 zend_execute_ex 函数,开始执行自定义的函数。zend_execute_ex 实际上就是前面提到的 execute_ex 函数(默认是这样,但扩展可能重写 zend_execute_ex 指针,这个 API 让 PHP 扩展开发者可以通过覆写函数达到扩展功能的目的,不是本篇的主题,不准备深入探讨),只是上下文数据被替换成当前函数所在的上下文数据。

我们可以这样理解,最外层的代码就是一个默认存在的函数(类似 C 语言中的 main()函数),和用户自定义的函数本质上是没有区别的。

逻辑跳转

我们知道指令都是顺序执行的,而我们的程序,一般都包含不少的逻辑判断和循环,这部分又是如何通过 OPCODE 实现的呢?

<?php
$a = 10;
if ($a == 10) {
    echo &#39;success&#39;;
} else {
    echo &#39;failure&#39;;
}

我们还是通过 vld 查看 OPCODE(不得不说 vld 扩展是分析 PHP 的神器)。

compiled vars:  !0 = $a
line     #* E I O op                           
fetch          ext  return  operands
-------------------------------------------------------------------------------------
   2     0  E >   ASSIGN                                                   !0, 10
   3     1        IS_EQUAL                                         
   ~2      !0, 10
         2      > JMPZ                                                     ~2, ->5
   4     3    >   ECHO                                                     &#39;success&#39;
         4      > JMP                                                      ->6
   6     5    >   ECHO                                                     &#39;failure&#39;
   7     6    > > RETURN                                                   1

我们看到,JMPZ 和 JMP 控制了执行流程。JMP 的逻辑非常简单,将当前的 OPCODE 指针指向需要跳转的 OPCODE。

ZEND_VM_HANDLER(42, ZEND_JMP, JMP_ADDR, ANY)
{
	USE_OPLINE

	ZEND_VM_SET_OPCODE(OP_JMP_ADDR(opline, opline->op1));
	ZEND_VM_CONTINUE();
}

JMPZ 仅仅是多了一次判断,根据结果选择是否跳转,这里就不再重复列举了。而处理循环的方式与判断基本上是类似的。

<?php
$a = [1, 2, 3];
foreach ($a as $n) {
    echo $n;
}
compiled vars:  !0 = $a, !1 = $n
line     #* E I O op                           
fetch          
ext  return  
operands
-------------------------------------------------------------------------------------
   2     0  E >   ASSIGN                                                   !0, <array>
   3     1      > FE_RESET_R                                       
   $3      !0, ->5
         2    > > FE_FETCH_R                                               $3, !1, ->5
   4     3    >   ECHO                                                     !1
         4      > JMP                                                      ->2
         5    >   FE_FREE                                                  $3
   5     6      > RETURN                                                   1

循环只需要 JMP 指令即可完成,通过 FE_FETCH_R 指令判断是否已经到达数组的结尾,如果到达则退出循环。

结语

通过了解 Zend 虚拟机,相信你对 PHP 是如何运行的,会有更深刻的理解。想到我们写的一行行代码,最后机器执行的时候会变成数不胜数的指令,每个指令又建立在复杂的处理逻辑之上。那些从前随意写下的代码,现在会不会在脑海里不自觉的转换成 OPCODE 再品味一番呢?

The above is the detailed content of PHP kernel analysis-Zend virtual machine detailed explanation. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
The Continued Use of PHP: Reasons for Its EnduranceThe Continued Use of PHP: Reasons for Its EnduranceApr 19, 2025 am 12:23 AM

What’s still popular is the ease of use, flexibility and a strong ecosystem. 1) Ease of use and simple syntax make it the first choice for beginners. 2) Closely integrated with web development, excellent interaction with HTTP requests and database. 3) The huge ecosystem provides a wealth of tools and libraries. 4) Active community and open source nature adapts them to new needs and technology trends.

PHP and Python: Exploring Their Similarities and DifferencesPHP and Python: Exploring Their Similarities and DifferencesApr 19, 2025 am 12:21 AM

PHP and Python are both high-level programming languages ​​that are widely used in web development, data processing and automation tasks. 1.PHP is often used to build dynamic websites and content management systems, while Python is often used to build web frameworks and data science. 2.PHP uses echo to output content, Python uses print. 3. Both support object-oriented programming, but the syntax and keywords are different. 4. PHP supports weak type conversion, while Python is more stringent. 5. PHP performance optimization includes using OPcache and asynchronous programming, while Python uses cProfile and asynchronous programming.

PHP and Python: Different Paradigms ExplainedPHP and Python: Different Paradigms ExplainedApr 18, 2025 am 12:26 AM

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

PHP and Python: A Deep Dive into Their HistoryPHP and Python: A Deep Dive into Their HistoryApr 18, 2025 am 12:25 AM

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

Choosing Between PHP and Python: A GuideChoosing Between PHP and Python: A GuideApr 18, 2025 am 12:24 AM

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

PHP and Frameworks: Modernizing the LanguagePHP and Frameworks: Modernizing the LanguageApr 18, 2025 am 12:14 AM

PHP remains important in the modernization process because it supports a large number of websites and applications and adapts to development needs through frameworks. 1.PHP7 improves performance and introduces new features. 2. Modern frameworks such as Laravel, Symfony and CodeIgniter simplify development and improve code quality. 3. Performance optimization and best practices further improve application efficiency.

PHP's Impact: Web Development and BeyondPHP's Impact: Web Development and BeyondApr 18, 2025 am 12:10 AM

PHPhassignificantlyimpactedwebdevelopmentandextendsbeyondit.1)ItpowersmajorplatformslikeWordPressandexcelsindatabaseinteractions.2)PHP'sadaptabilityallowsittoscaleforlargeapplicationsusingframeworkslikeLaravel.3)Beyondweb,PHPisusedincommand-linescrip

How does PHP type hinting work, including scalar types, return types, union types, and nullable types?How does PHP type hinting work, including scalar types, return types, union types, and nullable types?Apr 17, 2025 am 12:25 AM

PHP type prompts to improve code quality and readability. 1) Scalar type tips: Since PHP7.0, basic data types are allowed to be specified in function parameters, such as int, float, etc. 2) Return type prompt: Ensure the consistency of the function return value type. 3) Union type prompt: Since PHP8.0, multiple types are allowed to be specified in function parameters or return values. 4) Nullable type prompt: Allows to include null values ​​and handle functions that may return null values.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft