Home >Backend Development >PHP Tutorial >Re-Implementing the Range Operator in PHP

Re-Implementing the Range Operator in PHP

Christopher Nolan
Christopher NolanOriginal
2025-02-15 09:36:12211browse

SitePoint wonderful article recommendation: Improved PHP range operator implementation

This article is reproduced on SitePoint with the author's authorization. The following content is written by Thomas Punt and introduces the improved implementation method of PHP range operator. If you are interested in PHP internals and adding features to your favorite programming languages, now is a good time to learn!

This article assumes that readers can build PHP from source code. If this is not the case, please first read the "Building PHP" chapter of the PHP internal mechanism book.

Re-Implementing the Range Operator in PHP


In the previous article (tip: make sure you have read it), I showed a way to implement range operators in PHP. However, initial implementations are rarely the best, so this article aims to explore how to improve previous implementations.

Thanks again Nikita Popov for proofreading this article!

Key Points

  • Thomas Punt reimplements the range operator in PHP, moving the computational logic out of the Zend virtual machine, allowing the use of range operators in the context of constant expressions.
  • This reimplementation can be calculated at compile time (for literal operands) or at runtime (for dynamic operands). This not only brings a little benefit to Opcache users, but also allows constant expression functionality to be used with range operators.
  • The reimplementation process involves updating the lexer, parser, compilation stage, and Zend virtual machine. The lexical analyzer implementation remains the same, while the parser implementation is the same as the previous part. The compilation phase does not require updating the Zend/zend_compile.c file, as it already contains the necessary logic to handle binary operations. The Zend virtual machine has been updated to handle execution of the ZEND_RANGE opcode at runtime.
  • In the third part of this series, Punt plans to build this implementation by explaining how to overload this operator. This will enable the object to be used as operands and add appropriate support to the string.

Disadvantages of previous implementations

The initial implementation puts all the logic of the range operator in the Zend virtual machine, which forces the calculation to be performed purely at runtime when executing the ZEND_RANGE opcode. This not only means that for literal operands, the calculations cannot be transferred to compile time, but also means that some functions simply don't work.

In this implementation, we move the range operator logic out of the Zend virtual machine to be able to perform calculations at compile time (for literal operands) or runtime (for dynamic operands). This not only brings a little benefit to Opcache users, but more importantly, allows constant expression functionality to be used with range operators.

Example:

<code class="language-php">// 作为常量定义
const AN_ARRAY = 1 |> 100;

// 作为初始属性定义
class A
{
    private $a = 1 |> 2;
}

// 作为可选参数的默认值:
function a($a = 1 |> 2)
{
    //
}</code>

So, without further ado, let's reimplement the range operator.

Update Lexical Analyzer

The lexical analyzer implementation remains completely unchanged. The token is first registered in Zend/zend_language_scanner.l (about 1200 lines):

<code class="language-c"><st_in_scripting>"|>" {
</st_in_scripting>    RETURN_TOKEN(T_RANGE);
}</code>

Then declare in Zend/zend_language_parser.y (about 220 lines):

<code class="language-php">// 作为常量定义
const AN_ARRAY = 1 |> 100;

// 作为初始属性定义
class A
{
    private $a = 1 |> 2;
}

// 作为可选参数的默认值:
function a($a = 1 |> 2)
{
    //
}</code>

The tokenizer extension must be regenerated again by entering the ext/tokenizer directory and executing the tokenizer_data_gen.sh file.

Update parser

The parser implementation is the same as before. Again we declare the priority and binding of the operator by adding the T_RANGE token to the end of the following line:

<code class="language-c"><st_in_scripting>"|>" {
</st_in_scripting>    RETURN_TOKEN(T_RANGE);
}</code>

Then we update the expr_without_variable production rules again, but this time the semantic action (code inside the braces) will be slightly different. Update it with the following code (I put it under the T_SPACESHIP rule, about 930 lines):

<code class="language-c">%token T_RANGE           "|> (T_RANGE)"</code>

This time, we used the zend_ast_create_binary_op function (rather than the zend_ast_create function), which created a ZEND_AST_BINARY_OP node for us. zend_ast_create_binary_op takes an opcode name that will be used to distinguish binary operations during the compilation phase.

Since we are now reusing the ZEND_AST_BINARY_OP node type, there is no need to define a new ZEND_AST_RANGE node type as before in the Zend/zend_ast.h file.

Update compilation phase

This time, there is no need to update the Zend/zend_compile.c file, as it already contains the necessary logic to handle binary operations. So we just need to reuse this logic by setting our operator to the ZEND_AST_BINARY_OP node.

The following is a simplified version of the zend_compile_binary_op function:

<code class="language-c">%nonassoc T_IS_EQUAL T_IS_NOT_EQUAL T_IS_IDENTICAL T_IS_NOT_IDENTICAL T_SPACESHIP T_RANGE</code>

As we can see, it's very similar to the zend_compile_range function we created last time. The two important differences are how to get the opcode type and what happens when both operands are literals.

Opcode type is taken this time from the AST node (rather than hardcoded as last time), because the ZEND_AST_BINARY_OP node stores this value (as shown in the semantic action of the new production rule) to distinguish binary operations. When both operands are literals, the zend_try_ct_eval_binary_op function is called. This function looks like this:

<code class="language-c">    |   expr T_RANGE expr
            { $$ = zend_ast_create_binary_op(ZEND_RANGE, , ); }</code>

This function obtains a callback from the get_binary_op function (source code) in Zend/zend_opcode.c according to the opcode type. This means we need to update this function next to fit the ZEND_RANGE opcode. Add the following case statement to the get_binary_op function (about 750 lines):

<code class="language-c">void zend_compile_binary_op(znode *result, zend_ast *ast) /* {{{ */
{
    zend_ast *left_ast = ast->child[0];
    zend_ast *right_ast = ast->child[1];
    uint32_t opcode = ast->attr;

    znode left_node, right_node;
    zend_compile_expr(&left_node, left_ast);
    zend_compile_expr(&right_node, right_ast);

    if (left_node.op_type == IS_CONST && right_node.op_type == IS_CONST) {
        if (zend_try_ct_eval_binary_op(&result->u.constant, opcode,
                &left_node.u.constant, &right_node.u.constant)
        ) {
            result->op_type = IS_CONST;
            zval_ptr_dtor(&left_node.u.constant);
            zval_ptr_dtor(&right_node.u.constant);
            return;
        }
    }

    do {
        // redacted code
        zend_emit_op_tmp(result, opcode, &left_node, &right_node);
    } while (0);
}
/* }}} */</code>

Now we have to define the range_function function. This will be done in the Zend/zend_operators.c file with all other operators:

<code class="language-c">static inline zend_bool zend_try_ct_eval_binary_op(zval *result, uint32_t opcode, zval *op1, zval *op2) /* {{{ */
{
    binary_op_type fn = get_binary_op(opcode);

    /* don't evaluate division by zero at compile-time */
    if ((opcode == ZEND_DIV || opcode == ZEND_MOD) &&
        zval_get_long(op2) == 0) {
        return 0;
    } else if ((opcode == ZEND_SL || opcode == ZEND_SR) &&
        zval_get_long(op2)      return 0;
    }

    fn(result, op1, op2);
    return 1;
}
/* }}} */</code>

The function prototype contains two new macros: ZEND_API and ZEND_FASTCALL. ZEND_API is used to control the visibility of a function by making it available to compile into an extension of a shared object. ZEND_FASTCALL is used to ensure that more efficient calling conventions are used, where the first two parameters will be passed in registers instead of stacks (more relevant for 64-bit builds on x86 than for 32-bit builds).

Function body is very similar to what we have in the Zend/zend_vm_def.h file in the previous article. VM-specific content no longer exists, including the HANDLE_EXCEPTION macro call (replaced with return FAILURE;), and the ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION macro call has been completely removed (this check and operation needs to be kept in the VM, so the macro will be called later from the VM code ). Furthermore, as mentioned earlier, we avoid using the GET_OPn_ZVAL_PTR pseudo-macro (rather than the GET_OPn_ZVAL_PTR_DEREF) to process references in the VM.

Another notable difference is that we are applying ZVAL_DEFEF to both operands to ensure that references are processed correctly. This was previously done using the pseudo-macro GET_OPn_ZVAL_PTR_DEREF inside the VM, but has now been transferred to this function. This is not done because it needs to be compiled at (because for compile-time processing both operands must be literals and they cannot be referenced), but because it enables range_function to be safely called elsewhere in the code base, Without worrying about reference processing. Therefore, most operator functions (except where performance is critical) perform reference processing, rather than in their VM opcode definitions.

Finally, we have to add the range_function prototype to the Zend/zend_operators.h file:

<code class="language-php">// 作为常量定义
const AN_ARRAY = 1 |> 100;

// 作为初始属性定义
class A
{
    private $a = 1 |> 2;
}

// 作为可选参数的默认值:
function a($a = 1 |> 2)
{
    //
}</code>

Update Zend virtual machine

Now we have to update the Zend virtual machine again to handle the execution of the ZEND_RANGE opcode at runtime. Put the following code in Zend/zend_vm_def.h (bottom):

<code class="language-c"><st_in_scripting>"|>" {
</st_in_scripting>    RETURN_TOKEN(T_RANGE);
}</code>

(Again, the opcode number must be one larger than the current highest opcode number, which can be seen at the bottom of the Zend/zend_vm_opcodes.h file.)

The definition this time is much shorter, because all work is handled in range_function. We just need to call this function and pass in the result operand of the current opline to save the calculated value. Exception checks removed from range_function and skip to the next opcode are still processed in the VM by a call to ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION. Furthermore, as mentioned earlier, we avoid using the GET_OPn_ZVAL_PTR pseudo-macro (rather than the GET_OPn_ZVAL_PTR_DEREF) to process references in the VM.

Now regenerate the VM by executing the Zend/zend_vm_gen.php file.

Finally, the beautiful printer needs to update the Zend/zend_ast.c file again. Update the priority table comment (about 520 lines):

<code class="language-c">%token T_RANGE           "|> (T_RANGE)"</code>

Then, insert a case statement in the zend_ast_export_ex function to process the ZEND_RANGE opcode (about 1300 lines):

<code class="language-c">%nonassoc T_IS_EQUAL T_IS_NOT_EQUAL T_IS_IDENTICAL T_IS_NOT_IDENTICAL T_SPACESHIP T_RANGE</code>

Conclusion

This article shows an alternative to implementing range operators, where the computational logic has been moved from the VM. This has the advantage of being able to use range operators in the context of constant expressions.

The third part of this series of articles will be built on this implementation, explaining how to overload this operator. This will allow objects to be used as operands (such as objects from GMP libraries or objects that implement __toString methods). It will also show how to add appropriate support to strings (unlike the ones seen in PHP's current range functions). But for now, I hope this is a good demonstration of some deeper aspects of ZE when implementing operators into PHP.

The above is the detailed content of Re-Implementing the Range Operator in PHP. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn