首页  >  文章  >  后端开发  >  双引号是否过多,这就是问题所在!

双引号是否过多,这就是问题所在!

王林
王林原创
2024-08-16 16:34:49511浏览

最近我又听说 PHP 人们仍然在谈论单引号与双引号,并且使用单引号只是一种微观优化,但如果你习惯一直使用单引号,你会节省大量的 CPU循环!

“一切都已经说过了,但还没有被所有人说出” – Karl Valentin

正是本着这种精神,我正在写一篇关于 Nikita Popov 12 年前已经做过的同一主题的文章(如果您正在阅读他的文章,您可以在这里停止阅读)。

毛茸茸的到底是什么?

PHP 执行字符串插值,在字符串中搜索变量的使用情况,并将其替换为所使用变量的值:

$juice = "apple";
echo "They drank some $juice juice.";
// will output: They drank some apple juice.

此功能仅限于双引号和定界符中的字符串。使用单引号(或 nowdoc)将产生不同的结果:

$juice = "apple";
echo 'They drank some $juice juice.';
// will output: They drank some $juice juice.

请注意:PHP 不会搜索该单引号字符串中的变量。所以我们可以开始在任何地方使用单引号。所以人们开始建议这样的改变..

- $juice = "apple";
+ $juice = 'apple';

.. 因为它会更快,并且每次执行该代码都会节省大量 CPU 周期,因为 PHP 不会在单引号字符串中查找变量(无论如何,该示例中不存在这些变量)并且皆大欢喜,案件结案。

案件结案了吗?

显然,使用单引号和双引号是有区别的,但为了理解发生了什么,我们需要更深入地挖掘。

尽管 PHP 是一种解释性语言,但它使用编译步骤,其中某些部分一起运行以获得虚拟机实际可以执行的内容,即操作码。那么我们如何从 PHP 源代码获取操作码呢?

词法分析器

词法分析器扫描源代码文件并将其分解为标记。可以在 token_get_all() 函数文档中找到该含义的简单示例。一个 PHP 源代码只是

T_OPEN_TAG (<?php )
T_ECHO (echo)
T_WHITESPACE ( )
T_CONSTANT_ENCAPSED_STRING ("")

我们可以在这个 3v4l.org 代码片段中看到它的实际效果并使用它。

解析器

解析器获取这些标记并从中生成抽象语法树。当用 JSON 表示时,上述示例的 AST 表示如下所示:

{
  "data": [
    {
      "nodeType": "Stmt_Echo",
      "attributes": {
        "startLine": 1,
        "startTokenPos": 1,
        "startFilePos": 6,
        "endLine": 1,
        "endTokenPos": 4,
        "endFilePos": 13
      },
      "exprs": [
        {
          "nodeType": "Scalar_String",
          "attributes": {
            "startLine": 1,
            "startTokenPos": 3,
            "startFilePos": 11,
            "endLine": 1,
            "endTokenPos": 3,
            "endFilePos": 12,
            "kind": 2,
            "rawValue": "\"\""
          },
          "value": ""
        }
      ]
    }
  ]
}

如果你也想玩这个,看看其他代码的 AST 是什么样子,我找到了 Ryan Chandler 的 https://phpast.com/ 和 https://php-ast-viewer.com/ ,其中两者都显示给定 PHP 代码片段的 AST。

编译器

编译器采用 AST 并创建操作码。操作码是虚拟机执行的内容,如果您进行了设置并启用了它,它也会存储在 OPcache 中(我强烈推荐)。

要查看操作码,我们有多个选项(也许更多,但我确实知道这三个):

  1. 使用 vulcan 逻辑转储器扩展。它也被纳入 3v4l.org
  2. 使用 phpdbg -p script.php 转储操作码
  3. 或者使用 OPcache 的 opcache.opt_debug_level INI 设置使其打印出操作码
    • 优化前输出操作码为 0x10000
    • 0x20000 的值输出优化后的操作码
$ echo '<?php echo "";' > foo.php
$ php -dopcache.opt_debug_level=0x10000 foo.php
$_main:
...
0000 ECHO string("")
0001 RETURN int(1)

假设

回到使用单引号与双引号时节省 CPU 周期的最初想法,我想我们都同意,只有当 PHP 在运行时为每个请求评估这些字符串时,这才是正确的。

运行时会发生什么?

那么让我们看看 PHP 为两个不同版本创建了哪些操作码。

双引号:

<?php echo "apple";
0000 ECHO string("apple")
0001 RETURN int(1)

对比单引号:

<?php echo 'apple';
0000 ECHO string("apple")
0001 RETURN int(1)

嘿等等,发生了一些奇怪的事情。这看起来一模一样!我的微优化去哪儿了?

好吧,也许 ECHO 操作码处理程序的实现会解析给定的字符串,尽管没有标记或其他东西告诉它这样做......嗯?

让我们尝试不同的方法,看看词法分析器对这两种情况做了什么:

双引号:

T_OPEN_TAG (<?php )
T_ECHO (echo)
T_WHITESPACE ( )
T_CONSTANT_ENCAPSED_STRING ("")

对比单引号:

Line 1: T_OPEN_TAG (<?php )
Line 1: T_ECHO (echo)
Line 1: T_WHITESPACE ( )
Line 1: T_CONSTANT_ENCAPSED_STRING ('')

标记仍然区分双引号和单引号,但是检查 AST 将为我们提供两种情况相同的结果 - 唯一的区别是 Scalar_String 节点属性中的 rawValue,它仍然具有单/双引号,但是在这两种情况下,该值都使用双引号。

新假设

难道字符串插值实际上是在编译时完成的吗?

让我们看一个稍微“复杂”的例子:

<?php
$juice="apple";
echo "juice: $juice";

此文件的令牌是:

T_OPEN_TAG (<?php)
T_VARIABLE ($juice)
T_CONSTANT_ENCAPSED_STRING ("apple")
T_WHITESPACE ()
T_ECHO (echo)
T_WHITESPACE ( )
T_ENCAPSED_AND_WHITESPACE (juice: )
T_VARIABLE ($juice)

Look at the last two tokens! String interpolation is handled in the lexer and as such is a compile time thing and has nothing to do with runtime.

Too double quote or not, that

For completeness, let's have a look at the opcodes generated by this (after optimisation, using 0x20000):

0000 ASSIGN CV0($juice) string("apple")
0001 T2 = FAST_CONCAT string("juice: ") CV0($juice)
0002 ECHO T2
0003 RETURN int(1)

This is different opcode than we had in our simple

Get to the point: should I concat or interpolate?

Let's have a look at these three different versions:

<?php
$juice = "apple";
echo "juice: $juice $juice";
echo "juice: ", $juice, " ", $juice;
echo "juice: ".$juice." ".$juice;
  • the first version is using string interpolation
  • the second is using a comma separation (which AFAIK only works with echo and not with assigning variables or anything else)
  • and the third option uses string concatenation

The first opcode assigns the string "apple" to the variable $juice:

0000 ASSIGN CV0($juice) string("apple")

The first version (string interpolation) is using a rope as the underlying data structure, which is optimised to do as little string copies as possible.

0001 T2 = ROPE_INIT 4 string("juice: ")
0002 T2 = ROPE_ADD 1 T2 CV0($juice)
0003 T2 = ROPE_ADD 2 T2 string(" ")
0004 T1 = ROPE_END 3 T2 CV0($juice)
0005 ECHO T1

The second version is the most memory effective as it does not create an intermediate string representation. Instead it does multiple calls to ECHO which is a blocking call from an I/O perspective so depending on your use case this might be a downside.

0006 ECHO string("juice: ")
0007 ECHO CV0($juice)
0008 ECHO string(" ")
0009 ECHO CV0($juice)

The third version uses CONCAT/FAST_CONCAT to create an intermediate string representation and as such might use more memory than the rope version.

0010 T1 = CONCAT string("juice: ") CV0($juice)
0011 T2 = FAST_CONCAT T1 string(" ")
0012 T1 = CONCAT T2 CV0($juice)
0013 ECHO T1

So ... what is the right thing to do here and why is it string interpolation?

String interpolation uses either a FAST_CONCAT in the case of echo "juice: $juice"; or highly optimised ROPE_* opcodes in the case of echo "juice: $juice $juice";, but most important it communicates the intent clearly and none of this has been bottle neck in any of the PHP applications I have worked with so far, so none of this actually matters.

TLDR

String interpolation is a compile time thing. Granted, without OPcache the lexer will have to check for variables used in double quoted strings on every request, even if there aren't any, waisting CPU cycles, but honestly: The problem is not the double quoted strings, but not using OPcache!

However, there is one caveat: PHP up to 4 (and I believe even including 5.0 and maybe even 5.1, I don't know) did string interpolation at runtime, so using these versions ... hmm, I guess if anyone really still uses PHP 5, the same as above applies: The problem is not the double quoted strings, but the use of an outdated PHP version.

Final advice

Update to the latest PHP version, enable OPcache and live happily ever after!

以上是双引号是否过多,这就是问题所在!的详细内容。更多信息请关注PHP中文网其他相关文章!

声明:
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn