PHP Tokenizer 学习笔记_PHP-PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

PHP Tokenizer 学习笔记_PHP

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 01, 2016 pm 12:21 PM

简述

在某个项目中需要分析 PHP 代码，分离出对应的函数调用（以及源代码对应的位置）。虽然这使用正则也可以实现，但无论从效率还是代码复杂度方面考虑，这都不是最优的方式。

查询了 PHP 手册，发现其实 PHP 已经内置解析器的接口，那就是 PHP Tokenizer，这工具正是我想要的。使用 PHP Tokenizer 能简单、高效、准确的分析出 PHP 源代码的组成。

实例

官方站点对 Tokenizer 的文档很少，不过这不影响我们理解它。Tokenizer 组件仅仅包含两个函数：token_get_all 以及 token_name，它们分别用于分析 PHP 代码以及获取代码对应的标识符名称。

下面是个简单的实例，说明如何使用这两个函数：

以下为引用的内容：

$code = '<?php echo "string1"."string2"; ?>';
$tokens = token_get_all($code);
foreach ($tokens as $token) {
    if (is_array($token)) {
        // 行号、标识符字面量、对应内容
        printf("%d - %s\t%s\n", $token[2], token_name($token[0]), $token[1]);
    }
}

对应的输出为

以下为引用的内容：

1 - T_OPEN_TAG    <?php 1 - T_ECHO    echo
1 - T_WHITESPACE     
1 - T_CONSTANT_ENCAPSED_STRING    "string1"
1 - T_CONSTANT_ENCAPSED_STRING    "string2"
1 - T_WHITESPACE     
1 - T_CLOSE_TAG    ?>

这里顺便说明下，$token 如果为数组，那么分别对应的三个数组成员为 token 标识符（可以用 token_name 获得字面量）、对应的源代码内容、以及对应的行号。

还有中情况就是 $token 为字符串，这可能的情况之一就是为 T_CONSTANT_ENCAPSED_STRING 等常量，在分析代码时要注意。如果对这点很在意，可以考虑使用这里的代码。

是的，调用方式非常的简单，我们的野心当然远远要比写个简单的循环要大得多。我们可以利用这个组件做写实事，例如下面的代码用于“压缩” PHP 代码，去除不不要的换行、空白以及注释

以下为引用的内容：

/**
 * “压缩”PHP 源代码
 *
 * @see http://c7y.phparch.com/c/entry/1/art,practical_uses_tokenizer
 */
class CompactCode
{
    static protected $out;
    static protected $tokens;

    static public function compact($source)
    {
        // 解析 PHP 源代码
        self::$tokens = token_get_all($source);   
        self::$out = '';

        reset(self::$tokens);

        // 递归判断每个标记符的类型
        while ($t = current(self::$tokens)) {
            if (is_array($t)) {
                // 过滤空白、注释
                if ($t[0] == T_WHITESPACE || $t[0] == T_DOC_COMMENT || $t[0] == T_COMMENT) {
                    self::skipWhiteAndComments();
                    continue;
                }       
                self::$out .= $t[1];
            } else {
                self::$out .= $t;
            }

            next(self::$tokens);
        }

        return self::$out;
    }

    static private function skipWhiteAndComments()
    {
        // 增加个空格，用于分割关键字
        self::$out .= ' ';
        while ($t = current(self::$tokens)) {
            // 再次贪婪查找
            if (is_array($t) && ($t[0] == T_WHITESPACE || $t[0] == T_DOC_COMMENT || $t[0] == T_COMMENT)) {
                next(self::$tokens);
            } else {
                return;
            }
        }
    }
}

调用方式很简单，只需要使用

以下为引用的内容：

CompactCode::compact($source_code);

即可，返回的字符串就是压缩以后的内容。在这里还有更多使用 Tokenizer 的实例，推荐阅读。

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

What is the difference between the unset() and unlink() functions ?Apr 30, 2025 pm 03:33 PM

The article discusses the differences between unset() and unlink() functions in programming, focusing on their purposes and use cases. Unset() removes variables from memory, while unlink() deletes files from the filesystem. Both are crucial for effec

What are Traits in PHP ?Apr 30, 2025 pm 03:31 PM

PHP traits enable code reuse in single inheritance contexts, offering benefits like reusability and simplified inheritance. They can be effectively combined with traditional inheritance to enhance class flexibility and modularity.

Is PHP supports multiple inheritance ?Apr 30, 2025 pm 03:30 PM

PHP does not support multiple inheritance but uses interfaces and traits as alternatives to achieve similar functionality, avoiding issues like the diamond problem.

What is inheritance in PHP ?Apr 30, 2025 pm 03:29 PM

Inheritance in PHP allows classes to inherit properties and methods, promoting code reuse and hierarchical organization. Key benefits include reusability, abstraction, and polymorphism. Common mistakes to avoid are overuse of inheritance and ignoring

What are the main error types, and how do they differ?Apr 30, 2025 pm 03:28 PM

The article discusses three main error types in programming: syntax, runtime, and logical errors. It explains their causes, prevention strategies, impacts on performance and user experience, and methods for diagnosis and resolution.

How can PHP and HTML interact?Apr 30, 2025 pm 03:27 PM

Article discusses PHP and HTML interaction, best practices for embedding PHP in HTML, dynamic HTML content generation, and recommended development tools.

What is the difference between for and foreach loop in PHP?Apr 30, 2025 pm 03:26 PM

The article discusses the differences between for and foreach loops in PHP, focusing on syntax, usage, control, and performance. Foreach is preferred for array iteration due to simplicity and efficiency, but for loops are better for index-based opera

Explain the importance of Parser in PHP.for eachApr 30, 2025 pm 03:25 PM

The article discusses the crucial role of the PHP parser in script execution, focusing on its tasks in syntax analysis, error handling, and code optimization, and how its efficiency impacts web application performance.

See all articles