search
HomeBackend DevelopmentPHP Tutorial使用PHP-Parser生成AST抽象语法树

0、前言

最近项目的流程逐渐清晰,但是很多关键性的技术没有掌握,也只能一步一步摸索。

由于要做基于数据流分析的静态代码分析,所以前端的工作如:词法分析、语法分析必不可少。Yacc和Lex什么的就不再考虑了,查了一天的资料,发现两款比较适合,一款是Java下的ANTLR,另一款是专门做PHP AST生成的PHP-Parser。

ANTLR是编译原理领域比较著名的工具了,相对于Yacc和Lex,更加实用。但是对PHP的语法文件只有一个,折腾了半天才生成调通,发现不太适合,对于”$a=1”生成tokens竟然是[$,a,=,1],无法识别assignment,做得过于粗糙,令人无比失望。

相比之下,PHP-Parser更加专业一些,毕竟专注PHP的词法、语法分析工作。

 

1、介绍

PHP-Parser的项目主页是https://github.com/nikic/PHP-Parser。可以对多版本的PHP进行完美解析,生成一颗抽象语法树。

对于词法分析,PHP有个内置函数token_get_all()可以用来获取TOKENS,作为语法分析的输入,这个开源项目也是用的token_get_all()生成的token流。

 

2、安装

安装也很简单,这里我是使用的PHP中的包管理工具composer添加的,在项目目录中执行以下命令即可:

php composer.phar require nikic/php-parser

如果没有下载Composer,应该先执行下面的命令:

Curl -s http://getcomposer.org/installer | php  

 

3、生成AST

使用composer添加php-parser之后,就可以方便使用。

首先介绍一下PHP-Parser中定义的一些节点类型:

(1)PhpParser\Node\Stmt是语句节点,不带任何返回信息(return)的结构,如赋值语句”$a = $b” ;

(2)PhpParser\Node\Expr是表达式节点,可以返回一个值的语言结构,如$var和func()。

(3)PhpParser\Node\Scalar是常量节点,可以用来表示任何常量值。如’string’,0,以及常量表达式。

(4)还有一些节点没有包括进去,如参数节点(PhpParser\Node\Arg)。

一些节点类的名称使用了下划线,这是为了避免和PHP关键字冲突。

PHP-parser的HelloWorld程序如下,该代码片段会生成AST:

输出结果为:


<span style="font-size:12px;">Array(    [0] => PhpParser\Node\Stmt\Echo_ Object    (            [subNodes:protected] => Array                (                    [exprs] => Array                        (                            [0] => PhpParser\Node\Scalar\String Object                                (                                    [subNodes:protected] => Array                                        (                                            [value] => 1+2                                        )                                    [attributes:protected] => Array                                        (                                            [startLine] => 1                                            [endLine] => 1                                        )                                )                            [1] => PhpParser\Node\Scalar\String Object                                (                                    [subNodes:protected] => Array                                        (                                            [value] => chongrui                                        )                                    [attributes:protected] => Array                                        (                                            [startLine] => 1                                            [endLine] => 1                                        )                                )                        )                )            [attributes:protected] => Array                (                    [startLine] => 1                    [endLine] => 1                )        ))</span>

可以看到,这课AST只有一个节点Echo_,此节点有一个子节点exprs,可以使用$stmts[0]->exprs进行访问。

对于节点中的attributes信息是用来存储startLine和endLine以及comments的。可以使用getAttributes(),getAttribute(‘startLine’),setAttribute(),hasAttribute()方法进行访问。

开始行号startLine可以通过getLine()/setLine()方法进行访问(也可以getAttribute(‘startLine’))。注释信息可以使用getDocComment()获取。

访问节点上的值:如访问值“chongrui”,使用$stmts[0]->exprs[1]->value;即可。

 

 

4、节点遍历

对抽象语法树的遍历非常方便,使用PhpParser\NodeTraverser类即可。同时,支持自定义的Visitor对象。因为在实际应用中,对PHP源码进行分析,往往是不知道AST的具体结构,这时需要动态的去判断每个节点的类型信息。

这些判断统一写到MyNodeVisitor中,该类继承了一个父类NodeVisitorAbstract,这个类中有一些方法:

(1)beforeTraverse()方法用于遍历之前,通常用来在遍历前对值进行重置。

(2)afterTraverse()方法和(1)相同,唯一不同的地方是遍历之后才触发。

(3)enterNode()和leaveNode()方法在对每个节点访问时触发。

enterNode在进入节点时触发,比如在访问节点的子节点之前。这个方法可以返回NodeTraverser::DONT_TRAVERSER_CHILDREN,用来跳过该节点的孩子节点。

leaveNode在遍历节点完成之后触发。它可以返回

NodeTraverser::REMOVE_NODE,这种情况下,当前节点会被删除。如果返回一个节点的集合,那么这些节点会并入到父节点的array中,比如array(A,B,C),B节点被array(X,Y,Z)替换,变成array(A,X,Y,Z,C) .

下面的代码片段对$code进行解析,生成AST,并且在遍历时,当发现遍历节点时String类型时,就进行输出。

结果会输出1,2。

5、其他AST表示

有时候会将AST进行文本化持久保存,这个功能PHP-Parser也支持。

(1)简单的进行序列化

使用serialize()和unserialize()进行序列化和反序列化操作,可以对AST进行持久保存。 

(2)易于阅读的保存形式

分别是完美打印和XML持久存储,在这里不做详细介绍,有需要的时候可以看项目的文档:

https://github.com/nikic/PHP-Parser/blob/master/doc/3_Other_node_tree_representations.markdown

 

 

6、总结

至少在PHP静态分析方面,PHP-Parser在功能方面大大优于ANTLR。如何构建一个PHP自动化审计系统,这个PHP-Parser肯定会发挥不小的作用:)~


Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Optimize PHP Code: Reducing Memory Usage & Execution TimeOptimize PHP Code: Reducing Memory Usage & Execution TimeMay 10, 2025 am 12:04 AM

TooptimizePHPcodeforreducedmemoryusageandexecutiontime,followthesesteps:1)Usereferencesinsteadofcopyinglargedatastructurestoreducememoryconsumption.2)LeveragePHP'sbuilt-infunctionslikearray_mapforfasterexecution.3)Implementcachingmechanisms,suchasAPC

PHP Email: Step-by-Step Sending GuidePHP Email: Step-by-Step Sending GuideMay 09, 2025 am 12:14 AM

PHPisusedforsendingemailsduetoitsintegrationwithservermailservicesandexternalSMTPproviders,automatingnotificationsandmarketingcampaigns.1)SetupyourPHPenvironmentwithawebserverandPHP,ensuringthemailfunctionisenabled.2)UseabasicscriptwithPHP'smailfunct

How to Send Email via PHP: Examples & CodeHow to Send Email via PHP: Examples & CodeMay 09, 2025 am 12:13 AM

The best way to send emails is to use the PHPMailer library. 1) Using the mail() function is simple but unreliable, which may cause emails to enter spam or cannot be delivered. 2) PHPMailer provides better control and reliability, and supports HTML mail, attachments and SMTP authentication. 3) Make sure SMTP settings are configured correctly and encryption (such as STARTTLS or SSL/TLS) is used to enhance security. 4) For large amounts of emails, consider using a mail queue system to optimize performance.

Advanced PHP Email: Custom Headers & FeaturesAdvanced PHP Email: Custom Headers & FeaturesMay 09, 2025 am 12:13 AM

CustomheadersandadvancedfeaturesinPHPemailenhancefunctionalityandreliability.1)Customheadersaddmetadatafortrackingandcategorization.2)HTMLemailsallowformattingandinteractivity.3)AttachmentscanbesentusinglibrarieslikePHPMailer.4)SMTPauthenticationimpr

Guide to Sending Emails with PHP & SMTPGuide to Sending Emails with PHP & SMTPMay 09, 2025 am 12:06 AM

Sending mail using PHP and SMTP can be achieved through the PHPMailer library. 1) Install and configure PHPMailer, 2) Set SMTP server details, 3) Define the email content, 4) Send emails and handle errors. Use this method to ensure the reliability and security of emails.

What is the best way to send an email using PHP?What is the best way to send an email using PHP?May 08, 2025 am 12:21 AM

ThebestapproachforsendingemailsinPHPisusingthePHPMailerlibraryduetoitsreliability,featurerichness,andeaseofuse.PHPMailersupportsSMTP,providesdetailederrorhandling,allowssendingHTMLandplaintextemails,supportsattachments,andenhancessecurity.Foroptimalu

Best Practices for Dependency Injection in PHPBest Practices for Dependency Injection in PHPMay 08, 2025 am 12:21 AM

The reason for using Dependency Injection (DI) is that it promotes loose coupling, testability, and maintainability of the code. 1) Use constructor to inject dependencies, 2) Avoid using service locators, 3) Use dependency injection containers to manage dependencies, 4) Improve testability through injecting dependencies, 5) Avoid over-injection dependencies, 6) Consider the impact of DI on performance.

PHP performance tuning tips and tricksPHP performance tuning tips and tricksMay 08, 2025 am 12:20 AM

PHPperformancetuningiscrucialbecauseitenhancesspeedandefficiency,whicharevitalforwebapplications.1)CachingwithAPCureducesdatabaseloadandimprovesresponsetimes.2)Optimizingdatabasequeriesbyselectingnecessarycolumnsandusingindexingspeedsupdataretrieval.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use