Home  >  Article  >  Backend Development  >  PHP automated white box audit technology and implementation

PHP automated white box audit technology and implementation

WBOY
WBOYOriginal
2016-08-08 09:23:311425browse

0x00 Foreword


There are few published PHP automated audit technical materials in China. In contrast, relatively excellent automated audit implementations have appeared abroad. For example, RIPS performs a series of code analysis based on token flow. Traditional static analysis technologies such as data flow analysis and pollution propagation analysis are relatively rarely applied to dynamic scripting language analysis such as PHP, but they are a key technical point in realizing white-box automation technology. Today, the author mainly introduces the recent research and implementation results. I hope that more domestic security researchers will devote their energy to the meaningful field of PHP automated audit technology.

0x01 Basic knowledge


There are many ways to implement automated auditing, such as directly using the regular expression rule library for positioning and matching. This method is the simplest, but has the lowest accuracy. The most reliable idea is to design based on knowledge in the field of static analysis technology. Generally, the process of static analysis security tools is mostly in the form of the following figure:

The first thing to do in static analysis is to model the source code. In layman's terms, it is to convert the source code of the string into an intermediate representation that is convenient for our subsequent vulnerability analysis, that is, a set of data structures representing this code. Methods in the field of compilation technology are generally used in modeling work, such as lexical analysis to generate tokens, abstract syntax trees, and control flow charts. The quality of modeling work directly affects the results of subsequent pollution propagation analysis and data flow analysis.
Execution analysis is to combine security knowledge to analyze and process vulnerabilities in the loaded code. Finally, the static analysis tool must generate judgment results to end this phase of work.

0x02 Implementation Idea


After a period of hard work, the author and my friends have roughly implemented a static analysis tool for automation. The specific implementation idea uses static analysis technology. If you want to understand the implementation idea in depth, you can read the previously published articles.
In the tool, the automated audit process is as follows:

  • First load all the PHP files in the project directory to be scanned input by the user, and identify these PHP files. If the scanned PHP file is the Main file, That is, the PHP file that actually processes the user's request, then perform vulnerability analysis on this type of file. If it is not a Main file type, such as a class definition or tool function definition file in a PHP project, it will be skipped without analysis.
  • Second, collect global data. The key information collected is the definition of class information in the project to be scanned, such as the file path where the class is located, the attributes in the class, the methods and parameters in the class, and other information. At the same time, a file summary is generated for each file. The file summary focuses on collecting the information of each assignment statement, as well as the purification information and encoding information of the relevant variables in the assignment statement.
  • After global initialization, perform related work on compiling the front-end module, and use the open source tool PHP-Parser to build an abstract syntax tree (AST) for the PHP code to be analyzed. Based on AST, the CFG construction algorithm is used to construct the control flow graph and generate summary information of basic blocks in real time.
  • During the compilation of the front-end, if a call to a sensitive function is found, stop and conduct pollution propagation analysis, conduct inter-process analysis, and intra-process analysis to find the corresponding tainted data. Then, based on the information collected during the data flow analysis process, the purified information and the encoded information are judged to determine whether it is a vulnerable code.
    If the previous step is vulnerable code, then transfer to the vulnerability reporting module to collect vulnerable code segments. The basis of its implementation is to maintain a singleton mode result set context object in the system environment. If a vulnerability record is generated, it is added to the result set. After scanning the entire project results, use Smarty to output the result set to the front end, and the front end will visualize the scan results.

0x03 Initialization work


In a real PHP audit, when we encounter calls to sensitive functions, such as mysql_query, we will involuntarily analyze the first parameter manually to see if it is controllable. In fact, many CMS will encapsulate some database query methods to make them convenient to call and the program logic is clear, such as encapsulating them into a class MysqlDB. At this time, we will not search for the mysql_query keyword during the audit, but will look for calls such as db->getOne.
So the question is, when the automated program is analyzing, how to know that the db->getOne function is a database access class method?
This requires collecting all classes and defined methods of the entire project in the early stage of automated analysis, so that the program can find the method bodies that need to be followed up during analysis.
The collection of class information and method information should be completed as part of the framework initialization and stored in the singleton context:

At the same time, it is necessary to identify whether the PHP file analyzed is the file that actually handles user requests, because in some CMS, encapsulated classes are generally written into separate files, such as database operation classes or file operation classes encapsulated into files. For these files, it is meaningless to conduct pollution propagation analysis, so they need to be identified when the framework is initialized. The principle is very simple. Analyze the proportion of calling type statements and defining type statements, and judge based on the threshold. The error rate is very small.
Finally, perform a summary operation on each file. The purpose of this step is to perform inter-file analysis when encountering require, include and other statements during subsequent analysis. Mainly collects variable assignment, variable encoding, and variable purification information.

0x04 User function processing


Common web vulnerabilities are generally caused by dangerous parameters that are user-controllable. This type of vulnerability is called a taint type vulnerability, such as common SQLI, XSS, etc.
Some of PHP's built-in functions are inherently dangerous, such as echo, which may cause reflected XSS. However, in real code, no one will directly call some built-in functions, but re-encapsulate them as custom functions, such as:

<code><span><span>function</span><span>myexec</span><span>(<span>$cmd</span>)</span>
{</span>
    exec(<span>$cmd</span>) ;
}</code>

In implementation, our processing flow is:

  • Using initialization Get the context information and locate the corresponding method code segment
  • Analyze this code snippet and find the dangerous function (here is exec)
  • Locate the dangerous parameters in the dangerous function (here is cmd)
  • If you do not encounter it during the analysis When the purification information is obtained, indicating that this parameter can be infected, it is mapped to the first parameter cmd of the user function myexec, and this user-defined function is stored in the context structure as a dangerous function
  • recursively returns to start the taint analysis process

To sum it up in one sentence, we just follow the corresponding class methods, static methods, and functions, and check whether there are calls to dangerous functions and dangerous parameters from these code segments. These dangerous functions and parameter locations built into PHP are The configuration is completed in the configuration file. If these functions and parameters are discovered and the dangerous parameters are not filtered, the user-defined function will be regarded as a user-defined dangerous function. Once these functions are found to be called in subsequent analysis, taint analysis will be started immediately.

0x05 Processing the purification and encoding of variables


In the real audit process, once we find that dangerous parameters are controllable, we can’t wait to find out whether the programmer has effectively filtered or encoded the variable. This determines whether there is a vulnerability.
This idea is also followed in automated auditing. In the implementation, statistics and configuration of each security function in PHP must first be carried out. During program analysis, the necessary purification and encoding information should be collected retrospectively for each piece of data flow information, such as:

<code><span>$a</span> = <span>$_GET</span>[<span>'a'</span>] ;
<span>$a</span> = intval(<span>$a</span>) ;
<span>echo</span><span>$a</span> ;
<span>$a</span> = htmlspecialchars(<span>$a</span>) ;
mysql_query(<span>$a</span>) ;</code>

The above The code snippet looks a little weird, but is just for demonstration purposes. As can be seen from the code snippet, variable a has been purified by intval and htmlspecialchars. According to the configuration file, we successfully collected this information. At this time, a backtracking is performed to merge the purification and encoding information upward from the current line of code.
For example, in the third line, the purification information of variable a is only one intval, but in the fifth line, it is required to merge the purification information of variable a and collect it into a list set of intval and htmlspecialchars. The method is to collect all the data streams in the predecessor code. information and perform backtracking.

The detail is that when the user calls two functions such as base64_encode and base64_decode on the same variable at the same time, the base64 encoding of the variable will be eliminated. Similarly, if escaping and anti-escaping are performed at the same time, they must also be eliminated. But if the calling sequence is wrong or only decoding is performed, then you know, it is quite dangerous.

0x06 Variable backtracking and taint analysis


1. Variable backtracking

In order to find the parameters (traceSymbol) of all dangerous sink points, all basic blocks connected to the current Block will be traced forward. The specific process is as follows:

  • Loop all the entry edges of the current basic block to find those that have not been purified traceSymbol and look for the name of the traceSymbol in the DataFlow property of the basic block.
  • If once found, replace it with the mapped symbol, and copy all the purification information and encoding information of the symbol. Tracking is then carried out at all entrances.
  • Finally, the results on different paths on CFG will be returned.

When traceSymbol is mapped to a static object of type static string, number, etc. or the current basic block has no entry edge, the algorithm stops. If traceSymbol is a variable or array, check whether it is in the superglobal array.

2. Taint analysis

Taint analysis starts during inter-process analysis and processing of built-in and user-defined functions. If a sensitive function call is encountered during program analysis, use backtracking or obtain dangerous parameter nodes from the context and start Perform taint analysis. In layman's terms, it is to judge whether dangerous parameters may cause vulnerabilities. The taint analysis work is implemented in the code TaintAnalyser. After obtaining the dangerous parameters, the specific steps are as follows:

  • First, look for the assignment of dangerous parameters in the current basic block and find whether there is user input in the right node of DataFlow source, such as GET

    The above introduces the technology and implementation of PHP automated white-box auditing, including aspects of it. I hope it will be helpful to friends who are interested in PHP tutorials.

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn