


PHP Kernel Exploration: Principle of Hash Table Collision Attack, Kernel
The following uses pictures and texts to show you PHP Kernel Exploration: Principle of Hash Table Collision Attack.
Recently, the topic of Hashtable collisions as DOS attack has been constantly brought up, and various languages have been affected. This article combines the PHP kernel source code to talk about the principle and implementation of this attack.
Basic principles of hash table collision attack
Hash table is a data structure with extremely high search efficiency. Many languages implement hash tables internally. The hash table in PHP is an extremely important data structure. It is not only used to represent the Array data type, but also used to store context information inside the Zend virtual machine (the variables and functions of the execution context are all stored using the hash table structure) .
Ideally, the time complexity of hash table insertion and search operations is O(1). Any data item can calculate a hash value (key) in a time independent of the length of the hash table. Then a bucket (the term bucket means a position in the hash table) is located in constant time. Of course, this is an ideal situation, because the length of any hash table is limited, so there must be a situation where different data items have the same hash value. At this time, different data items are assigned to the same bucket, which is called a collision. (collision). The implementation of the hash table needs to solve the collision problem. There are generally two ideas for solving the collision. The first is to assign the collided data to other buckets according to a certain principle, such as linear detection - if the data collides during insertion, Then search the buckets behind this bucket sequentially and put them into the first unused bucket; the second strategy is that each bucket is not a location that can only hold a single data item, but a location that can hold multiple data. Data structure (such as a linked list or a red-black tree), all collision data is organized in the form of some data structure.
No matter which collision resolution strategy is used, the time complexity of the insertion and search operations is no longer O(1). Take search as an example. You cannot end by locating the bucket by key. You must also compare whether the original key (that is, the key before hashing) is equal. If not, you must use the same algorithm as the insertion to continue searching until it is found. The matching value or confirmation data is not in the hash table.
PHP uses a singly linked list to store collision data, so in fact the average search complexity of the PHP hash table is O(L), where L is the average length of the bucket linked list; and the worst complexity is O(N) , at this time all the data collides, and the hash table degenerates into a singly linked list. The following figure is a schematic diagram of a normal hash table and a degenerate hash table in PHP.
The hash table collision attack is to carefully construct the data so that all the data collides, artificially turning the hash table into a degenerated singly linked list. At this time, the time of various operations on the hash table is increased by an order of magnitude, so It will consume a lot of CPU resources, causing the system to be unable to respond to requests quickly, thereby achieving the purpose of a denial of service attack (DoS).
As you can see, the premise for a hash collision attack is that the hash algorithm is particularly easy to find collisions. If it is MD5 or SHA1, it is basically out of the question. Fortunately (or unfortunately) most programming languages The hashing algorithms used are very simple (this is for efficiency reasons), so the attack data can be constructed effortlessly. The next section will analyze Zend related kernel code to find out how to attack hash table collision and attack PHP.
Internal implementation data structure of Zend hash table
PHP uses a structure called Backet to represent buckets, and all buckets with the same hash value are organized into a singly linked list. The hash table is represented by the HashTable structure. The relevant source code is under zend/Zend_hash.h:
typedef struct bucket { ulong h; /* Used for numeric indexing */ uint nKeyLength; void *pData; void *pDataPtr; struct bucket *pListNext; struct bucket *pListLast; struct bucket *pNext; struct bucket *pLast; char arKey[1]; /* Must be last element */ } Bucket; typedef struct _hashtable { uint nTableSize; uint nTableMask; uint nNumOfElements; ulong nNextFreeElement; Bucket *pInternalPointer; /* Used for element traversal */ Bucket *pListHead; Bucket *pListTail; Bucket **arBuckets; dtor_func_t pDestructor; zend_bool persistent; unsigned char nApplyCount; zend_bool bApplyProtection; #ifZEND_DEBUG int inconsistent; #endif } HashTable;
The field name clearly indicates its purpose, so no further explanation is required. Focus on the following fields: "h" in Bucket is used to store the original key; nTableMask in HashTable is a mask, generally set to nTableSize - 1, which is closely related to the hash algorithm. When discussing the hash algorithm later Will elaborate; arBuckets points to an array of pointers, where each element is a pointer to the head of the Bucket list.
Hash Algorithm
The minimum capacity of the PHP hash table is 8 (2^3), the maximum capacity is 0×80000000 (2^31), and is rounded to the integer power of 2 (that is, the length will automatically expand to the integer power of 2, such as 13 The length of the hash table of elements is 16; the length of the hash table of 100 elements is 128). nTableMask is initialized to the length of the hash table (after rounding) minus 1. The specific code is in the _zend_hash_init function of zend/Zend_hash.c. Here, the parts related to this article are intercepted and added with a few comments.
ZEND_API int_zend_hash_init(HashTable *ht, uintnSize, hash_func_t pHashFunction, dtor_func_t pDestructor, zend_bool persistent ZEND_FILE_LINE_DC) { uinti = 3; Bucket **tmp; SET_INCONSISTENT(HT_OK); //长度向2的整数次幂圆整 if(nSize >= 0x80000000) { /* prevent overflow */ ht->nTableSize = 0x80000000; } else{ while((1U << i) < nSize) { i++; } ht->nTableSize = 1<< i; } ht->nTableMask = ht->nTableSize - 1; /*此处省略若干代码…*/ returnSUCCESS; }
It is worth mentioning that PHP’s method of rounding to integer powers of 2 is very clever and can be memorized and used when needed.
Zend HashTable’s hash algorithm is extremely simple:
Copy code The code is as follows:
hash(key)=key&nTableMask
That is, simply perform a bitwise AND between the original key of the data and the nTableMask of the HashTable.
If the original key is a string, first use the Times33 algorithm to convert the string into an integer and then bitwise AND it with nTableMask.
复制代码 代码如下:
hash(strkey)=time33(strkey)&nTableMask
下面是Zend源码中查找哈希表的代码:
ZEND_API int zend_hash_index_find(constHashTable *ht, ulong h, void **pData) { uint nIndex; Bucket *p; IS_CONSISTENT(ht); nIndex = h & ht->nTableMask; p = ht->arBuckets[nIndex]; while(p != NULL) { if((p->h == h) && (p->nKeyLength == 0)) { *pData = p->pData; returnSUCCESS; } p = p->pNext; } returnFAILURE; } ZEND_API int zend_hash_find(constHashTable *ht, constchar *arKey, uint nKeyLength, void **pData) { ulong h; uint nIndex; Bucket *p; IS_CONSISTENT(ht); h = zend_inline_hash_func(arKey, nKeyLength); nIndex = h & ht->nTableMask; p = ht->arBuckets[nIndex]; while(p != NULL) { if((p->h == h) && (p->nKeyLength == nKeyLength)) { if(!memcmp(p->arKey, arKey, nKeyLength)) { *pData = p->pData; returnSUCCESS; } } p = p->pNext; } returnFAILURE; }
其中zend_hash_index_find用于查找整数key的情况,zend_hash_find用于查找字符串key。逻辑基本一致,只是字符串key会通过zend_inline_hash_func转为整数key,zend_inline_hash_func封装了times33算法,具体代码就不贴出了。
攻击 基本攻击
知道了PHP内部哈希表的算法,就可以利用其原理构造用于攻击的数据。一种最简单的方法是利用掩码规律制造碰撞。上文提到Zend HashTable的长度nTableSize会被圆整为2的整数次幂,假设我们构造一个2^16的哈希表,则nTableSize的二进制表示为:1 0000 0000 0000 0000,而nTableMask = nTableSize - 1为:0 1111 1111 1111 1111。接下来,可以以0为初始值,以2^16为步长,制造足够多的数据,可以得到如下推测:
0000 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0
0001 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0
0010 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0
0011 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0
0100 0000 0000 0000 0000 & 0 1111 1111 1111 1111 = 0
……
概况来说只要保证后16位均为0,则与掩码位于后得到的哈希值全部碰撞在位置0。
下面是利用这个原理写的一段攻击代码:
<?php $size= pow(2, 16); $startTime= microtime(true); $array= array(); for($key= 0, $maxKey= ($size- 1) * $size; $key<= $maxKey; $key+= $size) { $array[$key] = 0; } $endTime= microtime(true); echo $endTime- $startTime, " seconds";
这段代码在我的VPS上(单CPU,512M内存)上用了近88秒才完成,并且在此期间CPU资源几乎被用尽:
而普通的同样大小的哈希表插入仅用时0.036秒:
<?php $size= pow(2, 16); $startTime= microtime(true); $array= array(); for($key= 0, $maxKey= ($size- 1) * $size; $key<= $size; $key+= 1) { $array[$key] = 0; } $endTime= microtime(true); echo $endTime- $startTime, " seconds";
可以证明第二段代码插入N个元素的时间在O(N)水平,而第一段攻击代码则需O(N^2)的时间去插入N个元素。
POST攻击
当然,一般情况下很难遇到攻击者可以直接修改PHP代码的情况,但是攻击者仍可以通过一些方法间接构造哈希表来进行攻击。例如PHP会将接收到的HTTP POST请求中的数据构造为$_POST,而这是一个Array,内部就是通过Zend HashTable表示,因此攻击者只要构造一个含有大量碰撞key的post请求,就可以达到攻击的目的。具体做法不再演示。
防护 POST攻击的防护
针对POST方式的哈希碰撞攻击,目前PHP的防护措施是控制POST数据的数量。在>=PHP5.3.9的版本中增加了一个配置项max_input_vars,用于标识一次http请求最大接收的参数个数,默认为1000。因此PHP5.3.x的用户可以通过升级至5.3.9来避免哈希碰撞攻击。5.2.x的用户可以使用这个patch: http://www.laruence.com/2011/12/30/2440.html 。
另外的防护方法是在Web服务器层面进行处理,例如限制http请求body的大小和参数的数量等,这个是现在用的最多的临时处理方案。具体做法与不同Web服务器相关,不再详述。
其它防护
上面的防护方法只是限制POST数据的数量,而不能彻底解决这个问题。例如,如果某个POST字段是一个json数据类型,会被PHP json_decode ,那么只要构造一个超大的json攻击数据照样可以达到攻击目的。理论上,只要PHP代码中某处构造Array的数据依赖于外部输入,则都可能造成这个问题,因此彻底的解决方案要从Zend底层HashTable的实现动手。一般来说有两种方式,一是限制每个桶链表的最长长度;二是使用其它数据结构如 红黑树 取代链表组织碰撞哈希(并不解决哈希碰撞,只是减轻攻击影响,将N个数据的操作时间从O(N^2)降至O(NlogN),代价是普通情况下接近O(1)的操作均变为O(logN))。
目前使用最多的仍然是POST数据攻击,因此建议生产环境的PHP均进行升级或打补丁。至于从数据结构层面修复这个问题,目前还没有任何方面的消息。
以上所述就是本文的全部内容,希望大家喜欢。

php把负数转为正整数的方法:1、使用abs()函数将负数转为正数,使用intval()函数对正数取整,转为正整数,语法“intval(abs($number))”;2、利用“~”位运算符将负数取反加一,语法“~$number + 1”。

实现方法:1、使用“sleep(延迟秒数)”语句,可延迟执行函数若干秒;2、使用“time_nanosleep(延迟秒数,延迟纳秒数)”语句,可延迟执行函数若干秒和纳秒;3、使用“time_sleep_until(time()+7)”语句。

php除以100保留两位小数的方法:1、利用“/”运算符进行除法运算,语法“数值 / 100”;2、使用“number_format(除法结果, 2)”或“sprintf("%.2f",除法结果)”语句进行四舍五入的处理值,并保留两位小数。

判断方法:1、使用“strtotime("年-月-日")”语句将给定的年月日转换为时间戳格式;2、用“date("z",时间戳)+1”语句计算指定时间戳是一年的第几天。date()返回的天数是从0开始计算的,因此真实天数需要在此基础上加1。

php字符串有下标。在PHP中,下标不仅可以应用于数组和对象,还可应用于字符串,利用字符串的下标和中括号“[]”可以访问指定索引位置的字符,并对该字符进行读写,语法“字符串名[下标值]”;字符串的下标值(索引值)只能是整数类型,起始值为0。

方法:1、用“str_replace(" ","其他字符",$str)”语句,可将nbsp符替换为其他字符;2、用“preg_replace("/(\s|\ \;||\xc2\xa0)/","其他字符",$str)”语句。

php判断有没有小数点的方法:1、使用“strpos(数字字符串,'.')”语法,如果返回小数点在字符串中第一次出现的位置,则有小数点;2、使用“strrpos(数字字符串,'.')”语句,如果返回小数点在字符串中最后一次出现的位置,则有。

在php中,可以使用substr()函数来读取字符串后几个字符,只需要将该函数的第二个参数设置为负值,第三个参数省略即可;语法为“substr(字符串,-n)”,表示读取从字符串结尾处向前数第n个字符开始,直到字符串结尾的全部字符。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 Linux new version
SublimeText3 Linux latest version

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

Notepad++7.3.1
Easy-to-use and free code editor
