Home >Backend Development >PHP Tutorial >nginx data structure 1 - ngx_int_t and ngx_rbtree_t
Faced with 71 source files in the ./src/core subdirectory, it’s a bit hard to start. Browsing the nginx.c file containing the main function, I found that nginx uses a lot of self-encapsulated data structures. It is difficult to understand the meaning of the operations in the main function without knowing what kind of data structures these are. . So we picked a seemingly basic data structure and started studying it. Organizing all data structures in nginx is the ngx_core.h file. It first contains ngx_config.h, where we found three type definitions. 1, ngx_int_t,
ngx_uint_t, ngx_flag_t The first unfamiliar data type seen in nginx.c is
ngx_int_t, in nginx_config.h Found its definition.
typedef intptr_t ngx_int_t; typedef uintptr_t ngx_uint_t; typedef intptr_t ngx_flag_t;followed the clues and found the definitions of three data types. There is no introduction to
intptr_t/uintptr_t
in the introductory teaching of undergraduate c. I found their definitions in the stdint.h header file of c./* Types for `void *' pointers. */ #if __WORDSIZE == 64 # ifndef __intptr_t_defined typedef long int intptr_t; # define __intptr_t_defined # endif typedef unsigned long int uintptr_t; #else # ifndef __intptr_t_defined typedef int intptr_t; # define __intptr_t_defined # endif typedef unsigned int uintptr_t; #endifFirst of all, comment that these two types are pointer types of
"void *", although literally,
intptr_t and uintptr_t are indeed integer pointer types and unsigned Integer pointer type, but people are confused, why use integer as the pointer type of integer? Play it first and look at the macro at the back. The machine is 64bit word length, so intptr_t is long int, uintptr_t is unsigned long int , just me The machine is a 64 compiler. After sizeof(), it is 8 bytes64bits, which is less than 64bits long intptr_t is int, uintptr_t is unsigned int, and the table shows that 32bit compiler under int and unsigned is 4 bytes, 16 bit compiler is 2 bytes. Then intptr_t/uintptr_t should be an integer type that changes accordingly as the platform word length changes. After understanding, I found that the explanation in "In-depth Analysis of LinuxKernel Source Code" is that when the system kernel operates memory, it treats the memory as a large array, and the pointer is the array index / subscript, kernel Programmers use this special integer type to accept memory address values and operate memory more intuitively than using pointers, and are less likely to make mistakes. It seems that in nginx, we just want to use some platform-related int, unsigned int type variables.2, ngx_rbtree_t
2.1, What is a red-black tree
Retired players paddling in the competition, yes Famous data structures like red-black trees are relatively sensitive. The red-black tree is a balanced binary search tree implementation under a special constraint form. Students who have studied the data structure class should know that the earliest self-balancing binary tree in the textbook
AVLtree strictly requires that the height difference of the subtrees does not exceed 2 in order to obtain the root node to all leaf nodes Distances are essentially the same (balanced) properties. Red-black trees do not pursue strict balance, but achieve basic balance through 5 constraints:
① Nodes are red or black; ② Roots are black;
③ Leaf nodes are Black; ④The child nodes of the red node are all black; ⑤The number of black nodes in the simple path from any node to its leaf node is the same. The ratio of the longest distance to the shortest distance from the root of the AVL tree to the leaf node does not exceed 2. The constraints of the red-black tree also guarantee this characteristic (the longest path is red and black, and the shortest path is all black. In this case, the longest path is exactly twice as long as the shortest path). Since it is an implementation of a balanced binary search tree, the red-black tree is naturally internally ordered and supports O(log2n) time complexity search, insertion and deletion like the AVL tree.相比AVL树,红黑可以保证在每次插入或删除操作之后的重平衡过程中,全树拓扑结构的更新仅涉及常数个结点。尽管最坏情况下需对O(log2n)个结点重染色,但就分摊意义(平均效率)而言,仅为O(1)个。但是因为没有严格约束树的平衡特性,红黑树的左右子树高度差比AVL树要大。
2.2、ngx_rbtree.h
机会难得,我们就把nginx的源码作为素材来深入了解一下红黑树的实现。首先是结点的结构:
typedef ngx_uint_t ngx_rbtree_key_t; typedef ngx_int_t ngx_rbtree_key_int_t; typedef struct ngx_rbtree_node_s ngx_rbtree_node_t; struct ngx_rbtree_node_s { ngx_rbtree_key_t key;//平台相关的无符号整型关键字 ngx_rbtree_node_t *left;//左子结点指针 ngx_rbtree_node_t *right;//<span style="font-family:宋体;">右</span>子结点指针 ngx_rbtree_node_t *parent;//父结点指针 u_char color;//结点颜色 u_char data;//结点数据 };
然后是红黑树的结构定义:
typedef struct ngx_rbtree_s ngx_rbtree_t; //“_s”是结构体“_t”是类型 //下面是一个函数指针变量类型的定义,是红黑树插入函数的指针 //参数有树根结点、插入结点和哨兵结点的指针 typedef void (*ngx_rbtree_insert_pt) (ngx_rbtree_node_t *root, ngx_rbtree_node_t *node, ngx_rbtree_node_t *sentinel); struct ngx_rbtree_s { ngx_rbtree_node_t *root; //根节点指针 ngx_rbtree_node_t *sentinel; //哨兵结点指针 ngx_rbtree_insert_pt insert; //插入函数指针 };将函数指针变量作为结构体成员变量以达成可以把结构体当做类来使用(既有成员变量又有成员方法)的效果,这种手法在nginx的源码中相当普遍。关于函数,nginx还有一种更神奇的手段——宏:
#define ngx_rbtree_init(tree, s, i) \ ngx_rbtree_sentinel_init(s); \ (tree)->root = s; \ (tree)->sentinel = s; \ (tree)->insert = i//这里insert函数指针的赋值实现了多态
借助宏来达成内联函数的效果(函数实现如果比较简单,就干脆把实现过程整个搬到类中),令人费解的是,C不是没有内联关键字,甚至同一个头文件中就有一个内联函数的定义。研究内联函数之前,下面还有几个宏要看一看:
#define ngx_rbt_red(node) ((node)->color = 1) #define ngx_rbt_black(node) ((node)->color = 0) #define ngx_rbt_is_red(node) ((node)->color) #define ngx_rbt_is_black(node) (!ngx_rbt_is_red(node)) #define ngx_rbt_copy_color(n1, n2) (n1->color = n2->color) /* a sentinel must be black */ #define ngx_rbtree_sentinel_init(node) ngx_rbt_black(node)
nginx源码中的变量都很容易看懂以至于我们不怎么需要查资料或找注释。color置1染红置0染黑,color为1则结点为红色,不为红色的则为黑色,复制结点颜色即复制color值,哨兵结点一定要染成黑色。
static ngx_inline ngx_rbtree_node_t * ngx_rbtree_min(ngx_rbtree_node_t *node, ngx_rbtree_node_t *sentinel) { while (node->left != sentinel) { node = node->left; } return node; }
ngx_inline是一个宏,实际值就是关键字inline。这个内联函数非常好懂,目的看起来是寻找以任意结点为根结点的子树中结点值最小的结点。实现方法是找到红黑树子树最边缘的左子结点。那么我们有理由猜测,哨兵结点是空结点或边缘标识。
2.3、红黑树的结点插入
接下来我们来深入ngx_rbtree.c看看nginx如何实现几个关键的红黑树方法。
void ngx_rbtree_insert(ngx_rbtree_t *tree, ngx_rbtree_node_t *node) { //根结点指针的指针,或者根结点指针数组,会有多个根结点吗,令人费解 //临时结点指针 //哨兵结点指针,推测哨兵在每次查询时可能都不一样,也许指待插位置 //变量不分行,我写注释都很不方便 ngx_rbtree_node_t **root, *temp, *sentinel; /* a binary tree insert */ root = (ngx_rbtree_node_t **) &tree->root;//树根指针的指针赋给了root sentinel = tree->sentinel;//哨兵指针赋给了哨兵指针 if (*root == sentinel) {//特判,如果根是哨兵,即树是空的 node->parent = NULL;//新插入的结点变成了根 node->left = sentinel;//新结点的左子结点是哨兵 node->right = sentinel;//新结点的右子结点也是哨兵 ngx_rbt_black(node);//新根染黑 *root = node;//确认新结点为新根 return;//插入结束 } //树初始化时给了insert指针一个函数地址 //查看前面的宏ngx_rbtree_init(tree, s, i) //发现只是把指定结点染黑,同时赋为根和哨兵,给insert指针指定一个函数 //ngx_rbtree.c中有两个参数表符合的可选函数:插入值、插入计时器值 //稍后来看两种插入分别如何实现又有什么区别 tree->insert(*root, node, sentinel); /* re-balance tree */ //如果新结点不是根且其父结点是红的,循环 while (node != *root && ngx_rbt_is_red(node->parent)) { //如果父结点是左子结点,获得父结点的右兄弟 if (node->parent == node->parent->parent->left) { temp = node->parent->parent->right; //如果父结点的右兄弟是红的 if (ngx_rbt_is_red(temp)) { ngx_rbt_black(node->parent);//父结点染黑 ngx_rbt_black(temp);//父结点的右兄弟染黑 ngx_rbt_red(node->parent->parent);//父结点的父结点染红 node = node->parent->parent;//父结点的父结点成为当前结点 } else {//如果父结点的右兄弟是黑的 if (node == node->parent->right) {//如果新结点是右子结点 node = node->parent;//父结点成为新node ngx_rbtree_left_rotate(root, sentinel, node);//node左旋 } ngx_rbt_black(node->parent);//node的父结点染黑 //node的父结点的父结点染红 ngx_rbt_red(node->parent->parent); ngx_rbtree_right_rotate(root, sentinel, node->parent->parent);//node的父结点的父结点右旋 } } else {//如果父结点是右子结点,获得父结点的左兄弟 temp = node->parent->parent->left; //如果父结点的左兄弟是红的 if (ngx_rbt_is_red(temp)) { ngx_rbt_black(node->parent);//父结点染黑 ngx_rbt_black(temp);//父结点的左兄弟染黑 ngx_rbt_red(node->parent->parent);//父结点的父结点染红 node = node->parent->parent; } else {//如果父结点的左兄弟是黑的 if (node == node->parent->left) {//如果新结点是左子结点 node = node->parent;//父结点成为当前结点 ngx_rbtree_right_rotate(root, sentinel, node); //当前结点右旋 } ngx_rbt_black(node->parent);//当前结点染黑 //当前结点父结点的父结点染红 ngx_rbt_red(node->parent->parent); ngx_rbtree_left_rotate(root, sentinel, node->parent->parent);//当前结点的父结点的父结点左旋 } } } ngx_rbt_black(*root);//根结点染黑 }
然后是对应ngx_rbtree_insert_pt指针的基础的结点插入函数:
void ngx_rbtree_insert_value(ngx_rbtree_node_t *temp, ngx_rbtree_node_t *node, ngx_rbtree_node_t *sentinel) { ngx_rbtree_node_t **p;//虽然无关紧要,但两层指针令人费解 for ( ;; ) {//无条件循环或者说死循环,等同于while(1)但节省了一个字符 p = (node->key < temp->key) ? &temp->left : &temp->right; if (*p == sentinel) {//在二叉树中查找新结点合适的叶结点位置 break; } temp = *p; } //令新结点占据合适的哨兵位置成为新的叶结点,染红,产生新哨兵 *p = node; node->parent = temp; node->left = sentinel; node->right = sentinel; ngx_rbt_red(node); }
ngx_rbtree_insert_timer_value函数跟ngx_rbtree_insert_value函数唯一区别就是判断大小时,采用了两个值相减,避免溢出。
以上是插入结点涉及的函数,老实说我不太喜欢这么长的函数实现,换我自己写肯定分块了。分支操作太多,看代码逻辑已经乱了,我们需要画几个图。首先,如果树为空:
如果树中只有一个根结点:
如果C>A:
如果C,C染红,B染黑A染红,A右旋。右旋函数如下:
static ngx_inline void ngx_rbtree_right_rotate(ngx_rbtree_node_t **root, ngx_rbtree_node_t *sentinel, ngx_rbtree_node_t *node) { ngx_rbtree_node_t *temp; temp = node->left; node->left = temp->right;//左子结点指向原左子结点的右结点 if (temp->right != sentinel) {//如果左子结点的右结点不为哨兵 temp->right->parent = node;//左子结点的右子结点挂在右旋结点上 } temp->parent = node->parent;//左子结点挂在右旋结点的父结点上 if (node == *root) {//如果右旋结点为根节点 *root = temp;//根节点赋为左子结点 } else if (node == node->parent->right) {//如果右旋结点为右子结点 node->parent->right = temp;//左子结点挂父结点右边 } else {//否则左子结点挂父结点左边 node->parent->left = temp; } temp->right = node;//右旋结点挂左子结点右边 node->parent = temp; } <p> 显然<span>B</span><span>将成为新的根,左</span><span>C</span><span>右</span><span>A</span><span>:</span></p> <img src="http://image.codes51.com/Article/image/20150806/20150806084733_7641.png" alt=""><br><p> 如果<span>B<C<A</span><span>,会先做一次左旋再做一次右旋,其实除开染色过程,我觉得这跟</span><span>AVL</span><span>树的插入过程没有什么区别:</span></p><img src="http://image.codes51.com/Article/image/20150806/20150806084733_8423.png" alt=""><p> 其他的插入情景要么与以上几个对称,要么发生在树的其他子树中,实际过程完全一样。LL<span>型右旋,</span><span>RR</span><span>型左旋,</span><span>LR</span><span>型先右旋后左旋,</span><span>RL</span><span>型先左旋后右旋。</span>与<span>AVL</span><span>树不同的是,插入结点时红黑树左旋或右旋的判定条件明确为附近一两个结点的颜色,其他过程没有任何区别。</span></p><p>2.4、红黑树的结点删除</p><p> 据说红黑树和<span>AVL</span><span>树的区别主要体现在删除节点时,我们就来看一看。</span>我刚说什么来着,删除结点的函数体更长了,足足<span>165</span><span>行,我决定分段研究,</span>先看第一部分:</p><pre name="code">if (node->left == sentinel) {//如果左子结点是哨兵或左右子结点都是哨兵 temp = node->right;//获得右子结点,后面让它接替node位置 subst = node;//node赋给subst } else if (node->right == sentinel) {//如果右子结点是哨兵 temp = node->left;//获得左子结点,后面让它接替node位置 subst = node;//node赋给subst } else {//如果左右子结点都不是哨兵 subst = ngx_rbtree_min(node->right, sentinel);//获得右子树中最小的结点 if (subst->left != sentinel) {//如果右子树的最小结点的左子结点不是哨兵 temp = subst->left;//获得右子树的最小结点的左子结点 } else {//否则获得右子树最小结点的右子结点 temp = subst->right; }//看起来subst将被从原位置删掉然后接替node的位置 } <p> 下面我们来看看<span>temp</span><span>和</span><span>subst</span><span>要干什么用:</span></p> <pre name="code">if (subst == *root) {//如果subst是根 *root = temp;//temp接替根 ngx_rbt_black(temp);//染黑temp /* DEBUG stuff */ node->left = NULL;//清空了待删结点 node->right = NULL; node->parent = NULL; node->key = 0; return; } red = ngx_rbt_is_red(subst);//获得subst是否是红色 if (subst == subst->parent->left) {//如果subst是左子结点 subst->parent->left = temp;//把接替结点挂到subst位置 } else {//如果subst是右子结点 subst->parent->right = temp;//把接替结点挂到subst位置 }下一段:
if (subst == node) {//如果subst是待删结点 temp->parent = subst->parent;//接替结点直接接替,删除完成 } else {//如果subst不是待删结点 if (subst->parent == node) {//如果subst的父结点就是待删结点 temp->parent = subst;//接替结点挂在subst上 } else {//如果待删结点比subst的父结点更高 temp->parent = subst->parent;//把接替结点挂在subst的父结点上 } //subst接替待删结点node的位置,复制待删结点跟周围结点的关系 subst->left = node->left; subst->right = node->right; subst->parent = node->parent; ngx_rbt_copy_color(subst, node);//复制颜色 if (node == *root) {//如果待删结点是根 *root = subst;//subst接替根 } else {//如果待删结点不是根,subst接替它 if (node == node->parent->left) { node->parent->left = subst; } else { node->parent->right = subst; } } if (subst->left != sentinel) {//如果subst左子结点不是哨兵 subst->left->parent = subst;//subst的左子结点放弃node,挂上来 } if (subst->right != sentinel) {//如果subst右子结点不是哨兵 subst->right->parent = subst;//subst右子结点放弃node,挂上来 } } //清空待删结点node /* DEBUG stuff */ node->left = NULL; node->right = NULL; node->parent = NULL; node->key = 0; //如果subst是红色,红黑树约束依然被遵守,删除工作就可以结束了 if (red) { return; }
看起来结点的删除过程已经顺利完成了,但是如果subst是黑色,我们需要修复红黑树的约束。下面这一段代码的主角是接替subst位置的temp结点:
//当subst的接替结点不是根且为黑色,循环 while (temp != *root && ngx_rbt_is_black(temp)) { if (temp == temp->parent->left) {//如果temp是左子结点 w = temp->parent->right;//获得其右兄弟 if (ngx_rbt_is_red(w)) {//如果temp的右兄弟是红色 ngx_rbt_black(w);//染黑temp的右兄弟 ngx_rbt_red(temp->parent);//染红temp的父结点 //temp的父结点左旋 ngx_rbtree_left_rotate(root, sentinel, temp->parent); w = temp->parent->right;//获得temp的新右兄弟 } //如果temp右兄弟的左右子结点都是黑的 if (ngx_rbt_is_black(w->left) && ngx_rbt_is_black(w->right)) { ngx_rbt_red(w);//染红temp的右兄弟 temp = temp->parent;//获得temp的父结点为新temp } else {//如果temp右兄弟的子结点不全为黑 if (ngx_rbt_is_black(w->right)) {//如果其右子结点是黑色 ngx_rbt_black(w->left);//染黑左子结点 ngx_rbt_red(w);//染红temp的右兄弟 ngx_rbtree_right_rotate(root, sentinel, w);//右兄弟右旋 w = temp->parent->right;//获得temp的新右兄弟 } //temp右兄弟复制temp父结点颜色 ngx_rbt_copy_color(w, temp->parent); ngx_rbt_black(temp->parent);//染黑temp父结点 ngx_rbt_black(w->right);//染黑temp右兄弟的右子结点 //temp父结点左旋 ngx_rbtree_left_rotate(root, sentinel, temp->parent); temp = *root;//获得根 } } else {//如果temp是右子结点,做对称的事 w = temp->parent->left; if (ngx_rbt_is_red(w)) { ngx_rbt_black(w); ngx_rbt_red(temp->parent); ngx_rbtree_right_rotate(root, sentinel, temp->parent); w = temp->parent->left; } if (ngx_rbt_is_black(w->left) && ngx_rbt_is_black(w->right)) { ngx_rbt_red(w); temp = temp->parent; } else { if (ngx_rbt_is_black(w->left)) { ngx_rbt_black(w->right); ngx_rbt_red(w); ngx_rbtree_left_rotate(root, sentinel, w); w = temp->parent->left; } ngx_rbt_copy_color(w, temp->parent); ngx_rbt_black(temp->parent); ngx_rbt_black(w->left); ngx_rbtree_right_rotate(root, sentinel, temp->parent); temp = *root; } } } ngx_rbt_black(temp);//染黑当前temp
跟插入结点时一样乱,我们梳理一下。
首先忽略红黑树的约束进行删除:
①如果删除的是一个叶结点,即没有后继或后继全为哨兵的结点,直接删除即可;
②如果只有一个后继,让其替换待删除结点即可;
③如果有两个后继,需要从树的边缘选择一个结点,有两种等价的选择,待删结点左子树的最大结点和右子树的最小结点,nginx选择的是后者,以这个结点的键与值(key与value/data)替换待删结点的键与值,然后删除这个替身。
不论是①、②情景中的待删结点还是③情景中替身,在源码中都是subst。下面要围绕着它来进行讨论。
以上是不考虑红黑树平衡性的纯拓扑结构变动。下面要考虑是否调整树的拓扑结构使树重新平衡,是否调整结点的颜色使树重新符合红黑树的约束条件。我们知道红黑树有一条关键约束是任意结点到其子树中叶结点的简单路径中黑色结点数相同。那么如果subst是一个红色结点,我们不需要对红黑树做任何调整,它仍是一棵红黑树;如果subst是黑色的,所有经过subst的简单路径上都会少一个黑色结点数,所以需要进行调整。
下面来根据不同情景分情况讨论,因为二叉树的情景左右颠倒时调整方式也可以左右颠倒,我们只讨论subst是左子结点的情况。设刚接替subst的temp为X,X的新右兄弟为W。从经过简化的源码来看,关于结点颜色的变化很令人费解,我们不妨先来看一看:
①W为红色:将W染黑,将X与W的父结点X->parent染红,X->parent左旋,W重设为X的新右兄弟,然后转入情景①、②或③;
②W为黑色,W两个后继都是黑色:将W染红,X重设为X->parent;
③W为黑色,W右子结点为黑色:将W左子结点染黑,将W染红,W右旋,W重设为X的新右兄弟,然后将X->parent的颜色赋给W,将X->parent染黑,X->parent左旋,根赋给temp;
④W为黑色,W右子结点为红色:将W左子结点染黑,将W染红,W右旋,W重设为X的新右兄弟,然后将X->parent的颜色赋给W,将X->parent染黑,将W右子结点染黑,X->parent左旋,根赋给temp。
最后还要把temp染黑。我们可以看到情景①中进行了一次左旋,情景②只进行了染色,情景③、④都进行了一次右旋和一次左旋。情景①处理结束时一定还要转入别的情景,情景②、③、④的出现则标志着本次调整的结束。那么,红黑树删除结点后的调整过程中,依情景①循环出现的次数,调整过程中旋转的最多见的次数将是1次、2次、3次,再往上次数越多越罕见(依情景①循环出现的次数),最多旋转次数将可能到达树高即log2n次。生产环境中,删除结点后平均每次调整中旋转的次数就像分析源码之前提到的,将是常数规模的。
Next, I plan to rewrite the red-black tree in a step-by-step renovation version to gain a more detailed and intuitive understanding of the data structure of the red-black tree. Before rewriting, we need to understand that all leaf nodes in nginx's red-black are sentinels, which achieves an optimization of the red-black tree when adjusting the red-black tree. By adding a layer of all-black sub-nodes, red nodes are allowed to appear in the sub-trees that actually have values in the red-black tree. Although I have not proved it, this constant scale increases the number of rotations when deleting nodes, and also promotes the probability of adjustment when inserting new nodes (increasing the probability of inserting new nodes under red nodes), and also increases the probability of inserting new nodes under red nodes. the number of rotations. Rotation will compress the height of the red-black tree subtree and improve query efficiency.
In the process of rewriting the red-black tree from simplicity to refinement, I will consider using nginx to optimize the red-black tree from least to most, or add my own optimization.
Copyright Statement: This article is an original article by the blogger and may not be reproduced without the blogger's permission.
The above introduces the data structure 1 of nginx - ngx_int_t and ngx_rbtree_t, including the relevant content. I hope it will be helpful to friends who are interested in PHP tutorials.