• 沒有找到結果。

Self-Balancing Binary Search Trees

在文檔中 Linux Kernel Development (頁 132-135)

The depth of a node is measured by how many parent nodes it is from the root.Nodes at the

“bottom”of the tree—those with no children—are called leaves.The height of a tree is the depth of the deepest node in the tree.A balanced binary search tree is a binary search tree in which the depth of all leaves differs by at most one (see Figure 6.8).A self-balancing binary search tree is a binary search tree that attempts,as part of its normal operations,to remain (semi) balanced.

Red-Black Trees

A red-black tree is a type of self-balancing binary search tree. Linux’s primary binary tree data structure is the red-black tree. Red-black trees have a special color attribute, which is either red or black. Red-black trees remain semi-balanced by enforcing that the following six properties remain true:

1. All nodes are either red or black.

2. Leaf nodes are black.

3. Leaf nodes do not contain data.

4. All non-leaf nodes have two children.

5. If a node is red, both of its children are black.

6. The path from a node to one of its leaves contains the same number of black nodes as the shortest path to any of its other leaves.

Taken together, these properties ensure that the deepest leaf has a depth of no more than double that of the shallowest leaf. Consequently, the tree is always semi-balanced.

Why this is true is surprisingly simple. First, by property five, a red node cannot be the child or parent of another red node. By property six, all paths through the tree to its leaves have the same number of black nodes.The longest path through the tree alternates red and black nodes.Thus the shortest path, which must have the same number of black nodes, contains only black nodes.Therefore, the longest path from the root to a leaf is no more than double the shortest path from the root to any other leaf.

11

7 18

3 9 16

8

42

10

Figure 6.8 A balanced binary search tree.

ptg If the insertion and removal operations enforce these six properties, the tree remains

semi-balanced. Now, it might seem odd to require insert and remove to maintain these particular properties.Why not implement the operations such that they enforce other, simpler rules that result in a balanced tree? It turns out that these properties are relatively easy to enforce (although complex to implement), allowing insert and remove to guaran-tee a semi-balanced tree without burdensome extra overhead.

Describing how insert and remove enforce these rules is beyond the scope of this book.

Although simple rules, the implementation is complex. Any good undergraduate-level data structures textbook ought to give a full treatment.

rbtrees

The Linux implementation of red-black trees is called rbtrees.They are defined in

lib/rbtree.c and declared in <linux/rbtree.h>. Aside from optimizations, Linux’s rbtrees resemble the “classic” red-black tree as described in the previous section.They remain balanced such that inserts are always logarithmic with respect to the number of nodes in the tree.

The root of an rbtree is represented by the rb_root structure.To create a new tree, we allocate a new rb_root and initialize it to the special value RB_ROOT:

struct rb_root root = RB_ROOT;

Individual nodes in an rbtree are represented by the rb_node structure. Given an

rb_node, we can move to its left or right child by following pointers off the node of the same name.

The rbtree implementation does not provide search and insert routines. Users of rbtrees are expected to define their own.This is because C does not make generic pro-gramming easy, and the Linux kernel developers believed the most efficient way to imple-ment search and insert was to require each user to do so manually, using provided rbtree helper functions but their own comparison operators.

The best way to demonstrate search and insert is to show a real-world example. First, let’s look at search.The following function implements a search of Linux’s page cache for a chunk of a file (represented by an inode and offset pair). Each inode has its own rbtree, keyed off of page offsets into file.This function thus searches the given inode’s rbtree for a matching offset value:

struct page * rb_search_page_cache(struct inode *inode, unsigned long offset) {

struct rb_node *n = inode->i_rb_page_cache.rb_node;

while (n) {

struct page *page = rb_entry(n, struct page, rb_page_cache);

ptg Binary Trees 107

if (offset < page->offset) n = n->rb_left;

else if (offset > page->offset) n = n->rb_right;

else

return page;

}

return NULL;

}

In this example, the while loop iterates over the rbtree, traversing as needed to the left or right child in the direction of the given offset.The if and else statements implement the rbtree’s comparison function, thus enforcing the tree’s ordering. If the loop finds a node with a matching offset, the search is complete, and the function returns the associ-ated page structure. If the loop reaches the end of the rbtree without finding a match, one does not exist in the tree, and the function returns NULL.

Insert is even more complicated because it implements both search and insertion logic.

The following isn’t a trivial function, but if you need to implement your own insert rou-tine, this is a good guide:

struct page * rb_insert_page_cache(struct inode *inode, unsigned long offset, struct rb_node *node) {

struct rb_node **p = &inode->i_rb_page_cache.rb_node;

struct rb_node *parent = NULL;

struct page *page;

while (*p) {

parent = *p;

page = rb_entry(parent, struct page, rb_page_cache);

if (offset < page->offset) p = &(*p)->rb_left;

else if (offset > page->offset) p = &(*p)->rb_right;

else

return page;

}

rb_link_node(node, parent, p);

rb_insert_color(node, &inode->i_rb_page_cache);

return NULL;

}

ptg As with our search function, the while loop is iterating over the tree, moving in the

direction of the provided offset. Unlike with search, however, the function is hoping not to find a matching offset but, instead, reach the leaf node that is the correct insertion point for the new offset.When the insertion point is found, rb_link_node() is called to insert the new node at the given spot. rb_insert_color() is then called to perform the complicated rebalancing dance.The function returns NULL if the page was added to the page cache and the address of an existing page structure if the page is already in the cache.

在文檔中 Linux Kernel Development (頁 132-135)