Database Systems
( 資料庫系統 )
November 1, 2004
Lecture #8
Announcement
• Next week reading: Chapter 11 Hash-based Index • Assignment #5 is due 11/4 (Thur).
• Assignment #6 is available on the course homepage tom orrow (due 11/24)
Cool Ubicomp Project
Information Art (Georgia Tech)
• Pleasant paintings that can give “information” (from the Internet)
• You can customize information & their presentation.
(1) Today’s weather forecast (2) Tomorrow weather forecast (3) Temperature
(5) Stock market index (6) Traffic
Tree-Structured Indexing
Outline
• Motivation for tree-structured indexes • ISAM index
• B+ tree index
• Key compression • B+ tree bulk-loading • Clustered index
Review: Three Alternatives for Data
Entries
• As for any index, 3 alternatives for data entries k*:
(1) Clustered Index: Data record with key value k
(2) Unclustered Index: <k, rid of data record with search key value
k>
(3) Unclustered Index: <k, list of rids of data records with search key k>, useful when search key is not unique (not a candidate key).
• Choice of data entries is independent to the indexing
technique used to locate data entries k*.
– Two general indexing techniques: hash-structured indexing or tree-structured indexing
Tree vs. Hash-Structured Indexing
• Tree-structured indexing supports both range searches
and equality searches efficiently.
– Why efficient range searches?
• Data entries (on the leaf nodes of the tree) are sorted.
• Perform equality search on the first qualifying data entry + scan to find the rests.
• Data records also need to be sorted by search key in case that the range searches access record fields other than the search key.
• Hash-structured indexing supports equality search very efficiently, but not range searches.
– Why inefficient range searches?
Range Searches
• ``Find all students with gpa > 3.0’’
– If data is in sorted file, do binary search to find first such student, then
scan to find others.
– Cost of binary search over data file can still be quite high (proportional
to the number of page I/Os)
• Simple solution: create a smaller index file.
– Cost of binary search over the index file is reduced.
k2 kN
Motivation for Tree-Structure Index
• But, the index file can still be large.
– The cost of binary search over the index file can still be large. Can we reduce the cost further?
• Apply the simple solution again: create multiple levels of indexes.
– Each index level is much smaller than the lower index level. This index structure is a tree.
– A tree node is an index page that can hold, e.g.,100 indexes. – A tree with a depth of 4 (from the root index page to the leaf
index page) can hold over 100,000,000 records. – The cost of search is 3~4 page access.
ISAM and B+ Tree
• Two tree-structured indexings:
– ISAM (Indexed Sequential Access Method): static structure.
• Assuming that the file does not grow or shrink too much.
– B+ tree: dynamic structure
• Tree structure adjusts gracefully under inserts and deletes. • B+ tree is a variant of B tree.
• Analyze cost of the following operations:
– Search
– Insertion of data entries – Deletion of data entries – Concurrent access.
12
ISAM
Leaf pages contain data entries.
P0 K 1 P 1 K 2 P 2 K m P m index entry Non-leaf Pages Pages Overflow page Primary pages Leaf Index Pages
Example
Root
10 0 12 0 15 0 18 0 30 3 5 11 30 35 100 101 110 120 013 150 156 179 180 20057 81 95
to keys to keys to keys to keys
< 57 57 k<81 81
k<95 k>=95Leaf node
57 81 95 To r ec or d w ith k ey 5 7 To r ec or d w ith k ey 8 1 To r ec or d w ith k ey 8 5Comments on ISAM
• File creation:
– Assume that data records are available and will not change much in the future.
– Sort data records. Allocate data pages for the sorted data records.
– Sort data entries based on the search keys.
Allocate leaf index pages for sorted data entries sequentially.
ISAM Operations
• Search: Start at root; use key comparisons to go to leaf.
– Cost = log F N + #overflow pages
– F = # entries/index page and N = # leaf pages
• Insert: Find the leaf page and put it there. If the leaf
page is full, put it in the overflow page.
– Cost = search cost + constant (assuming little or no overflow pages)
• Delete: Find and remove from the leaf page; if empty
overflow page, de-allocate.
– Cost = search cost + constant (assuming little or no overflow pages)
Example ISAM Tree
• Each node can hold 2 entries; no need for `next-leaf-page’ pointers in primary pages.
– Why? Primary pages are allocated sequentially at file creation time.
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
20 33 51 63
40
After Inserting 23*, 48*, 41*, 42*
...
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97* 20 33 51 63 40 Root 23* 48* 41* Overflow Pages Leaf Index Pages Pages Primary... Then Deleting 42*, 51
*, 97*
Note that 51* appears in index levels, but not in leaf!
10* 15* 20* 27* 33* 37* 40* 46* 55* 63*
20 33 51 63
40
Root
Properties of ISAM Tree
• Insertions and deletions affect only the leaf pages, not the non-leaf pages
– index in the tree is static.
• Static index tree has both advantages & disadvantages.
– Advantage: No locking and waiting on index pages for concurrent access.
– Disadvantage: when a file grows, it creates large overflow chains, leading to poor performance.
• ISAM tree is good when data does not change much.
– To accommodate some insertions, can leave the primarily pages 20% empty.
22
B+ Tree
• It is similar to ISAM tree-structure, except:
– It has no overflow chains (this is the cause of poor performance in ISAM).
• When an insertion goes to a leaf page becomes full, a new leaf page is created.
– Leaf pages are not allocated sequentially. Leaf pages are
sorted and organized into doubly-linked list.
– Index pages can grow and shrink with size of data file.
• What is the difference between B+ tree and B tree?
Index Entries
Data Entries (Direct search)
Properties of B+ Tree
• Keep tree height-balanced.
– Balance means that
distance from root to all leaf nodes are the same .
• Minimum 50% occupancy
(except for root)
– Each index page node must contain d <= m <= 2d
entries.
– The parameter m is the number of occupied entries. – The parameter d is called
the order of the tree (or ½
• What is the value of m? • What is the value of d?
More Properties of B+ Tree
• Cost of search, insert, and delete (disk page I/O
s):
– O (height of the tree) = O(log d+1 N) (N = # leaf page
s)
• Supports equality and range-searches efficiently.
• B+ tree is the most widely used index in DBMS.
Example B+ Tree
• Search begins at root, and key comparisons direct it to a leaf (as in ISAM).
• Search for 5*, 15*, all data entries >= 24* ...
Root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13
B+ Trees in Practice
• Typical order: 100. Typical fill-factor: 67%.
– average fanout = 133
• Typical capacities:
– Height 4: 1334 = 312,900,700 records
– Height 3: 1333 = 2,352,637 records
• Can often hold top levels in buffer pool:
– Level 1 = 1 page = 8 Kbytes – Level 2 = 133 pages = 1 Mbyte – Level 3 = 17,689 pages = 133 Mbytes
Inserting a Data Entry into a B+
Tree
• Find correct leaf L. • Put data entry into L.
– If L has enough space, done!
– Else, must split L (into L and a new node L2)
• Redistribute entries evenly, copy up middle key. • Insert index entry pointing to L2 into parent of L.
• This can happen recursively
– To split index node, redistribute entries evenly, but push up
middle key. (Contrast with leaf splits.)
• Splits “grow” tree; root split increases height.
Inserting 8*
• Observe how
minimum occupancy is guaranteed in both leaf and index pg
splits.
• Note difference between copy-up
and push-up; be sure
you understand the reasons for this.
2* 3* 5* 7* 8*
5
Entry to be inserted in parent node. (Note that 5 is
continues to appear in the leaf.)s copied up and
appears once in the index. Contrast
5 24 30
17
13
Entry to be inserted in parent node. (Note that 17 is pushed up and only this with a leaf split.)
Root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13
Example B+ Tree After Inserting
8*
Notice that root was split, leading to increase in height.
In this example, we can avoid split by re-distrib
2* 3* Root 17 24 30 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 5 7* 5* 8*
Redistribution after Inserting 8*
Root 17 24 30 2* 3* 5* 7* 8* 14* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 8 Root 17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 16*Check sibling leaf node to see if it has space. Copy up 8 (new low key value on the 2nd
Deleting a Data Entry from a B+
Tree
• Start at root, find leaf L where entry belongs. • Remove the entry.
– If L is at least half-full, done! – If L has only d-1 entries,
• Try to re-distribute, borrowing from sibling (adjacent node with
same parent as L).
• If re-distribution fails, merge L and sibling.
• If merge occurred, must delete entry (pointing to L or sibling) from parent of L.
Tree After Deleting 19* and 20* ...
• Deleting 19* is easy.
• Deleting 20* is done with re-distribution. Notice how middle key is copied up.
39* 2* 3* 17 30 14* 16* 33* 34* 38* 13 5 7* 5* 8* 22* 24* 27 27* 29* 2* 3* Root 17 24 30 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 5 7* 5* 8*
• And then deleting 24* • Must merge.
• Observe `toss’ of index entry (27), and `pull down’ of index entry (17). 30 22* 27* 29* 33* 34* 38* 39* Root 30 13 5 17 2* 3* 17 30 14* 16* 33* 34* 38* 13 5 7* 5* 8* 22* 24* 27 27* 29* toss Pull down 39*
Example of Non-leaf
Re-distribution
• Tree is shown below during deletion of 24*. (What could be a possible initial tree?)
• In contrast to previous example, can re-distribute entry from left child of root to right child.
Root 13 5 17 20 22 30 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39* 7* 5* 8* 3* 2*
After Re-distribution
• Intuitively, entries are re-distributed by `pushing through’ the splitting entry in the parent node.
• It suffices to distribute index entry with key 20; we’ve re-distributed 17 as well for illustration.
Root 13 5 17 30 20 22
Prefix Key Compression
• Important to increase fan-out. (Why?)
• Key values in index entries only `direct traffic’; can often compress them.
– Compress “David Smith” to “Dav”? How about “Davi”?
– In general, while compressing, must leave each index entry greater than every key value (in any subtree) to its left.
• Insert/delete must be suitably modified.
Daniel Lee David Smith Devarakonda …
Bulk Loading of a B+ Tree
• If we have a large collection of records, and we want to create a B+ tree index on a field, doing so by repeatedly inserting records is very slow.
– Cost = # entries * logF(N), where F = fan-out, N = # index pages
• Bulk Loading can be done much more efficiently.
– Step 1: Sort data entries. Insert pointer to first (leaf) page in a new (root) page.
Sorted pages of data entries; not yet in B+ tree Root
Bulk Loading (Contd.)
• Step 2: Build Index entries for leaf
pages.
– Always entered into right-most index
page just above leaf level. When this fills up, it splits. (Split may go up right-most path to the root.)
– Cost = # index pages, which is much faster than repeated inserts.
3* 4* 6* 9* 10* 11* 12* 13* 20*22* 23* 31* 35*36* 38*41* 44*
Root
Data entry pages not yet in B+ tree
35 23 12 6 10 20 3* 4* 6* 9* 10* 11* 12* 13* 20*22* 23* 31* 35*36* 38*41* 44* 6 Root 10 12 23 20 35 38
not yet in B+ tree Data entry pages
Summary of Bulk Loading
• Option 1: multiple inserts.
– More I/Os during build.
– Does not give sequential storage of leaves.
• Option 2: Bulk Loading
– Fewer I/Os during build.
– Leaves will be stored sequentially (and linked, of course). – Can control “fill factor” on pages.
A Note on `Order’
• Order (the parameter d) concept denote minimum occupancy on the number of entries per index page.
– But it is not practical in real implementation. Why?
– Index pages can typically hold many more entries than leaf pages. – Variable sized records and search keys mean different nodes will
contain different numbers of entries.
– Even with fixed length fields, multiple records with the same
search key value (duplicates) can lead to variable-sized data entries (if we use Alternative (3)).
• Order is replaced by physical space criterion (`at least
Summary
• Tree-structured indexes are ideal for range-searches, also good for equality searches.
• ISAM is a static structure.
– Only leaf pages modified; overflow pages needed.
– Overflow chains can degrade performance unless size of data set
and data distribution stay constant.
• B+ tree is a dynamic structure.
– Inserts/deletes leave tree height-balanced; log F N cost. – High fanout (F) means depth rarely more than 3 or 4. – Almost always better than maintaining a sorted file.
Summary (Contd.)
– Typically, 67% occupancy on average.
– Usually preferable to ISAM, modulo locking considerations;
adjusts to growth gracefully.
• Key compression increases fanout, reduces height.
• Bulk loading can be much faster than repeated inserts for creating a B+ tree on a large data set.