Database Systems
( 資料庫系統 )
November 21, 2005
Lecture #8
Announcement
• Next week reading: Chapters 11
Hash-based Index
• Assignment #3 is available for pickup later
today.
• Assignment #4 and practicum #2 will be
available on the course homepage.
Intelligent Furniture
• Weight table +
history tablecloth
• Dietary-aware
Tree-Structured Indexing
Outline
• Motivation for tree-structured indexes
• ISAM index
• B+ tree index
• Key compression
• B+ tree bulk-loading
• Clustered index
Review: Three Alternatives for
Data Entries
• As for any index, 3 alternatives for data entries
k*:
(1) Clustered Index: Data record with key value k
(2) Unclustered Index: <k, rid of data record with search
key value k>
(3) Unclustered Index: <k, list of rids of data records with
search key k>, useful when search key is not unique
(not a candidate key).
• Choice of data entries is orthogonal to the
indexing technique used to locate data entries k*.
– Two general indexing techniques: hash-structured
indexing or
tree-structured indexing
Tree vs. Hash-Structured
Indexing
• Tree index supports both
range searches
and
equality searches
efficiently.
– Why efficient range searches?
• Data entries (on the leaf nodes of the tree) are sorted.
• Perform equality search on the first qualifying data entry + scan to find the rests.
• Data records also need to be sorted by search key in case that the range searches access record fields other than the search key.
• Hash index supports equality search efficiently,
but not range search.
– Why inefficient range searches?
Step back: Range Searches
• ``Find all students with gpa > 3.0’’
– If data is in sorted file, do binary search to find first such
student, then scan to find others.
– Cost of binary search over data file can still be quite high
(proportional to the number of page I/Os)
• Simple solution: create a smaller
index file
.
– Cost of binary search over index file is reduced.
Page 1 Page 2 Page 3 Page N Data File
k2 kN
Motivation for Tree-Structure
Index
• But, the index file can still be large.
– The cost of binary search over the index file can still be large.
– Can we further reduce search cost?
• Apply the simple solution again: create multiple
levels of indexes.
– Each index level is much smaller than the lower index level. This index structure is a tree.
– A tree node is an index page that can hold, e.g.,100 indexes.
– A tree with a depth of 4 (from the root index page to the leaf index page) can hold over 100,000,000 records.
ISAM and B+ Tree
• Two tree-structured indexings:
– ISAM
(Indexed Sequential Access Method):
static
structure.
• Assuming that the file does not grow or shrink too much.
– B+ tree: dynamic structure
• Tree structure adjusts gracefully under inserts and deletes.
• Analyze cost of the following operations:
– Search
– Insertion of data entries
– Deletion of data entries
– Concurrent access.
11
ISAM
Leaf pages contain data entries
.
P0 K 1 P 1 K 2 P 2 K m P m index entry Non-leaf Pages Pages Overflow page Primary pages Leaf Index Pages
Example
10
0
12
0
15
0
18
0
30
3 5 11
30 35
10
0
10
1
11
0
12
0
0
13
15
0
15
6
17
9
18
0
20
0
57
81
95
to keys
to keys
to keys
to keys
< 57
57 k<81
81
k<95
k>=95
Leaf node
57
81
95
To
r
e
co
rd
w
it
h
k
e
y
5
7
To
r
e
co
rd
w
it
h
k
e
y
8
1
To
r
e
co
rd
w
it
h
k
e
y
8
5
Comments on ISAM
• File creation:
– Assume that data records are present
and will not change much in the future.
– Sort data records. Allocate data pages
for the sorted data records.
– Sort data entries based on the search
keys.
Allocate leaf index pages for
ISAM Operations
• Search
: Start at root; use key comparisons
to go to leaf.
– Cost = log
FN, where F = # entries/index page, N
= # leaf pages
• Insert
: Find the leaf page and put it there. If
the leaf page is full, put it in the overflow
page.
– Cost = search cost + constant (assuming little or
no overflow pages)
• Delete
: Find and remove from the leaf page;
if empty overflow page, de-allocate.
– Cost = search cost + constant (assuming little or
no overflow pages)
Example ISAM Tree
• Each node can hold 2 entries; no need for
`next-leaf-page’ pointers in primary pages. Why not?
– Primary pages are allocated sequentially at file creation
time.
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
20 33 51 63
40 Root
After Inserting 23*, 48*, 41*,
42* ...
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97* 20 33 51 63 40 Root 23* 48* 41* 42* Overflow Pages Leaf Index Pages Pages Primary... Then Deleting
42*, 51*, 97*
Note that 51* appears in index levels, but not in leaf!
10* 15* 20* 27* 33* 37* 40* 46* 55* 63*
20 33 51 63
40
Root
Properties of ISAM Tree
• Insertions and deletions affect only the leaf pages,
not the non-leaf pages
– index in the tree is static.
• Static index tree has both advantages &
disadvantages.
– Advantage: No locking and waiting on index pages for concurrent access.
– Disadvantage: when a file grows, it creates large overflow chains, leading to poor performance.
• ISAM tree is good when data does not change much.
– To accommodate some insertions, can leave the primarily pages 20% empty.
• B+ tree can support file growth & shrink efficiently,
but at the cost of locking overhead.
B+ Tree
• It is similar to ISAM tree-structure, except:
– It has no overflow chains (this is the cause of poor
performance in ISAM).
• When an insertion goes to a leaf page becomes full, a new leaf page is created.
– Leaf pages are
not allocated sequentially
. Leaf pages
are
sorted
and organized into
doubly-linked list.
– Index pages can grow and shrink with size of data file.
Index Entries
Data Entries ("Sequence set") (Direct search)
Properties of B+ Tree
• Keep tree
height-balanced
.
– Balance means that distance
from root to all leaf nodes are
the same .
• Minimum
50% occupancy
(except for root)
– Each index page node must
contain
d <= m <= 2d
entries.
– The parameter
m
is the
number of occupied entries.
– The parameter
d
is called the
order of the tree (or ½ node
capacity)
More Properties of B+ Tree
• Cost of search, insert, and delete (disk pa
ge I/Os):
– Θ(height of the tree) = Θ(log
m+1N), where N =
# leaf pages
• Supports equality and range-searches effici
ently.
Example B+ Tree
• Search begins at root, and key comparisons direct
it to a leaf (as in ISAM).
• Search for 5*, 15*, all data entries >= 24* ...
Root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13
B+ Trees in Practice
• Typical order: 100. Typical fill-factor: 67%.
–average fanout = 133
• Typical capacities:
–
Height 4: 133
4= 312,900,700 records
–
Height 3: 133
3= 2,352,637 records
• Can often hold top levels in buffer pool:
–Level 1 = 1 page = 8 Kbytes
–
Level 2 = 133 pages = 1 Mbyte
–Level 3 = 17,689 pages = 133 Mbytes
Inserting a Data Entry into a
B+ Tree
• Find correct leaf L.
• Put data entry onto L.
–
If L has enough space, done!
–
Else, must
split
L (into L and a new node L2)
• Redistribute entries evenly, copy up middle key. • Insert index entry pointing to L2 into parent of L.
• This can happen recursively
–
To split index node
, redistribute entries evenly, but
push
up
middle key. (Contrast with leaf splits.)
• Splits “grow” tree; root split increases height.
–Tree growth: gets
wider
or
one level taller at top.
Inserting 8*
• Observe how
minimum occupancy is guaranteed in
both leaf and index pg splits.
• Note difference between copy-up
and push-up; be
sure you understand the reasons for this.
2* 3* 5* 7* 8*
5 Entry to be inserted in parent node. (Note that 5 is copied
up and continues to appear in the leaf.) 5 24 30 17 13 Root 17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 Entry to be inserted in parent node. (Note that 17 is pushed up and only appears once in the
index. Contrast this with a leaf split.
Example B+ Tree After
Inserting 8*
Root was split, leading to increase in height.
Avoid split by re-distributing entries.
2* 3* Root 17 24 30 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 5 7* 5* 8*
Redistribution after Inserting
8*
Root 17 24 30 2* 3* 5* 7* 8* 14* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 8 Root 17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 16*Check sibling leaf node to see if it has space.
Copy up 8 (new low key value on the 2nd
Deleting a Data Entry from
a B+ Tree
• Start at root, find leaf L where entry belongs.
• Remove the entry.
–
If L is at least half-full, done!
–If L is less than half-full,
• Try to
re-distribute
, borrowing from
sibling
(adjacent node with same parent as L).
• If re-distribution fails,
merge
L and sibling.
• If merge occurred, must delete entry
(pointing to L or sibling) from parent of L.
• Merge could propagate to root, decreasing
Tree After Deleting 19* and 20* ...
• Deleting 19* is easy.
• Deleting 20* is done with re-distribution. Notice how middle key is copied up.
39* 2* 3* 17 30 14* 16* 33* 34* 38* 13 5 7* 5* 8* 22* 24* 27 27* 29* 2* 3* Root 17 24 30 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 5 7* 5* 8*
• And then deleting 24*
• Must merge.
• Observe
toss
of index
entry (27), and
pull down
of index entry (17).
30 22* 27* 29* 33* 34* 38* 39* Root 30 13 5 17 2* 3* 17 30 14* 16* 33* 34* 38* 13 5 7* 5* 8* 22* 24* 27 27* 29* toss Pull down 39*Example of Non-leaf
Re-distribution
• Tree is shown below during deletion of 24*. (What
could be a possible initial tree?)
• In contrast to previous example, can re-distribute
entry from left child of root to right child.
Root 13 5 17 20 22 30 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39* 7* 5* 8* 3* 2*
After Re-distribution
• Intuitively, entries are
re-distributed by pushing
through
the splitting entry in the parent node.
• It suffices to re-distribute index entry with key 20;
we’ve re-distributed 17 as well for illustration.
Root 13 5 17 30 20 22
Prefix Key Compression
• Important to increase fan-out. (Why?)
• Key values in index entries only `direct traffic’;
can often compress them.
– Compress “David Smith” to “Dav”? How about “Davi”?
– In general, while compressing, must leave each index
entry greater than every key value (in any subtree) to
its left.
Daniel Lee David Smith Devarakonda …
Bulk Loading of a B+ Tree
• If we have a large collection of records, and we
want to create a B+ tree on some field, doing so
by repeatedly inserting records is very slow.
– Cost = # entries * log
F(N), where F = fan-out, N = #
index pages
• Bulk Loading can be done much more efficiently.
– Step 1: Sort data entries. Insert pointer to first (leaf)
page in a new (root) page.
Sorted pages of data entries; not yet in B+ tree
Bulk Loading(Conti)
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
Sorted pages of data entries; not yet in B+ tree
Root
6* 10*
3* 4* 6* 9* 10* 11* 12* 13* 20*22* 23* 31* 35*36* 38*41* 44* Root
Data entry pages not yet in B+ tree 12
6
Bulk Loading (Contd.)
• Step 2: Build Index
entries for leaf
pages.
– Always entered into right-most index
page just above leaf level. When this
fills up, it splits. (Split may go up right-most path to the root.)
– Cost = # index pages, which is much faster than
3* 4* 6* 9* 10* 11* 12* 13* 20*22* 23* 31* 35*36* 38*41* 44* Root
Data entry pages not yet in B+ tree 35 23 12 6 10 20 6 Root 10 12 23 20 35 38
not yet in B+ tree Data entry pages
Summary of Bulk Loading
• Option 1: multiple inserts.
–More I/Os during build.
–
Does not give sequential storage of leaves.
• Option 2: Bulk Loading
–
Fewer I/Os during build.
–
Leaves will be stored sequentially (and linked, of
course).
A Note on `Order’
• Order (the parameter d) concept denote
minimum occupancy on the number of
entries per index page.
– But it is not practical in real implementation. Why?
• Index pages can typically hold many more entries than
leaf pages.
• Variable sized records and search keys mean different
nodes will contain different numbers of entries.