Database Systems (資料庫系統) Lecture #8

(1)

Database Systems

( 資料庫系統 )

November 1, 2004

Lecture #8

(2)

Announcement

• Next week reading: Chapter 11 Hash-based Index • Assignment #5 is due 11/4 (Thur).

• Assignment #6 is available on the course homepage tom orrow (due 11/24)

(3)

Cool Ubicomp Project

Information Art (Georgia Tech)

• Pleasant paintings that can give “information” (from the Internet)

• You can customize information & their presentation.

(1) Today’s weather forecast (2) Tomorrow weather forecast (3) Temperature

(5) Stock market index (6) Traffic

(4)

(5)

Tree-Structured Indexing

(6)

Outline

• Motivation for tree-structured indexes • ISAM index

• B+ tree index

• Key compression • B+ tree bulk-loading • Clustered index

(7)

Review: Three Alternatives for Data

Entries

• As for any index, 3 alternatives for data entries k*:

(1) Clustered Index: Data record with key value k

(2) Unclustered Index: <k, rid of data record with search key value

k>

(3) Unclustered Index: <k, list of rids of data records with search key k>, useful when search key is not unique (not a candidate key).

• Choice of data entries is independent to the indexing

technique used to locate data entries k*.

– Two general indexing techniques: hash-structured indexing or tree-structured indexing

(8)

Tree vs. Hash-Structured Indexing

• Tree-structured indexing supports both range searches

and equality searches efficiently.

– Why efficient range searches?

• Data entries (on the leaf nodes of the tree) are sorted.

• Perform equality search on the first qualifying data entry + scan to find the rests.

• Data records also need to be sorted by search key in case that the range searches access record fields other than the search key.

• Hash-structured indexing supports equality search very efficiently, but not range searches.

– Why inefficient range searches?

(9)

Range Searches

• ``Find all students with gpa > 3.0’’

– _{If data is in sorted file, do binary search to find first such student, then}

scan to find others.

– _{Cost of binary search over data file can still be quite high (proportional}

to the number of page I/Os)

• Simple solution: create a smaller index file.

– Cost of binary search over the index file is reduced.

k2 kN

(10)

Motivation for Tree-Structure Index

• But, the index file can still be large.

– The cost of binary search over the index file can still be large. Can we reduce the cost further?

• Apply the simple solution again: create multiple levels of indexes.

– Each index level is much smaller than the lower index level. This index structure is a tree.

– A tree node is an index page that can hold, e.g.,100 indexes. – A tree with a depth of 4 (from the root index page to the leaf

index page) can hold over 100,000,000 records. – The cost of search is 3~4 page access.

(11)

ISAM and B+ Tree

• Two tree-structured indexings:

– ISAM (Indexed Sequential Access Method): static structure.

• Assuming that the file does not grow or shrink too much.

– B+ tree: dynamic structure

• Tree structure adjusts gracefully under inserts and deletes. • B+ tree is a variant of B tree.

• Analyze cost of the following operations:

– Search

– Insertion of data entries – Deletion of data entries – Concurrent access.

(12)

12

ISAM

_{Leaf pages contain}_{data entries}_.

P₀ K ₁ P ₁ _{K 2 P 2} _{K m P m} index entry Non-leaf Pages Pages Overflow page Primary pages Leaf Index Pages

(13)

Example

Root

10 0 12 0 15 0 18 0 30 3 5 11 _{30 35} ₁₀0 ₁₀1 ₁₁0 ₁₂0 0₁₃ ₁₅0 ₁₅6 ₁₇9 ₁₈0 ₂₀0

(14)

57 81 95

to keys to keys to keys to keys

< 57 57 k<81 81



k<95 k>=95

(15)

Leaf node

57 81 95 To r ec or d w ith k ey 5 7 To r ec or d w ith k ey 8 1 To r ec or d w ith k ey 8 5

(16)

Comments on ISAM

• File creation:

– Assume that data records are available and will not change much in the future.

– Sort data records. Allocate data pages for the sorted data records.

– Sort data entries based on the search keys.

Allocate leaf index pages for sorted data entries sequentially.

(17)

ISAM Operations

• Search: Start at root; use key comparisons to go to leaf.

– Cost = log F N + #overflow pages

– F = # entries/index page and N = # leaf pages

• Insert: Find the leaf page and put it there. If the leaf

page is full, put it in the overflow page.

– Cost = search cost + constant (assuming little or no overflow pages)

• Delete: Find and remove from the leaf page; if empty

overflow page, de-allocate.

– Cost = search cost + constant (assuming little or no overflow pages)

(18)

Example ISAM Tree

• Each node can hold 2 entries; no need for `next-leaf-page’ pointers in primary pages.

– Why? Primary pages are allocated sequentially at file creation time.

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* _63* 97*

20 33 51 63

40

(19)

After Inserting 23, 48, 41, 42

...

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* _63* 97* 20 33 51 63 40 Root 23* 48* 41* Overflow Pages Leaf Index Pages Pages Primary

(20)

... Then Deleting 42*, 51

, 97

 Note that 51* appears in index levels, but not in leaf!

10* 15* 20* 27* 33* 37* 40* 46* 55* _63*

20 33 51 63

40

Root

(21)

Properties of ISAM Tree

• Insertions and deletions affect only the leaf pages, not the non-leaf pages

– index in the tree is static.

• Static index tree has both advantages & disadvantages.

– Advantage: No locking and waiting on index pages for concurrent access.

– Disadvantage: when a file grows, it creates large overflow chains, leading to poor performance.

• ISAM tree is good when data does not change much.

– To accommodate some insertions, can leave the primarily pages 20% empty.

(22)

22

B+ Tree

• It is similar to ISAM tree-structure, except:

– It has no overflow chains (this is the cause of poor performance in ISAM).

• When an insertion goes to a leaf page becomes full, a new leaf page is created.

– Leaf pages are not allocated sequentially. Leaf pages are

sorted and organized into doubly-linked list.

– Index pages can grow and shrink with size of data file.

• What is the difference between B+ tree and B tree?

Index Entries

Data Entries (Direct search)

(23)

Properties of B+ Tree

• Keep tree height-balanced.

– Balance means that

distance from root to all leaf nodes are the same .

• Minimum 50% occupancy

(except for root)

– Each index page node must contain d <= m <= 2d

entries.

– The parameter m is the number of occupied entries. – The parameter d is called

the order of the tree (or ½

• What is the value of m? • What is the value of d?

(24)

More Properties of B+ Tree

• Cost of search, insert, and delete (disk page I/O

s):

– O (height of the tree) = O(log d+1 N) (N = # leaf page

s)

• Supports equality and range-searches efficiently.

• B+ tree is the most widely used index in DBMS.

(25)

Example B+ Tree

• Search begins at root, and key comparisons direct it to a leaf (as in ISAM).

• Search for 5*, 15*, all data entries >= 24* ...

Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13

(26)

B+ Trees in Practice

• Typical order: 100. Typical fill-factor: 67%.

– _{average fanout = 133}

• Typical capacities:

– _{Height 4: 133}4 = 312,900,700 records

– Height 3: 1333 = 2,352,637 records

• Can often hold top levels in buffer pool:

– _{Level 1 = 1 page = 8 Kbytes} – Level 2 = 133 pages = 1 Mbyte – _{Level 3 = 17,689 pages = 133 Mbytes}

(27)

Inserting a Data Entry into a B+

Tree

• Find correct leaf L. • Put data entry into L.

– _{If L has enough space, done!}

– _{Else, must}_split_{L (into L and a new node L2)}

• Redistribute entries evenly, copy up middle key. • Insert index entry pointing to L2 into parent of L.

• This can happen recursively

– _{To split index node}_{, redistribute entries evenly, but}_{push up}

middle key. (Contrast with leaf splits.)

• Splits “grow” tree; root split increases height.

(28)

Inserting 8*

• Observe how

minimum occupancy is guaranteed in both leaf and index pg

splits.

• Note difference between copy-up

and push-up; be sure

you understand the reasons for this.

2* 3* 5* 7* 8*

5

Entry to be inserted in parent node. (Note that 5 is

continues to appear in the leaf.)s copied up and

appears once in the index. Contrast

5 24 30

17

13

Entry to be inserted in parent node. (Note that 17 is pushed up and only this with a leaf split.)

Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13

(29)

Example B+ Tree After Inserting

8*

_{Notice that root was split, leading to increase in height.}

_{In this example, we can avoid split by re-distrib}

2* 3* Root 17 24 30 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 5 7* 5* 8*

(30)

Redistribution after Inserting 8*

Root 17 24 30 2* 3* 5* 7* 8* 14* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 8 Root 17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 16*

Check sibling leaf node to see if it has space. Copy up 8 (new low key value on the 2nd

(31)

Deleting a Data Entry from a B+

Tree

• Start at root, find leaf L where entry belongs. • Remove the entry.

– _{If L is at least half-full, done!} – _{If L has only d-1 entries,}

• Try to re-distribute, borrowing from sibling (adjacent node with

same parent as L).

• If re-distribution fails, merge L and sibling.

• If merge occurred, must delete entry (pointing to L or sibling) from parent of L.

(32)

Tree After Deleting 19* and 20* ...

• Deleting 19* is easy.

• Deleting 20* is done with re-distribution. Notice how middle key is copied up.

39* 2* 3* 17 30 14* 16* 33* 34* 38* 13 5 7* 5* 8* 22* 24* 27 27* 29* 2* 3* Root 17 24 30 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 5 7* 5* 8*

(33)

• And then deleting 24* • Must merge.

• Observe `toss’ of index entry (27), and `pull down’ of index entry (17). 30 22* 27* 29* 33* 34* 38* 39* Root 30 13 5 17 2* 3* 17 30 14* 16* 33* 34* 38* 13 5 7* 5* 8* 22* 24* 27 27* 29* toss Pull down 39*

(34)

Example of Non-leaf

Re-distribution

• Tree is shown below during deletion of 24*. (What could be a possible initial tree?)

• In contrast to previous example, can re-distribute entry from left child of root to right child.

Root 13 5 17 20 22 30 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39* 7* 5* 8* 3* 2*

(35)

After Re-distribution

• Intuitively, entries are re-distributed by `pushing through’ the splitting entry in the parent node.

• It suffices to distribute index entry with key 20; we’ve re-distributed 17 as well for illustration.

Root 13 5 17 30 20 22

(36)

Prefix Key Compression

• Important to increase fan-out. (Why?)

• Key values in index entries only `direct traffic’; can often compress them.

– Compress “David Smith” to “Dav”? How about “Davi”?

– In general, while compressing, must leave each index entry greater than every key value (in any subtree) to its left.

• Insert/delete must be suitably modified.

Daniel Lee David Smith Devarakonda …

(37)

Bulk Loading of a B+ Tree

• If we have a large collection of records, and we want to create a B+ tree index on a field, doing so by repeatedly inserting records is very slow.

– Cost = # entries * logF(N), where F = fan-out, N = # index pages

• Bulk Loading can be done much more efficiently.

– Step 1: Sort data entries. Insert pointer to first (leaf) page in a new (root) page.

Sorted pages of data entries; not yet in B+ tree Root

(38)

Bulk Loading (Contd.)

• Step 2: Build Index entries for leaf

pages.

– Always entered into right-most index

page just above leaf level. When this fills up, it splits. (Split may go up right-most path to the root.)

– Cost = # index pages, which is much faster than repeated inserts.

3* 4* 6* 9* 10* 11* 12* 13* 20*22* 23* 31* 35*36* 38*41* 44*

Root

Data entry pages not yet in B+ tree

35 23 12 6 10 20 3* 4* 6* 9* 10* 11* 12* 13* 20*22* 23* 31* 35*36* 38*41* 44* 6 Root 10 12 23 20 35 38

not yet in B+ tree Data entry pages

(39)

Summary of Bulk Loading

• Option 1: multiple inserts.

– _{More I/Os during build.}

– Does not give sequential storage of leaves.

• Option 2: Bulk Loading

– Fewer I/Os during build.

– _{Leaves will be stored sequentially (and linked, of course).} – Can control “fill factor” on pages.

(40)

A Note on `Order’

• Order (the parameter d) concept denote minimum occupancy on the number of entries per index page.

– But it is not practical in real implementation. Why?

– _{Index pages can typically hold many more entries than leaf pages.} – _{Variable sized records and search keys mean different nodes will}

contain different numbers of entries.

– _{Even with fixed length fields, multiple records with the same}

search key value (duplicates) can lead to variable-sized data entries (if we use Alternative (3)).

• Order is replaced by physical space criterion (`at least

(41)

Summary

• Tree-structured indexes are ideal for range-searches, also good for equality searches.

• ISAM is a static structure.

– Only leaf pages modified; overflow pages needed.

– _{Overflow chains can degrade performance unless size of data set}

and data distribution stay constant.

• B+ tree is a dynamic structure.

– Inserts/deletes leave tree height-balanced; log _F N cost. – _{High fanout (F) means depth rarely more than 3 or 4.} – Almost always better than maintaining a sorted file.

(42)

Summary (Contd.)

– Typically, 67% occupancy on average.

– _{Usually preferable to ISAM, modulo}_locking_{considerations;}

adjusts to growth gracefully.

• Key compression increases fanout, reduces height.

• Bulk loading can be much faster than repeated inserts for creating a B+ tree on a large data set.

Database Systems (資料庫系統) Lecture #8