Database Dystems (資料庫系統) Lecture #11

(1)

Database Systems

( 資料庫系統 )

December 12, 2005 Lecture #11

(2)

Announcement

• Next week reading: Chapter 14

• Assignment #5 is out on the course

homepage.

– It is a written assignment.

– It is due 12/19 (next Monday).

• Practicum #3 is also out on the

course homepage.

(3)

To Frame or not to Frame?

Architecture, Psychology, &

(4)

External Sorting

(5)

Why learn sorting again?

• O (n*n): bubble, insertion, selection, …

sorts

• O (n log n): heap, merge, quick, … sorts • Sorting huge dataset (say 10 GB)

• CPU time complexity may mean little on

practical systems

(6)

“External” Sorting Defined

• Refer to sorting methods when the data is too l arge to fit in main memory.

– E.g., sort 10 GB of data in 100 MB of main memory. • During sorting, some intermediate steps may req

uire data to be stored externally on disk.

• Disk I/O cost is much greater than CPU instruct ion cost

– Average disk page I/O cost: 10 ms vs. 3.8 GHz CPU clo ck: 0.25 ns.

– Minimize the disk I/Os (rather than number of compari sons).

(7)

Outline (easy chapter)

• Why does a DMBS sort data? • Simple 2-way merge sort

• Generalize B-way merge sort • Optimization

– Replacement sort

– Blocked I/O optimization – Double buffering

• Using an existing B+ tree index vs.

(8)

• Users may want answers to query in some

order

– E.g., students sorted by increasing age

• Sorting is the first step in bulk loading a B+

tree index

• Sorting is used for eliminating duplicate

copies

• Join requires a sorting step.

– Sort-join algorithm requires sorting.

When does a DBMS sort data?

(9)

Bulk Loading of a B+ Tree

• Step 1: Sort data entries. Insert pointer to first

(leaf) page in a new (root) page.

• Step 2: Build Index entries for leaf pages.

3* 4* 6* 9* 10*11* 12*13* 20*22* 23*31* 35*36* 38*41* 44*

Sorted pages of data entries; not yet in B+ tree Root

(10)

Example of Sort-Merge Join

(11)

A Simple Two-Way Merge Sort

• It uses only 3 buffer pages.

– Basic idea is divide and conquer.

– Sort smaller runs

and merge them into bigger runs.

• Pass 0: read each

page, sort records in each page, and write the page out to disk. (1 buffer page is used) Input file 1-page runs 2-page runs 4-page runs 8-page runs PASS 0 PASS 1 PASS 2 PASS 3 3,4 6,2 9,4 8,7 5,6 3,1 2 3,4 2, 5,6 6 4,9 7,8 1,3 2 2,3 4,6 4,7 8,9 5,61,3 2 2,3 4,4 6,7 8,9 1,2 3,5 6 1,2 2,3 3,4 4,5 6,6 7,8

(12)

A Simple Two-Way Merge Sort

• Pass 1: read two p ages, merge them, and write them out to disk. (3 buffe r pages are used)

• Pass 2-3: repeat a bove step till one sorted 8-page run.

• Each run is define d as a sorted subf ile. Input file 1-page runs 2-page runs 4-page runs 8-page runs PASS 0 PASS 1 PASS 2 PASS 3 3,4 6,2 9,4 8,7 5,6 3,1 2 3,4 2, 5,6 6 4,9 7,8 1,3 2 2,3 4,6 4,7 8,9 5,61,3 2 2,3 4,4 6,7 8,9 1,2 3,5 6 1,2 2,3 3,4 4,5 6,6

(13)

2-Way Merge Sort • Say the number of pages in a file is 2k:

– Pass 0 produces 2k_{sorted runs of one page each}

– Pass 1 produces 2k-1_{sorted runs of two pages each}

– Pass 2 produces 2k-2_{sorted runs of four pages each}

– Pass k produces one sorted runs of 2k_pages.

• Each pass requires read + write each page in file: 2*N

• For a N pages file,

– the number of passes = ceiling ( log₂ N) + 1

• So total cost (disk I/Os) is

(14)

General External Merge Sort More than 3 buffer pages. How can we

utilize them?

• To sort a file with N pages using B buffer

pages:

– Pass 0: use B buffer pages. Produce N / B

sorted runs of B pages each.

– Pass 1..k: use B-1 buffer pages to merge B-1

runs, and use 1 buffer page for output.

B Main memory INPUT 1 INPUT B-1 OUTPUT Disk Disk INPUT 2 . . . . . . . . .

(15)

15

General External Merge Sort (B=4) 8,7 6,2 9,4 3,4 5,6 3,12,1015,3 16,513,819,132,8 8,9 4,4 6,7 2,3 1,2 3,3 5,610,15 1,5 8,813,1619,32 1,1

Pass 0: Read four unsorted pages, sort

them, and write them out. Produce 3 runs of 4 pages each.

Pass 1: Read three pages, one page from

each of 3 runs, merge them, and write them out. Produce 1 run of 12 pages.

2,2 3,3 …

merging

(16)

Cost of External Merge Sort

• # of passes: 1 + ceiling(log _B-1ceiling(N/B))

• Disk I/O Cost = 2*N*(# of passes)

• E.g., with 5 buffer pages, to sort 108 page file:

– Pass 0: ceiling(108/5) = 22 sorted runs of length 5 pag

es each (last run is only 3 pages)

– Pass 1: ceiling(22/4) = 6 sorted runs of length 20 page

s each (last run is only 8 pages)

– Pass 2: 2 sorted runs, of length 80 pages and 28 pages – Pass 3: Sorted file of 108 pages

– # of passes = 1 + ceiling (log ₄ceiling(108/5)) = 4 – Disk I/O costs = 2*108*4 = 864

(17)

(18)

Further Optimize Merge Sort

• Opportunity #1: sorting the first sorted-runs at pass 0

– First sorted run has length B.

– Is it possible to create bigger length in the first sorted

runs? How?

• Opportunity #2: consider block I/Os

– Block I/O: reading & writing consecutive blocks – Can the merge passes use block I/O?

• Opportunity #3: minimize CPU/disk idle time – How to keep both CPU & disks busy at the

(19)

Replacement Sort (Optimize Merge Sort)

• Pass 0 can output approx. 2B sorted pages on average. How?

• Divide B buffer pages into 3 parts:

– Current set buffer (B-2): unsorted or unmerged pages. – Input buffer (1 page): one unsorted page.

– Output buffer (1 page): output sorted page.

• Algorithm:

– Pick the tuple in the current set with the smallest k value > large st value in output buffer.

– Append k to output buffer.

– This creates a hole in current set, so move a tuple from input buff er to current set buffer.

– When the input buffer is empty of tuples, read in a new unsorted pa ge.

(20)

Replacement Sort Example (B=4) 8,7 6,2 9,4 3,4 5,6 3,1 2,1015,316,513,8 19,132,8 3,4 6,2 9,4 Current Set Buffer Input Buffer Output Buffer 3,4 6,2 9,4 3,4 6,9 4 2 4,4 6,9 2,3 8,7 8,4 6,9 4 7 2,3 On disk

When do you start a new run?

(21)

Minimizing I/O Cost vs. Number of I/Os

• So far, the cost metric is the number of disk

I/Os.

• This is inaccurate for two reasons:

– (1) Blocked I/O is a much cheaper (per I/O request) than equal number of individual I/O requests.

• Blocked I/O: read/write several consecutive pages at

the same time.

– (2) CPU cost may be significant.

• Keep CPU busy while we wait for disk I/Os. • Double Buffering Technique.

(22)

Blocked I/O

• Blocked access: read/write b pages as a unit.

• Assume the buffer pool has B pages, and file ha s N pages.

• Look at cost of external merge-sort (with repla cement optimization) using Blocked I/O:

– Blocked I/O has little affect on pass 0.

•Pass 0 produces initial N’ (= N/2B) runs of length 2B pag es.

– Pass 1..k, we can merge F = B/b – 1 runs.

– The total number of passes (to create one run of N p ages) is 1 + log_F (N’).

(23)

Double Buffering

• Keep CPU busy, minimizes waiting for I/O requests. – While the CPU is working on the current run, start to p

refetch data for the next run (called shadow blocks). • Potentially, more passes; in practice, most files

still sorted in 2-3 passes.

OUTPUT OUTPUT' Disk _Disk INPUT 1 INPUT k INPUT 2 INPUT 1' INPUT 2' INPUT k' block sizeb

(24)

Using B+ Trees for Sorting

• Assumption: Table to be sorted has B+ tree index on sorting column(s).

• Idea: Can retrieve records in order by traversing leaf pages.

• Is this a good idea?

• Cases to consider:

– B+ tree is clustered

(25)

Clustered B+ Tree Used for Sorting

• Cost: root to the

left-most leaf, then retrieve all leaf

pages (Alternative 1)

• If Alternative 2 is

used? Additional cost of retrieving data records: each page fetched just once.

• Cost better than

external sorting? (Directs search) Data Records Index Data Entries ("Sequence set")

(26)

Unclustered B+ Tree Used for

Sorting

• Alternative (2) for data entries;

each data entry contains rid of a

data record. In general, one I/O

per data record.

(Directs search)Index

Data Entries

(27)

27

External Sorting vs. Unclustered In dex

 p: # of records per page