Database Systems
( 資料庫系統 )
December 12, 2005 Lecture #11
Announcement
• Next week reading: Chapter 14
• Assignment #5 is out on the course
homepage.
– It is a written assignment.
– It is due 12/19 (next Monday).
• Practicum #3 is also out on the
course homepage.
To Frame or not to Frame?
Architecture, Psychology, &
External Sorting
Why learn sorting again?
• O (n*n): bubble, insertion, selection, …
sorts
• O (n log n): heap, merge, quick, … sorts • Sorting huge dataset (say 10 GB)
• CPU time complexity may mean little on
practical systems
“External” Sorting Defined
• Refer to sorting methods when the data is too l arge to fit in main memory.
– E.g., sort 10 GB of data in 100 MB of main memory. • During sorting, some intermediate steps may req
uire data to be stored externally on disk.
• Disk I/O cost is much greater than CPU instruct ion cost
– Average disk page I/O cost: 10 ms vs. 3.8 GHz CPU clo ck: 0.25 ns.
– Minimize the disk I/Os (rather than number of compari sons).
Outline (easy chapter)
• Why does a DMBS sort data? • Simple 2-way merge sort
• Generalize B-way merge sort • Optimization
– Replacement sort
– Blocked I/O optimization – Double buffering
• Using an existing B+ tree index vs.
• Users may want answers to query in some
order
– E.g., students sorted by increasing age
• Sorting is the first step in bulk loading a B+
tree index
• Sorting is used for eliminating duplicate
copies
• Join requires a sorting step.
– Sort-join algorithm requires sorting.
When does a DBMS sort data?
Bulk Loading of a B+ Tree
• Step 1: Sort data entries. Insert pointer to first
(leaf) page in a new (root) page.
• Step 2: Build Index entries for leaf pages.
3* 4* 6* 9* 10*11* 12*13* 20*22* 23*31* 35*36* 38*41* 44*
Sorted pages of data entries; not yet in B+ tree Root
Example of Sort-Merge Join
A Simple Two-Way Merge Sort
• It uses only 3 buffer pages.
– Basic idea is divide and conquer.
– Sort smaller runs
and merge them into bigger runs.
• Pass 0: read each
page, sort records in each page, and write the page out to disk. (1 buffer page is used) Input file 1-page runs 2-page runs 4-page runs 8-page runs PASS 0 PASS 1 PASS 2 PASS 3 3,4 6,2 9,4 8,7 5,6 3,1 2 3,4 2, 5,6 6 4,9 7,8 1,3 2 2,3 4,6 4,7 8,9 5,61,3 2 2,3 4,4 6,7 8,9 1,2 3,5 6 1,2 2,3 3,4 4,5 6,6 7,8
A Simple Two-Way Merge Sort
• Pass 1: read two p ages, merge them, and write them out to disk. (3 buffe r pages are used)
• Pass 2-3: repeat a bove step till one sorted 8-page run.
• Each run is define d as a sorted subf ile. Input file 1-page runs 2-page runs 4-page runs 8-page runs PASS 0 PASS 1 PASS 2 PASS 3 3,4 6,2 9,4 8,7 5,6 3,1 2 3,4 2, 5,6 6 4,9 7,8 1,3 2 2,3 4,6 4,7 8,9 5,61,3 2 2,3 4,4 6,7 8,9 1,2 3,5 6 1,2 2,3 3,4 4,5 6,6
2-Way Merge Sort • Say the number of pages in a file is 2k:
– Pass 0 produces 2k sorted runs of one page each
– Pass 1 produces 2k-1 sorted runs of two pages each
– Pass 2 produces 2k-2 sorted runs of four pages each
– Pass k produces one sorted runs of 2k pages.
• Each pass requires read + write each page in file: 2*N
• For a N pages file,
– the number of passes = ceiling ( log2 N) + 1
• So total cost (disk I/Os) is
General External Merge Sort More than 3 buffer pages. How can we
utilize them?
• To sort a file with N pages using B buffer
pages:
– Pass 0: use B buffer pages. Produce N / B
sorted runs of B pages each.
– Pass 1..k: use B-1 buffer pages to merge B-1
runs, and use 1 buffer page for output.
B Main memory INPUT 1 INPUT B-1 OUTPUT Disk Disk INPUT 2 . . . . . . . . .
15
General External Merge Sort (B=4) 8,7 6,2 9,4 3,4 5,6 3,12,1015,3 16,513,819,132,8 8,9 4,4 6,7 2,3 1,2 3,3 5,610,15 1,5 8,813,1619,32 1,1
Pass 0: Read four unsorted pages, sort
them, and write them out. Produce 3 runs of 4 pages each.
Pass 1: Read three pages, one page from
each of 3 runs, merge them, and write them out. Produce 1 run of 12 pages.
2,2 3,3 …
merging
Cost of External Merge Sort
• # of passes: 1 + ceiling(log B-1 ceiling(N/B))
• Disk I/O Cost = 2*N*(# of passes)
• E.g., with 5 buffer pages, to sort 108 page file:
– Pass 0: ceiling(108/5) = 22 sorted runs of length 5 pag
es each (last run is only 3 pages)
– Pass 1: ceiling(22/4) = 6 sorted runs of length 20 page
s each (last run is only 8 pages)
– Pass 2: 2 sorted runs, of length 80 pages and 28 pages – Pass 3: Sorted file of 108 pages
– # of passes = 1 + ceiling (log 4 ceiling(108/5)) = 4 – Disk I/O costs = 2*108*4 = 864
Further Optimize Merge Sort
• Opportunity #1: sorting the first sorted-runs at pass 0
– First sorted run has length B.
– Is it possible to create bigger length in the first sorted
runs? How?
• Opportunity #2: consider block I/Os
– Block I/O: reading & writing consecutive blocks – Can the merge passes use block I/O?
• Opportunity #3: minimize CPU/disk idle time – How to keep both CPU & disks busy at the
Replacement Sort (Optimize Merge Sort)
• Pass 0 can output approx. 2B sorted pages on average. How?
• Divide B buffer pages into 3 parts:
– Current set buffer (B-2): unsorted or unmerged pages. – Input buffer (1 page): one unsorted page.
– Output buffer (1 page): output sorted page.
• Algorithm:
– Pick the tuple in the current set with the smallest k value > large st value in output buffer.
– Append k to output buffer.
– This creates a hole in current set, so move a tuple from input buff er to current set buffer.
– When the input buffer is empty of tuples, read in a new unsorted pa ge.
Replacement Sort Example (B=4) 8,7 6,2 9,4 3,4 5,6 3,1 2,1015,316,513,8 19,132,8 3,4 6,2 9,4 Current Set Buffer Input Buffer Output Buffer 3,4 6,2 9,4 3,4 6,9 4 2 4,4 6,9 2,3 8,7 8,4 6,9 4 7 2,3 On disk
When do you start a new run?
Minimizing I/O Cost vs. Number of I/Os
• So far, the cost metric is the number of disk
I/Os.
• This is inaccurate for two reasons:
– (1) Blocked I/O is a much cheaper (per I/O request) than equal number of individual I/O requests.
• Blocked I/O: read/write several consecutive pages at
the same time.
– (2) CPU cost may be significant.
• Keep CPU busy while we wait for disk I/Os. • Double Buffering Technique.
Blocked I/O
• Blocked access: read/write b pages as a unit.
• Assume the buffer pool has B pages, and file ha s N pages.
• Look at cost of external merge-sort (with repla cement optimization) using Blocked I/O:
– Blocked I/O has little affect on pass 0.
•Pass 0 produces initial N’ (= N/2B) runs of length 2B pag es.
– Pass 1..k, we can merge F = B/b – 1 runs.
– The total number of passes (to create one run of N p ages) is 1 + logF (N’).
Double Buffering
• Keep CPU busy, minimizes waiting for I/O requests. – While the CPU is working on the current run, start to p
refetch data for the next run (called shadow blocks). • Potentially, more passes; in practice, most files
still sorted in 2-3 passes.
OUTPUT OUTPUT' Disk Disk INPUT 1 INPUT k INPUT 2 INPUT 1' INPUT 2' INPUT k' block sizeb
Using B+ Trees for Sorting
• Assumption: Table to be sorted has B+ tree index on sorting column(s).
• Idea: Can retrieve records in order by traversing leaf pages.
• Is this a good idea?
• Cases to consider:
– B+ tree is clustered
Clustered B+ Tree Used for Sorting
• Cost: root to the
left-most leaf, then retrieve all leaf
pages (Alternative 1)
• If Alternative 2 is
used? Additional cost of retrieving data records: each page fetched just once.
• Cost better than
external sorting? (Directs search) Data Records Index Data Entries ("Sequence set")
Unclustered B+ Tree Used for
Sorting
• Alternative (2) for data entries;
each data entry contains rid of a
data record. In general, one I/O
per data record.
(Directs search)Index
Data Entries
27
External Sorting vs. Unclustered In dex
p: # of records per page