Database Systems (資料庫系統) Lecture #12

(1)

Database Systems

( 資料庫系統 )

November 29, 2004

Lecture #12

(2)

Announcement

• Next week reading: Chapter 14

• Assignment #8 is out on the course homepage.

• Graded Assignment #6 & midterm exams are ready for pickup.

(3)

Cool Ubicomp Project

(4)

External Sorting

(5)

“External” Sorting Defined

• Refer to sorting methods when the data is too large to fit in main memory.

– E.g., sort 10 GB of data in 100 MB of main memory.

• During sorting, some intermediate steps may require data to be stored externally on disk.

• Disk I/O cost is much greater than CPU comparison cost

– Average disk page I/O cost: 10 ms vs. 3.8 GHz CPU clock: 250

ns.

(6)

Outline (easy chapter)

• Why does a DMBS sort data?

• Simple 2-way merge sort

• Generalize B-way merge sort

• Optimization

– Replacement sort

– Blocked I/O optimization – Double buffering

• Using an existing B+ tree index vs. external

sorting

(7)

• Users may want answers in some order

– E.g., students sorted by increasing age

• Sorting is the first step in bulk loading a B+ tree

index

• Sorting is used for eliminating duplicate copies

• Join requires a sorting step.

– Sort-join algorithm requires sorting.

(8)

Bulk Loading of a B+ Tree

• Step 1: Sort data entries. Insert pointer to first (leaf) page in a new (root) page.

• Step 2: Build Index entries for leaf pages.

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*

Sorted pages of data entries; not yet in B+ tree Root

(9)

(10)

A Simple Two-Way Merge Sort

• It uses only 3 buffer

pages.

– Basic idea is divide and

conquer.

– Sort smaller runs and

merge them into bigger runs.

• Pass 0: read each page, sort records in each page, and write the page out to disk. (1

buffer page is used)

Input file 1-page runs 2-page runs 4-page runs 8-page runs PASS 0 PASS 1 PASS 2 PASS 3 3,4 6,2 9,4 8,7 5,6 3,1 2 3,4 2,6 4,9 7,8 5,6 1,3 2 2,3 4,6 4,7 8,9 1,3 5,6 2 2,3 4,4 6,7 8,9 1,2 3,5 6 1,2 2,3 3,4 4,5

(11)

A Simple Two-Way Merge Sort

• Pass 1: read two pag es, merge them, and write them out to disk.

(3 buffer pages are u

sed)

• Pass 2-3: repeat abov e step till one sorted 8-page run.

• Each run is defined a s a sorted subfile. Input file 1-page runs 2-page runs 4-page runs 8-page runs PASS 0 PASS 1 PASS 2 PASS 3 3,4 6,2 9,4 8,7 5,6 3,1 2 3,4 2,6 4,9 7,8 5,6 1,3 2 2,3 4,6 4,7 8,9 1,3 5,6 2 2,3 4,4 6,7 8,9 1,2 3,5 6 1,2 2,3 3,4 4,5 6,6 7,8

(12)

2-Way Merge Sort

• Say the number of pages in a file is 2k:

– Pass 0 produces 2k_{sorted runs of one page each}

– Pass 1 produces 2k-1_{sorted runs of two pages each}

– Pass 2 produces 2k-2_{sorted runs of four pages each}

– Pass k produces one sorted runs of 2k_pages.

• Each pass requires read + write each page in file: 2*N • For a N pages file,

– the number of passes = ceiling ( log₂ N) + 1

• So total cost (disk I/Os) is

(13)

General External Merge Sort

More than 3 buffer pages. How can we utilize them? • To sort a file with N pages using B buffer pages:

– _{Pass 0: use B buffer pages. Produce}_{N / B sorted runs of B pages}

each.

– _{Pass 1..k: use B-1 buffer pages to}_{merge B-1 runs}_{, and use 1}

buffer page for output.

B Main memory INPUT 1 INPUT B-1 OUTPUT Disk Disk INPUT 2

. . .

. .

.

. . .

(14)

General External Merge Sort (B=4)

8,7 6,2 9,4 3,4 5,6 3,1 2,10 15,3 16,5 13,8 19,1 32,8 8,9 4,4 6,7 2,3 1,2 3,3 5,6 10,15 1,5 8,8 13,1619,32 1,1

Pass 0: Read four unsorted pages, sort them, and write them out. Produce 3 runs of 4 pages each.

Pass 1: Read three pages, one page from each of 3 runs, merge them,

2,2 3,3 …

merging

(15)

Cost of External Merge Sort

• # of passes:

• Disk I/O Cost = 2*N*(# of passes)

• E.g., with 5 buffer pages, to sort 108 page file:

– Pass 0: = 22 sorted runs of length 5 pages each (last

run is only 3 pages)

– _{Pass 1: = 6 sorted runs of length 20 pages each (last}

run is only 8 pages)

– Pass 2: 2 sorted runs, of length 80 pages and 28 pages

– _{Pass 3: Sorted file of 108 pages}

– # of passes = 1 + ceiling (log ₄ceiling(108/5)) = 4

– _{Disk I/O costs = 2*108*4 = 864}









1 

log

_B_₁

N B

/



108 5

/



(16)

# Passes of External Sort

N

B=3 B=5 B=9 B=17 B=129 B=257

100

7

4

3

2

1

1 1,000

10

5

4

3

2

2 10,000

13

7

5

4

2

2 100,000

17

9

6

5

3

3 1,000,000

20

10

7

5

3

3 10,000,000

23

12

8

6

4

3 100,000,000

26

14

9

7

4

4 1,000,000,000 30

15

10

8

5

4

(17)

Replacement Sort (Optimize Merge Sort)

• Pass 0 can output approx. 2B sorted pages on average. How? • Divide B buffer pages into 3 parts:

– Current set buffer (B-2): unsorted or unmerged pages.

– Input buffer (1 page): one unsorted page.

– Output buffer (1 page): output sorted page. • Algorithm:

– Pick the tuple in the current set with the smallest k value > largest value in output buffer.

– Append k to output buffer.

– This creates a hole in current set, so move a tuple from input buffer to c urrent set buffer.

(18)

Replacement Sort Example (B=4)

8,7 6,2 9,4 3,4 5,6 3,1 2,10 15,3 16,5 13,8 19,1 32,8 3,4 6,2 9,4 Current Set Buffer Input Buffer Output Buffer 3,4 6,2 9,4 3,4 6,9 4 2 4,4 6,9 2,3 8,7 8,4 6,9 4 7 2,3 On disk

When do you start a new run?

(19)

Minimizing I/O Cost vs. Number of

I/Os

• So far, the cost metric is the number of disk I/Os. • This is inaccurate for two reasons:

– (1) Blocked I/O is a much cheaper (per I/O request) than equal number of individual I/O requests.

• Blocked I/O: read/write several consecutive pages at the same time.

– (2) CPU cost is significant.

• Keep CPU busy while we wait for disk I/Os. • Double Buffering Technique.

(20)

Blocked I/O

• Blocked access: read/write b pages as a unit.

• Assume the buffer pool has B pages, and file has N pag es.

• Look at cost of external merge-sort (with replacement op timization) using Blocked I/O:

– Blocked I/O has little affect on pass 0.

• Pass 0 produces initial N’ (= N/2B) runs of length 2B pages.

– Pass 1..k, we can merge F = B/b – 1 runs.

– The total number of passes (to create one run of N pages) is 1 + logF (N’).

(21)

Double Buffering

• Keep CPU busy, minimizes waiting for I/O requests.

– While the CPU is working on the current run, start to prefetch da

ta for the next run (called `shadow blocks’).

• Potentially, more passes; in practice, most files still sorte d in 2-3 passes. OUTPUT OUTPUT' Disk _Disk INPUT 1 INPUT k INPUT 2 INPUT 1' INPUT 2' INPUT k' block sizeb

(22)

Sorting Records!

• Sorting has become a blood sport!

Year 2003 Results

Daytona Indy

Penny 10GB (105 million records)

THsort pdf, doc (2002..2003)

1098 seconds on a $857 Linux/AMD system

Peng Liu, Yao Shi, Li Zhang, Kuo Zhang, Tian Wang,

ZunChong Tian, Hao Wang, Xiaoge Wang Tsinghua University, Beijing, China

(new) 42 GB (433 M records)

SheenkSort.pdf

1541 seconds on a 614$ Linux/AMD system

Lei Yang, Hui Huang, Zheng Wan, Tao Song

Tsinghua University, Beijing, China

Minute 12 GB in 60 seconds

Ordinal Nsort

SGI 32 cpu Origin IRIX

Also Y2000..Y2003 winner

21.8 GB 218 million records in 56.51 sec

also the Y2000..Y2003 winner

NOW+HPVMsort 64 nodes WinNT pdf (170KB).

Luis Rivera , Xianan Zhang, Andrew Chien UCSD

TeraByte 49 minutes also Y2000..Y2003 winner

Daivd Cossock , Sam Fineberg, Pankaj Mehra , John Peck 1057 seconds also the Y2000..Y2003 winner SPsort 1952 SP cluster 2168 disks Jim Wyllie

(23)

Using B+ Trees for Sorting

• Assumption: Table to be sorted has B+ tree index on sorti ng column(s).

• Idea: Can retrieve records in order by traversing leaf pag es.

• Is this a good idea?

• Cases to consider:

– _{B+ tree is}_clustered _{Good idea!}

(24)

Clustered B+ Tree Used for

Sorting

• Cost: root to the left-mo st leaf, then retrieve all l eaf pages (Alternative 1)

• If Alternative 2 is used? Additional cost of retriev ing data records: each page fetched just once.

(Directs search)

Data Records Index

Data Entries

(25)

Unclustered B+ Tree Used for Sorting

• Alternative (2) for data entries; each data entry

contains rid of a data record. In general, one I/O per data record.

(Directs search) Index

Data Entries

(26)

External Sorting vs. Unclustered Index

N

Sorting

p=1

p=10

p=100

100

200

100 1,000

10,000

1,000

2,000

1,000

10,000

100,000

10,000

40,000

10,000

100,000

1,000,000

100,000

600,000

100,000

1,000,000

10,000,000

1,000,000 8,000,000 1,000,000 10,000,000 100,000,000

10,000,000 80,000,000 10,000,000 100,000,000 1,000,000,000

(27)

Summary

• External sorting is important; DBMS may dedicate part of b uffer pool for sorting!

• External merge sort minimizes disk I/O cost:

– Pass 0: Produces sorted runs of size B (# buffer pages). Later pass

es: merge runs.

– _{# of runs merged at a time depends on B, and block size.}

– Larger block size means less I/O cost per page.

– _{Larger block size means smaller # runs merged.}

– In practice, # of runs rarely more than 2 or 3.

• Clustered B+ tree is good for sorting; unclustered tree is us ually very bad.