Database Systems (資料庫系統) Lecture #12

(1)

Database Systems

( 資料庫系統 )

December 19, 2004

Lecture #12

(2)

Announcement

• Next week reading: Chapter 16

– Skip Chapter 15

(3)

Evaluating Relational

Operators

(4)

Outline

• Continue from Chapter 12 (Overview of

Query Evaluation) and discuss more

details on the different possible algorithms

to evaluate relational operators efficiently.

• Selection Operation

• Projection Operation

• Join Operation

• Set Operations

(5)

Relational Operations

• We will consider how to implement:

–

Selection

: Selects a subset of rows from re

lation.

–

Projection

: Deletes unwanted columns from r

elation.

–

Join:

Allows us to combine two relations.

–

Set-difference

: Tuples in reln. 1, but not

in reln. 2.

(6)

Schema for Examples

Sailors (sid: integer, sname: string, rating: integer, ag e: real) Reserves (sid: integer, bid: integer, day: dates, rname:

string)

• Similar to old schema; rname added for variations.

– Rname is the name of the agent making the reservation on beha lf of a sailor.

• Reserves:

– Each tuple is 40 bytes long, 100 tuples per page, 1000 pages. • Sailors:

– Each tuple is 50 bytes long, 80 tuples per page, 500 pages.

(7)

Selection Operation

SELECT * FROM Reserves R WHERE R.rname=‘Joe’

• Several alternative algorithms for implementing selection. • No algorithm is always better (disk I/O costs) than the ot

hers

• Depend on the file organizations (sorted, clustered index, etc.).

– No Index, Unsorted Data – No Index, Sorted Data – B+ Tree Index

– Hash Index

• Two types of selection conditions:

– Conjunctions (term_1 AND term_2)

(8)

No Index for Selections

SELECT * FROM Reserves R WHERE R.rname=‘Joe’

• Unsorted Data, no Index (on rname)

– Scan the file

– Cost is 1000 page I/Os.

• Sorted Data (on rname), no Index (on rname)

– Binary search to find the first qualifying tuple, the n scan to find remaining qualifying tuples.

– Cost of binary search is O(log₂ 1000) ~ 10 page I/Os.

– Cost of scanning is the number of pages containing qu alifying tuples.

(9)

Using B+ Tree Index for

Selections

• Cost depends on (#qualifying tuples), and (whether the index is clustered or not).

– _{Cost of finding qualifying data entries (typically small) pl}

us cost of retrieving tuples (could be large without cluster ing).

– Assume uniform distribution of names, about 10% of tuples qu

alify (100 pages, 10,000 tuples).

– _{With clustered index, cost is little more than 100 page I/Os}

.

– With unclustered index, cost can be up to 10,000 page I/Os. • _{Why? Each qualified tuple may be on a different page, worse than the}

file scan (1,000 page I/Os).

(10)

Refinement for Unclustered I

ndexes

1. Find qualifying data entries.

2. Sort the rid’s of the data tuples to be r

etrieved (e.g., by their pageNos)

RID = <pageNO, slotNo>

3. Fetch rids in order. This ensures that ea

ch data page is looked at just once.

• The cost is bounded by the number of page I

(11)

Using Hash Index for

Selections

• It is good only for equality selection condit

ions.

– It does not apply to range selections.

• The cost of using the hash index is a sum of

– Retrieving the bucket (overflow) page in the index requires only 1~2 page I/Os.

– Retrieving all qualified tuples requires up to (# qualified tuples) page I/Os, given that each quali fied tuple can be on different page.

– Can apply the refinement of sorting of rids on Pag eNo.

(12)

General Selections

• So far, have considered simple selection condition in the form of (R.rname=‘Joe’).

• A general selection condition is a boolean combination of

– (term_1 op term_2 … op term_n), where op can be AND | OR.

• Given a general selection condition:

(1) rewrite the general condition in conjunctive normal form (C NF):

(day < 8/9/02 AND rname = ‘Joe’) OR bid=5 OR sid=3

Can be transformed into

(day < 8/9/02 OR bid=5 OR sid=3) AND (rname=‘Joe’ OR bid=5 OR sid=3)

(2) Try to match conjunct with index (Chapter 12)

(13)

Two Approaches in

Evaluating Selections with

Conjunction

• First approach:

Find the

most selective access

path

, retrieve tuples using it, and apply any r

emaining terms that don’t

match

the index:

– Most selective access path: An index or file scan req uiring fewest page I/Os.

– Selectivity of a conjunct is proportional to the redu ction factor of applying this conjunct to the tuples.

– Consider day<8/9/94 AND bid=5 AND sid=3_. A B+ tree in

dex on day can be used; then, bid=5 and sid=3 must be checked for each retrieved tuple. Similarly, a hash index on <bid, sid> could be used; day<8/9/94 must th en be checked.

(14)

Intersection of Rids

• Second approach:

(if we have 2 or more matchi

ng indexes that use Alternatives (2) or (3) f

or data entries):

– Get sets of rids of data tuples using each matchin g index.

– Then intersect these sets of rids

– Retrieve the tuples and apply any remaining terms. – (Intersection on rids, not on the tuples)

•

Example:

day<8/9/94

AND bid=5 AND

sid=3.

– Assume a B+ tree index on day and an index on sid, both using Alternative (2)

(15)

Examples: what is the most

selective path?

Ex#1: (day < 8/9/02 OR rname=‘Joe’), index on rname

– Could check (rname=‘Joe’) using index, but (day < 8 /9/02) requires a file scan, so might as well as use file scan.

– File scan is the most selective path.

Ex#2: (day < 8/9/02 OR rname=‘Joe’) AND sid=3, indexes on rname and sid

– (sid = 3) is the most selective path, similar to appr oach #1.

Ex#3: (day < 8/9/02 OR rname=‘Joe’), indexes on day and rname

– Union of Rids matching (day < 8/9/02) and matching (r name=‘Joe’), similar to approach #2.

(16)

The Projection Operation

• A simple approach based on sorting:

(1) Scan R and produce a set of tuples that contain only the proje cted attributes

(2) External Merge-Sort this set of tuples.

(3) Scan the sorted result, comparing adjacent tuples to discard d uplicates.

• What is the cost?

– _{Step (1) requires reading1,000 page I/Os, and writing out (assum} ing that projected attributes is ¼ size of a original tuple) 250 page I/Os.

– Step (2) requires (assuming 20 buffer pages) 2 passes (1 + log_B-1 250/B). The cost is 2*2*250 = 1000 page I/Os.

– Step (3) requires 250 page I/Os. – _{Total cost is 2,500 page I/Os.}

SELECT DISTINCT

R.sid,

R.bid

(17)

The Projection Operation

• An improved approach on sorting:

– (Combine step 1 with step 2): modify Pass 0 of exte

rnal sort to eliminate unwanted fields. Also use r eplacement sort to produce runs of about 2B pages.

– (Combine step 2 with step 3): Modify merging passes

to eliminate duplicates.

– Cost: In Pass 0, read original relation (1000 page

I/Os), write out same number of smaller tuples (250 page I/Os). In the one merging pass, read the runs (250 page I/Os). The total cost is 1500 page I/Os (compared to 2500 page I/Os in the simple approach)

SELECT DISTINCT

R.sid,

R.bid

(18)

Projection Based on

Hashing

• The basic idea is that duplicate tuples will be hashed to the same bucket. So you can look at tuples in each bucket and eliminate dup licates.

• Partitioning phase: Read R using one input buffer. For each tuple , discard unwanted fields, apply hash function h1 to choose one of B-1 output buffers.

– _{Result is B-1 partitions (of tuples with no unwanted fields). 2 tuples from}

different partitions guaranteed to be distinct.

• Duplicate elimination phase: For each partition, read it and build an in-memory hash table, using hash fn h2 (<> h1) on all fields, wh ile discarding duplicates.

– Why hashing 2nd time? To minimizing collisions (two unique tuples hashing to

the same bucket).

• Cost: In partitioning phase, read 1000 pages I/Os and write out 250 page I/Os. In elimination phase, read 250 page I/Os. Total is 1500.

(19)

Discussion of Projection

• Sort-based approach is the standard, why?

– No need to worry if distribution of hash function is

uniform.

– Result is sorted.

• If an index on the relation contains all

projected attributes in its search key, can do

index-only

scan.

– Apply projection techniques to data entries (much

(20)

Equality Joins With One Join

Column

SELECT * FROM Reserves R1, Sailors S1 WHERE R1.sid=S1.sid

• R ∞ S is very common, so must be carefully optimize d.

• R X S is large; so, R X S followed by a selection is inefficient.

• Assume: M pages in R, p_R tuples per page, N pages in S, pS tuples per page.

– In our examples, R is Reserves and S is Sailors.

(21)

Two Classes of Algorithms to

Implement Join Operation

• Algorithms in class 1 require enumerating al

l tuples in the cross-product and discard tu

ples that do not meet the join condition.

– Simple Nested Loops Join

– Blocked Nested Loops Join

• Algorithms in class 2 avoid enumerating the

cross-product.

– Index Nested Loops Join

– Sort-Merge Join

(22)

Simple Nested Loops Join

foreach tuple r in R do

foreach tuple s in S do

if ri == sj then add <r, s> to result

• For each tuple in the

outer

relation R, sca

n the entire

inner

relation S (scan S total

of

p

_R

* M

times!).

– Cost: M + p_R * M * N = 1000 + 100*1000*500

I/Os => very huge.

•

How can we improve simple nested loops join

(23)

Page Oriented Nested Loops

Join

foreach page of R do

foreach page of S do

for all matching tuples r in R-block and

s in S-page, add <r, s> to result

• Cost: M + MN = 1000 + 1000500 = 501,000

=> still huge.

•

If smaller relation (S) is outer, cost = 50

(24)

Block Nested Loops Join

foreach block of B-2 pages of R do

foreach page of S do

for all matching tuples r in R-block and

s in S-page, add <r, s> to result

• Use one page as an input buffer for scanning

the inner S, one page as the output buffer,

and use all remaining pages to hold ``block

’’ of outer R.

– For each matching tuple r in R-block, s in S-page

, add <r, s> to result. Then read next R-block, scan S, etc.

(25)

Block Nested Loops Join:

Efficient Matching Pairs

• If B is large, it may be slow to find match

ing pairs between tuples in S-page and R-bl

ock (R-block has B-2 pages).

• The solution is to build a main-memory hash

table for R-block.

. . .

. . . R & S

Hash table for block of R

Input buffer for SOutput buffer

. . .

(26)

Examples of Block Nested

Loops

• Cost: Scan of outer + #outer blocks * scan of inner

– #outer blocks = ceiling (# of pages of outer / blocksize)

• With Reserves (R) as outer, and 102 buffer pages:

– Cost of scanning R is 1000 I/Os; a total of 10 _blocks. – _{Per block of R, scan Sailors (S); 10*500 I/Os.}

– Total cost = 1000 + 10 * 500 = 6000 page I/Os => huge impro

vement over page-oriented nested loops join.

• With 100-page block of Sailors as outer:

– _{Cost of scanning S is 500 I/Os; a total of 5 blocks.} – Per block of S, we scan Reserves; 5*1000 I/Os.

– Total cost = 500 + 5*1000 = 5500 page I/Os

• For blocked access (block I/Os are more efficient), i

t may be best to divide buffers evenly between R and S.

(27)

Index Nested Loops Join

foreach tuple r in R do

foreach tuple s in S where ri == sj do add <r, s> to result

• If there is an index on the join column of one rel ation (say S), can make it the inner and exploit t he index.

– Cost: M + ( (M*p_R) * cost of finding matching S tuples)

• For each R tuple, cost of probing S index is about 1.2 for hash index, 2-4 for B+ tree. Cost of find ing S tuples (assume Alt. (2) or (3) for data entr ies) depends on clustering.

– _{Clustered index: 1 I/O (typical)}

(28)

Examples of Index Nested

Loops

• Hash-index (Alt. 2) on sid of Sailors (as inner):

– _{Scan Reserves: 1000 page I/Os, 100*1000 tuples.}

– _{For each Reserves tuple: 1.2 I/Os to get data entry in index, plus 1 I/O to get}

(the exactly one) matching Sailors tuple.

– _{Total: 100 0+ 100,000 * 2.2 = 221,000 I/Os.}

• Hash-index (Alt. 2) on sid of Reserves (as inner):

– _{Scan Sailors: 500 page I/Os, 80*500 tuples.}

– _{For each Sailors tuple: 1.2 I/Os to find index page with data entries, plus cos}

t of retrieving matching Reserves tuples.

– _{Assume uniform distribution, 2.5 reservations per sailor (100,000 / 40,000). Co}

st of retrieving them is 1 or 2.5 I/Os depending on whether the index is cluste red.

– _{Total (Clustered): 500 + 40,000 * 2.2 = 88,500 I/Os.}

• _{Given choices, put the relation with higher # tuples as inner}

loop.

(29)

Sort-Merge Join

• Sort R and S on the join column [merge-sort], then scan them to do a ``merge’’ (on join col.) [scan-merge], and output result tuples.

• Scan-merge:

– _{Advance scan of R until current R-tuple >= current S tuple}

, then advance scan of S until current S-tuple >= current R tuple; do this until current R tuple = current S tuple.

– At this point, all R tuples with same value in Ri (current

R group) and all S tuples with same value in Sj (current S group) match; output <r, s> for all pairs of such tuples.

(30)

Review External Merge Sort

(B=4)

8,7 6,2 9,4 3,4 5,6 3,12,1015,3 16,513,819,132,8 8,9 4,4 6,7 2,3 1,2 3,3 5,610,15 1,5 8,813,1619,32 1,1

Pass 0: Read four unsorted pages, sort

them, and write them out. Produce 3 runs

of 4 pages each.

Pass 1: Read three pages, one page from

each of 3 runs, merge them, and write them

2,2 3,3 …

merging

(31)

Scan-Merge

Gs Tr 1 1 2 3 4 5 6 2a 2b Ts 3a 3b 5a 5b 6a 6b

(32)

Cost of Merge-Sort Join

• R is scanned once; each S group is scanned once per matching R tuple. (Multiple scans of an S group ar e likely to find needed pages in buffer.)

• Cost: 2 M log M + 2 N log N + (M+N)

– Assume enough buffer pages to sort both Reserves and Sail

ors in 2 passes

– The cost of merge-sorting two relations is 2 M log M + 2

N log N.

– The cost of scan-merge two sorted relations is M+N.

– Total join cost: 2*2*1000 + 2*2*500 + 1000 + 500 = 7500 p

age I/Os.

(33)

Refinement of Sort-Merge

Join

• Combine the merging phase in

sorting

with the

scan-merge for the join.

– Allocate one buffer space for each run (in the mer ge pass) in R & S.

– Buffer size B > squar_root(L), where _Lis the siz e of the larger relation. Why?

• # runs = 2(L/2B) = L/B < B [replacement sort]

– Cost: read+write each relation in Pass 0 + read ea

ch relation in (only) merging pass (not counting the writing of result tuples).

– Cost goes down from 7500 to 4500 I/Os L

R S

scan-merge sort-merge

(34)

Hash Join

• Hash both relations on the join attribute u

sing the same hash function h.

– Tuples in R-partition_i (bucket) can only match with tuples in S-partition_i.

• For i=1..k, check for matching pairs in R-p

artition_i and S-partition_i.

• For efficient matching pairs, apply hashing

to tuples of R-partition using another hash

function h2.

(35)

Hash-Join

• Partition both rel ations using hash fn h: R tuples in partition i will o nly match S tuples in partition i. • Read in a partitio n of R, hash it us ing h2 (<> h!). Sc an matching partit ion of S, search f or matches. Partitions of R & S Input buffer

Hash table for partition Ri Output buffer Join Result hash fn h2 h2

B main memory buffers _Disk Disk Original Relation OUTPUT 2 INPUT 1 hash function h _B-1 Partitions 1 2 B-1

. . .

(36)

Observations on Hash-Join

• #partitions k < B-1 (need one buffer page for reading), and B-2 > size of largest partition to be held in memory .

• Assume uniformly sized partitions, and maximize k, we ge t:

– k= B-1, and B-2 > M/(B-1), i.e., B must be > square_root(M)

• If we build an in-memory hash table to speed up the matc hing of tuples, a little more memory is needed.

• If the hash function does not partition uniformly, one o r more R partitions may not fit in memory. Can apply ha sh-join technique recursively to do the join of this R-p artition with corresponding S-partition.

(37)

Cost of Hash-Join

• In partitioning phase, read+write both relns;

2 (M+N).

In matching phase, read both relns;

M+N

I/Os.

• In our running example, this is a total of 4500

I/Os.

• Sort-Merge Join vs. Hash Join:

– Same amount of buffer pages. – Same cost of 3(M+N) I/Os.

– Hash Join shown to be highly parallelizable.

– Sort-Merge less sensitive to data skew; result is sor

(38)

General Join Conditions

• So far, we have only discussed single equality join con dition.

• How about equalities over several attributes? (e.g., R .sid=S.sid AND R.rname=S.sname):

– _{For Index Nested Loops join, build index on} _{<sid, sname> on R}

(if R is inner); or use existing indexes on sid or sname.

– For Sort-Merge and Hash Join, sort/partition on combination o

f the two join columns.

• How about inequality conditions? (e.g., R.rname < S.sna me):

– For Index Nested Loops join, need (clustered!) B+ tree index. – _{Hash Join, Sort Merge Join not applicable.}

– _{Block Nested Loops join, quite likely to be the best join met}

(39)

Set Operations

• Intersection and cross-product are special cases of join.

– Intersection is equality on all fields in join condition. None f or cross-product.

• Union (Distinct) and Except similar; we’ll do union.

• Sorting based approach to union:

– _{Sort both relations (on combination of all attributes).} – Scan sorted relations and merge them.

– _Alternative_{: Merge runs from Pass 0 for}_both_relations.

• Hash based approach to union:

– _{Partition R and S using hash function}_h_.

– _{For each S-partition, build in-memory hash table (using}_h2_{), scan} corr. R-partition and add tuples to table while discarding duplic ates.

(40)

Aggregate Operations (

AVG,

MIN

, etc.)

• Without grouping:

– _{In general, requires scanning the relation and maintain s}

ome running information (sum of age & number of sailors).

– Given index whose search key includes all attributes in t

he SELECT or WHERE clauses, can do index-only scan. • With grouping:

– Sort on group-by attributes (rating), then scan relation

and compute aggregate for each group.

– Similar approach based on hashing on rating.

– Given tree index whose search key includes all attributes

in SELECT, WHERE and GROUP BY clauses, can do index-only sca

n; if group-by attributes form prefix of search key, can retrieve data entries/tuples in group-by order.

SELECT AVG(S.age) FROM Sailors S

(41)

Impact of Buffering

• If several operations are executing concurrently,

they share buffer pool; therefore it is difficult to

estimate the number of available buffer pages.

• Repeated access patterns interact with buffer

replacement policy.

– e.g., Inner relation is scanned repeatedly in Simple

Nested Loop Join. With enough buffer pages to hold inner, replacement policy does not matter. Otherwise, replacement policy is critical. MRU is best, LRU is

worst (sequential flooding).