Database Systems (資料庫系統) Chapter 14

(1)

Database Systems

( 資料庫系統 )

December 6, 2004

Chapter 14

(2)

Announcement

• Next week reading: Chapter 15

• Assignment #9 will be out today.

(3)

Evaluating Relational Operators

(4)

Outline

• Continue from Chapter 12 (Overview of Query

Evaluation)

• Discuss more on the different evaluation algorithms for

relational operators:

– Selection – Projection – Join – Set Operations – Aggregate

(5)

Relational Operations

• We will consider how to implement:

– _Selection_{: Selects a subset of rows from relation.} – Projection: Deletes unwanted columns from relation. – _Join:_{Allows us to combine two relations.}

– Set-difference: Tuples in reln. 1, but not in reln. 2. – _Union_{: Tuples in reln. 1 and in reln. 2.}

– _Aggregation₍_{SUM, MIN}_{, etc.) and}_{GROUP BY}

• Since each op returns a relation, ops can be

composed

!

After we cover the operations, we will discuss how to opti

mize queries formed by composing them.

(6)

Schema for Examples

• Similar to old schema; rname added for variations.

– Rname is the name of the agent making the reservation on behalf of a sailor.

• Reserves:

– _{Each tuple is 40 bytes long, 100 tuples per page, 1000 pages.}

• Sailors:

– Each tuple is 50 bytes long, 80 tuples per page, 500 pages.

•

Again, we will look at cost measured in

# of disk I/Os

.

Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer, day: dates, rname: string)

(7)

Selection Operation

• Several alternative algorithms for selection

• No algorithm is always better (disk I/O costs) than the others,

depending on the file organizations (sorted, clustered index, etc.). • Consider four situations:

– No Index, Unsorted Data – No Index, Sorted Data – B+ Tree Index

– Hash Index

• Two types of selection conditions:

– Conjunctions (term_1 AND term_2)

SELECT *

FROM Reserves R WHERE R.rname=‘Joe’

(8)

No Index for Selections

• Unsorted Data, no Index (on rname)

– Algorithm: scan the file

– Cost: 1000 page I/Os

• Sorted Data (on rname), no Index (on rname)

– Algorithm: binary search to find the first qualifying tuple, then sca n to find remaining qualifying tuples.

– Cost of binary search is O(log2 1000) ~ 10 page I/Os.

– Cost of scanning: number of pages containing qualifying tuples.

SELECT *

FROM Reserves R WHERE R.rname=‘Joe’

(9)

Using B+ Tree Index for

Selections

• Cost depends on (#qualifying tuples), and (whether the inde

x is clustered or not).

– _{Cost of finding qualifying data entries (typically small) plus cost of ret}

rieving tuples (could be large without clustering).

– _{In example, assuming uniform distribution of names, about 10% of tu}

ples qualify (100 pages, 10,000 tuples).

– With clustered index, cost is little more than 100 page I/Os. – _{With unclustered index, cost can be up to 10,000 page I/Os!}

• _{Why? Each qualified tuple may be on a different page. This is worse th} an the naïve file scan (1,000 page I/Os).

(10)

Refinement for Unclustered Indexe

s

1. Find qualifying data entries.

2. Sort the rid’s of the data tuples to be retrieved (e.g., by their pageNo s)

RID = <pageNO, slotNo>

3. Fetch rids in order. This ensures that each data page is looked at ju st once.

• The cost is bounded by the number of page I/Os (no wor

se than file scan).

(11)

Using Hash Index for Selections

• Good only for equality selection conditions.

– Does not apply to range selections.

• The cost of using the hash index is a sum of

– Retrieving the bucket (overflow) page in the index requires only 1~2 page I/Os.

– Retrieving all qualified tuples requires up to (# qualified tuples) p age I/Os, given that each qualified tuple can be on different page .

(12)

General Selections

• So far, we have considered simple selection condition in th

e form of (

R.rname=‘Joe’

).

• A general selection condition is a boolean combination of

– (term_1 op term_2 … op term_n), where op can be AND | OR.

• Given a general selection condition:

(1) rewrite the general condition in conjunctive normal form (CNF):

(day < 8/9/02 AND rname = ‘Joe’) OR bid=5 OR sid=3 Can be transformed into

(day < 8/9/02 OR bid=5 OR sid=3) AND (rname=‘Joe’ OR bid=5 OR sid= 3)

(2) Try to match conjunct with index (Chapter 12)

(13)

Two Approaches in Evaluating

Selections with Conjunction

• First approach:

Find the

most selective access path

, retrie

ve tuples using it, and apply any remaining terms that don’

t

match

the index:

– Most selective access path: An index or file scan that we estimate

will require the fewest page I/Os.

– The selectivity of a conjunct is proportional to the reduction factor

of applying this conjunct to the tuples.

– Consider day<8/9/94 AND bid=5 AND sid=3. A B+ tree index on d

ay can be used; then, bid=5 and sid=3 must be checked for each r

etrieved tuple. Similarly, a hash index on <bid, sid> could be used ; day<8/9/94 must then be checked.

(14)

Intersection of Rids

• Second approach

(if we have 2 or more matching indexes th

at use Alternatives (2) or (3) for data entries):

– _{Get sets of rids of data tuples using each matching index.} – _Then_intersect_these_{sets of rids.}

– Retrieve the tuples and apply any remaining terms. – _{(Intersection on rids,}_not_{on the tuples)}

•

Example:

day<8/9/94 AND bid=5 AND sid=3

.

– If we have a B+ tree index on day and an index on sid, both using Alt

ernative (2), we can retrieve rids of tuples satisfying day<8/9/94 usin g the first, rids of recs satisfying sid=3 using the second, intersect, re trieve tuples and check bid=5.

(15)

Selections with Disjunction

• Consider the example #1:

(day < 8/9/02 OR rname=‘Joe’), index on rname

– We could check (rname=‘Joe’) using index, but evaluating (day < 8/9/02) requires a file scan, so might as well as checking for co ndition during file scan.

– File scan is the most selective path.

• Consider example #2:

(day < 8/9/02 OR rname=‘Joe’) AND sid=3, indexes on rname and sid

– (sid = 3) is the most selective path, similar to approach #1.

• Consider example #3:

(day < 8/9/02 OR rname=‘Joe’), indexes on day and rname

(16)

The Projection Operation

• A simple approach based on sorting:

(1) Scan R and produce a set of tuples that contain only the projected attributes

(2) External Merge-Sort this set of tuples.

(3) Scan the sorted result, comparing adjacent tuples to discard duplic ates.

•

What is the cost?

– Step (1) requires reading1,000 page I/Os, and writing out (assuming

that projected attributes is ¼ size of a original tuple) 250 page I/Os.

– Step (2) requires (assuming 20 buffer pages) 2 passes (1 + log_B-1 2

50/B). The cost is 2*2*250 = 1000 page I/Os.

– _{Step (3) requires 250 page I/Os.} – _{Total cost is 2,500 page I/Os.}

SELECT

DISTINCT

R.s

id, R.bid

(17)

The Projection Operation

• An improved approach on sorting:

– _{(Combine step 1 with step 2):}_{modify Pass 0 of external sort to elimi}

nate unwanted fields. Also use replacement sort to produce runs o f about 2B pages.

– (Combine step 2 with step 3): Modify merging passes to eliminate d

uplicates.

– Cost: In Pass 0, read original relation (1000 page I/Os), write out s

ame number of smaller tuples (250 page I/Os). In the one merging pass, read the runs (250 page I/Os). The total cost is 1500 page I/ Os (compared to 2500 page I/Os in the simple approach).

SELECT DISTINCT

R.sid, R.bid

(18)

Projection Based on Hashing

• The basic idea is that duplicate tuples will be hashed to the same bucke

t. So you can look at tuples in each bucket and eliminate duplicates.

• Partitioning phase: Read R using one input buffer. For each tuple, disc ard unwanted fields, apply hash function h1 to choose one of B-1 outpu t buffers.

– _{Result is B-1 partitions (of tuples with no unwanted fields). 2 tuples from diff}

erent partitions guaranteed to be distinct.

• Duplicate elimination phase: For each partition, read it and build an in-memory hash table, using hash fn h2 (<> h1) on all fields, while discardi ng duplicates.

– Why hashing 2nd time? Minimize collisions (two unique tuples hashing to the

same bucket).

• Cost: In partitioning phase, read 1000 pages I/Os and write out 250 page I/Os. In elimination phase, read 250 page I/Os. Total is 1500.

(19)

Discussion of Projection

• Sort-based approach is the standard, why?

– No need to worry if distribution of hash function is uniform. – Result is sorted.

• If an index on the relation contains all projected attributes in

its search key, can do

index-only

scan.

(20)

Equality Joins With One Join Column

• R ⋈ S is very common, so must be carefully optimized.

• R x S is large; so, R x S followed by a selection is ineffici

ent.

• Assume: M tuples in R, p

R

tuples per page, N tuples in S,

p

S

tuples per page.

– In our examples, R is Reserves and S is Sailors.

• We will consider more complex join conditions later.

• Cost metric

: # of page I/Os.

SELECT *

FROM Reserves R1, Sailors S1 WHERE R1.sid=S1.sid

(21)

Two Classes of Algorithms to

Implement Join Operation

• Algorithms in class 1 require enumerating all tuples in th

e cross-product and discard tuples that do not meet the j

oin condition.

– Simple Nested Loops Join – Blocked Nested Loops Join

• Algorithms in class 2 avoid enumerating the cross-produ

ct.

– Index Nested Loops Join – Sort-Merge Join

(22)

Simple Nested Loops Join

• For each tuple in the outer relation R, scan the inner relation S

– Scan outer table R once, but inner table S pR * M times!

– Cost: M + p_R * M * N = 1000 + 100*1000*500 I/Os => very huge.

– _{Want to select the smaller table as the inner table}

• Simple improvement Page-oriented Nested Loops join

– For each page of R, get each page of S, and write out matching pairs of tu ples (scan S total of M times) => require 3 buffer pages

– Cost: M + M*N = 1000 + 1000*500 => still huge.

– If smaller relation (S) is outer, cost = 500 + 500*1000

• Given buffer pages, what improvement can you make? foreach tuple r in R do

foreach tuple s in S do

(23)

Block Nested Loops Join

• Use one page as an input buffer for scanning the inner S.

• Use one page as the output buffer.

• Use remaining (B-2) pages to hold ``block’’ of outer R.

– _{For each matching tuple r in R-block & s in S-page, add <r, s> to result.} – _{Continue with the next R-block.}

• How many times is the inner table (S) scanned?

– Ceiling ( M / (B-2) )

foreach block of B-2 pages of R do foreach page of S do

for all matching tuples r in R-block and s in S-page, add <r, s> to result

(24)

Block Nested Loops Join: Efficient

Matching Pairs

• Large B -> slow to match S-tuple with tuples in R-block (R-block has B-2 pages).

• Solution: build in-memory hash table for R-block.

. . .

R & S

Hash table for block of R

Input buffer for S Output buffer

. . .

(25)

Examples of Block Nested Loops

• Cost: Scan of outer + #outer blocks * scan of inner

• With Reserves (R) as outer, and 102 buffer pages:

– _{Cost of scanning R is 1000 I/Os; a total of 10 R-blocks.} – _{For each R-block, S is scanned once; 10*500 I/Os.}

– Total cost = 1000 + 10 * 500 = 6000 page I/Os => huge improvement over page-oriented nested loops join.

• With 100-page block of Sailors as outer:

– Cost of scanning S is 500 I/Os; a total of 5 blocks. – _{For each S-block, R is scanned once; 5*1000 I/Os.} – _{Total cost = 500 + 5*1000 = 5500 page I/Os}

• _{For blocked access (block I/Os are more efficient), it may be best to}

(26)

Index Nested Loops Join

• Block/simple nested loops joins do not use existing (or create) index. • If there is an index on the join column of one relation (say S), can mak

e it the inner and exploit the index.

– Cost: M + ( (M*pR) * cost of finding matching S tuples)

• For each R tuple, cost of probing S index is about 1.2 for hash index, 2-4 for B+ tree. Cost of then finding S tuples (assuming Alt. (2) or (3) f or data entries) depends on clustering.

– _{Clustered index: 1 I/O (typical) for all matching S tuples} – _{Unclustered index: up to 1 I/O per matching S tuple.}

foreach tuple r in R do

foreach tuple s in S where ri == sj do

(27)

Examples of Index Nested Loops

• Hash-index (Alt. 2) on sid of Sailors (as inner):

– _{Scan Reserves: 1000 page I/Os, 100*1000 tuples.}

– _{For each Reserves tuple: 1.2 I/Os to get data entry in index + 1 I/O to get (the ex}

actly one) matching Sailors tuple.

– _{Total: 1000 + 100,000 * 2.2 = 221,000 I/Os.}

• Hash-index (Alt. 2) on sid of Reserves (as inner): – _{Scan Sailors: 500 page I/Os, 80*500 tuples.}

– _{For each Sailors tuple: 1.2 I/Os to find index page with data entries + cost of retri}

eving matching Reserves tuples.

– _{Assuming uniform distribution, 2.5 reservations per sailor (100,000 / 40,000). Co}

st of retrieving them is 1 or 2.5 I/Os depending on whether the index is clustered.

– _{Total (Clustered): 500 + 40,000 * 2.2 = 88,500 I/Os.}

• _{Given choices, put the relation with higher # tuples as inner loop.} • _{Index Nested Loop performs better than simple nested loop.}

• _{How does it compare to block nested loop?} – _{Depend on the # matching tuples}

(28)

Sort-Merge Join

• Sort R and S on the join column, then scan them to do a ``m

erge’’ (on join col.), and output result tuples.

– _{Advance scan of R until current R-tuple >= current S tuple, then adv}

ance scan of S until current S-tuple >= current R tuple; do this until c urrent R tuple = current S tuple.

– _{At this point, all R tuples with same value in Ri (current R group) and}

all S tuples with same value in Sj (current S group) match; output <r, s> for all pairs of such tuples.

(29)

Example of Sort-Merge Join

(Scan-Merge Two Sorted Relations)

sid bid day rname

28 103 12/4/96 guppy 28 103 11/3/96 yuppy 31 101 10/10/96 dustin 31 102 10/12/96 lubber Gs Tr 1 1 2 3 4 5 6 2a 2b Ts 3a 3b 5a 5b 6a 6b

(30)

Cost of Merge-Sort Join

• R is scanned once; each S group is scanned once per m

atching R tuple. (Multiple scans of an S group are likely t

o find needed pages in buffer.)

• Cost: 2 M log M + 2 N log N + (M+N)

– _{The cost of merge-sorting two relations is 2 M log M + 2 N log N}

.

– With 35, 100 or 300 buffer pages, both Reserves and Sailors ca

n be sorted in 2 passes

– _{The cost of scan-merge two sorted relations is M+N.}

– _{Total join cost: 2*2*1000 + 2*2*500 + 1000 + 500 = 7500 page I/}

(31)

Refinement of Sort-Merge Join

• We can combine the merging phases in the sorting of R an

d S with the merging required for the join.

– _{We need to have buffer size B > , where L is the size of the larg}

er relation. Why? It means that #runs after Pass 0 for both relation s will be < B (#runs is 2(L/2B) = L/B < B), then you can do the next t hing …

– _{Allocate 1 page per run of each relation, and `merge’ while checking}

the join condition.

– Cost: read+write each relation in Pass 0 + read each relation in (onl

y) merging pass (+ writing of result tuples).

– _{In example, cost goes down from 7500 to 4500 I/Os.}

L

(32)

Hash Join

• Hash both relations (R & S) on the join attribute using the s

ame hash function h.

– Create two hash tables called Partition R & Partition S

• Tuples in Partition R

i

only match tuples in Partition S

i

!

– Similar idea to sort merge join

• For each bucket i, locate matching tupes in Partition R

i

and

Partition S

i

– For efficient matching pairs, (1) hash tuples in Partition Ri using anot her hash function h2, and (2) for each tuple in Partition Si , use h2 to find matching R tuples.

(33)

Hash-Join

• Partition both relations u sing hash fn h: R tuples in partition i will only ma tch S tuples in partition i .

• Read in a partition of R, hash it using h2 (<> h!). Scan matching partition of S, search for matche s.

Partitions of R & S

Hash table for partition Ri Join Result hash fn h2 h2

B main memory buffers _Disk Disk Original Relation OUTPUT 2 INPUT 1 hash function h _B-1 Partitions 1 2 B-1

. . .

(34)

Observations on Hash-Join

• #partitions k < B-1

(need one buffer page for reading), and

B-2

> size of largest partition

to be held in memory. Assuming unif

ormly sized partitions, and maximizing k, we get:

– k= B-1, and M/(B-1) < B-2, i.e., B must be >

• If we build an in-memory hash table to speed up the matching

of tuples, a little more memory is needed.

• If the hash function does not partition uniformly, one or more R

partitions may not fit in memory. Can apply hash-join techniqu

e recursively to do the join of this R-partition with correspondin

g S-partition.

(35)

Cost of Hash-Join

• In partitioning phase, read+write both relns;

2(M+N)

. In m

atching phase, read both relns;

M+N

I/Os.

• In our running example, this is a total of 4500 I/Os.

• Sort-Merge Join vs. Hash Join:

– Same amount of buffer pages. – _{Same cost of}_3(M+N)_I/Os.

– Hash Join shown to be highly parallelizable.

(36)

General Join Conditions

• So far, we have only discussed single equality join condition.

• How about equalities over several attributes? (e.g., R.sid=S.sid AND R.

rname=S.sname):

– _{For Index Nested Loops join, build index on}_{<sid, sname> on R}_{(if R is inne} r); or use existing indexes on sid or sname.

– _{For Sort-Merge and Hash Join, sort/partition on combination of the two join} columns.

• How about inequality conditions? (e.g., R.rname < S.sname):

– _{For Index Nested Loops join, need (clustered!) B+ tree index.} – Hash Join not applicable.

(37)

Set Operations

• Intersection and cross-product are special cases of join.

– Intersection is equality on all fields in join condition. None for cross-prod uct.

• Union (Distinct) and Except similar; we’ll do union. • Sorting based approach to union:

– _{Sort both relations (on combination of all attributes).} – _{Scan sorted relations and merge them.}

• Hash based approach to union:

– _{Partition R and S using hash function h.}

– _{For each S-partition, build in-memory hash table (using h2), scan corr.} R-partition and add tuples to table while discarding duplicates.

(38)

Aggregate Operations (

AVG, MIN

,

etc.)

• Without grouping:

– _{In general, requires scanning the relation and maintain some running infor} mation (sum of age & number of sailors).

– Given index whose search key includes all attributes in the _SELECT or _WHE

RE clauses, can do index-only scan.

• With grouping:

– _{Sort on group-by attributes (rating), then scan relation and compute aggreg} ate for each group.

– Similar approach based on hashing on rating.

– _{Given tree index whose search key includes all attributes in}_{SELECT, WHER}

E and GROUP BY clauses, can do index-only scan; if group-by attributes for m prefix of search key, can retrieve data entries/tuples in group-by order.

SELECT AVG(S.age) FROM Sailors S

(39)

Impact of Buffering

• If several operations are executing concurrently, they share

buffer pool; therefore it is difficult to estimate the number of

available buffer pages.

• Repeated access patterns interact with buffer replacement

policy.

– _{e.g., Inner relation is scanned repeatedly in Simple Nested Loop}

Join. With enough buffer pages to hold inner, replacement policy does not matter. Otherwise, replacement policy is critical. MRU is best, LRU is worst (sequential flooding).

– Does replacement policy matter for Block Nested Loops? (No,