Database Systems (資料庫系統) Lecture #10

(1)

Database Systems

( 資料庫系統 )

December 5, 2005

Lecture #10

(2)

Announcement

• The submission deadlines for

Assignment #4 & Practicum #2 are

extended to Wednesday.

• Assignment #5 will be out on the

course homepage.

– It is due in two weeks.

– TA will explain assignment #5 at the end

(3)

Ubicomp Project of the Week

• Cellular Squirrel (MIT Media Lab)

• Found cell phones ringing during meeting

disruptive?

–

But you don’t want to lose the calls

(4)

Overview of Query

Evaluation

(5)

Outline

• Query evaluation (Overview)

• Relational Operator Evaluation Algorithms

(Overview)

• Statistics and Catalogs

• Query Optimization (Overview)

• Example

(6)

Tables

Sailors(sid, sname, rating, age)

Reserves(sid, bid, day, rname)

(7)

• Given a SQL query, we would like to

find an

efficient plan

(minimal num

ber of disk I/Os) to evaluate it.

• What are general steps of a SQL que

ry evaluation?

• Step 1: a query is translated into

a

relational algebra tree

– σ(selection), π (projection), and ∞ (j

oin)

SELECT S.sname

FROM Reserves R, Sailors S

WHERE R.sid=S.sid ^ R.bid=100 ^ S.rating>5

Overview of Query Evaluation

(1)

Reserves Sailors

∞ sid=sid

σbid=100 ^ rating > 5 πsname

(8)

Overview of Query Evaluation

(2)

• Step 2: Find a good

evaluation plan (Query

Optimization)

– Estimate costs for several

alternative equivalent evaluation plans.

– Different order of

evaluations gives different cost (e.g., push selection (bid=100) before join)

• How does it affect the cost

(assume join is computed by cross-product + selection)?

∞ sid=sid

σbid=100 ^ rating > 5 π sname

(9)

Overview of Query Evaluation

(3)

• (continue step 2)

– An evaluation plan is

consisted of choosing access method &

evaluation algorithm.

– Selecting an access method

to retrieve records for each table on the tree (e.g., file scan, index search, etc.)

– Selecting an evaluation algorithm for each

relational operator on the tree (e.g., index nested loop join, sort-merge join, etc.) Reserves Sailors ∞ sid=sid σbid=100 ^ rating > 5 πsname (index nested loop) (on-the-fly) (on-the-fly) (file scan) (file scan)

(10)

Overview of Query Evaluation

(4)

• Two main issues in query optimization:

–

For a given query, what plans are considered?

• Consider a few plans (considering all is too many &

expensive), and find the one with the cheapest (estimated) cost.

–

How is the cost of a plan estimated?

• Examine catalog table that has data schemas and

statistics.

• There are system-wide factors that can also affect

cost, such as size of buffer pool, buffer replacement algorithm.

(11)

Statistics and Catalogs

• Need information about the relations and indexes involved

.

Catalogs

typically contain at least:

– _{# tuples (NTuples)} _and_{# pages (NPages)} _{for each table.} – _{# distinct key values (NKeys)} _{and NPages for each index.}

– _{Index height, low/high key values (Low/High)} _{for each tree index.}

•

How are they used to estimate the cost? Consider:

– Reserves ⋈ _{reserves.sid = sailors.sid} Sailors (assume simple nested l

oop join)

Foreach tuple r in reserves Foreach tuple s in sailors

If (r.sid = s.sid) then add <r,s> to the results – Sailors (⋈ σ _{bid = 10} Reserves)

(12)

Statistics and Catalogs

• Catalogs are updated periodically.

–

Updating whenever lots of data changes; lo

ts of approximation anyway, so slight inco

nsistency is ok.

• More detailed information (e.g., histog

rams of the values in some fields) are

sometimes stored.

–

They can be used to estimate # tuples matc

hing certain conditions (bid > 5)

(13)

Relational Operator

Evaluation

• There are several alternative algorithms for

implementing each relational operator (selection,

projection, join, etc.).

• No algorithm is always better (disk I/O costs) than

the others. It depends on the following factors:

– Sizes of tables involves

– Existing indexes and sort orders

– Size of buffer pool (Buffer replacement policy)

• Describe (1) common techniques for relational

operator algorithms, (2) access path, and (3)

details of these algorithms.

(14)

Some Common Techniques

• Algorithms for evaluating relational operators us

e some simple ideas repeatedly:

– Indexing: If a selection or join condition is specifi ed (e.g., σ _{bid = 10} Reserves), use an index (<bid>) to retrieve the tuples that satisfy the condition.

– Iteration: Sometimes, faster to scan all tuples even if there is an index ₍σ _{bid ≠ 10} Reserves, bid = 1 .. 1 000). Sometimes, scan the data entries in an index ins tead of the table itself. ₍π _bid Reserves).

– Partitioning: By using sorting or hashing, partition t he input tuples and replace an expensive operation by similar operations on smaller inputs (e.g., π _{sid, rname} Reserves)

(15)

Access Paths

• An access path is a method of retrieving tuples.

– Note that every relational operator takes one or two tables as its input.

– There are two possible methods: (1) file scan, or (2) index that matches a selection condition.

• Can we use an index for a selection condition? How d

oes an index match a selection condition?

– Selection condition can be rewritten into Conjunctive Normal Form (CNF), or a set of terms (conjuncts) connected by ^ (an d).

• Example: (rname=‘Joe’) ^ (bid = 5) ^ (sid = 3)

– Intuitively, an index matches conjuncts means that it can be used to retrieve (just) tuples that satisfy the conjunct.

(16)

16

Access Paths for Tree

Index

• A tree index matches conjuncts that involve only at

tributes in a prefix of its index search key.

– E.g., Tree index on <a, b, c> matches the selection condi tion (a=5 ^ b=3), (a=5 ^ b>6), but not (b=0).

– Tree index on <a, b, c>: <a0, b0, c0>, <a0, b0, c1>, <a0, b0, c2>, …, <a0, b1, c0>, <a0, b1, c1>, …<a1, b0, c0>, …

– Can match range condition (a=5 ^ b>3).

–

How about (a=5 ^ b=3 ^ c=2 ^ d=1)?

– (a=5 ^ b=3 ^ c=2) is called primary conjuncts. Use index t o get tuples satisfying primary conjuncts, then check the remaining condition (d=1) for each retrieved tuple.

–

How about two indexes <a,b> & <c,d>?

– Many access paths: (1) use <a,b> index, (2) use <c,d> inde x, …

(17)

Access Paths for Hash Index

• A hash index matches a conjunct that has a term

attribute = value for every attribute in the search

key of the index.

– E.g., Hash index on <a, b, c> matches (a=5 ^ b=3 ^

c=5), but it does not match (b=3), (a=5 ^ b=3), or (a>5 ^ b=3 ^ c=5).

• Compare to Tree Index:

– Cannot match range condition.

(18)

A Note on Complex

Selections

(day<8/9/94 AND rname=‘Paul’) OR bid=5 OR

sid=3

• Selection conditions are first converted to

conjunctive normal form (CNF):

(day<8/9/94

OR

bid=5

OR

sid=3 )

AND

(rname

=‘Paul’

OR

bid=5

OR

sid=3)

• We only discuss case with no ORs; see text

if you are curious about the general case.

(19)

Selectivity of Access Paths

• Selectivity of an access path

is the number of pag

e I/Os needed to retrieve the tuples satisfying th

e desired condition.

– Obviously, we want to use the most selective access path (with the fewest page I/Os).

• Possible access paths for selections:

– Use an index that matches the selection condition.

– Scan the file records.

– Scan the index (e.g., π _bidReserves, index on bid)

• Access path using index may not be the most select

ive!

(20)

Selection

1. Find the

most selective access path

2. Retrieve tuples using it

3. Apply any remaining terms that don’t

match

the

index

• Consider

σ

_{day<8/9/94 ^ bid=5 ^ sid=3}

.

– A B+ tree index on day can be used; then, (bid=5 ^ sid= 3) must be checked for each retrieved tuple.

– A hash index on <bid, sid> could be used; day<8/9/94 mu st then be checked.

SELECT (*)

FROM Reserves R

(21)

Example

• Use the following example to estimate page I/O cost of

different algorithms.

• Sailors( sid:integer, sname:string, rating:integer, age

:real)

– Each Sailor tuple is 50 bytes long

– A page is 4KB. It can hold 80 sailor tuples.

– We have 500 pages of Sailors (total 40,000 sailor tuples).

• Reserves( sid:integer, bid:integer, day:dates, rname:st

ring)

– Each reserve tuple is 40B long

– A page is 4KB. It can hold 100 reserve tuples.

(22)

Reduction Factor & Catalog

Stats

• Reduction factor

is the fraction of tuples in the tab

le that satisfy a given conjunct.

• Example #1:

– Index H on Sailors with search key <bid>

– Selection condition (bid=5)

– Stats from Catalog: NKeys(H) = # of distinct key values = 10

– Reduction factor = 1 / NKeys(H) = 1/10

• Examples #2:

– Index on Reserves <bid, sid> (not a key, not stats on them)

– Selection condition (bid=5 ^ sid=3)

– Typically, can use default fraction of 1/10 for each conjunc t.

(23)

More on Reduction Factor

• Examples 3:

–

Range condition as (day > 9/1/2002)

–

Index Key T on day

–

Stats from Catalog: High(T) = highest day value

, Low(T) = lowest day value.

–

Reduction factor = (High(T) – value) / (High

(T) – Low(T))

–

Say: High(T) = 12/31/2002, Low(T) =1/1/2002

–

Reduction factor = 1/3

(24)

Using an Index for

Selections

• Cost depends on

#matched tuples

and

clustering

.

– Cost of finding qualifying data entries (typically small)

plus cost of retrieving records (could be large w/o cluste ring)

• _{Why large? Each matched entry could be on a different page.}

– Assume uniform distribution, 10% of tuples matched (100 pa

ges, 10,000 tuples).

• _{With a clustered index on <rid>, cost is little more than 100 I/Os}

;

• _{If unclustered, worse case is 10,000 I/Os.} • _{Faster to do file scan => 1,000 I/Os}

SELECT * FROM Reserves R WHERE R.rid mod 10 = 1

(25)

Projection

• Projection drops columns not in the select a

ttribute list.

• The expensive part is removing duplicates.

• If no duplicate removal,

–

Simple

iteration

through table.

–

Given index <sid, bid>, scan the index entries.

• If duplicate removal, use sorting (partition

ing)

(1) Scan table to obtain <sid, bid> pairs

(2) Sort pairs based on <sid, bid>

(3) Scan the sorted list to remove adjacent duplic

ates.

SELECT DISTINCT R.sid, R.bid

(26)

More on Projection

• Some optimization by combining sorting

with projection (talk more on sorting in

Chapter 13)

• Hash-based projection:

– Hash on <sid, bid> (#buckets = #buffer

pages).

– Load buckets into memory one at a time and

eliminate duplicates.

(27)

Join: Index Nested Loops

• There exists an index <sid> for Sailors.

• Index Nested Loops:

Scan R, for each tuple in R, th

en use index to find matching tuple in S.

• Say we have unclustered hash index <sid> in Sailors

. What is the cost of join operation?

– Scan R = 1000 I/Os

– R has 100,000 tuples. For each R tuple, retrieve index pa ge (1.2 I/Os on average for hashing) and data page (1 I/O) .

– Total cost = 1,000 + 100,000 *(1.2 + 1) = 221,000 I/Os.

foreach tuple r in R do

foreach tuple s in S where r

.sid

= s

.sid

do

add <r, s> to result

Reserves (R) ⋈

Sailors (S)

(28)

Join: Sort-Merge

• It does not use any index.

• Sort R and S on the join column

• Scan sorted lists to find matches, li

ke a

merge

on join column

• Output resulting tuples.

• More details about Sort-Merge in Chap

ter 14.

(29)

Example of Sort-Merge

Join

• The cost of a merge sort is 2*M log_(B-1) M, whereas M = #pages, B = size of buffer.

– B-1 is quite large, so log(B-1) M is generally just 2.

• Total cost = cost of sorting R & S + Cost of merging = 2*2*(10 00+500) + (1000+500) = 7500 I/Os. (a lot less than index nest ed loops join!)

(30)

Index Nested Loop Join vs.

Sort-Merge Join

• Sort-merge join does not require a pre-existing index, a

nd …

– Performs better than index-nested loop join.

– Resulting tuples sorted on sid.

• Index-nested loop join has a nice property:

incremental

.

– The cost proportional to the number of Reserves tuples.

• Cost = #tuples(R) * Cost(accessing index+record)

– Say we have very selective condition on Reserves tuples.

• Additional selection: R.bid=101

• Cost small for index nested loop, but large for sort-merge join (sort Sa ilors & merging)

– Example of considering the cost of a query (including the selec tion) as a whole, rather than just the join operation → query optimization.

Reserves (R) ⋈ Sailors (S)

(31)

Other Relational Operators

• Discussed simple evaluation algorithms for

–

Projection, Selection, and Join

• How about other relational operators?

–

Set-union, set-intersection, set-difference, etc.

–

The expensive part is duplicate elimination (same

as in projection).

–

How to do R1 U R2 ?

• SQL aggregation (group-by, max, min, etc.)

–

Group-by is typically implemented through sorting

(without search index on group-by attribute).

–

Aggregation operators are implemented using count

ers as tuples are retrieved.

(32)

Query Optimization

• Find a good plan for an entire query consisting of

many relational operators.

– So far, we have seen evaluation algorithms for each

relational operator.

• Query optimization has two basic steps:

– Enumerate alternative plans for evaluating the query – Estimating the cost of each enumerated plan & choose

the one with lowest estimated cost.

• Consult the catalog for estimation.

• Estimate size of result for each operation in tree (reduction

factor).

(33)

Motivating Example

• Cost: 1000+1000*500 I/Os • Misses several

opportunities: selections could have been `pushed’ earlier, no use is made of any available indexes, etc.

• Goal of optimization: To

find more efficient plans that compute the same answer.

SELECT S.sname

FROM Reserves R, Sailors S WHERE R.sid=S.sid AND

R.bid=100 AND S.rating>5

RA Tree:

Reserves Sailors ∞ sid=sid σbid=100 ^ rating > 5 πsname Reserves Sailors ∞ sid=sid σbid=100 ^ rating > 5 πsna me (block nested loop) (on-the-fly) (on-the-fly) (file scan) (file scan)

Plan:

(34)

Pipeline Evaluation

• How to avoid the cost of

physically writing out i

ntermediate results betw

een operators?

– Example: between join and selection

– Use pipelining, each join tuple (produced by join) i s (1) checked with selecti on condition, and (2) proj ected out on the fly, befo re being physically writte n out (materialized). Reserves Sailors ∞ sid=sid σbid=100 ^ rating > 5 πsna me (simple nested loop) (on-the-fly) (on-the-fly) (file scan) (file scan)

(35)

Alternative Plan 1

(No Indexes)

• Main difference: push selects.

• With 5 buffers, cost of plan:

– _{Scan Reserves (1000 pages) for selection + write temp T1 (10}

pages, if we have 100 boats, uniform distribution).

– _{Scan Sailors (500 pages) for selection + write temp T2 (250 pages,}

if we have 10 ratings, uniform distribution).

– _{Sort T1 (2*2*10), sort T2 (2*4*250), merge (10+250)}

– _{Total: (1000 + 500 + 10 + 250) + (40 + 2000 + 260) = 4060 page} I/Os.

• If we used BNL join, join cost = 10+4*250, total cost = 2770.

• If we `push’ projection, T1 has only sid, T2 only sid and sname:

– _{T1 fits in 3 pages, cost of BNL drops to under 250 pages,}_{total <}

Reserves Sailors ∞ sid=sid πsname (sort merge join) (on-the-fly)

(scan: write to temp)(scan: write to temp)

σrating > 5 σbid=100

πsid

πsid, snam e

(36)

Alternative Plan 2 With

Indexes

• With clustered index on bid of Reserve s, we get 100,000/100 = 1000 tuples o n 1000/100 = 10 pages.

• Use pipelining (the join tuples are no t materialized).

• Join column sid is a key for Sailors.

• Projecting out unnecessary fields from Sailors may not help.

– Why not? Need Sailors.sid for join.

• Not to push rating>5 before the join b ecause,

– Want to use sid index on Sailors.

• Cost: Selection of Reserves tuples (1 0 I/Os); for each,

– Must get matching Sailors tuple (1000*1.

∞

sid=sid

σ

bid=100

π

sname(On-the-fly)

σ

rating > 5 (Index Nested Loops, with pipelining ) (On-the-fly) (Use hash index: do not write result to temp)