1
Database Systems
( 資料庫系統
)
October 17, 2005
Lecture #4
Course Administration
• Assignment #1 is due today.
• Assignment #2 is out on the home
webpage.
– It is due two weeks from today.
• Next week reading:
– Chapter 8: Overview of Storage and
Indexing
3
Vision 2010 (NTT
DoCoMo)
Reflection: DB design
• Step 1: Requirements Analysis
– What data to store in the database?
• Step 2: Conceptual Database Design
– Come up with the design: Entity-Relation (ER) model
– Sketch the design with entity-relationship diagrams
• Step 3: Logical Database Design
– Implement the design: relational data model
5
What’s next?
• How to ask questions about the
[relational] database?
– How much money in account XYZ?
– Who are valuable customers [∑ depost >
1M]?
• Two query languages
– Relational algebra [CH4] : Math Language
– SQL [CH5] : a Real Language
Relational Algebra
7
Relational Query
Languages
• What are query languages?
– For asking questions about the database
• Relational Algebra
– Mathematical Query Languages form
the basis for “real” languages (e.g.
SQL), and for implementation.
– A query is composed from a small set of
operators
Preliminaries
• A query is applied to table(s), and the
result of a query is also a table.
– Schema of input table(s) for a query is fixed. – Schema for the result of a given query is also
fixed! Determined by definition of query language constructs.
• Example:
– Find the names of sailors who have reserved
boat 103
9
Example Tables
R1S1
S2
• Sailors and Reserves are tables.
• Can refer to the fields by their
positions or names: • Assume names of
fields in the result table are inherited
from names of fields in input tables.
Relational Algebra
• Basic relational algebra operators:
– Selection (σ, pronounced sigma): Select a
subset of rows from a table.
– Projection (π): Delete unwanted columns
from a table.
– Cross-product ( X ): Combine two tables.
– Set-difference ( - ): Tuples in table 1, but not
in table 2.
– Union ( U ): Tuples in tables 1 or 2.
• Each operator can take one or two input table(s), and returns one table.
11
Relational Algebra (more)
• Additional relational algebra operators:
–
Intersection (∩)
: Tuples in tables 1 and 2.
–Join (∞): conditional cross product
–
Division (/):
–
Renaming
(p , pronounced “row”)
• Operations can be composed
to form a
very complex query
π
sid(
σ
age > 20Sailors
)
–
Relational Operators
• Projection
• Selection
• Union
• Intersection
• Set difference
• Cross product
• Rename
operator
• Join
• Division
13
Projection
)
2
(
,
rating
S
sname
age S
( )
2
• Delete attributes not
in projection list.
• Duplicates eliminated
S 2
Selection
rating
S
8
( )
2
sname rating
rating
S
,
(
8
( ))
2
• Selects rows satisfying
selection condition
.
– with duplicate removal
• Result table can be fed
into other operations
15
Relational Operators
• Projection
• Selection
• Union
• Intersection
• Set difference
• Cross product
• Rename
operator
• Join
• Division
Union
• Take two input tables, which must be union-compatible:
– Same number of fields. – `Corresponding’ fields
have the same type. • What is the schema of
result?
S1S2 S1
17
Intersection
S
1
S
2
Set-Difference
S S
1
2
19
Relational Operators
• Projection
• Selection
• Union
• Intersection
• Set difference
• Cross product
• Rename
operator
• Join
• Division
20
Cross-Product
• Each row of S1 is paired with each row of R1.
• Result schema has one field per field of S1 and R1, with field names `inherited’ if possible.
– Conflict: Both S1 and R1 have a field called sid.
Renaming
R1 S1 S1 x R1)
1
1
),
2
5
,
1
1
(
(
C
sid
sid
S
R
21
Condition Joins
• Cross-product, followed by a selection
• Result schema same as that of cross-product.
• Fewer tuples than cross-product, reduce tuples not meeting the condition.
R
c S
c R S
(
)
S
R
S sid R sid
1
1
1
1
.
.
sid bid day 22 101 10/10/96 58 103 11/12/96
R
Equi-Joins
• A special case of condition join where
the condition c contains only equalities.
• Result schema
similar to cross-product,
but only
one copy of fields
for which
equality is specified.
• Natural Join ( ):
Equi-join on all
common fields.
S1sid R1
sid bid day 22 101 10/10/96 58 103 11/12/96 R
23
Relational Operators
• Projection
• Selection
• Union
• Intersection
• Set difference
• Cross product
• Rename
operator
• Join
• Division
Division
• Reserves(sailor_name, boat_id); Boats
(boat_id)
– Useful for expressing queries like:
Find sailors who have reserved all boats => Reserves / Boats
• Let A have 2 fields, x and y; B have only
field y:
–
A/B
=– A[xy]/B[y] contains all x tuples (sailor_name)
such that for every y tuple (boat_id) in B, there is an xy tuple in A.
25
Examples of Division A/B
A
B1
B2
B3
Expressing A/B Using Basic
Operators
• Idea:
For A/B, compute all x values that
are not
disqualified
by some y value in B.
– x value is disqualified if by attaching y value from B, we obtain an xy tuple that is not in A. – 1) Iterate through each x value
– 2) Check: combined with each y value, xy in A? If not, disqualify.
•
Disqualified x values:
A/B =
x
((
x A B A
( )
)
)
x A
( )
all disqualified tuples
27
10 Practices with
Relational Operators
(Q1) Find names of sailors
who’ve reserved boat #103
Reserves(sid, bid, day) Sailors(sid, sname,
ratting, age)
• Solution 1:
πsname(σbid = 103 (Reserves
∞ Sailors)) • Solution 2 (more efficient) πsname((σbid = 103 Reserves) ∞ Sailors)
• Solution 3 (using
rename operator)
P(Temp1, σbid = 103 Reserves) P(Temp2, Temp1 ∞ Sailors) πsname(Temp2)29
(Q2) Find names of sailors
who’ve reserved a red boat
Reserves(sid, bid, day) Sailors(sid, sname, rating, age) Boats(bid, bname, color)
• Solution?
πsname((σcolor = ‘red’ Boats) ∞ Reserves ∞ Sailors )
• A more efficient solution?
πsname(πsid ((πbidσcolor = ‘red’ Boats)∞ Reserves )∞ Sailors )
A query optimizer can find this, given the first solution!
(Q3) Find the colors of boats
reserved by Lubber
Reserves(sid, bid, day)
Sailors(sid,
sname
, rating, age)
Boats(bid, bname,
color
)
• Solution?
π
color((σ
sname = ‘Lubber’Sailor)∞
31
(Q4) Find the names of sailors
who have reserved at least one
boat
Reserves(sid, bid, day)
Sailors(sid,
sname
, rating, age)
Boats(bid, bname, color)
• Solution?
(Q5) Find the names of sailors
who have reserved a red
or
a
green boat
Reserves(sid, bid, day)
Sailors(sid,
sname
, rating, age)
Boats(bid, bname,
color
)
• Solution?
π
sname(σ
color=‘red’ or color = ‘green’Boats ∞
33
(Q6) Find the names of sailors
who have reserved a red
and
a
green boat
Reserves(sid, bid, day)
Sailors(sid, sname, rating, age)
Boats(bid, bname, color)
• Wrong solution:
πsname(σcolor=‘red’ and color = ‘green’ Boats ∞ Reserves ∞
Sailors)
• Correct solution?
πsname(σcolor=‘red’ Boats ∞ Reserves ∞ Sailors)
∩ πsname(σcolor = ‘green’ Boats ∞ Reserves ∞
(Q7) Find the names of sailors
who have reserved at
least two
boats
Reserves(sid, bid, day) Sailors(sid, sname, rating, age) Boats(bid, bname, color)
• Strategy?
– Join a table (sid, bid): sailors reserving at least one
boat
– Cross-product the table with itself
– Select sailors with two different boats reserved
P (Reservations, C(1->sid1, 2->sid2, 3->bid1, 4->bid2) π sid,sname,bid (Sailors ∞ Reserves))
πsname(σsid1=sid2 & (bid1≠bid2) Reservations x
35
(Q8) Find the sids of sailors with
age
over 20 who have not
reserved a
red boat
Reserves(sid, bid, day)
Sailors(sid, sname, rating,
age
)
Boats(bid, bname,
color
)
• Strategy
– Find all sailors (sids) with age over 20
– Find all sailors (sids) who have
reserved a red boat
– Take their set differences
π
sid(σ
age>20Sailors) – π
sid((σ
color=‘red’(Q9) Find the names of
sailors who have reserved all
boats
Reserves(sid, bid, day)
Sailors(sid, sname, rating,
age
)
Boats(bid, bname,
color
)
• Strategy
– all = division
π
sname((π
sid,bid(Reserves) / π
bid(Boats))
37