Database Systems
Instructor: Winston Hsu TA: Daniel Hsieh, Ryan Chang
December 22, 2014
DUE DATE 1/7/2015, Hand in PAPER PRINTOUT
LATE POLICY We will not accept any assignment submissions.
CHEATING POLICY If you are caught cheating, your grade is 0.
SUBMISSION Please include your name and ID along with your answers to questions.
Assignment 6: Query Evaluation, External Sorting and Join Evaluation Algorithms 1. (20%) Consider the following schema:
Sailor(sid :integer, sname :string, rating :integer, age :real)
Note that sid is the primary key. Assume that each tuple of Sailors is 50 bytes long, that a page can hold 80 Sailors tuples, and that we have 500 pages of such tuples. For each of the following selection conditions, estimate the number of pages retrieved, given the catalog information in the question.
(a) Assume that we have a B+-tree index T on the search key Sailors.sid , and assume that IHeight(T) = 4, INPages(T) = 50, Low(T) = 1, and High(T) = 100,000.
i. σSailors.sid<50,000(Sailors) ii. σSailors.sid=50,000(Sailors)
(a) Assume that we have a hash index T on the search key Sailors.sid , and assume that IHeight(T) = 2, INPages(T) = 50, Low(T) = 1, and High(T) = 100,000.
i. σSailors.sid<50,000(Sailors) ii. σSailors.sid=50,000(Sailors)
2. (5%) Answer the following questions about query optimization:
(a) (1%) What are the benefits of storing the system catalogs as relations?
(b) (2%) What is the goal of query optimization? Why is optimization important?
(c) (2%) What role do statistics gathered from the database play in query optimization?
3. (5%) Explain the terms:
(a) (2%) What is external sorting?
(b) (3%) What is double buffering? What is the motivation for using it?
1
4. (10%) Answer the following questions for each of these scenarios, assuming that our most general external sorting algorithm is used: (1) A file with 10,000 pages and three available buffer pages.
(2) A file with 20,000 pages and five available buffer pages. (3) A file with 2,000,000 pages and 17 available buffer pages.
(a) (2%) How many runs will you produce in the first pass?
(b) (2%) How many passes will it take to sort the file completely?
(c) (3%) What is the total I/O cost of sorting the file?
(d) (3%) How many buffer pages do you need to sort the file completely in just two passes?
5. (10%) Consider the join R∞R.a=S.bS, given the following information about the relations to be joined. The cost metric is the number of page I/Os unless otherwise noted, and the cost of writing out the result should be uniformly ignored.
Relation R contains 10,000 tuples and has 10 tuples per page.
Relation S contains 2,000 tuples and also has 10 tuples per page.
Attribute b of relation S is the primary key for S.
Both relations are stored as simple heap files.
Neither relation has any indexes built on it.
52 buffer pages are available.
(a) What is the cost of joining R and S using a page-oriented simple nested loops join?
(b) What is the cost of joining R and S using a block nested loops join?
(c) What is the cost of joining R and S using a sort-merge join?
(d) What is the cost of joining R and S using a hash join?
(e) How many tuples does the join of R and S produce at most?
2