I herewith state that I understand and will adhere to the following academic integrity: “I will not use or attempt to use any unauthorized assistance, material, or study aids in this midterm examination

(1)

Database Systems ( ) Name: ___________________________________

Midterm, November 20, 2004

Time: 2:30 PM ~ 5:30 PM Student ID: _______________________________

I herewith state that I understand and will adhere to the following academic integrity: “I will not use or attempt to use any unauthorized assistance, material, or study aids in this midterm

examination. If I violate academic integrity, I may receive a grade of 0.”

___________________________________________________________

Signature (2 points)

The following table will be used for grading your exam. Do not write anything below.

PARTS PART Total Score

Name, ID, Signature 2

A. 30

B 20

C 12

D 8

E 18

F 20

Total 110

(2)

PART A: ER Model & Translation to Relational Model (30 Points)

You have been hired by Fubon ( ) bank as a multi-million dollar database consultant. After talking to the business people about what data to store in a database, you come up with the first version of the ER diagram below.

1) You explained the ER diagram to the business people. They told you that loan payment information is missing in the first version of your ER diagram. Each loan payment has a payment number, payment date, and payment amount. The payment number does not uniquely identify a particular payment among those for all the bank’s loans; however, it does uniquely identify a particular payment for a specific loan. Add the loan-payment information into the above ER diagram. (5 points) -> // changed from original 4 points

Double-line = “thick line”

(3)

2) You need to realize the ER diagram in a relational database. Translate the following entity sets & relationship set to tables in the relational model: “loan”, “loan-branch”, and

“branch”. Use the following SQL commands and keywords “create table”, “primary key”,

“foreign key”, “char[20]”, “integer”, “date”, “references”, and “null”. (9 points) //

changed from original 6 points

This is only one possible translation.

create table branch (

branch-name CHAR[20], branch-city CHAR[20], assets INTEGER,

primary key branch-name) create table loan-branch (

branch-name CHAR[20], loan-number INTEGER,

primary key (branch-name, loan-number), foreign key branch-name references branch, foreign key loan-number references loan) create table loan (

loan-number INTEGER, amount INTEGER,

primary key loan-number)

Fubon bank has recently acquired ( ) Taipei Bank ( ). You are put in charge of merging two relational databases from two banks into one single relational database. Assume that both banks use exactly the same ER diagram shown above. There are several potential problems:

The possibility that the two original banks have branches with the same name.

The possibility that some customers are customers of both banks.

The possibility that some loan or account numbers were used at both banks.

3) For each of the potential problems, describe why there is indeed a potential for difficulties.

Hint: think about the effects of merging tables from two databases. (6 points) // changed from original 10 points

Branch-name is the primary-key of the branch entity set. Therefore, while merging the two banks’ entity sets, if both banks have a branch with the same name, one of them will be lost.

Customers participate in the relationship sets cust-banker, borrower and depositor. While merging the two banks’ customer entity sets, duplicate tuples of the same customer will be deleted. Therefore those relations in the three mentioned relationship sets which involved these deleted tuples will have to be updated. Note that if the tabular representation of a relationship set is obtained by taking a union of the primary keys of the participating entity sets, no modification to these relationship sets is required.

The problem caused by loans or accounts with the same number in both the banks is similar to the problem caused by branches in both the banks with the same branch-name.

(4)

4) (Difficult: Extra Points) Propose a solution to fix these problems. For your solution, explain any changes that would have to be made to the ER-diagram and data. To receive maximum partial credits, you may want to describe what are the effects of these changes on the ER- diagram and data. (10 points)

To solve the problems caused by the merger, no schema changes are required. Merge the customer entity sets removing duplicate tuples with the same social security field. Before merging the branch entity sets, prepend the old bank name to the branch-name attribute in each tuple. The employee entity sets can be merged directly, and so can the payment entity sets. No duplicate removal should be performed. Before merging the loan and account entity sets, whenever there is a number common in both the banks, the old number is replaced by a new unique number, in one of the banks.

Next the relationship sets can be merged. Any relation in any relationship set which involves a tuple which has been modified earlier due to the merger, is itself modified to retain the same meaning. For example let 1611 be a loan number common in both the banks prior to the merger and let it be replaced by a new unique number 2611 in one of the banks, say bank 2. Now all the relations in borrower, loan-branch and loan-payment of bank 2 which refer to loan number 1611 will have to be modified to refer to 2611. Then the merger with bank 1’s corresponding

relationship sets can take place.

(5)

PART B: Relational Algebra & SQL (20 Points)

Consider the following relational database for Chinatrust ( ) bank. The primary keys are underlined. Assume that a customer can hold multiple deposit accounts and take out multiple loans.

customer(customer-name, street, city) branch(branch-name, city)

loan(loan-number, customer-name, amount, branch-name) deposit(account-number, customer-name, balance, branch-name)

1) Find the names of suspicious customers who have taken out loans in all branches. Write this query in a relational algebra expression. (5 points)

( customer-name, branch-name loan) /( branch-name branch)

2) Find the names of valuable customers who have deposit balance larger than every customer in the “central” branch. Write this query in a SQL statement. (5 points)

select D.customer-name from deposit D

where D.balance > (select max(D2.balance) from deposit D2

where D2.branch-name = “central”)

3) Find all low-risk customers whose total deposit amount is larger than total loan amount across all branches. Write this query in a SQL statement. (5 points)

select CD.customer-name

from (select D.customer-name, sum(D.balance) as total-balance from deposit D

group by D.customer-name) as CD where CD.total-balance > (optional all)

(select sum(L.balance) from loan L

where L.customer-name = CD.customer-name)

4) Find the branch that has the most customers. Write this query in a SQL statement. (5 points) select CB.branch-name

from (select L.customer-name,L.branch-name from loan L) union (select D.customer-name, D.branch-name from deposit D) as CB group by CB.branch-name

having count(distinct customer-name) >= all (select count (distinct customer-name) from CB

group-by branch-name)

(6)

PART C: Buffer Manager (12 Points)

Consider the buffer manager and love/hate replacement policy in assignment #5. Consider a new policy that maintains an MRU (most recently used) list for the loved pages and an LRU (least recently used) list for the hated pages. If a page is needed for replacement, the buffer manager selects from the list of hated pages first, and then from the loved pages if no hated ones exist.

Consider the number of buffer frames is 3, and the number of pages of a file is 4. Denote these four pages as P1, P2, P3, and P4. Say we want to do a scan of file in the following order: P1, P2, unpin P1 as love page, unpin P2 as love page, P3, P4, unpin P3 as hated page, unpin P4 as hated page, P1, P2, unpin P1 as love page, unpin P2 as love page.

1) The following buffer pool table shows which buffer frame holds which page (P1, P2, P3, or P4) after scanning the page shown on the left-most column. Please fill the below table according to the love/hate replacement policy described above. (6 points)

Buffer Pool Scan Buffer Frame

#1 Buffer Frame

#2 Buffer Frame

#3

P1 P1 FREE FREE

P2 P1 P2 FREE

P3 P1 P2 P3

P4 P1 P4 P3

P1 P1 P4 P3

P2 P1 P4 P2

2) What is the total number of disk page I/Os? (2 points) 5

3) When is the "dirtybit” set on a page? (2 points) The page is modified.

4) Why do frames in the buffer pool have a pin count instead of a pin flag? (2 points) A page in a frame can be requested by many different users or transactions at the same time.

(7)

PART D: Disk & RAID (8 Points)

1) Rank the closeness of the three disk blocks to the disk head based on their access times (1 being the closest). (3 points)

2) Consider the following data and parity-block arrangement on four disks. Bi’s represent data blocks; Pi’s represent parity blocks. Parity block Pi is the parity block for data blocks B4i-3 to B4i. What if any, problem, might this arrangement present? (5 points)

Disk 1 Disk 2 Disk 3 Disk 4

B1 B2 B3 B4

P1 B5 B6 B7

B8 P2 B9 B10

B11 B12 P3 B13

. . .

. .

.

This arrangement has the problem that Pi and B4i 3 are on the same disk. So if that disk fails, reconstruction of B4i 3 is not possible, since data and parity are both lost.

Platters Spindle

Disk head

Arm movement

Arm assembly

Tracks

Sector 1

2

3

(8)

PART E: Tree-based Index (18 Points)

1) List two advantages of the B+ tree over the ISAM tree. List one advantage of the ISAM tree over the B+ tree. (6 points)

Two advantages of B+ tree over ISAM tree: (1) B+ tree can support file growth and shrink efficiently. (2) No need to maintain overflow pages. [saying the same thing …]

One advantage of ISAM tree over B+ tree: In ISAM tree allows better concurrent access during insertions and deletions – no locking on the index pages.

2) Draw the B+ tree for the following set of key values: (1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 14, 15, 16, 17, 18). Assume that the tree is initially empty and values are added in ascending order.

Each node in the tree can fit five pointers and four key values. (3 points)

3) Draw the B+ tree that results from further inserting keys (7, 8). (3 points)

10

1 2 3 4 5 6 10 11 12 13 14 15 16 17 18

4 10 13 16

1 2 3 4 5 6 10 11 12 13 14 15 16 17 18

7 8

4 7 13 16

(9)

4) Draw the B+ tree that results from further deleting keys (13, 14). (3 points) Below is one possible tree.

5) Draw the B+ tree that results from further deleting keys (11, 12). (3 points) Below is one possible tree.

4 7 10 15

1 2 3 4 5 6 7 8 10 11 12 15 16 17

4 7 10 16

1 2 3 4 5 6 7 8 10 15 16 17

(10)

PART F: Hashing-based Index (20 points)

1) Why does the hash structure perform poorly for range queries? Derive the worst-case cost bound (measured in number of page I/Os) for the range query on a heap file with unclustered extendible hash index. Assume the entire directory fits in memory. B = number of data pages, R = number of records per page, and M = number of matched records in range query. (5 points)

In a hash structure, entries are not sorted on the index key (unlike tree index, there are no next &

previous pointers between adjacent buckets).

Worst case bound: M. (for each matched record, in the worse case, we may have to access a disk page).

2) Draw an extendible hashing index such that insertion of data entry with hashed key value 2*

causes the directory to double twice. Show the local depth of each bucket and the global depth of the directory in your sketch. (5 points)

Below is one possible EH index. Each bucket can hold 2 entries. Inserting 2* will double the directory twice.

3) Draw a linear hashing index such that insertion of an arbitrary data entry will cause the Next Pointer to move to the first bucket. That is, in your index the Next pointer should not point to the first bucket, but insertion of an arbitrary data entry should cause movement to the first bucket. (5 points)

Below is one possible LH index. Each bucket can hold 2 entries. Adding any data entry will cause the Next pointer to move to the first bucket.

h1 | h0 | data entries 00 | 0 | 0, 4

01 | 1 | 1, 3 <- next 10 | | 2, 6

11 | |

6* 10*

1

1*

1 1

0 1

(11)

4) (Difficult) Assume you are building an index over an attribute that is not a primary key and contains many duplicates. You have decided to build a hash index. Would you rather build an extendible hashing index or a linear hashing index? Give a short reason. (5 points)

Build a linear hashing index.

The reason is that extensible hashing may have the following problem – say many duplicates get hashed to the same bucket and that bucket (containing only duplicates) overflows. In this case, doubling directory does not resolve bucket overflow, even with infinite number of directory splitting. Note that linear hashing can have overflow pages, so it does not have this problem.