Database Systems
( 資料庫系統
)
Practicum in Database
Systems
( 資料庫系統實驗
)
September 19, 2005
Lecture #1
Course Goals
• First course in database systems.
• Main Course (3 units) - Learn
– Use
a relational database
– Build
a relational database
• Practicum Course (2 units) – Gain
– Hands-on experiences by implementing
key DB components
– Must take the main course
– No lectures
Topics
• Fundamentals
– ER
(Entity-Relationship) Model
– SQL (Structured
Query Language)
• Storage and indexing
– Disks & Files
– Tree-structure
indexing
– Hash-based indexing
• Query evaluation
– External sorting
– Evaluating
relational operators
• Transaction
management:
– Concurrency
control
– Crash recovery
Prerequisite
• Data structure and algorithms
• Practicum course – C++ programming
– Considerable amount
of C++ coding
– If you don’t know C++, you need to learn it on
your own.
• English skill
– Taught in English
– If I speak too fast, please tell me to slow down.
– You can ask questions in English or Chinese.
Textbook
• Required textbook:
“Database Management
Systems,
Third Edition
”,
by Ramakrishnan and
Gehrke.
• The textbook is available
from 新月 and (and
perhaps other)
bookstores.
General Comments on
Textbook
• Good
– Come with programming assignments that
implement algorithms described in the
textbook.
• Bad
– Ambiguous writing, inconsistent wording
– “More like an experience report from
researchers rather than introductory
textbooks for beginners”
Course Format
• Lecture (9:10 ~ 10:40, 80 minutes), break
(10:40 ~ 11:00, 20 minutes), & lecture
(11:00 ~ 12:20, 80 minutes).
• 6 Assignments
– SQL + Written assignments
• Practicum course: 6 programming
assignments
– Implement major database components
[teams of
2 students
]
Grading Breakdown
(Tentative)
• Tentative means that I may change it
later.
• Main Course:
– 6 Assignments (30% of Grade)
– Midterm Exam (35% of Grade)
– Final Exam (35% of Grade)
• Practicum Course:
– 6 Programming Assignments (100% of
Grade)
Office Hours & Contact
Info
• Instructor: 朱浩華 “ Hao”
– Mon 2:00~3:00, Room 518 or by appointment – Email: hchu@csie.ntu.edu.tw
• Teaching Assistant #1: 陳奕超 (YiChao)
– CSIE 336/338
– Email: b89902066@ntu.edu.tw
• Teaching Assistant #2, 張耿豪 (Kenghao)
– CSIE 336/338
– Email: kenghao@gmail.com
• Teaching Assistant #3, 田知本 (Ben)
– CSIE 336/338
Means of Communications
• Course homepage
– http://mll.csie.ntu.edu.tw/course/database_f05/index.ht
ml
• BBS
– ptt.cc, under “CSIE_DBMS” board
– Post your questions on BBS.
– Read posted messages before posting new questions.
– No SPAM.
– TAs respond to your questions as quickly as possible.
• Send email to TAs or me.
• Come to office hours
Lecture Notes
• Available on the course homepage
before each lecture
– Complements,
not replacement of
attending lecture and reading
Academic Integrity
• Write your own solutions & code: every
single word/line
• Plagiarism :
– Receive 0 grade for the assignment
+ Penalty
!
• Plagiarism (Cheating) includes:
– copy any answers/code from other students
– let other students copy your solution/code.
– see other students’ answers/code.
– copy any answers/code from the Internet &
upper class
Any Question(s) on
Administrative
Introduce an interesting
project in
Ubiquitous
Computing
KY:iyashikei-net (Urico Fujii, Ann
Poochareon, NYU)
Expressing
&
Chapter 1:
Overview of Database
Systems
Outline
• Why do we need a DBMS (Database
Management System)?
• What can a DBMS do for an application?
• Why study database systems?
• Data Models: Overview of a Relational Model
• Levels of Abstraction in a DBMS
• Sample Queries in DBMS
• Transaction Management Overview
• Structure of a DBMS
Why DBMS?
• Suppose that you want to build an
university database. It must store
the following information:
– Entities: Students, Professors,
Classes, Classrooms
– Relationships: Who teaches what?
Who teaches where? Who teaches
whom?
What can DBMS do for
applications?
• Store huge amount of data (e.g., TB+) over a long
period of time
• Allow apps to query and update data
– Query: what is Mary’s grade in the “Operating System” course?
– Update: enroll Mary in the “Database” course
• Protect from unauthorized access.
– Students cannot change their course grades.
• Protect from system crashes
– When some system components fail (hard drive,
More on what can DBMS
do for applications?
• Protect from incorrect inputs
– Mary is registering for 100 courses
• Support concurrent access from multiple
users
• Allow administrators to easily change
data schema
– At a later time, add TA info to courses.
• Efficient database operations
Alternative to Using a
DBMS
• Store data as files in operating systems.
• Applications have to deal with the following
issues:
– 32-bit addressing (5GB) is insufficient to address 100GB+ data file
– Write special code to support different queries
– Write special code to protect data from concurrent access
– Write special code to protect against system crashes
– Optimize applications for efficient access and query
– May often rewrite applications
Database Management
System (DBMS)
• DBMS is software to store and manage
data, so applications don’t have to
worry about them.
• What can a DBMS do for applications?
– Data Definition Language - DDL – Data Manipulation Language - DML
• Query language
– Storage management – Transaction Management
• Concurrency control • Crash recovery
– Provide good security, efficiency, and
Abstraction &
Interface
(Database
language: SQL)
Applications
S
y
st
e
m
(D
B
M
S
)
work that you
Perform dirty
Why Study Database
Systems?
• They are everywhere.
– Online stores, real stores
– Banks, credit card companies
– Passport control
– Police (criminal records)
– Airlines and hotels (reservations)
• DBMS vendors & products
– Oracle, Microsoft (Access and SQL server),
IBM (DB2), Sybase, …
Data Models
• A
data model
is a collection of concepts for
describing data.
– Entity-relation (ER) model
– Relational model (main focus of this course)
• A
schema
is a description of data.
• The
relational model
is the most widely used
data model.
– A relation is basically a table with rows and columns of
records.
Relational Model
sid
name
age
gpa
53666
Jones
Jones@cs 18
3.4
53688
Smith Smith@e
e
18
3.2
53650
Joe
Joe@cs
19
2.5
• The entire table shows an instance of the Students relation. • The Students schema is the column heads
Levels of Abstractions in
DBMS
• Many views, one
conceptual
schema and
one physical schema
.
– Conceptual schema
defines logical structure • Relation tables
– Physical schema describes
the file and indexing used • Sorted file with B+ tree
index
– Views describe how
applications (users) see the data
• Relation tables but not store explicitly
Physical Schema
Conceptual Schema
View 1 View 2 View 3
Example: University
Database
• Conceptual schema:
–
Students (sid: string, name: string, login: string,
age: integer, gpa:real)
–
Courses (cid: string, cname:string, credits:integer)
–Enrolled (sid:string, cid:string, grade:string)
• Physical schema:
–
Relations stored as unordered files.
–Index on first column of Students.
• View (External Schema):
–
Course_info(cid:string, enrollment:integer)
–Why?
Data Independence
• Three levels of
abstraction provides
data independence
.
– Changes in one layer
only affect one
upper layer.
– E.g., applications are
not affected by
changes in
conceptual &
physical schema.
Physical Schema
Conceptual Schema
View 1 View 2 View 3
Queries in DBMS
• Sample queries on university database:
– What is the name of the student with student
ID 123456?
• The key benefits of using a relational
database are
– Easy to specify queries using a
query
language
: Structured Query Language (SQL)
SELECT S.name FROM Students S
WHERE S.sid = 123456
Transaction Management
• A transaction is an execution of a user
program in a DBMS.
• Transaction management deals with two
things:
– Concurrent execution of transactions
Concurrency Control
• Example: two travel agents (A, B) are
trying to book one remaining airline
seat (two transactions), only one
transaction can succeed in booking.
// num_seats is 1
Transactions A and B: if num_seats > 0, book the seat & num_seat--; // overbook!
Concurrency Control
(Solution)
// num_seats is 1
Transactions A and B: if num_seats > 0, book the seat & num_seat--; // overbook!
• Solution: use locking protocol
Transaction A: get exclusive lock on num_seatsTransaction B: wait until A releases lock on num_seats Transaction A: if num_seats > 0, book & num_seat--;
// book the seat, num_seat is set to 0
Transaction A: release exclusive lock on num_seats
Transaction B: num_seats = 0, no booking; // does not book the seat
Crash Recovery
• Example: a bank transaction
transfers $100 from account A to
account B
A = A - $100
<system crashes> // good for the bank!
B = B + $100
Crash Recovery (Solution)
A = A - $100
<system crashes>
// good for the bank!
B = B + $100
• Solution: use logging, meaning that all
write operations are recorded in a log on
a stable storage.
A = A - $100 // recorded A value (checkpoint) in a log
<system crashes>
// start recovery: read the log from disk
//analyze, undo, & redo
Layered Architecture
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
These layers must consider concurrency control and crash recovery