• 沒有找到結果。

Database Systems (資料庫系統) Lecture #9

N/A
N/A
Protected

Academic year: 2021

Share "Database Systems (資料庫系統) Lecture #9"

Copied!
28
0
0

加載中.... (立即查看全文)

全文

(1)

Database Systems

( 資料庫系統 )

November 28, 2005

Lecture #9

(2)

Announcement

• Next week reading: Chapters 12

• Pickup your midterm exams at the end

of the class.

• Pickup your assignments #1~3

outside of the TA office 336/338.

• Assignment #4 & Practicum #2 are

due in one week.

(3)

Interesting Talk

Rachel Kern, “From Cell Phones To Mo

nkeys: Research Projects in the Speec

h Interface Group at the M.I.T. Media

Lab”, CSIE 102, Friday 2:20 ~ 3:30

(4)

Midterm Exam Score

Distribution

(5)

Ubicomp project of the week

• From Pervasive to

Persuasive Computing

• Pervasive Computing

(smart objects)

– Design to

be aware

of

people’s behaviors

• Examples: smart dining table,

smart chair, smart wardrobe, smart mirror, smart shoes, smart spoon, …

• Persuasive Computing

– Design to

change

people’s

behaviors

(6)
(7)

Smart Device:

Credit Card Barbie Doll

(from

Accenture)

• Barbie gets wireless implant of chip and

sensors and become decision-making

objects.

• When one Barbie meets another Barbie …

– Detect the presence of clothing of the other

Barbie.

– If she does not have it … she can

automatically send an online order through the wireless connection!

– You can give her a credit card limit.

• Good that this is just a concept toy.

• It illustrates the concept of autonomous

purchasing object: car, home, refrigerator,

(8)

Hash-Based Indexing

(9)

Introduction

• Hash-based

indexes are best for

equality

selections

. Cannot support range searches.

– Equality selections are useful for join

operations.

• Static and dynamic hashing techniques;

trade-offs similar to ISAM vs. B+ trees.

– Static hashing technique

– Two dynamic hashing techniques

• Extendible Hashing

• Linear Hashing

(10)

Static Hashing

• # primary pages fixed, allocated

sequentially, never de-allocated; overflow

pages if needed.

• h(k) mod N

= bucket to which data entry

with key k belongs. (N = # of buckets)

h(key) mod N

h

key

Primary bucket pages Overflow pages

2

0

(11)

Static Hashing (Contd.)

• Buckets contain data entries.

• Hash function works on search key field of record r. Must

distribute values over range 0 ... N-1.

h(key) = (a * key + b) usually works well.

a and b are constants; lots known about how to tune h.

Cost for insertion/delete/search: 2/2/1 disk page I/Os (no

overflow chains).

• Long overflow chains

can develop and degrade performance.

Why poor performance? Scan through overflow chains linearly.

Extendible and Linear Hashing: Dynamic techniques to fix this

(12)

Extendible Hashing

• Simple Solution (no overflow chain):

– When bucket (primary page) becomes full, ..

– Re-organize file by doubling # of buckets. Cost concern?

High cost: rehash all entries - reading and writing all pages

is expensive!

How to reduce high cost?

Use

directory of pointers to buckets,

double # of buckets by

doubling the directory,

splitting just the bucket that

overflowed

!

Directory much smaller than file, so doubling much cheaper.

Only one page of data entries is split.

How to adjust the hash function? Before doubling directory,

h(r) → 0..N-1 buckets. After doubling directory, h(r) → 0 ..

2N-1

(13)

Example

• Directory is array of size 4. • To find bucket for r, take

last global depth # bits of

h(r);

– Example: If h (r= 5), 5’s

binary is 101, it is in bucket pointed to by 01.

• Global depth: # of bits

used for hashing directory entries.

• Local depth of a bucket: # bits for hashing a bucket.

• When can global depth be

different from local depth?

13* 00 01 10 11 LOCAL DEPTH GLOBAL DEPTH DIRECTORY Bucket A Bucket B Bucket C Bucket D DATA PAGES 10* 1* 21 * 4* 12*32*16* 15*7*19* 2 2 2 2 2 5*

(14)

14

Insert 20 = 10100 (Causes

Doubling)

19* 2 2 2 000 001 010 011 100 101 110 111 3 3 3 DIRECTORY Bucket A Bucket B Bucket C Bucket D Bucket A2 (`split image' of Bucket A) 32* 1* 5*21*13* 16* 10* 15*7* 4*12*20* LOCAL DEPTH GLOBAL DEPTH 00 01 10 11 2 2 2 LOCAL DEPTH 2 DIRECTORY

GLOBAL DEPTH Bucket A Bucket B Bucket C Bucket D 1*5* 21*13* 32*16* 10* 15*7*19* 4*12* 2

double directory:

-Increment global depth

-Rehash bucket A

-Increment local depth, why

track local depth?

(15)

Insert 9 = 1001 (No

Doubling)

19* 3 2 2 000 001 010 011 100 101 110 111 3 3 3 DIRECTORY Bucket A Bucket B Bucket C Bucket D Bucket A2 32* 1* 9* 21*13* 16* 10* 15*7* 4*12*20* LOCAL DEPTH GLOBAL DEPTH 19* 2 2 2 000 001 010 011 100 101 110 111 3 3 3 DIRECTORY Bucket A Bucket B Bucket C Bucket D Bucket A2 32* 1* 5*21*13* 16* 10* 15*7* 4*12*20* LOCAL DEPTH GLOBAL DEPTH 3 Bucket B2

(split image of Bucket B) 5*

Only split bucket:

-Rehash bucket B

(16)

Points to Note

Global depth of directory:

Max # of bits needed

to tell which bucket an entry belongs to.

Local depth of a bucket:

# of bits used to

determine if an entry belongs to this bucket.

• When does bucket split cause directory doubling?

Before insert, bucket is full & local depth = global depth.

Directory is doubled by

copying it over

and `fixing’

pointer to split image page.

You can do this only by using the least significant bits in

(17)

Directory Doubling

00 01 10 11 2

Why use least significant bits in

directory?

 Allows for doubling via

copying!

0003 001 010 011 100 101 110 111 00 10 01 11 2 3 000 001 010 011 100 101 110 111

Split buckets

(18)

Comments on Extendible

Hashing

• If directory fits in memory, equality search

answered with one disk access; else two.

Problem with extendible hashing:

If the distribution of hash values is skewed

(concentrates on a few buckets), directory can grow

large.

Can you come up with one insertion leading to multiple

splits

Delete:

If removal of data entry makes bucket

empty, can be merged with `split image’. If each

directory element points to same bucket as its

(19)

Skewed data distribution

(multiple splits)

• Assume each

bucket holds one

data entry

• Insert 2 (binary 10)

– how many times

of split?

• Insert 16 (binary

10000) – how many

times of split?

0 1 LOCAL DEPTH GLOBAL DEPTH 0* 8* 1 1 1

(20)

Delete 10*

00 01 10 11 2 2 2 LOCAL DEPTH 2 DIRECTORY

GLOBAL DEPTH Bucket A Bucket B Bucket C Bucket D 1*5* 21*13* 32*16* 10* 15*7*19* 4*12* 2 00 01 10 11 2 2 2 LOCAL DEPTH 1 DIRECTORY

GLOBAL DEPTH Bucket A Bucket B Bucket B2 1*5* 21*13* 32*16* 15*7*19* 4*12*

(21)

Delete 15*, 7*, 19*

00 01 10 11 2 2 2 LOCAL DEPTH 1 DIRECTORY

GLOBAL DEPTH Bucket A Bucket B Bucket B2 1*5* 21*13* 32*16* 15*7*19* 4*12* 00 01 10 11 2 1 LOCAL DEPTH 1

GLOBAL DEPTH Bucket A Bucket B 1*5* 21*13* 32*16* 4*12* 00 01 1 1 LOCAL DEPTH 1

GLOBAL DEPTH Bucket A Bucket B 1*5* 21*13*

32*16* 4*12*

(22)

Linear Hashing (LH)

• This is another dynamic hashing scheme, an

alternative to Extendible Hashing.

– LH fixes the problem of long overflow chains (in static

hashing) without using a directory (in extendible hashing).

• Basic Idea:

Use a family of hash functions h

0

, h

1

, h

2

, ...

– Each function’s range is twice that of its predecessor.

– Pages are split when overflows occur – but not necessarily

the page with the overflow.

– Splitting occurs in turn, in a round robin fashion.

– When all the pages at one level (the current hash function)

have been split, a new level is applied.

– Splitting occurs gradually

(23)

Levels of Linear Hashing

• Initial Stage.

– The initial level distributes entries into N0 buckets.

– Call the hash function to perform this h0.

• Splitting buckets.

– If a bucket overflows its primary page is chained to an overflow

page (same as in static hashing).

– Also when a bucket overflows, some bucket is split.

• The first bucket to be split is the first bucket in the file (not

necessarily the bucket that overflows).

• The next bucket to be split is the second bucket in the file … and

so on until the Nth. has been split.

• When buckets are split their entries (including those in overflow

pages) are distributed using h1.

– To access split buckets the next level hash function (h1) is

applied.

(24)

Levels of Linear Hashing

(Cnt)

Level progression:

Once all

N

i buckets of the current level (i) ar

e split the hash function

h

i

is replaced by

h

i+1

.

The splitting process starts again at the first

bucket and

h

i+2

is applied to find entries in spl

(25)

Linear Hashing Example

• Initially, the index level equal

to 0 and N

0

equals 4 (three

entries fit on a page).

• h

0

maps index entries to one

of four buckets.

• h

0

is used and no buckets

have been split.

• Now consider what happens

when 9 (1001) is inserted

(which will not fit in the

second bucket).

• Note that next indicates

which bucket is to split next.

(Round Robin)

nex

t

64 36

1

17

5

6

31 15

00

01

10

11

h

0

(26)

Linear Hashing Example 2

• An overflow page is chained to the

primary page to contain the inserted value.

• Note that the split page is not

necessary the overflow page – round robin.

• If h0 maps a value from zero to next –

1 (just the first page in this case), h1

must be used to insert the new entry.

• Note how the new page falls naturally

into the sequence as the fifth page.

h

1

nex

t

64

h

0

nex

t

1

17

5

9

h

0

6

h

0

31

15

h

1

36

• The page indicated by next

is split (the first one).

(27)

27

Linear Hashing

• Assume inserts of

8, 7, 18, 14, 11

1

,

32, 16

2

, 10, 13,

23

3

• After the 2

nd

. split

the base level is

1 (N

1

= 8), use h

1

.

• Subsequent splits

will use h

2

for

inserts between

the first bucket

and next-1.

2 1 h1 h1 nex t3 64 8 32 16 h1 h1 1 1 7 9 h1 h0 nex t1 10 1 8 6 1 8 14 h0 h0 nex t2 11 31 1 5 7 11 h1 h1 36 h1 h1 5 1 3 h1 - 6 1 4

(28)

LH Described as a Variant

of EH

• Two schemes are similar:

Begin with an EH index where directory has N elements.

Use overflow pages, split buckets round-robin.

First split is at bucket 0. (Imagine directory being doubled

at this point.) But elements <1,N+1>, <2,N+2>, ... are

the same. So, need only create directory element N,

which differs from 0, now.

• When bucket 1 splits, create directory element N+1, etc.

• So, directory can double gradually. Also, primary

bucket pages are created in order. If they are

allocated in sequence too (so that finding i’th is

參考文獻

相關文件

– Each time a file is opened, the content of the directory entry of the file is moved into the table.. • File Handle (file descriptor, file control block): an index into the table

– Write special code to protect against system crashes – Optimize applications for efficient access and query – May often rewrite applications. • Easier to buy a DBMS to handle

– File and index layers organize records on files, and manage the indexing data

You are given the wavelength and total energy of a light pulse and asked to find the number of photons it

Reading Task 6: Genre Structure and Language Features. • Now let’s look at how language features (e.g. sentence patterns) are connected to the structure

Hope theory: A member of the positive psychology family. Lopez (Eds.), Handbook of positive

volume suppressed mass: (TeV) 2 /M P ∼ 10 −4 eV → mm range can be experimentally tested for any number of extra dimensions - Light U(1) gauge bosons: no derivative couplings. =&gt;

incapable to extract any quantities from QCD, nor to tackle the most interesting physics, namely, the spontaneously chiral symmetry breaking and the color confinement.. 