• 沒有找到結果。

Energy-Efficient and Performance-Enhanced Disks Using Flash-Memory Cache

N/A
N/A
Protected

Academic year: 2021

Share "Energy-Efficient and Performance-Enhanced Disks Using Flash-Memory Cache"

Copied!
6
0
0

加載中.... (立即查看全文)

全文

(1)

Energy-Efficient and Performance-Enhanced Disks Using Flash-Memory Cache

Jen-Wei Hsieh

Department of Computer Science and Information

Engineering National Chiayi University, Chiayi, Taiwan 60004, ROC

[email protected]

Tei-Wei Kuo

Department of Computer Science and Information

Engineering Graduate Institute of Networking and Multimedia National Taiwan University, Taipei, Taiwan 106, ROC

[email protected]

Po-Liang Wu

Department of Computer Science and Information

Engineering National Taiwan University,

Taipei, Taiwan 106, ROC

[email protected]

Yu-Chung Huang

Genesys Logic, Inc. Taipei, Taiwan 231, R.O.C.

[email protected] ABSTRACT

This work explores the unique characteristics of flash memory in serving as a cache layer for disks. The experiments show that the proposed management scheme could save up to 20%

energy consumption while reduce the read response time by the two third and the write response time by the five sixth of their counterparts. The estimated lifetime of the flash- memory cache is significantly improved as well.

Categories and Subject Descriptors

C.0 [Computer Systems Organization]: General; B.3.2 [Memory Structure]: Design Styles—Cache memory

General Terms

Management

Keywords

Flash memory, cache, energy efficient, performance

1. INTRODUCTION

Flash memory recently gains a lot of attention in serving as a storage-system alternative (e.g., [1, 3, 4, 8]) or as caches for hard disks. In particular, Windows ReadyBoost [6] lets users use a removable flash memory device to improve system

Supported in part by research grants from Taiwan, ROC National Science Council under Grants NSC95-2219-E-002- 014 and NSC 95R0062-AE00-07.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

ISLPED’07, August 27–29, 2007, Portland, Oregon, USA.

Copyright 2007 ACM 978-1-59593-709-4/07/0008 ...$5.00.

performance, while ReadyDrive [6] enables Windows Vista PCs equipped with a hybrid hard disk (a new type of disk with integrated non-volatile flash memory) to boot up faster, resume from hibernate in less time, preserve battery power, and improve disk reliability.

However, flash memory does have several unique character- istics that introduce challenges to the management issues. A NAND flash memory is organized in terms of blocks, where each block is of a fixed number of pages. Data must be writ- ten to the free space of flash memory. When a flash memory page is written, the space is no longer available unless it is erased. As a result, out-place-update is usually adopted in the management. A block is the basic unit for erase opera- tions, while reads and writes are processed in terms of pages.

1

The typical block size and the page size of a NAND flash memory are 16KB and 512B, respectively.

2

After the pro- cessing of a large number of page writes, the number of free pages on flash memory would be low. Garbage collection are needed to reclaim invalid pages scattered over blocks (due to out-place update) so that they could become free pages. A flash-memory block has a limitation on erasures, block erased over 10

6

times might suffer from frequent write errors. “Wear- levelling” is usually adopted to erase blocks evenly so that a longer overall lifetime is achieved.

One of the most pioneering work in adopting flash memory as a disk cache is done by Marsh et al. [5]. Due to the state of the art at that time, the study had 20MB NOR flash memory as cache for a 40MB hard disk, which implies that an efficient lookup mechanism to locate the cache space of given Logical Block Addresses (LBA’s) for large capacity flash memory was not considered. Another issue of adopting flash memory as a disk cache is its robustness, since flash memory suffers from worn-out effect. Different from the past work, this work is motivated by the needs of management in caching data for

1

Note that terms “page” and “block” used here are different from those used for disks. As can be seen, the term “page”

refers to a unit that is smaller than the unit referred to by the term“block.”

2

Some flash memory adopts 128KB blocks and 2KB pages.

(2)

disks, especially when the characteristics of flash memory are considered. Note that well-known caching strategies, such as direct mapped cache and set associative cache [2], would suf- fer from significant deterioration on read/write performance (due to the write-once and wear-levelling features of flash memory), if they are implemented without considering the characteristics of flash memory. This paper presents an ef- ficient lookup mechanism to locate the cached data of given LBA’s over flash memory and have it being integrated with an LRU-based caching strategy. It also considers read and write requests jointly in energy efficiency and performance issues.

A garbage collection strategy is proposed in an integrated way to consider the hotness of data and the system perfor- mance. The capability of the proposed strategies is evaluated by a series of experiments based on realistic workloads.

The rest of this paper is organized as follows: Section 2 presents our management schemes, including a joint lookup and caching mechanism, a garbage-collection strategy, and a replacement policy, for a flash-memory cache. Other im- plementation remarks are also presented. The capability of the proposed management schemes is evaluated by a series of experiments in Section 3. Section 4 is the conclusion.

2. MANAGEMENT SCHEMES 2.1 Overview

The management of flash-memory cache should consider the characteristics of flash memory and the access pattern of users over disks. Three potential situations in caching are considered: (1) When a read request arrives, the LBA of the request must be checked up to see if the corresponding data are in the cache. If the answer is “yes,” the read request can be satisfied without accessing any hard disk. (2) If the answer is “no,” the data are retrieved from the corresponding hard disk and then cached in the flash memory for future access.

(3) When a write request arrives, the data are cached in the flash memory. No extra action is taken unless a data write back is required.

The three potential situations introduce several design and implementations issues. One critical issue is an efficient lookup strategy for a given LBA. Such a strategy is needed to look for any data corresponding to a given LBA on the flash memory, regardless of whether it is for a read or a write. When a write request is considered, we must invalidate an existing copy in the cache if the corresponding data exist in the cache. An- other critical issue is on the replacement strategy, when the cache is full, or the flash memory needs garbage collection. A good replacement strategy should reduce the chance of cache missing. Other important issues include an energy-efficient strategy in flushing written data to disks, cache robustness, and cache utilization, etc. In Section 2.2, we shall present data structures and strategies in the management of the flash- memory cache, especially for efficient data lookup when the user access pattern changes dynamically. Section 2.3 proposes our garbage collection and replacement strategies. Section 2.4 discusses a rebuilding procedure for the entry table.

2.2 Data Lookup and Caching 2.2.1 Management Information

The management of flash-memory cache is based on the idea of set associative [2]. An entry table is used to do book- keeping for data in the cache. Each given LBA is hashed by a

Caching Buffer Caching Buffer Caching Buffer

LBA Hash1

Hash2

(LBAi,LBAi + R) (LBAj,LBAj + R) (LBAk,LBAk + R)

Primary Block

Primary Block

Entry Table

Overflow Block Primary

Block

Flash Memory Block Flash Memory Block : Used/Dead Page : Free Page

Collision!!

R LBA LBA LBAjd d j

Figure 1: The Organization of Management Informa- tion for Cached Data.

hash function to an entry in the table, where an example hash function is H(LBA) = (LBA/(K × NP B)) mod EN. Here stretch factor K is any constant no less than 1, and N P B and EN are the number of pages in a block and the number of entries in the table, respectively. A link of caching buffers is attached to each entry, and the length of each link might change with the access patterns. Each caching buffer, that corresponds to a range of LBA’s (LBA

i

, LBA

i

+ R), consists of a primary block and an overflow block if it exists, where R is any fixed multiple of the number of pages in a block, e.g., K × NP B. (Note that each primary/overflow block maps to a physical flash-memory block.) The lookup of a given LBA fails if it is not in the LBA range of any caching buffer associated with the hash entry.

The lookup of an LBA starts with a hashing to a specific hash entry and followed by a search of caching buffers asso- ciated with the entry, as shown in Figure 1. The lookup of the LBA is done by hashing again with a pre-defined hash function to a specific page in the primary block of the cor- responding caching buffer. An example hash function in lo- cating the target page is P ageIndex = LBA mod N P B. An overflow block is attached to a caching buffer if there is an attempt to overwrite the data in the hashed page of the pri- mary block and no overflow block is allocated yet. Free pages in an overflow block are written sequentially.

2.2.2 Read Requests and Write Requests

When a read request arrives, the LBA of the request is

checked up to see if the required data is cached in the flash

memory. The corresponding entry of the given LBA is first

derived by hashing. The corresponding caching buffer of the

LBA is then derived by searching over associated buffers of

the entry. If the target caching buffer is not found, the data

must be retrieved from the corresponding disk and cached for

any future access by allocating a new caching buffer to the

entry. If such a caching buffer is found, then the given LBA

is searched over the primary block and the overflow block to

locate the data. If the data is available in the cache, the

read request can be satisfied immediately without accessing

any disk. If it is not found in any of the blocks, then a read

operation to a proper disk is needed to retrieved the data, and

the retrieved data must be cached. When the data is retrieved

from a disk, such information might be useful in preventing

disks from being disturbed (from spin-down status) because

the system could know the device status and might activate

writing of dirty data back to its corresponding disk.

(3)

When a write request arrives, it is checked to see if its LBA exists in any corresponding caching buffer. The correspond- ing caching buffer of the LBA is then derived by searching over associated buffers. If no such a caching buffer exists, a new caching buffer is allocated and attached to the corre- sponding entry of the entry table. We always try to cache the data in the primary block first. If the corresponding page of the primary block is occupied by old-version data of the LBA, the page will be invalidated. We then try to cache the data in the first available page of the overflow block. An overflow block is allocated for the caching buffer if it does not exist.

If there exists an overflow block, it must be checked to see if any free page is available. A garbage collection is invoked to reclaim invalid pages of the primary and overflow blocks if there is no free page left in the overflow block. When the data is cached in the overflow block, any page for the old ver- sion of the data is invalidated. After garbage collection, the data would be written to the primary and overflow blocks, as described above. Note that the corresponding page of the data in the primary block might still be occupied because of a hash collision. In other words, an overflow block might still be needed. Finally, the data are cached in the overflow block.

2.3 Garbage Collection and Data Replacement 2.3.1 Garbage Collection

When there is no free page in an overflow block, garbage collection should start to recycle pages occupied by invalid pages of the overflow block and its corresponding primary block. If the disk is not spinning down or idle during the garbage collection, data in the blocks that correspond to write requests should be written back to the disk. The strategy of the proposed garbage collection is based on two major ideas:

(1) If the disk is spinning down or idle, the system should avoid writing data cached in the blocks back to the disk when- ever possible. (2) When valid pages of the two blocks are written back to the caching buffer (with new primary and overflow blocks), they are written back in an LRU fashion.

That is, valid pages in the overflow block are written back to the buffer earlier than those in the primary block, and valid pages in the overflow block are written back to the buffer from the bottom to the top of the overflow block.

A new primary block is allocated and associated with the caching buffer, and an overflow block is not allocated until necessary. If the disk is spinning down, then all of the data that correspond to writes must be kept in the cache when- ever possible. Otherwise, the data should be written to the corresponding disks, and the rest valid pages of the previous primary and overflow blocks (correspond to reads) are written back to the new primary and overflow blocks of the caching buffer in an LRU fashion. The previous primary and overflow blocks are then inserted into a queue to erase.

2.3.2 Replacement Strategy

The entry table of caching buffers, that changes over time, is used to do bookkeeping for data in the cache. Whenever there exists any problem in allocating a new block, we must execute a replacement strategy to recycle one or more caching buffers and their associated blocks. The basic idea is to pick up the LRU caching buffer for replacement to avoid any cache miss! Blocks of the flash memory are considered as a circular array, and a free pointer always points to a free block, as shown in Figure 2.(a). Whenever a free block is needed, the

free block pointed by the pointer is returned, and the pointer moves to the next free block one-by-one along the circular array. Examples in the allocation of free blocks are as shown in Figure 2.(b).

Entry Table Caching Buffer

Caching Buffer : Allocated

: Free : In Use

Flash Memory Blocks

Pointer of Free Blocks

Allocation (Find the First Free Block)

Entry Table Caching Buffer

Caching Buffer

Caching Buffer

Caching Buffer

Caching Buffer 1

2 3

4 5

: Order of Allocation Requests 1 ~ 5

Flash Memory Blocks

1 2 3 4 5

Pointer of Free Blocks

(a) Before Allocation. (b) After Allocation.

Figure 2: Allocations of Free Blocks.

To speed up the seeking of any free block and to help in the locating of the LRU caching buffer, an access map, that is an array of bits, is introduced to keep the access record.

Each bit in the access map corresponds to a unique block in the circular array. When any block of a caching buffer is accessed, the corresponding bit is set to 1. A replacement pointer that initially equals to the free pointer moves along the circular queue whenever there is any need to locating an LRU caching buffer or to recycle used blocks. When the replacement pointer moves, it stops at the bit with value 0.

The caching buffer corresponding to the block is considered as the LRU buffer and recycled. If the replacement pointer moves on a bit with value 1, the bit is set as 0, and the pointer moves to the next bit, as shown in Figure 3.

Entry Table Accessed

Caching Buffer

Caching Buffer

Accessed

Accessed A

E F

1

2 D

B C

Flash Memory Blocks

1 0 0 1 1 0 0 1 0 0 0 : Order of Allocation Requests 1 ~ 2

Access Map

A E B D 1 C F 2

Replacement Pointer

Entry Table Caching Buffer

Caching Buffer A

E F

1

2 D

B C

Flash Memory Blocks

1 0 00 00 0 1 0 0 0 : Order of Allocation Requests 1 ~ 2

Access Map

A E B D 1 C F 2

Replacement Pointer Replaced!

Caching Buffer Caching Buffer

Caching Buffer

Corresponding Caching Buffer Replaced

(a) Before Replacement. (b) After Replacement.

Figure 3: The Access Map and Replacement.

2.4 Rebuilding Procedure of the Entry Table

When a computer shut down normally, there exists many strategies in accelerating the rebuilding of the entry table.

This section illustrates a simple procedure in rebuilding the

entry table by scanning blocks on the flash memory without

any auxiliary information when the system crashed.

(4)

To create the entry table, we examine blocks with valid pages. If all of the written pages in a block are scattered, then the block must be a primary block. We restore the in- formation of the primary block for its corresponding caching buffer and then associate the buffer with the corresponding entry. On the other hand, if all of the written pages in a block are written in a sequential order, then the block might be either a primary block or an overflow block. Each writ- ten page in the block must be checked up to see if its page index is consistent with the one derived from the page-index hashing of its LBA. If there exists any inconsistency, then the block must be an overflow block, and the information of the overflow block must be restored for the corresponding caching buffer; otherwise, the block can be either a primary block or an overflow block, depending on the discovery of any block being associated with its corresponding caching block.

3. PERFORMANCE EVALUATION 3.1 Experiment Setup

This section evaluates the performance of the proposed im- plementation strategies in energy efficiency, read/write re- sponse time, and number of block erasures. Four different ca- pacities of the flash-memory cache were simulated for the per- formance evaluation, and impacts of the stretch factor K were explored. The size of a flash-memory block was 16KB, and the number of entries M in the entry table was set to 16384.

In addition to comparisons between different cache sizes and stretch factors, two well-known caching-management mech- anisms, a direct mapped cache and a set associative cache, were simulated for comparison.

The trace of data access for performance evaluation was col- lected over a 80GB hard disk of a personal computer with a 1GB RAM, and an AMD Athlon64 K8-3000+ 939 CPU. The operating system was Windows XP SP2, and the hard disk was formatted as NTFS. Traces were collected by DiskMon

3

, and the duration for trace collecting was one month. The workload of the personal computer in accessing the hard disk corresponds to daily use of most people, i.e., web surfing, movie playing, peer-to-peer file sharing, e-mail sending/receiving, and document typesetting/reading/editing. To evaluate the flash-memory cache in a steady state, we used the first week trace to fill up the flash-memory cache and collected statis- tics for the rest of the trace such that the effect of garbage collection could be observed.

3.2 Experiment Results 3.2.1 The Total Idle Time

Before we demonstrate the energy efficiency under various caching implementation strategies, the distribution of disk idle times, which affects the energy consumption, is worthy to note. Figure 4 shows the impact of different implementation strategies on disk idle times. Time intervals between any two consecutive disk accesses were compiled and ranked into six degrees according to the length of time intervals. Note that idle-time intervals less than two seconds were filtered out, since spined the disk down and then spined it up again within two seconds does not help in the power saving.

As the cache size became larger, more data can be retained in the flash-memory cache. As a result, many access requests

3

http://www.sysinternals.com/Utilities/Diskmon.html

0 200,000 400,000 600,000 800,000 1,000,000 1,200,000 1,400,000 1,600,000

Implementation Strategies

T o t a l I d l e T i m e ( s e c )

241sec~1200sec 1201sec~2400sec 2401sec~

2sec~15sec 16sec~60sec 61sec~240sec

No Cache

512MB K=1

1024MB K=1

2048MB K=1

4096MB K=1

1024MB K=2

1024MB K=4

1024MB K=8

1024MB Direct Mapped

1024MB Set Associative

Figure 4: The Distribution of Idle Times.

to the disk could be fulfilled by accessing the flash-memory cache, and time intervals between two consecutive disk ac- cesses could be prolonged. Different stretch factor K resulted in different data placement manner. As K became larger, the disk idle time can be improved. It can be observed that the total idle time achieved by setting K = 8 for a 1024MB flash-memory cache was almost compared to that achieved by having a 4096MB flash-memory cache with K = 1. This was because a large K can prevent a huge but not frequently accessed file, e.g., a movie chip, from spreading over numer- ous caching buffers. In other words, chances to swap out frequently accessed data when sequentially accessing such a huge file were reduced when the stretch factor was set large.

Due to the flexible management over data placement, the pro- posed implementation strategy outperformed a direct mapped cache and a set associative cache.

3.2.2 The Energy Efficiency

In our simulation, energy consumptions under various im- plementation strategies were derived from the statistic results of disk idle times, the number of disk spin-ups/spin-downs, and the number of flash memory read/write/erase operations.

To simplify the estimation, we assume the disk has only two modes, namely active and standby. No matter what action (seek/rotation/transfer) the disk was taken, we assume the consumed power was the same. When no action was taken for 30 seconds, the disk turned from an active mode into an idle mode. Note that a mode transition of the disk requires an extra energy. Detailed parameters of power consumptions were modelled in Table 1.

IBM Ultrastar 36Z15 [9] Flash Memory [7]

Spin-down Spin-up Active Standby Read Write Erase

13J 135J 13.5W 2.5W 30mW 60mW 60mW

Table 1: Power Consumption Parameters.

Figure 5 illustrates the comparison of the energy efficiencies under various implementation strategies for the 23-day trace.

Suppose the energy consumed by the disk without any flash- memory cache was x, and the energy consumed by the disk with some implementation strategy was y. The saved energy in the figure was x − y, and we also accordingly derived the saved energy ratio, which is (x − y)/x. The energy efficiency was dominated by idle times. A long idle-time interval was superior to several short ones due to less spin-up and spin- down overheads, even though total idle times were the same.

A longer idle-time interval a disk can stay, a better energy

(5)

0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000

512MB K=1

1024MB K=1

2048MB K=1

4096MB K=1

1024MB K=2

1024MB K=4

1024MB K=8

1024MB Direct Mapped

1024MB Set Associative Implementation Strategies

S a v e d E n e r g y ( J o u l e )

7.9% 8.97%

14.65%

19.94%

10.84%

12.34%

15.7%

5.54%

8.42%

Figure 5: Comparison of Energy Efficiencies.

efficiency it can achieve. As shown in Figure 5, we could save about 20% energy consumption while adopting a 4GB flash-memory cache for a 80GB disk.

3.2.3 The Number of Block Erasures

Since flash memory has a limitation on the block-erasure count, the distribution of erase counts over flash-memory blocks was definitely a major evaluation metric. The number of erasures over each flash-memory block is separately ac- cumulated. According to erase counts, flash-memory blocks were sorted into groups. The number of groups and the cov- ered range of erase counts for each group implied the quality of achieved wear-levelling, from which the life cycle of a flash- memory cache can be estimated.

A large cache size improved not only the idle time but also the quality of wear-levelling. Erasures over flash-memory blocks were amortized when the cache size became large. Dif- ferent from idle times, the impact of the cache size on the distribution of erase counts was more predictable. When the size of flash-memory cache was double, the peak in the dis- tribution roughly grew into double, and the range of erase counts roughly became half. On the other hand, although a large stretch factor was beneficial to idle times, it greatly deteriorated the quality of wear levelling. When K became larger, the range of erase counts over flash-memory blocks ex- panded. In addition, total erasures over flash-memory blocks boosted as well. Table 2 lists total erasures for a 23-day trace in the experiment. Since both a direct mapped cache and a set associative cache do not take out-place-update nature of flash memory into consideration, deviations of erase counts among flash-memory blocks were large. In addition, their suffered erasure overheads were also enormous, as shown in Table 2.

Note that when K was set large in the proposed strategy, the performance gap on idle times can even widen while the erasure overhead was still superior to a direct mapped cache or a set associative cache.

Total Erasure

512MB, K = 1 15,662,090

1024MB, K = 1 14,995,330 2048MB, K = 1 14,265,300 4096MB, K = 1 14,317,930 1024MB, K = 2 73,866,860 1024MB, K = 4 272,317,700 1024MB, K = 8 406,628,000 1024MB, Direct Mapped 418,747,600 1024MB, Set Associative 296,467,000

Table 2: Comparison of Total Erasures.

In the simulation over the 23-day trace, the maximum erase counts among all flash-memory blocks for various implemen- tation strategies are listed in Table 3. Based on these infor- mation, the life cycle of the flash-memory cache under differ- ent implementation strategies could be estimated. The flash- memory cache under the proposed implementation strategy (for cache size = 1024MB and K = 1) could last over 203 years, while a direct mapped cache could only work for 2.4 months and a set associative cache did not function well af- ter 7 months.

Maximum Erasure Estimated Product

Counts (23-day) Lifetime

512MB, K = 1 570 110.6 years

1024MB, K = 1 310 203.3 years

2048MB, K = 1 180 350 years

4096MB, K = 1 150 420 years

1024MB, K = 2 2,780 22.67 years

1024MB, K = 4 26,600 28.4 months

1024MB, K = 8 59,000 12.8 months

1024MB, Direct Mapped 314,500 2.4 months

1024MB, Set Associative 121,000 6.25 months

Table 3: The Estimated Product Lifetime.

3.2.4 The Read/Write Response Time

Since flash memory is a kind of EEPROM, the flash-memory cache has intrinsic limitation in improving the performance of data accessing. In addition to the penalty of disk access during a cache miss, the flash memory cache suffered from the erasure overhead when the utilization of the cache space was high. Without a proper space management, read/write requests could suffer from a series of page reads, page writes, block erasures, and even disk accesses. In the proposed im- plementation strategy, the garbage collection was properly designed such that block erasures could be postponed un- til a system idle time. To illustrate the read/write perfor- mance, the simulation adopts the access parameters of Sam- sung K9F6408U0A 8MB NAND Flash Memory and Western Digital Caviar WD800JB 80GB 7200RPM 8MB IDE Ultra ATA- 100 Hard Drive. Their performance characteristics are listed in Table 4.

Read Write Erase K9F6408U0A 36.55μs 226.65μs 2ms Caviar WD800JB 13.1ms 13.1ms N/A

Table 4: Performance Characteristics.

Figure 6 (a) and (b) compares average read/write response times among different cache sizes in terms of a day. As the cache size got larger, a better read/write response time can be achieved. Figure 6 (c) and (d) shows impacts of the stretch factor over the average read/write response time. When the stretch factor became larger, the average read/write response time deteriorated quickly. As shown in the figure, when K = 8, the average write response time even got worse than the disk without any flash-memory cache. This was because a great deal of erase operations were introduced.

Figure 6 (e) compares average read response times among

different implementation strategies in terms of a day. As

shown in the figure, the proposed strategy could save up to

two third of the read response time and save one third of the

read response time in average. On the other hand, a direct

mapped cache did not improve the read response time in most

cases, while the average read performance of a set associative

(6)

0 2 4 6 8 10 12 14

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Day

A v e r a g e R e a d R e s p o n s e T i m e ( m s ) 512MB, K=1 1024MB, K=1 2048MB, K=1

4096MB, K=1

(a) Cache Size/Read.

0 1 2 3 4 5 6 7 8 9

A v e r a g e W r i t e R e s p o n s e T i m e ( m s )

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Day

512MB, K=1 1024MB, K=1

2048MB, K=1 4096MB, K=1

(b) Cache Size/Write.

0 2 4 6 8 10 12 14 16

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Day

A v e r a g e R e a d R e s p o n s e T i m e ( m s )

1024MB, K=1 1024MB, K=2 1024MB, K=4 1024MB, K=8

(c) Stretch Factor/Read.

0 2 4 6 8 10 12 14 16

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Day

A v e r a g e W r i t e R e s p o n s e T i m e ( m s )

1024MB, K=1 1024MB, K=2

1024MB, K=4 1024MB, K=8

(d) Stretch Factor/Write.

0 2 4 6 8 10 12 14 16

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Day

A v e r a g e R e a d R e s p o n s e T i m e ( m s ) K=1 Direct Mapped Set Associative

(e) Implementation/Read.

0 2 4 6 8 10 12 14

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Day

A v e r a g e W r i t e R e s p o n s e T i m e ( m s )

K=1 Direct Mapped Set Associative

(f) Implementation/Write.

Figure 6: The Impacts of Cache Size, Stretch Factor, and Different Implementation Strategies (Cache Size = 1024MB) on the Average Read/Write Response Time.

cache was among that of the proposed strategy and a di- rect mapped cache. Figure 6 (f) illustrates the comparison of average write response times between different implementa- tion strategies. As shown in the figure, the proposed strategy could save up to five sixth of the write response time and save two third of the write response time in average. Although a set associative cache is superior to a direct mapped cache in write response, both of their improvements in write response time were minor.

4. CONCLUSION AND FUTURE WORK

This work targets the unique characteristics of flash mem- ory in serving as a cache layer for disks. An efficient data lookup and caching strategy are proposed based on the idea of set associative, but the proposed strategy is more flexible and takes flash-memory nature into consideration. The cou- pled garbage collection and replacement strategies are also designed accordingly. Different stretch factor could result in different data placement manner, from which a trade-off be- tween energy efficiency and life cycle can be tuned.

Our trace-driven simulation shows that length and frequency of disk idle times could be improved under the proposed strategy, from which up to 20% energy consumption could be saved. In addition, the flash-memory cache under the proposed implementation strategy could last over 203 years, while direct mapped cache could only work for less than three months and set associative cache did not function well af- ter seven months. For data accessing, the proposed strategy could save up to two-third read response time in terms of a day and save one-third read response time in average for a 23-day trace. The performance improvement was even better for writes. The proposed strategy could save up to five-sixth write response time in terms of a day and save two-third write response time in average for a 23-day trace.

For the future work, we shall implement the prototype of

the proposed flash-memory caching scheme, such that the more realistic experimental results and comparisons with re- lated works (such as ReadyBoost and ReadyDrive of Win- dows Vista) could be obtained.

5. REFERENCES

[1] Aleph One Company. Yet Another Flash Filing System.

[2] J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, 1996.

[3] J.-W. Hsieh, L.-P. Chang, and T.-W. Kuo. Efficient On-Line Identification of Hot Data for Flash-Memory Management. In ACM SAC, pages 838–842, Mar 2005.

[4] M-Systems. Flash-memory Translation Layer for NAND flash (NFTL), 1998.

[5] B. Marsh, F. Douglis, and P. Krishnan. Flash Memory File Caching for Mobile Computers. In HICSS, pages 451–460, 1994.

[6] Microsoft Corporation. Windows Vista.

[7] Spansion. 3.0 Volt-only Flash Memory Technology.

[8] Y.-L. Tsai, J.-W. Hsieh, and T.-W. Kuo. Configurable NAND Flash Translation Layer. In IEEE SUTC, June 2006.

[9] J. Zedlewski, S. Sobti, N. Garg, F. Zheng,

A. Krishnamurthy, and R. Wang. Modeling Hard-Disk

Power Consumption. In FAST’03, pages 217–230, Mar

2003.

數據

Figure 1: The Organization of Management Informa- Informa-tion for Cached Data.
Figure 2: Allocations of Free Blocks.
Figure 4: The Distribution of Idle Times.
Figure 5: Comparison of Energy Efficiencies.
+2

參考文獻

相關文件

(3)In principle, one of the documents from either of the preceding paragraphs must be submitted, but if the performance is to take place in the next 30 days and the venue is not

substance) is matter that has distinct properties and a composition that does not vary from sample

• When a system undergoes any chemical or physical change, the accompanying change in internal energy, ΔE, is the sum of the heat added to or liberated from the system, q, and the

(c) If the minimum energy required to ionize a hydrogen atom in the ground state is E, express the minimum momentum p of a photon for ionizing such a hydrogen atom in terms of E

Study the following statements. Put a “T” in the box if the statement is true and a “F” if the statement is false. Only alcohol is used to fill the bulb of a thermometer. An

Courtesy: Ned Wright’s Cosmology Page Burles, Nolette & Turner, 1999?. Total Mass Density

Over there, there is a celebration of Christmas and the little kid, Tiny Tim, is very ill and the family has no money to send him to a doctor.. Cratchit asks the family

For the data sets used in this thesis we find that F-score performs well when the number of features is large, and for small data the two methods using the gradient of the