Li-Chun Huang - 嵌入式網路通訊裝置評比技術與工具之研發---子計畫四：嵌入式網路通訊裝置儲存裝置效能評比基準與工具之研發(中心分項)(II)

Department of Computer Science, National Chiao-Tung University, Hsin-Chu, Taiwan 300, ROC

kellemes13@gmail.com

Abstract

Multilevel flash memory cells double or even triple storage den-sity, producing affordable solid-state disks for end users. However, flash lifetime is becoming a critical issue in the popularity of solid-state disks. Wear-leveling methods can prevent flash-storage de-vices from prematurely retiring any portions of flash memory. The two practical challenges of wear-leveling design are implementa-tion cost and tuning complexity. This study proposes a new wear-leveling design that features both simplicity and adaptiveness. This design requires no new data structures, but utilizes the intelligence available in sector-translating algorithms. Using an on-line tuning method, this design adaptively tunes itself to reach good balance between wear evenness and overhead. A series of trace-driven sim-ulations show that the proposed design outperforms a competitive existing design in terms of wear evenness and overhead reduction.

This study also presents a prototype that proves the feasibility of this wear-leveling design in real solid-state disks.

Categories and Subject Descriptors D.4.2 [Operating Systems]:

Garbage collection; B.3.2 [ Memory Structures]: Mass Storage General Terms Design, Performance, Algorithm.

Keywords Flash memory, wear leveling, solid-state disks.

1. Introduction

Solid-state disks are storage devices that employ solid-state mem-ory like flash as the storage medium. The physical characteris-tics of flash memory differ from those of mechanical hard drives, necessitating different methods for memory accessing. Solid-state disks hide flash memory from host systems by emulating a typi-cal disk geometry, allowing systems to switch from a hard drive to a solid-state disk without modifying existing software and hard-ware. Solid-state disks are superior to traditional hard drives in terms of shock resistance, energy conservation, random-access per-formance, and heat dissipation, attracting vendors to deploy such storage devices in laptops, smart phones, and portable media play-ers.

∗This work is in part supported by research grant NSC-98-2220-E-009-048 from National Science Council, Taiwan, ROC.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

LCTES’11, April 11–14, 2011, Chicago, Illinois, USA.

Flash memory is a kind of erase-before-write memory. Because any one part of flash memory can only withstand a limited number of erase-write cycles, approximately 100K cycles under the current technology [17], frequent erase operations can prematurely retire a region in flash memory. This limitation affects the lifetime of solid-state disks in applications such as laptops and desktop PCs, which write disks at very high frequencies. Even worse, recent advances in flash manufacturing technologies exaggerate this lifetime issue. In an attempt to break the entry-cost barrier, modern flash devices now use multilevel cells for double or even triple density. Compared to standard single-level-cell flash, multilevel-cell flash degrades the erase endurance by one or two orders of magnitude [18].

Localities of data access inevitably degrade wear evenness in flash. Partially wearing out a piece of flash memory not only de-creases its total effective capacity, but also inde-creases the frequency of its housekeeping activities, which further speeds up the wearing out of the rest of the memory. A solid-state drive ceases to func-tion when the amount of its worn-out space in flash exceeds what the drive can manage. The wear-leveling technique ensures that the entire flash wears evenly, postponing the first appearance of a worn-out memory region. However, wear leveling is not free, as it moves data around in flash to prevent solid-state disks from excessively wearing any one part of the memory. These extra data movements contributes to overall wear.

Wear-leveling algorithms include rules defining when data movement is necessary and where the data to move to/from. These rules monitor wear in the entire flash, and intervene when the flash wear develops unbalanced. Solid-state disks implement wear lev-eling at the firmware level, subjecting wear-levlev-eling algorithms to crucial resource constraints. Prior research explores various wear-leveling designs under such tight resource budgets, revealing three major design challenges: First, monitoring the entire flash’s wear requires considerable time and space overheads, which most con-trollers in present solid-state disks cannot afford. Second, algo-rithm tuning for environment adaption and performance definition requires prior knowledge of flash access patterns, on-line human intervention, or both. Third, high implementation complexity dis-courages firmware programmers from adopting sophisticated wear-leveling algorithms.

Standard solid-state-disk microcontrollers (controllers in the rest of this paper) cannot afford the RAM space overhead required to store the entire flash’s wear information in RAM. Chang et al.

[2] proposed caching only portions of wear information. However, periodic synching between the wear information in RAM and in flash introduces extra write traffic to flash. Jung et al. [9] proposed a low-resolution wear information method based on the average wear of large memory regions. Nevertheless, this approach suffers from distortion whenever flash wearing is severely biased. Chang et al. [5] introduced bit-indicated recent wear history. However, recent

wear history blinds wear leveling because recency and frequency are independent in terms of flash wear.

Almost all wear-leveling designs subject wear evenness to tun-able threshold parameters [2, 5, 9]. The system environment in which wear leveling takes place includes many conditions, such as sector-translating algorithms, flash geometry, and host disk work-loads. Even though the wear-leveling threshold remains unchanged, the results of using a wear-leveling algorithm under various sys-tem environments can be very different. Using inadequately tuned parameters can cause unexpectedly high wear-leveling overhead or unsatisfactory wear evenness. Existing approaches require hu-man intervention or prior knowledge of the system environment for threshold tuning.

From a firmware point of view, implementation complexity pri-marily involves the applicability of wear-leveling algorithms. The dual-pool algorithm [2] uses five priority queues of wear infor-mation and a caching method to reduce the RAM footprints of these queues. The group-based algorithm [9] and the static wear-leveling algorithm [5] add extra data structures to maintain coarse-grained wear information and the recent history of flash wear, re-spectively. These approaches ignore the information already avail-able in sector-translating algorithms, which are firmware modules accompanying wear leveling, and unnecessarily increase their de-sign complexity.

This study presents a new wear-leveling design, called the lazy wear-leveling algorithm, to tackle the three design challenges men-tioned above. First, this design does not store wear information in RAM, but leaves all of this information in flash instead. Second, even though this algorithm uses a threshold parameter, it adopts an analytical model to estimate its overhead with respect to different threshold settings, and then automatically selects a good thresh-old for good balance between wear evenness and overhead. Third, the proposed algorithm utilizes the address-mapping information available in the sector-translating algorithms, eliminating the need to add extra data structures for wear leveling.

The rest of this paper is organized as follows: Section 2 reviews flash characteristics and the existing algorithms for sector trans-lating and wear leveling. Section 3 presents the proposed wear-leveling algorithm, and Section 4 describes an adaptive tuning strat-egy for this algorithm. Section 5 reports the results of trace-driven simulations, and Section 6 presents an implementation of the pro-posed algorithm based on a real solid-state disk. Section 7 con-cludes this paper.

2. Problem Formulation

2.1 Flash-Memory Characteristics

Solid-state disks use NAND-type flash memory (flash memory for short) as a storage medium. A piece of flash memory is a physical array of blocks, and each block contains the same number of pages.

In a typical flash geometry, a flash page is 2048 plus 64 bytes. The 2048-byte portion stores user data, while the 64 bytes is a spare area for storing housekeeping data. Flash memory reads and writes in terms of pages, and it must erase a page before overwriting this page. Flash erases in terms of blocks, which consist of 64 pages. Under the current technology, a flash block can sustain a limited number of write-erase cycles before it becomes unreliable.

This cycle limit depends on the type of the flash manufacturing technology: a single-level-cell flash block endures 100K cycles [17], while this limit is 10K or less in multilevel-cell flash [18].

The rest of this paper uses terms “flash blocks”, “physical blocks”, or simply “blocks” interchangeably.

Solid-state disks emulate disk geometry using a firmware layer called the flash translation layer (i.e., FTL). FTLs update existing data out of place and invalidate old copies of the data to avoid

Figure 1. The set-associative mapping scheme whose group size is two. Each data-block group is associated with up to one log-block group.

erasing a flash block every time before rewriting a piece of data.

Thus, FTLs require a mapping scheme to translate logical disk-sector numbers into physical locations in flash. Updating data out of place consumes free space in flash, and FTLs must recycle mem-ory space occupied by invalid data with erase operations. Before erasing a block, FTLs copy all valid data from this block to other free space. This series of copy and erase operations for reclaiming free space is called garbage collection. Reducing data-copy over-head during garbage collection is a priority in FTL designs.

2.2 Flash Translation Layers

FTLs are part of the firmware in solid-state disks. They use RAM-resident index structures to translate logical sector numbers into physical flash locations. Mapping resolutions have direct impact on RAM-space requirements and write performance. Block-level mapping [21], adopted in many entry-level flash-storage devices like USB thumb drives, requires only small mapping structures.

However, low-resolution mapping suffers from slow response when servicing non-sequential write patterns. Sector-level mapping [3, 6, 7] better handles random write requests, but requires large mapping structures, making its implementation infeasible in high-capacity solid-state disks.

Hybrid mapping combines both sector and block mapping for good balance between RAM-space requirements and write perfor-mance. This method groups consecutive logical sectors as logical blocks as large as physical blocks. It maps logical blocks to phys-ical blocks on a one-to-one basis using a block mapping table. If a physical block is mapped to a logical block, then this physical block is called the data block of this logical block. Any unmapped physical blocks are spare blocks. Hybrid mapping uses spare blocks as log blocks to serve new write requests, and uses a sector map-ping table to redirect read requests to the newest versions of data in spare blocks.

Hybrid mapping requires two policies: the first policy forms groups of data blocks and groups of log blocks, and the second policy associates these two kinds of groups with each other. Figures 1 and 2 show two FTL designs that use different policies. Let lbn and pbn stand for a logical-block number and a physical-block number, respectively. The term lsn represents a logical-sector number, and disp is the page offset in a physical block.

The bold boxes stand for physical blocks, each of which has four pages. The number in the pages indicate the lsns of their storage data. White pages, shadowed pages, and pages with diagonal lines represent pages containing valid data, invalid data, and free space, respectively. The BMT and the SMT are the block mapping table and the sector mapping table, respectively.

Figure 2. The fully-associative mapping scheme. All data blocks are in one group and all log blocks are in the other.

Let the group size denote the number of blocks in a group. In Fig. 1, the group size of data blocks is exactly two, while the group size of log blocks is no larger than two. This mapping scheme, called set-associative mapping, associates a data-block group with one log-block group or none. This design has two important vari-ants: set-associative sector translation (SAST), developed by Park et al. [15], and block-associative sector translation (BAST), de-veloped by Chung et al. [22]. SAST uses two variables, N and K, to set the group sizes of data blocks and log blocks, respec-tively. BAST (Block-Associative Sector Translation) [22] is sim-pler, fixing N =1 and K=1 always. Figure 2 depicts another map-ping scheme, called fully-associative mapmap-ping. This method has only two groups associated with each other, one for all data blocks and the other for all log blocks. Fully-associative sector translation (FAST), developed by Lee et al. [12], is based on this design.

2.3 The Need for Wear Leveling

FTLs write new data in log blocks allocated from spare blocks.

When they run low on spare blocks, FTLs start erasing log blocks.

Before erasing a log block, FTLs collect the valid data from the log block and from the data block associated with this log block, copy this valid data to a blank block, remove the sector-mapping information related to the log block, re-direct block-mapping in-formation to the copy destination block, and finally erase the old data block and log block into spare blocks. This procedure is called either merging operations or garbage collection.

For example, in Fig. 1, the FTL decides to erase the group consisting of log blocks at pbns 3 and 6. This log-block group is associated with the group of data blocks at pbns 0 and 2. The FTL prepares a group of two blank blocks at pbns at 7 and 8. Next, the FTL collects four valid sectors at lsns 0 through 3, and writes them to the blank block at pbn 7. Similarly, the FTL copies valid sectors at lsns 4 through 7 to the blank block at pbn 8. Finally, the FTL erases the physical blocks at pbns 0, 2, 3, and 6 into spare blocks, and then re-maps lbns 0 and 1 to physical blocks at pbns 7 and 8, respectively.

Log-block-based FTLs exhibit some common behaviors in the garbage-collection process regardless of their grouping and associ-ating policies. FTLs never erase a data block if none of its sector data have been updated. In the set-associative mapping illustration in Fig. 1, erasing the data blocks at pbn 5 does not reclaim any free space. Similarly, in the fully-associative mapping illustration in Fig. 2, erasing any of the log blocks does not involve the data block at pbn 5. This is a potential cause of uneven flash wear.

Figure 3(a) shows a fragment of the disk-write traces recorded from a laptop PC’s daily use¹. The X-axis and the Y-axis of this

1This workload is the NOTEBOOK workload in Section 5.1.

Figure 3. Flash wear in a solid-state disk under the disk workload of a laptop. (a) A fragment of the disk-write workload and (b) the final distribution of flash blocks’ erase counts.

figure represent the logical time and the lsns of write requests, re-spectively. This pattern biases write requests toward a small collec-tion of disk sectors. Let a physical block’s erase count denote how many write-erase cycles this block has undergone. After replay-ing the trace set on a real solid-state disk which adopts an FAST-based FTL (Section 6.1 describes this product in more detail), Fig.

3(b) shows that the final distribution of erase counts is severely un-balanced. The X-axis and Y-axis of Fig. 3(b) represent the pbns and erase counts of physical blocks, respectively. Nearly 60% of all physical blocks have zero erase counts, as the horizontal line at the bottom of Fig. 3(b) shows. In other words, this workload retires only 40% of all blocks, while the rest remain fresh. Evenly dis-tributing erase operations can double the flash lifespan compared to that without wear leveling.

2.4 Prior Wear-Leveling Strategies

This section provides a conceptual overview of existing wear-leveling designs. Static wear wear-leveling moves static/immutable data away from lesser worn flash blocks, encouraging FTLs to start eras-ing these blocks. Flash vendors includeras-ing Numonyx [14], Micron [13], and Spansion [20] suggest using static wear leveling for flash lifetime enhancement. Chang et al. [5] described a static wear lev-eling design, and later Chang et al. [2] showed that this design is competitive with existing approaches. However, the experiments in this study reveal that static wear leveling suffers from uneven flash wear on the long-term.

Hot-cold swapping exchanges data in a lesser worn block with data from a badly worn block. Jung et al. [9] presented a hot-cold swapping design. However, Chang et al. [2] showed that hot-cold swapping risks erasing the most worn flash block pathologically.

Cold-data migration relocates immutable data to excessively worn blocks and then isolates these worn blocks from wear leveling until they are no longer worn blocks compared to other blocks. Chang et al. [2] described a design of this idea. This design adopts five priority queues to sort blocks in terms of their wear information and a cache mechanism to store only frequently accessed wear lev-eling. However, synching the wear information between the cache and flash introduces extra write traffic to flash, and its higher im-plementation complexity may be a concern of firmware designers.

Unlike the wear-leveling designs above that treat wear leveling and garbage collection as independent activities, Chiang et al. [6]

and Kim et al. [11] proposed heuristic functions that score flash blocks with considering garbage collection and wear leveling. In this case, FTLs erase the most scored block. However, erasing a block can require re-scoring all flash blocks. This task excessively stress the controllers and delay ordinary read/write requests.

There are compromises between algorithm concept and im-plementation, because the controllers can offer very limited re-sources. Even though different wear-leveling designs are based on

The average erase count

Figure 4. Physical blocks and their erase recency and erase counts.

An upward arrow indicates that a block has recently increased its erase count.

the same concept, they could have very different resource demands and performance characteristics. For example, among the differ-ent designs of static wear leveling, Chang et al. [5] proposed us-ing a periodically-refreshed bitmap to indicate not recently erased blocks. Differently, the designs from Numonyx [14] and Chang and Kuo [4] store blocks’ erase counts in RAM, and involve the block of the smallest erase count in wear leveling.

Lazy wear leveling (the proposed approach) roots in cold-data migration. However, different from the dual-pool algorithm [2], which is also based on cold-data migration, lazy wear leveling adopts the following innovative designs. First, lazy wear leveling does not store blocks’ wear information in RAM. It leaves them in flash instead, and utilizes the mapping information available in FTLs to assist wear leveling. In contrast, the dual-pool algorithm requires RAM space to store blocks’ wear information and monitor

在文檔中嵌入式網路通訊裝置評比技術與工具之研發---子計畫四：嵌入式網路通訊裝置儲存裝置效能評比基準與工具之研發(中心分項)(II) (頁 33-43)