T HESIS O RGANIZATION - 記錄式快閃記憶體儲存系統之設計與實做

CHAPTER 1 INTRODUCTION

1.2 T HESIS O RGANIZATION

The rest of this thesis is organized as follows. Chapter 2 describes the background of the flash memory and cleaning process. Chapter 3 introduces related works. Chapter 4 describes the design and implementation of LFSS. Performance results are given in Chapter 5. And, Chapter 6 describes the conclusion and future work.

Chapter 2 Background

2.1 Flash Characteristics

Flash is a form of Electrically Erasable Read Only Memory (EEPROM), except that it is electrically erasable. It is available in two major types, which are the traditional NOR flash and the newer, cheaper NAND flash.

Flash chips are divided into blocks, whose sizes are typically 64KB or 128KB on NOR flash or 8KB on NAND flash. These two types of flash share a most important characteristic that a write operation can only be done on a clean block (i.e., all bits in this block are logical one).

Read Cycle Time 120 ~ 200 ns Write Cycle Time 6 ~ 9 us/byte Block Erase Time 60 ~ 80 ms Erase Block Size 64KB or 128 KB

Erase Cycles / Block 100,000 ~ 1,000,000 times Table 2.1 NOR Flash Characteristics

Table 2.1 lists the typical NOR flash memory characteristics. The read cycle time is comparable to the time of a DRAM read operation (i.e., less than 200ns). However, writing a byte to a clean block needs about 6us. Moreover, writing to a non-cleaned block requires the erase operation to be performed first, which takes around 70ms.

The lifetime of a block is measured in erase cycles, with a typical value of 100,000 to

1,000,000 erases. To balance the lifetime of all the blocks, most flash-based storage systems attempt to ensure that erase operations are evenly distributed around the whole flash chip. This is a well-known process named wear leveling or even wearing [12].

2.2 Data Update Problem

For flash memory, in-place update is not efficient since a block must be erased before being updated. Figure 2.1 shows the detailed operations for in-place update. In flash chips (e.g., 64 Kbytes or 128 Kbytes for Intel Series 2+ Flash Memory and 512 bytes for SanDisk flash memory), all data in the to-be-updated block must first be copied to a system buffer (a). Then, the data is updated in the buffer and the dirty block is erased (b). After the block has been erased, the block in the buffer is written back to the flash (c). It is worth to mention that even a one-byte update requires a block-read, a slow erase, and a block write operations. Therefore, in-place update results in poor performance. Moreover, in-place update violates the rule of even wearing. Hot data blocks will soon reach their erase cycle limits.

To avoid these problems, the non-in-place-update scheme was proposed. Figure 2.2 shows the detailed operations for non-in-place-update. Instead of updating data at the same address, new data is written to an empty space in the flash memory and the obsolete data is left as garbage. A software cleaner will be triggered later to reclaim these garbage by migrating the valid data from the block to be cleaned to another block, and then erasing the block. Therefore, the block will be available for storing new data.

Write pointer The area to be updated

Figure 2.1 : Updating Data in Place Flash

Buffer Flash Buffer Flash Buffer Flash

Copy the block to the buffer

Update the data Erase this block

(a)

(b)

Write data to the same block in the flash

Update data success

block

(c)

Write pointer Prepare to update this data

Figure 2.2 : Non-in-place-update

2.3 Flash Memory Cleaning Policies

2.3.1 Free Space Management : Segments

w data, the invalid data must be

s shown in Figure 2.3.

First, as the Figure 2.3(a) illu

In order to get more free space for storing ne

reclaimed. Many data management approach divides the flash memory into large, fix-sized segments for ease of reclaiming invalid data. A segment is made up of a number of contiguous blocks, where the number may be different for different data management approaches. When the number of free segments is less than a certain threshold, a software cleaning process (i.e., the cleaner) will be triggered to reclaim the invalid data.

2.3.2 Three-stage Operations of Cleaning Process The cleaning of the invalid data involves three stages, a

strates, the cleaning process selects a victim segment and then identifies the valid data in it. In the next stage, shown in Figure 2.3(b), the valid data is copied into another free space. Finally, the victim segment is erased, as shown the Figure 2.3(c).

Invalid data

Update

Write pointer

Segment

: invalid : used : free 1: Select a segment to be cleaned

2: Copy valid data to free space

3: Erase block

(a)

(b)

(c)

Figure 2.3 : Th es of C g Process

2.3.3 Issues of Cleaning Policies

In this section, we describe four issues that must be addressed by a cleaning ree Stag leanin

policy.

(1) When should the cleaner be triggered? One approach is to continuously run it as

(2) e? Segment cleaning

(3) as the segment selection

(4) try

.3.4 Segment Selection Algorithm

segment selection algorithms.

d is the simplest algorithm, which selects a segment with the large

2.3.4.2 Cost-Benefit Policy

[6] chooses to clean a segment that maximize the a low-priority task in the background. Another approach is to trigger it at night, or when disk space is nearly exhausted.

How many segments should the cleaner process at a tim

offers an opportunity to reorganize data on flash memory. In general, the more segments we clean at once, the more free space we get.

Which segments should be cleaned? This is referred to

algorithm. One may select a segment with the largest amount of garbage or select segments based on their attributes, such as age, update times, and etc.

How should the valid blocks be arranged when they are copied out? One may to enhance the locality by grouping files in the same directory into a single segment. Another possibility is to group the blocks with similar last-modified time into new segments. This can help cleaning policies executing more effectively.

In this section we shall introduce three 2.3.4.1 Greedy Policy

The greedy metho

st amount of garbage. According to the previous study [19], the greedy policy works well in the case of uniform access. However, it performs poorly under high locality of reference.

The cost-benefit policy

formula: and therefore (1-u) stands for the amount of free space that can be reclaimed. The age indicates the time elapsed since the most recent modification (i.e., the last block invalidation or writing), and it is used to represent the hotness of the valid data. The 2u reflects the overheads of cleaning a segment (i.e., reading valid blocks and writing them to another segment). Cost-benefit policy performs well under high locality of reference. However, it does not perform as well as the greedy policy under uniform access.

2.3.4.3 Cost Age Time (CAT) Policy

A similar policy to the previous one is the Cost Age Times (CAT) policy [5,6], which chooses to clean segments that minimize the following formula:

Cleaning Cost * Age

1 * Number of Cleaning.

The cleaning cost is defined as u/(1-u), where u is the percentage of valid data in a segment. The cleaning cost reflects the ratio of overheads to the benefit, which should be minimized. The definition of age is similar to cost-benefit policy. And, the number of cleaning stands for the number of times a segment has been erased. The basic idea of CAT is to minimize the cleaning costs, as well as gives the recently-cleaned segments more time to accumulate garbage for reclamation. In addition, to achieve the goal of wear-leveling, the segments with the fewest number of erases are given more chances to be selected for cleaning.

Chapter 3 Related Work

In this chapter, we will introduce four kinds of related work. Section 3.1 describes MTD subsystem, which is an interface layer between file systems and memory device drivers. Section 3.2 shows two well-known flash file systems, JFFS and MFFS. Section 3.3 introduces four flash-memory based storage systems that are related to our design. Section 3.4 describes cleaning policies used in the flash memory.

3.1 Memory Technology Device (MTD) Subsystem for Linux

The Memory Technology Device (MTD) subsystem for Linux [8] provides a generic support for various types of memory devices, especially for Flash devices such as the M-Systems DiskOnChip and Common Flash Interface (CFI) onboard flash.

Figure 3.1 : MTD Subsystem

MTD layer

CFI NOR Flash NAND Flash

File System

FTL MTD User

Modules LFSS

MTD Hardware Device Drivers

The aim of this subsystem is to provide a generic interface between the hardware drivers and the upper layers of the system. Hardware drivers only need to supply simple routines such as reading, writing, erasing, and querying for the device. Data presentation of the device is handled by the upper layer components, such as FTL (Flash Translation Layer) and JFFS2, which are called MTD user modules (as shown in Figure 3.1). From the figure we can see that, LFSS is also implemented as a MTD user module.

3.2 Flash File Systems

3.2.1 JFFS

Application Application Application

Figure 3.2 : JFFS in the Linux File System Framework Hardware JFFS

Virtual File System

Ext3 Shared library

User space

Ext2 FAT

Buffer Cache

Disk Driver MTD

Flash Driver

Kernel space

Flash Device Disk Device

JFFS is a ns AB in Swe

anges a flash device as a circular area, as shown in Figure 3.3.

Mod

log-structured file system designed by Axis Communicatio

den. It is especially used for flash devices on embedded systems. Figure 3.2 shows how JFFS fits into the file system framework in Linux. From the figure we can see that, JFFS sits between the VFS and the MTD layers. In addition, a major difference between JFFS and ordinary file systems is that the former does not rely on buffer cache.

JFFS arr

ifications to the file system are written at the tail (i.e., the start of the free chunks).

Invalid data blocks are reclaimed from the head. The basic data structure used for storing data on the flash device is the raw node. Each raw node is divided into two parts, metadata and real data. Similar to JFFS, LFSS also uses log-like structure to store data. However, the main difference is that LFSS separates the metadata and real data in different segments for reducing the system initial time.

Head pointer

Tail pointer

Figure 3.3 : Data Arrangement in JFFS : Valid data : Invalid data : Free space

3.2.2 Microsoft Fl

provides complete file system capa

3.3 Flash-Memory Based Storage Systems

orage systems.

3.3.1

flash memory and mana

ng to their write access frequencies and dyna

3.3.2 eNVy

] is a large flash memory-based storage system, which provides a mem

ash File System (MFFS) Microsoft Flash File System (MFFS) [23]

bilities for DOS. It uses linked lists to store and manage data in flash memory.

Data are allocated as variable-sized regions instead of fix-sized blocks. And, the greedy policy is used for reclaiming invalid data. Previous research [13] reported that MFFS performs poor when accessing large files. Specifically, its write performance degrades linearly with the growth of file size.

In this section, we introduce four flash-memory based st Dynamic Data Clustering Server (DAC Server) DAC server [5] uses the DAC approach to cluster data on

ges flash memory as fix-sized blocks and uses the non-in-place-update scheme for data blocks to avoid per-update erasing.

DAC server is to classify data accordi

mic cluster them at the time when the data is updated or when the segments are cleaned. However, DAC approach is not suitable for our environment that has an additional battery-backed SDRAM buffer.

eNVy [26

ory interface rather than a block-based disk interface. The hardware consists of the flash memory, a small battery-backed SRAM for write-buffering, a high-bandwidth parallel data path between them, and a controller for page mapping and cleaning. Figure 3.4 shows the page-remapping technique of eNVy.

Figure 3.4 : Page Remapping in eNVy (for a Write to Page 2) Flash

Flash

Data o prevent upda

3.3.3 Flash Translation Layer (FTL)

sh memory to emulate a hard disk.

Basi

3.3.4 Large-Scale Flash Memory Storage System

scheme for large-scale ash

3.4 Cleaning Policies

important issue in flash storage systems. Rosenblum update is performed with the help of page-remapping in order t

te in place. It uses a hybrid cleaning policy that combines FIFO and locality gathering to minimize the cleaning costs for uniform access and high locality of reference. Simulation results show that it can handle 30,000 transactions per second at a flash utilization of 80%. Similar to eNVy, our design also uses a small battery-backed SRAM. However, LFSS uses DEB data clustering approach to make hotness data always be updated in SDRAM. Besides, CAT cleaning policy is used in LFSS. Therefore, LFSS can reduce more erase operations and even wearing.

M-Systems’s TrueFFS [10] allows fla

cally, it is a software block device driver to be used with an existing file system.

Flash memory is divided into fixed-sized blocks. The data presentation, which is patented by M-Systems, is called Flash Translation Layer (FTL) standard. And, some researches [24,25]implement over the flash translation layer for the compatibility of their applications and systems.

Chang, et al. [3], proposed to a flexible management

fl -memory storage systems. It efficiently manages high-capacity flash-memory storage systems based on the behaviors of realistic access patterns. Besides, it uses the real time garbage collection mechanism [4] to manage its invalid data. Therefore, their proposed scheme could significantly reduce the main-memory usages without noticeable performance degradation.

Cleaning policies is another

and

ich we mentioned in Section 2.3.4.3, prov

es the greedy policy for clean

Ousterhout [21] suggested that the Log-structured File System (LFS), which writes data as appended log instead of updating data in place, can be applied to flash memory. In the paper, the authors showed that the greedy policy performs poorly under high localities of reference. Therefore, the cost-benefit policy was proposed. As we mentioned in Section 2.3.4.2, it tries to clean segments with cold data. As a result, it performs well under high locality of reference.

The Cost Age Times (CAT) [5,6] policy, wh

ides better wear leveling than the cost-benefit policies because the number of erase operations performed on each segment is considered.

Linux PCMCIA [11] flash memory driver also us

ing. However, to avoid concentrating erasures on a few segments, it sometimes chooses to clean the segment that has been erased the fewest number of times. This is called revised greedy policy.

Chapter 4 Design and Implementation

In this chapter, we describe the design and implementation of the Log Flash Storage System (LFSS). LFSS uses an additional battery-backed SDRAM buffer as the extension of the flash memory. Moreover, it integrates two techniques that we propose for improving the performance of the SDRAM-embedded flash memory system. The first technique is a data clustering method, Dynamic data clustering with Extra Buffer region (DEB). It makes the hot data to be updated in the extra RAM buffer so as to reduce the erasing times of flash blocks. The second technique is a data layout approach that separates the flash memory into two parts, super segments and data segments. Super segments contain a number of checkpoint nodes, each of which holds the total metadata in the flash. Therefore, we can simply scan the super segments, instead of the total flash memory, during the system initialization. As a result, LFSS can reduce the initialization time. The real data is stored in data segments sequentially, and LFSS manages flash memory as variable-sized blocks like as log-structured file systems. The non-in-place-update scheme is used when data blocks are updated.

In addition to the two techniques, we also implement three cleaning policies in LFSS in order to evaluate the cleaning effectiveness.

The system is implemented in Linux 2.4.20. Different from JFFS that provides an interface to the virtual file system, LFSS provides its interface directly to user space. As Figure 4.1 is shown, LFSS is implemented as a MTD user module. It provides functions such as read, write, erase, and update to application programs. For ease of experiment in the PC environment, we use SDRAM, instead of flash, for

performance evaluation. Therefore, we implement a SDRAM MTD driver to connect the MTD layer. All MTD user modules regard the SDRAM MTD driver as a normal flash.

User space

Figure 4.1 : LFSS in the Linux

The rest of this chapter is organized as follows. We first describe the DEB data clustering approach in Sec

4.1 Dynamic Data Clustering with Extra Buffer Region

When a segment is selected to be cleaned, the valid data in it should be migrated to anothe

tion 4.1. Section 4.2 introduces the flash data layout of LFSS. The cleaning policy we implemented in LFSS is represented in Section 4.3.

r segment. If the system migrates the valid data to a segment that will be cleaned soon, the migration becomes useless and wasteful. Therefore, the data

Virtual File System

LFSS JFFS FAT

MTD Layer

SDRAM

Kernel space FTL

Application Application

NAND NOR

SDRAM driver NAND driver NOR driver

Hardware

reorganization is important to flash-memory based storage systems. Previous research [6,17,21,24] about data reorganization pointed out that separating hot data from cold data can reduce cleaning overhead. Hot data stands for the data that is updated frequently. On the contrary, cold data is stable.

DAC (Dynamically dAta Clustering) approach [5] dynamically clusters data durin

ure 4.2 : D tering in D

DAC partitions the flash memory into several logical regions that contain data with

g segment cleaning and data updating. Therefore, the hot data and cold data can be separated by migrating them to different flash memory spaces.

Fig ata Clus AC

different degrees of hotness. Each region includes a set of flash segments, which are not needed to be physically contiguous. The basic idea of DAC is to cluster data segments with similar write access frequencies in the same region. Because data access frequencies may change over time, a data segment will be migrated among regions when its write access frequency changes. Figure 4.2 shows that if the update frequency increases, the data will be moved toward the hottest region. And, it will be moved toward the coldest region if the update frequency decreases. Besides, when a

Too old

Region 1 Region 2

…

_{Region N}

Young & Young & Young &

updated updated updated

Too old Too old

Young &

updated

Coldest Hottest

Too old

segment is selected for cleaning, all of its valid old data will be moved to the free space in the next colder region. Therefore, the DAC approach is more fine-grained and more effective in data clustering than other research that just separates data into two classes, hot and cold.

On the basis of adding a SDRAM buffer as the extension of the flash, we prop

the second data clustering policy, DEB (Dynamically data clust

regio

ose two policies. First, we make the SDRAM to be the hottest region in DAC approach. Because the hottest data will be updated in SDRAM, we can reduce a lot of erase operation. However, this is not aggressive and effective. This is because the hottest data must be moved through total region to reach the hottest region. Moreover, some hotter data may be not reach the hottest region, because it is not hot enough to move through total region.

Therefore, we propose

ering with Extra Buffer region). The basic idea is to make hot data be updated in

在文檔中記錄式快閃記憶體儲存系統之設計與實做 (頁 11-0)