CHAPTER 2 BACKGROUND
2.3 F LASH M EMORY C LEANING P OLICIES
2.3.1 Free Space Management : Segments
w data, the invalid data must be
s shown in Figure 2.3.
First, as the Figure 2.3(a) illu
In order to get more free space for storing ne
reclaimed. Many data management approach divides the flash memory into large, fix-sized segments for ease of reclaiming invalid data. A segment is made up of a number of contiguous blocks, where the number may be different for different data management approaches. When the number of free segments is less than a certain threshold, a software cleaning process (i.e., the cleaner) will be triggered to reclaim the invalid data.
2.3.2 Three-stage Operations of Cleaning Process The cleaning of the invalid data involves three stages, a
strates, the cleaning process selects a victim segment and then identifies the valid data in it. In the next stage, shown in Figure 2.3(b), the valid data is copied into another free space. Finally, the victim segment is erased, as shown the Figure 2.3(c).
Invalid data
Update
Write pointer
Segment
: invalid : used : free 1: Select a segment to be cleaned
2: Copy valid data to free space
3: Erase block
(a)
(b)
(c)
Figure 2.3 : Th es of C g Process
2.3.3 Issues of Cleaning Policies
In this section, we describe four issues that must be addressed by a cleaning ree Stag leanin
policy.
(1) When should the cleaner be triggered? One approach is to continuously run it as
(2) e? Segment cleaning
(3) as the segment selection
(4) try
.3.4 Segment Selection Algorithm
segment selection algorithms.
d is the simplest algorithm, which selects a segment with the large
2.3.4.2 Cost-Benefit Policy
[6] chooses to clean a segment that maximize the a low-priority task in the background. Another approach is to trigger it at night, or when disk space is nearly exhausted.
How many segments should the cleaner process at a tim
offers an opportunity to reorganize data on flash memory. In general, the more segments we clean at once, the more free space we get.
Which segments should be cleaned? This is referred to
algorithm. One may select a segment with the largest amount of garbage or select segments based on their attributes, such as age, update times, and etc.
How should the valid blocks be arranged when they are copied out? One may to enhance the locality by grouping files in the same directory into a single segment. Another possibility is to group the blocks with similar last-modified time into new segments. This can help cleaning policies executing more effectively.
2
In this section we shall introduce three 2.3.4.1 Greedy Policy
The greedy metho
st amount of garbage. According to the previous study [19], the greedy policy works well in the case of uniform access. However, it performs poorly under high locality of reference.
The cost-benefit policy
formula: and therefore (1-u) stands for the amount of free space that can be reclaimed. The age indicates the time elapsed since the most recent modification (i.e., the last block invalidation or writing), and it is used to represent the hotness of the valid data. The 2u reflects the overheads of cleaning a segment (i.e., reading valid blocks and writing them to another segment). Cost-benefit policy performs well under high locality of reference. However, it does not perform as well as the greedy policy under uniform access.
2.3.4.3 Cost Age Time (CAT) Policy
A similar policy to the previous one is the Cost Age Times (CAT) policy [5,6], which chooses to clean segments that minimize the following formula:
Cleaning Cost * Age
1 * Number of Cleaning.
The cleaning cost is defined as u/(1-u), where u is the percentage of valid data in a segment. The cleaning cost reflects the ratio of overheads to the benefit, which should be minimized. The definition of age is similar to cost-benefit policy. And, the number of cleaning stands for the number of times a segment has been erased. The basic idea of CAT is to minimize the cleaning costs, as well as gives the recently-cleaned segments more time to accumulate garbage for reclamation. In addition, to achieve the goal of wear-leveling, the segments with the fewest number of erases are given more chances to be selected for cleaning.
Chapter 3
Related Work
In this chapter, we will introduce four kinds of related work. Section 3.1 describes MTD subsystem, which is an interface layer between file systems and memory device drivers. Section 3.2 shows two well-known flash file systems, JFFS and MFFS. Section 3.3 introduces four flash-memory based storage systems that are related to our design. Section 3.4 describes cleaning policies used in the flash memory.
3.1 Memory Technology Device (MTD) Subsystem for Linux
The Memory Technology Device (MTD) subsystem for Linux [8] provides a generic support for various types of memory devices, especially for Flash devices such as the M-Systems DiskOnChip and Common Flash Interface (CFI) onboard flash.
Figure 3.1 : MTD Subsystem
MTD layer
CFI NOR Flash NAND Flash
File System
FTL MTD User
Modules LFSS
MTD Hardware Device Drivers
The aim of this subsystem is to provide a generic interface between the hardware drivers and the upper layers of the system. Hardware drivers only need to supply simple routines such as reading, writing, erasing, and querying for the device. Data presentation of the device is handled by the upper layer components, such as FTL (Flash Translation Layer) and JFFS2, which are called MTD user modules (as shown in Figure 3.1). From the figure we can see that, LFSS is also implemented as a MTD user module.
3.2 Flash File Systems
3.2.1 JFFS
Application Application Application
Figure 3.2 : JFFS in the Linux File System Framework Hardware JFFS
Virtual File System
Ext3 Shared library
User space
Ext2 FAT
Buffer Cache
Disk Driver MTD
Flash Driver
Kernel space
Flash Device Disk Device
JFFS is a ns AB in Swe
anges a flash device as a circular area, as shown in Figure 3.3.
Mod
log-structured file system designed by Axis Communicatio
den. It is especially used for flash devices on embedded systems. Figure 3.2 shows how JFFS fits into the file system framework in Linux. From the figure we can see that, JFFS sits between the VFS and the MTD layers. In addition, a major difference between JFFS and ordinary file systems is that the former does not rely on buffer cache.
JFFS arr
ifications to the file system are written at the tail (i.e., the start of the free chunks).
Invalid data blocks are reclaimed from the head. The basic data structure used for storing data on the flash device is the raw node. Each raw node is divided into two parts, metadata and real data. Similar to JFFS, LFSS also uses log-like structure to store data. However, the main difference is that LFSS separates the metadata and real data in different segments for reducing the system initial time.
Head pointer
Tail pointer
Figure 3.3 : Data Arrangement in JFFS : Valid data : Invalid data : Free space
3.2.2 Microsoft Fl
provides complete file system capa
3.3 Flash-Memory Based Storage Systems
orage systems.
3.3.1
flash memory and mana
ng to their write access frequencies and dyna
3.3.2 eNVy
] is a large flash memory-based storage system, which provides a mem
ash File System (MFFS) Microsoft Flash File System (MFFS) [23]
bilities for DOS. It uses linked lists to store and manage data in flash memory.
Data are allocated as variable-sized regions instead of fix-sized blocks. And, the greedy policy is used for reclaiming invalid data. Previous research [13] reported that MFFS performs poor when accessing large files. Specifically, its write performance degrades linearly with the growth of file size.
In this section, we introduce four flash-memory based st Dynamic Data Clustering Server (DAC Server) DAC server [5] uses the DAC approach to cluster data on
ges flash memory as fix-sized blocks and uses the non-in-place-update scheme for data blocks to avoid per-update erasing.
DAC server is to classify data accordi
mic cluster them at the time when the data is updated or when the segments are cleaned. However, DAC approach is not suitable for our environment that has an additional battery-backed SDRAM buffer.
eNVy [26
ory interface rather than a block-based disk interface. The hardware consists of the flash memory, a small battery-backed SRAM for write-buffering, a high-bandwidth parallel data path between them, and a controller for page mapping and cleaning. Figure 3.4 shows the page-remapping technique of eNVy.
Figure 3.4 : Page Remapping in eNVy (for a Write to Page 2) Flash
Flash
Data o prevent upda
3.3.3 Flash Translation Layer (FTL)
sh memory to emulate a hard disk.
Basi
3.3.4 Large-Scale Flash Memory Storage System
scheme for large-scale ash
3.4 Cleaning Policies
important issue in flash storage systems. Rosenblum update is performed with the help of page-remapping in order t
te in place. It uses a hybrid cleaning policy that combines FIFO and locality gathering to minimize the cleaning costs for uniform access and high locality of reference. Simulation results show that it can handle 30,000 transactions per second at a flash utilization of 80%. Similar to eNVy, our design also uses a small battery-backed SRAM. However, LFSS uses DEB data clustering approach to make hotness data always be updated in SDRAM. Besides, CAT cleaning policy is used in LFSS. Therefore, LFSS can reduce more erase operations and even wearing.
M-Systems’s TrueFFS [10] allows fla
cally, it is a software block device driver to be used with an existing file system.
Flash memory is divided into fixed-sized blocks. The data presentation, which is patented by M-Systems, is called Flash Translation Layer (FTL) standard. And, some researches [24,25]implement over the flash translation layer for the compatibility of their applications and systems.
Chang, et al. [3], proposed to a flexible management
fl -memory storage systems. It efficiently manages high-capacity flash-memory storage systems based on the behaviors of realistic access patterns. Besides, it uses the real time garbage collection mechanism [4] to manage its invalid data. Therefore, their proposed scheme could significantly reduce the main-memory usages without noticeable performance degradation.
Cleaning policies is another
and
ich we mentioned in Section 2.3.4.3, prov
es the greedy policy for clean
Ousterhout [21] suggested that the Log-structured File System (LFS), which writes data as appended log instead of updating data in place, can be applied to flash memory. In the paper, the authors showed that the greedy policy performs poorly under high localities of reference. Therefore, the cost-benefit policy was proposed. As we mentioned in Section 2.3.4.2, it tries to clean segments with cold data. As a result, it performs well under high locality of reference.
The Cost Age Times (CAT) [5,6] policy, wh
ides better wear leveling than the cost-benefit policies because the number of erase operations performed on each segment is considered.
Linux PCMCIA [11] flash memory driver also us
ing. However, to avoid concentrating erasures on a few segments, it sometimes chooses to clean the segment that has been erased the fewest number of times. This is called revised greedy policy.
Chapter 4
Design and Implementation
In this chapter, we describe the design and implementation of the Log Flash Storage System (LFSS). LFSS uses an additional battery-backed SDRAM buffer as the extension of the flash memory. Moreover, it integrates two techniques that we propose for improving the performance of the SDRAM-embedded flash memory system. The first technique is a data clustering method, Dynamic data clustering with Extra Buffer region (DEB). It makes the hot data to be updated in the extra RAM buffer so as to reduce the erasing times of flash blocks. The second technique is a data layout approach that separates the flash memory into two parts, super segments and data segments. Super segments contain a number of checkpoint nodes, each of which holds the total metadata in the flash. Therefore, we can simply scan the super segments, instead of the total flash memory, during the system initialization. As a result, LFSS can reduce the initialization time. The real data is stored in data segments sequentially, and LFSS manages flash memory as variable-sized blocks like as log-structured file systems. The non-in-place-update scheme is used when data blocks are updated.
In addition to the two techniques, we also implement three cleaning policies in LFSS in order to evaluate the cleaning effectiveness.
The system is implemented in Linux 2.4.20. Different from JFFS that provides an interface to the virtual file system, LFSS provides its interface directly to user space. As Figure 4.1 is shown, LFSS is implemented as a MTD user module. It provides functions such as read, write, erase, and update to application programs. For ease of experiment in the PC environment, we use SDRAM, instead of flash, for
performance evaluation. Therefore, we implement a SDRAM MTD driver to connect the MTD layer. All MTD user modules regard the SDRAM MTD driver as a normal flash.
User space
Figure 4.1 : LFSS in the Linux
The rest of this chapter is organized as follows. We first describe the DEB data clustering approach in Sec
4.1 Dynamic Data Clustering with Extra Buffer Region
When a segment is selected to be cleaned, the valid data in it should be migrated to anothe
tion 4.1. Section 4.2 introduces the flash data layout of LFSS. The cleaning policy we implemented in LFSS is represented in Section 4.3.
r segment. If the system migrates the valid data to a segment that will be cleaned soon, the migration becomes useless and wasteful. Therefore, the data
Virtual File System
LFSS JFFS FAT
MTD Layer
SDRAM
Kernel space FTL
Application Application
NAND NOR
SDRAM driver NAND driver NOR driver
Hardware
reorganization is important to flash-memory based storage systems. Previous research [6,17,21,24] about data reorganization pointed out that separating hot data from cold data can reduce cleaning overhead. Hot data stands for the data that is updated frequently. On the contrary, cold data is stable.
DAC (Dynamically dAta Clustering) approach [5] dynamically clusters data durin
ure 4.2 : D tering in D
DAC partitions the flash memory into several logical regions that contain data with
g segment cleaning and data updating. Therefore, the hot data and cold data can be separated by migrating them to different flash memory spaces.
Fig ata Clus AC
different degrees of hotness. Each region includes a set of flash segments, which are not needed to be physically contiguous. The basic idea of DAC is to cluster data segments with similar write access frequencies in the same region. Because data access frequencies may change over time, a data segment will be migrated among regions when its write access frequency changes. Figure 4.2 shows that if the update frequency increases, the data will be moved toward the hottest region. And, it will be moved toward the coldest region if the update frequency decreases. Besides, when a
Too old
Region 1 Region 2
…
Region NYoung & Young & Young &
updated updated updated
Too old Too old
Young &
updated
Coldest Hottest
Too old
segment is selected for cleaning, all of its valid old data will be moved to the free space in the next colder region. Therefore, the DAC approach is more fine-grained and more effective in data clustering than other research that just separates data into two classes, hot and cold.
On the basis of adding a SDRAM buffer as the extension of the flash, we prop
the second data clustering policy, DEB (Dynamically data clust
regio
ose two policies. First, we make the SDRAM to be the hottest region in DAC approach. Because the hottest data will be updated in SDRAM, we can reduce a lot of erase operation. However, this is not aggressive and effective. This is because the hottest data must be moved through total region to reach the hottest region. Moreover, some hotter data may be not reach the hottest region, because it is not hot enough to move through total region.
Therefore, we propose
ering with Extra Buffer region). The basic idea is to make hot data be updated in SDRAM, instead of the flash memory so as to reduce the number of erase operations.
Similar to DAC, DEB also partitions the flash memory into several logical ns. And, we always associate the extra RAM buffer to the Extra Buffer Region (EBR). As we mentioned before, each region contains a set of segments, which are not to be physically contiguous. Thus, each segment is associated with a single region at any given time.
Figure 4.3 : Stable time interval
Each flash region has a corresponding stable time interval, as shown in Figure 4.3 i
Figure 4.4 : Data Clustering in DEB
s shown, which defines the range of the appropriate stable time1 for the data in the region. Assuming that sst(n) represents the shortest stable time and lst(n) represents the longest stable time of the interval belonging to region n. From the figure we can see that, the value of sst(i) is equal to the value of lst(i+1). And, both sst and lst of a colder region are bigger than the corresponding values of a higher region.
It is because the data in the former is more stable.
f1
1 We define the stable time as the time period between the most recently two updates of the data.
Basically, an update involves two entities, the block and the region that the block associate order to simplify the description, we say an update is fast if the time between the update and the last update of the block (i.e., the stable time of the block) is less than the sst value of the region. Similarly, an update is said to be slow if
the t ore than value of gion. Figu ws the data
reorganization diagram in DEB. The data reorganization happens when data blocks are updated or when segment cleaning occurs. The rules of the data reorganization can be summarized as follow
1. Newly created data blocks are placed in the RAM buffer, and thus they are associated with EBR.
2. If a data block is to be updated and its stable time falls in the interval of the current region, the new data is written to the free space of the current region.
f a fast , we check the last update of this block. If
he last u is written to the free space of
the next hotter region (denoted as f1 in Figure 4.4). Otherwise, the new data is written to the free space in the EBR (denote as f2 in Figure 4.4). After
writing the new data, the obsolete data s
ata means that the time elapsed since the last update s with. In
the lst
ime is m the re re 4.4 sho
s :
And, the obsolete data block is invalidated as garbage.
3. I update happens on a block
t pdate was not a fast one, the new data
block in the original region i invalidated as garbage.
4. If a slow update happens on a block, the new data is written to the free space of the next colder region (denoted as s in Figure 4.4). And, the obsolete data block in the original region is invalidated as garbage.
5. If the used space in the EBR is greater than a certain threshold, we write back the oldest data in it to the suitable region until the used space in the EBR is lower than the half of threshold (denoted as w in Figure 4.4). The suitable region for the d
of the data falls in the stable time interval of that region.
If a data block update
6. happens in the EBR, the block is updated in place.
in the same region.
data duri
data duri