Chapter 1 Introduction
1.1 Motivation
The speed of processor is double every eighteen months by Moore’s Law, but the performance of the computer system grows slowly due to the slow I/O. The disk I/O time is mainly dominated by seek time and rotation delay. Because data in volatile memory (DRAM) is not persistent, system must periodically write back dirty data to the disk for avoiding data loss due to the power failure. Therefore, performance of disk IO is worse.
With the well advances in non-volatile memory (NVRAM) technologies, many kinds of non-volatile RAM such as MRAM [9] (Magnetoresistive RAM), FeRAM (Ferro Electric RAM), PRAM (Phase-change RAM) and OUM (Ovonics Unified Memory) [10] have been proposed. NVRAM is an emerging technique to solve the problem to the slow disk IO. As semiconductor technology makes progress, we can anticipate NVRAM to become a common component of computer systems. Since MRAM among them is comparable with DRAM in terms of capacity, speed and cost, MRAM is considered as the potential replacement of DRAM as the main memory for computer systems. Therefore, we can regard data in memory as persistent and exploit some approaches about NVRAM to improve performance of disk IO.
There are some problems in traditional DRAM-based system if the main memory is NVRAM. Firstly, according to some research [37][38][46], many small files are short-lived. Once the file is created in file system, file system must do some disk IO operations, such as reading metadata of the file. Even many files are deleted soon after created, they also need perform disk IO. Besides, after these files are deleted, it produces some fragmentation in the disk. Secondly, the traditional DRAM-based
when power outages. But there are two disadvantages: first, the dirty pages per file maybe distributed far in the disk. It must spend many seek time and rotation delay on writing back these dirty pages. Second, if it writes back all dirty pages of a file, system can not make sure whether it does not write back recently-updated pages.
According to time locality, recently-updated pages may be re-accessed and re-modified recently. Therefore, if it writes back recently-updated pages, it wastes the disk IO operations. Lastly, the file system consistency has been an important issue recently. Many file systems use the technique to support data consistency such as journaling, but it needs extra journaling IO to write logs into the disk earlier.
Therefore, on the basis of three problems, we propose three mechanisms corresponding three problems to improve performance of the file system.
1.2 Our Three Mechanisms
In this thesis, we propose the Buffer Cache management and transaction support on file system operations in NVRAM systems. We have two mechanisms temporary-file file system (TempFFS) and intelligent write back policy (WB) in Buffer Cache management and one mechanism for transaction support on file system operations (trans) for maintaining file system consistency.
Firstly, we add TempFFS between VFS and file system to apply delayed allocation simultaneously on all existing file systems. Different from file system specific implementations that maintain newly-created files on their own, such as XFS[47], and Ext4, TempFFS maintains newly-created files for all the file systems. Upon memory pressure or sync operations, the files are transferred to their original file systems and block allocation of these files takes place. Therefore, an existing file system can enjoy the benefit of delayed allocation without any code modifications.
Secondly, due to the data in NVRAM is persistent; we modify original write back policy which is file-by-file and does not consider recency. We consider the location of dirty pages in the disk and write back contiguous or neighbor dirty pages to reduce the seek time and rotation delay of the disk. Besides, we consider recency that we do not write back the recently-updated dirty pages.
Lastly, our transaction support mechanism can ensure both file system consistency without inducing any extra disk I/O. Since the data in NVRAM is persistent, it needs not write journaling IO before. We only make sure the file operation is atomic. In order to achieve atomic, we duplicate all data and metadata before they are modified in the file operation into undo logs. Once the file operation has finished successfully, the undo logs can be removed immediately. If the crash happens in the progress of the file operation, the undo log can be used to restore to the consistent state. Since our undo logs are placed in NVRAM and deleted later, it needs not any extra disk I/O.
We implement our three mechanisms in Linux 2.6.12. Since large capacity MRAM is not generally available in the market, and the performance characteristics of DRAM and MRAM are comparable and shown in Table 1.1, we use DRAM to emulate MRAM.
Table 1.1 MRAM and DRAM Characteristic
Device Type MRAM DRAM
Volatility No Yes
Characteristic
Erase Needed No No
Access Time 50ns ~5ns
Read Time 50ns 50ns
Performance
Write Time 50ns 50ns
Operation Power Supply 1.8V 1.8-5V According to our experimental results, the performance improvement of our TempFFS is about 35% compared to Ext2, the performance improvement of our intelligent write-back is about 65% compared to Ext2 and the performance improvement of our transaction support is about 80% compared to Ext3. Lastly, the performance improvement of the combinations of three proposed mechanisms is about 90% compared to Ext3.
1.3 Structure of the Thesis
The remainder of this thesis is organized as follows. Chapter 2 describes the related work about NVRAM. Chapter 3 presents the design and implementation details of the proposed mechanisms. The performance results are shown in Chapter 4. Finally, we give conclusions in Chapter 5.
Chapter 2 Related Work
In this chapter, we introduce some researches about NVRAM. In Section 2.1, we introduce some researches exploiting NVRAM to recover system when system crashes. In Section 2.2, some researches use NVRAM as storage device to improve the performance of file system. In Section 2.3, some researches use NVRAM as buffer to reduce disk IO, especially write operation. In Section 2.4, we introduce researches about providing file system consistency.
2.1 System Recovery
Ren Ohmura [33] in Keio University exploits the characteristic of NVRAM in system recovery. They propose a scheme to recovery the state of peripheral devices in NVRAM systems so that the system can resume its execution after an unpredictable power failure. They record all messages between CPU and devices in NVRAM and system re-sends messages recorded in memory to recovery devices into previous state when power failure.
Harp [25] records all updates of files in server nodes. Files in individual node can survive after the failure because file operations are logged in memory at several nodes.
The Recovery Box [2] stores the state of system in NVRAM and protects the region of storing the system state to not overwrite when system crashes. After the system crashes, it uses the protected system state to recover system.
Rio [8] enables the data in memory to survive operating system crashes and power outages. It uses write protections to protect files in file cache and does not accidentally overwrite the file cache while system is crashing. Therefore, the files in Rio are persistent and safe when system crashes.
2.2 NVRAM as Storage Device
Because the flash is cheap and has the characteristic of non-volatile, the more and more file systems which are designed for flash memory are proposed, such as JFFS2 [44] and Microsoft Flash [24] and so on. However, the flash memory has some limits:
firstly, flash must erase the block before writing it. Secondly, the block in flash has finite number of erase-write cycles. Therefore, the flash system usually writes data by using non-in-place update and it makes the number of erase-write cycles in each block are similar by using wear leveling technique.
In order to speed up the writing in flash system, eNVy [45] uses a small amount of battery-backed SRAM as write buffer and uses copy-on-write technique to copy corresponding data in flash into SRAM, then modifies data in SRAM. Lastly, it writes back data into flash memory when the amount of SRAM is full. Hwan Doh [11] also exploits non-volatile memory to enhance the performance of flash file system. They propose a flash-based file system that stores all metadata in NVRAM and stores all file data in flash memory. The advantages of using NRAM as a metadata store are the mount time of flash is reduce to the minimum and access all metadata is speeder than before.
MRAMFS [12] is a prototype in-memory file system to put all data/metadata in NVRAM. However, the amount of NVRAM may be not enough containing of a large number of files. Therefore, they use the compression method to reduce occupied space and use the different compression method to compress metadata and data because metadata often has the fixed format. In metadata they can save about 60%
space and save about 40%~60% space for file data.
There are some recent works in Hybrid Disk/NVRAM file system such as
HeRMES file system [31] and Conquest file system [42]. HeRMES considers that the metadata is frequently modified in the file system requests. Therefore, they suggest that use of compression techniques in order to minimize the amount of memory required for metadata and place all metadata in NVRAM to improve the performance of file system requests. Conquest assumes that the system is in the sufficient amount of NVRAM. Therefore, it stores all small files and metadata in NVRAM and disk holds only the data content of remaining large files. The advantages are that it can avoid the overhead of accessing small file and metadata because metadata and small files are placed in NVRAM and it can optimize the arrangements of large files to reduce the fragmentation in disk because there are only large files in disk.
The above works have some disadvantages. Firstly, they almost place all metadata in NVRAM but the occupied space of metadata/data is constantly increasing as users create files at all times. Secondly, although the metadata is frequently accessed in file system, it is not that all metadata are frequently accessed. Therefore, they place all metadata in NVRAM such that there is some non-recently-used metadata occupied the NVRAM space resulting in performance decreases.
2.3 NVRAM as Buffer
In addition to storage device, the general purpose of NVRAM is as the write buffer.
eNVy [45] mentioned in Section 2.2 uses a small amount of battery-backed SRAM as write buffer to improve the performance of write operations in flash. Mark Baker [1]
proposes that if they provide a NVRAM as write buffer, it can reduce disk access by about 20% on most of file systems, and by about 90% on one frequently-accessed file system.
Theodore R. Haining [19] mentions that the use of non-volatile write caches
provides two benefits: some writes will be avoided because dirty blocks will be overwritten in the cache, and physically contiguous dirty blocks can be grouped into a single I/O operation. They also present some write back strategies, such as least recently used (LRU), shortest access time first (STF) and largest segment per track (LST) to manage non-volatile write buffer and find that write buffer can reduce a large number of write requests to improve the performance of system.
Robert Y. Hou [20] exploits non-volatile memory to improve the performance of RAID5. In each write request, RAID5 needs to execute “read-modify-writes” which means that single-block writes require the old data block and old parity block to be read, modify them to generate the new parity block, and then the new data and new parity can be written to their respective locations. Read-modify-writes can reduce the performance of RAID5 arrays because it needs four disk accesses in each write request. Therefore, they use non-volatile memory as the write buffer of RAID5 to improve the performance of write operations.
Above researches are also about using write buffer to improve write operations, Alex Batsakis [3] mentions read operations may depend upon write operations because buffering dirty pages will occupy the memory for read caching. They address this problem by separately allocating memory between write buffering and read caching and by writing dirty pages to disk opportunistically before the operation system submits them for write-back. They also write back dirty pages which are almost adjacent, but they do not consider whether the dirty pages are not recently-updated.
Due to the capacity of MRAM is increasing continuously, it maybe replace DRAM as the main memory of computing system in the future. We not only use the technique of non-volatile write buffer to delay write, but also use the better write-back policy to
improve the performance of file operations.
2.4 Transaction Supporting
Traditionally, file system consistency has been maintained by using synchronous writes to restrict the proper ordering of metadata updates, but this approach degrades the performance of file system because the proceeding of metadata updates is dominated by the disk speed. Soft updates [30] eliminates the need for synchronous disk I/O. Soft updates is an implementation mechanism that enforces the dependencies of metadata updates and allows the metadata caching for write back.
Log-structured file system [39] proposed by Mendel Rosenblum treats the file system as a segmented log and always writes all modified data blocks and metadata into the end of the log. File system changes are buffered in the cache and then written into the disk sequentially in single disk IO operation. Therefore, it can improve the performance of write operation but it can not write all related metadata in single write operation since if crashes happen in the progress of disk operation, the file system remains an inconsistent state.
Journaling [35][44][47] is nowadays a widely-used technique for file system consistency. It logs metadata and data updates into a stable storage before the updates are performed on the disk. Hence, it produces the extra journaling IO traffic that is critical impact on the system performance.
Kevin M. Greenan [17] introduces two approaches to reliably storing file system structures in NVRAM. Firstly, they strengthen memory consistency by using page-level write protection and error correcting codes. Secondly, it periodically calls online consistency checker to replay all transaction logs for checking file system inconsistency. If it finds the inconsistency in file system, it immediately recovers the
state of file system. However, it needs to periodically replay all transaction logs even if the file system is normal and does not have any failures.
Henry Mashburn [40] proposes recoverable virtual memory (RVM) that is simple user-lever library to handle atomic file operation and data persistence. Firstly, it copies the range of memory which will be updated to the undo log in memory, then updates data, and lastly writes the updated data to the redo log in disk. Therefore, it needs three copy operations for each file operation.
Vista [27] proposed by David Loweel is simple user-library runs on Rio mentioned in Section 4.1. Because Rio protects the files in memory to be persistent, Vista can eliminate the redo log to speed up disk operations and it only uses undo log to make sure the file operation is atomic. However, it must be based on Rio and because it is user-level library, Vista is not user-transparent.
We propose a simple lightweight transaction support on file system operations in NVRAM environment and it only needs to add only about 40 line-codes in kernel and about 300 line-codes in implementation. It also provides the same strength of consistency as the journaling mode of Ext3.
Chapter 3 Design and Implementation
In this chapter, we describe the design and implementation of the proposed mechanisms. In Section 3.1, we first introduce the three mechanisms for improving the performance and ensuring the consistency of file systems on NVRAM based computer systems, namely Temporary-File File System (TempFFS), intelligent write-back policy, and transaction support on file system operations. In Section 3.2, we show the details of implementing and integrating the mechanisms and provide an analysis on the integration of the mechanisms.
3.1 Background
In this section, we describe the proposed NVRAM-based buffer cache management mechanisms, which include Temporary-File File System and intelligent write-back policy. Both mechanisms aim at improving the file system performance based on the non-volatility feature of main memory. Moreover, we also describe a lightweight transaction support mechanism on file system operations, which takes advantage of the non-volatility feature of main memory for ensuring the consistency and data integrity of the file system.
3.1.1 Temporary-File File System (TempFFS)
The first goal of TempFFS is to reduce the fragmentation of the underlying file systems. With numerous and concurrent file creation/deletion/appending activities, a file system is easy to become fragmented, which leads to performance degradation.
Moreover, according to the previous studies [37][38][46], many files are short-lived, meaning that they are deleted soon after their creation. Allocating disk space for these files, which involves disk IO operations for reading the file system metadata (e.g.
To reduce the file system fragmentation and the unnecessary disk IO operations, some advanced file systems such as XFS [47]and ext4 support delayed allocation, which delays the disk block allocation of a newly-created file until the data is needed to be flushed back to the disk due to memory pressure or sync operations. However, the delayed allocation feature is not shared among all file systems. Only the file systems that implement the feature can benefit from it.
Instead of integrating the delayed allocation feature into a specific file system, we implement a RAM-based file system named TempFFS in order to apply the feature simultaneously on existing file systems such as ext3 and NTFS. Based on the concept of stackable file systems, TempFFS sits between VFS (virtual files system) and file system implementations and is transparent to the latter, as shown in Figure 3.1. All new files are initially written to TempFFS and associated with their original file systems when they are created. TempFFS uses page cache as the file store, and the files are transferred into their corresponding file systems upon memory pressure or sync operations. In this way, existing file systems can benefit from delayed allocation without code modifications. Note that a file can stay for a long time in TempFFS. This raises the risk of data loss if the main memory is volatile. On systems with non-volatile main memory, however, memory data can survive power failures. The implementation of TempFFS was achieved by modify the code of an existing RAM file system (i.e., the RamFS [34]) for ease of implementation.
Figure 3.1 Architecture of the Temporary-File File System
TempFFS stores files in kernel memory, which cannot be paged out in traditional
TempFFS stores files in kernel memory, which cannot be paged out in traditional