Chapter 1 Introduction
1.1 Motivation
NAND flash memory was invented in 1980s by Dr. Fujio Masuoka. Due to its high cell density and relatively simple array architecture, NAND flash devices have found a strong foothold for solid-state mass storage [1]. In 2005, Toshiba and SanDisk developed a NAND flash chip capable of storing 1GB of data. Currently NAND flash based storage devices are widely used in applications such as removable storage, embedded storage and consumer electronic devices like memory cards, multimedia players or mobile phones [1], [2].
As integrated circuit techniques grow deeper with process miniaturization, the cell size of the flash memory has been scaled down 40%-50% per year [3], [4], and memory capacity also doubles every year [5]. The array organization of the NAND flash device is also enhanced from a single-plane, single-die array into a multi-plane, multi-die array whereas the instruction setup time and I/O transfer time remain almost the same while the
technology advances. In order to improve the performance of the NAND flash device, the NAND flash controller fully utilize those advanced features that the current NAND flash array provides.
When sequential data access occurs at chips using a conventional flash memory controller, only one page of data is accessed at a time, other commands in queue are waiting while the non-executing chips and dies remain idle, which indicates that the controller has poor parallelism [6]. For a conventional flash controller, the data throughput is completely determined by I/O frequency and command setup time. The performance of Micron MT29F32G08Q is 26MB/s for read and 4MB/s for program [7].
1.2 Overview of the proposed NAND flash controller
We propose a performance enhancing flash controller which has the advantage of simple hardware implementation by using the three methods as follows:
Parallel data access between chips and dies
Two-plane addressing
Out-of-order execution
By these techniques, we can maximize the I/O data transfer rate and also decrease the command execution time of a single access to achieve a better performance.
The experimental results show that the proposed flash controller with 2 planes and 2 dies per chip can outperform the basic flash controller in all kinds of data accesses.
1.3 Thesis organization
This thesis introduces the high performance flash controller design, and is organized as follows. Chapter 2 explains the physical properties of NAND flash. Chapter 3 introduces the basic operations and the disadvantages of traditional flash controller. In Chapter 4 we present our high performance flash controller architecture. Chapter 5 shows the experimental results. Finally, Chapter 6 gives conclusions of this thesis.
Chapter 2
Physical Properties of NAND Flash
In this chapter, we will introduce the physical properties of NAND flash cell and its differences from traditional storage devices such as hard drives. Also we will introduce the overall array architecture of a complete NAND flash device.
2.1 Cell organization
Figure 2.1 shows the transistor configuration of a single flash cell [2]. By adding the floating gate into a traditional MOSFET, flash cell uses Flowler-Nordheim (FN) tunneling to drive stored electrons to memorize data after power off. Such Non-Volatile-Memory (NVM) characteristic and small cell area made flash be the popular choice as data storage.
Figure 2.1: Layout of a single flash memory cell
The cell array diagram of NAND and NOR memories are shown as Figure 2.2. Since NAND array shares both source lines and bit lines, its area per unit cell is only 40% of NOR array. Every word line of the NAND array represents a page entry, and the number
of word line which shares the same source line and bit line stands for the number of pages in a block. Block is the minimum unit of a NAND memory.
NAND NOR
Figure 2.2: Array diagram of NAND and NOR memories
Though NAND array shares source and bit lines to achieve a higher density and lower cost, it also degrades the ability of random access. On the other hand, NOR flash memories will be a better choice for random access applications such as the replacement of EEPROM and uses as code memories. Other comparison of NAND and NOR memories are shown in Table 2.1. Since our target application is for large file storage, we choose NAND flash as the target for our controller architecture design.
Table 2.1: Comparisons between NAND and NOR flash
Flash Memory Type NAND NOR
Cell Size 4F2 10F2
Byte Write Difficult Possible
Random Access Read 25μs(first byte) ~0.12μs
Sustained Read 23MB/s 20.5MB/s
Random Program ~300μs/2112bytes 180μs/32byte
Sustained Read 5MB/s 0.178MB/s
Erase Time(Typical) 2ms/128KB 750ms/128KB
Applications
♦ File
♦ Video
♦ Large sequential data
♦ Replace EEPROM
♦ Execute directly from NVM
2.2 SLC VS. MLC
The FN tunneling program process changes the threshold voltage (Vt) of NAND cell.
For a Single-Level Cell (SLC), there will be only one reference point and allows 1-bit operations per memory cell. On the other hand, the states stored in Multi-Level Cell (MLC) is determined by the number of Vt reference points, for instance, the most popular MLC cell has 3 reference points and stores 2-bit data per cell.
Since SLC has less reference point, it has higher performance and reliability. The endurance of SLC is ten times longer than MLC chips, and SLC chips also offer a shorter array operation time.
Generally, SLC chips require only 1 bit error correction code (ECC) per 512 bytes whereas MLC chips require 4 or more bits for ECC [2]. And SLC chips can operate at a lower voltage to achieve the same data integrity as MLC chips, which will be a better
choice to use in mobile applications. MLC chips gain advantages of its low manufacturing cost, capacity of a single chip doubles because of the number of bit stored in an MLC cell is twice as an SLC cell, thus it’s mainly used in consumer memory card, media player and applications where performance and reliability are not that important.
Some applications also combined SLC and MLC chips together to obtain a better mix.
2.3 Operation principles
All array operations of NAND flash memories are similar to volatile memories; the address of the desired access block is decoded by mux selection signals. After that, all word lines are charged into different voltage levels to perform read/program operations.
While reading, all unselected word lines are charged to 5V with the selected word line uncharged. If read current flows from bit line to source, it means data read out from cell is a “1”. Figure 2.3 shows the read voltage of an 8-page NAND memory block.
When NAND cell is being programmed, the selected word line is charged to 20V and all other cells shares the same bit line is charged to 10V, and then discharge bit line to 0V to perform a “0” programming. To erase the data in a block, the voltage of all word lines are discharged to 0V, and a 20V voltage is added on the source line to erase all cell to “1”
therefore all pages that share the same source line must be erased at the same time. Figure 2.4 shows the array configuration while erasing.
Figure 2.4: Voltage applied while erasing
2.4 Basic NAND flash array
The array architecture of a basic NAND flash chip contains three functional blocks, which are I/O control, control logic and the NAND flash array.
NAND flash array includes all NAND flash cells and all column and row address decode multiplexers, the data to be programmed or the data read out from NAND flash are stored in data register and cache register. The traditional NAND flash cell array is organized as single plane, single die, so it only supports command which access one page at a time. NAND flash array uses two pages of register to maintain the ability to have a better performance of sequential access. I/O control logic controls the I/O signal of access,
I/O control relies on the status register which controlled by control logic to determine current I/O signal is a command, address or simply data to be transferred. The control logic receives input control signal to determine which action should be taken for I/O control and NAND flash array by the combinations of the input signals. Figure 2.5 shows the inner configuration of a NAND flash chip [7].
Figure 2.5: The NAND flash functional block diagram
NAND flash does not contain dedicated address pins, therefore address are loaded using a five-cycle sequence. The detailed memory mapping and addressing are shown in Figure 2.6. The lower 13-bit are used to represent the column address of NAND flash
Figure 2.6: Memory mapping of a NAND flash chip
2.5 Advanced NAND flash array
The capacity of NAND flash device increases significantly as integrated circuit technique advances; the array organization of the NAND flash device also changes due to this capacity improvement. The inner architecture of an advanced NAND flash device is similar to a basic one but the NAND flash array is evolved to a multi-plane, multi-die one.
A multi-plane array packs more NAND cells on a same row entry, and separates it into different groups. When the row entry is selected, the data on this row can be accessed at the same time, so more than one page of data can be read/programmed with one flash array execution busy time, so the total access time can be saved. Figure 2.7 compares the basic NAND flash command with the advanced two-plane command. Multi-plane command must meet the address restriction: the command of two accesses must be the same, the page address of both commands must be identical and the least significant bit of the block address must be different [7], [8].
Figure 2.7: Comparison between basic command and two-plane command
Multi-die array packs certain set of arrays together, each set of NAND flash array is like a basic one, but they share the same set of I/O signals. Basic commands like read, program and erase can be performed in parallel when those dies are available. Multi-die control can reduce the NAND flash array setup time by parallel access. Figure 2.8 shows the difference between basic NAND flash device and multi-die device [6].
Figure 2.8: Difference between basic command and two-die command
Chapter 3
Functionalities of a Basic NAND Flash Controller
This chapter describes how a basic NAND flash controller works. Hardware architecture shows how NAND flash chips and controller are connected and the interface with higher level host.
3.1 System architecture of NAND flash chip
A complete NAND flash system may include several NAND flash chips, a NAND flash controller, an error-correction unit, and bus connections, and so on. Figure 3.1 demonstrates an example of solid state disk (SSD) system architecture [3].
Figure 3.1: The SSD system architecture
The NAND flash controller receives command from host, and translates it into control signals accessing each NAND flash chip. Data reads out from NAND flash are sent to ECC unit to make sure the correctness of the desired information. Physical address translate block translates the virtual address from file system into the physical address that is being accessed [9]. Bad block management unit determines which block should not be accessed and how to deal with when the translated physical address is selected as a damaged block. Wear-leveling module dynamically changes the block to be programmed to make sure the write cycles of all good blocks are about the same.
3.2 The ONFi standard
Since the interface, addressing, device behavior of the NAND flash chips may differ by its vendor. The Open NAND Flash Interface (ONFi) tries to standardize all these specification to shorten the NAND flash system developing time [10]. The main philosophy of ONFi standard is described as below:
Ensure no pre-association with NAND flash at host design is required. All NAND flash parameters must self-describe through a parameter page; feature that cannot be self-describe shall be host discoverable, i.e. the number of CE#.
ONFi should leverage existing NAND flash behavior to extent possible.
3.3 Basic command management
In order to control the NAND flash chip, four basic operations are required. Although the NAND flash array also supports some advanced command set like one time programmable (OTP) commands to perform data protection, these commands are seldom used and will not be introduce at this section.
Basic NAND flash commands include reset, read, program and erase. As mentioned in Section 2.3, NAND flash memory has to erase before a new data programmed, normal disk write command is divided into program and erase. Also by the architecture of the NAND flash array, more than one page is being erased while the minimum programming unit is one page. Therefore, sector based access will be a better choice for NAND flash file system.
Reset command is used to put the memory device into a known condition and to abort a command sequence in progress. Such command is used for simulation since the initial state of the NAND flash chip is unknown. The reset command clears all bytes in the NAND flash chip to 8’hff. Since all bytes are reset to an unprogrammed state, no other access address is needed. Controller sends command 8’hff and wait tRST for reset finished.
The ERASE command is used when another set of data needs to be programmed into a programmed page, old data must be erased before new data written. The erase command occurs at block level, and is operated at one block at a time. Since erasing is block aligned, only three cycles of address are required for a block erase operation, if a five-cycle address is transferred; only the last three cycles will be sent to the NAND flash chip. The actual command sequence is a two-step process. The ERASE SETUP (60h) command is first written to the command register. Then three cycles of address are written to the
device. Next, the ERASE CONFIRM (D0h) command is written to the command register.
At the rising edge of WE#, R/B# goes low and the control logic automatically controls the timing of ERASE operation, R/B# stays low for tBERS for the entire erase time. Figure 3.4 shows the timing diagram of an ERASE operation [7].
Figure 3.2: Timing diagram of ERASE operation
The PROGRAM PAGE operation requires loading the SERIAL DATA INPUT (80h) command into the command register, followed by five address cycles, then the data.
Serial data is loaded on consecutive WE# cycles starting at the given address. The PROGRAM (10h) command is written after input data loading is complete. The control logic automatically executes the proper algorithm and controls the entire timing program and verifies the operation. After tPROG, R/B# goes back to high and finishes the program
monitor the R/B# signal. After tR, the READ command can be re-issued, pulse RE# and the data will output starting from the initial column address. Figure 3.5 shows the timing diagram of a READ operation [7].
Figure 3.3: The READ timing diagram
3.4 Basic NAND flash interface controller
The basic interface controller of a basic NAND flash device transfers an input command into corresponding control signal that controls the desired NAND flash chip.
Each time one command is accessed since the basic NAND flash device can only access one page of data at a time. Under this control architecture, no command queues are needed and also I/O can share the same data buffer because data input/output won’t happen at the same time. The finite state machine of this interface controller is designed to suspend while NAND flash chip is executing; it simply ignores all requirements while execution.
Chapter 4
Proposed NAND Flash Controller
The proposed NAND flash controller architecture is introduced in this chapter. In order to achieve a higher operation speed, our proposed controller uses out-of-order (OOO) execution while commands in queues are independent. Also we use two-plane addressing to shorten the busy time of sequential data access. Controller uses OOO execution to enhance the ability of parallel access, meanwhile, in-order-commitment is used to simplify the protocol between host and controller.
4.1 Concept
The maximum performance that a NAND flash device can reach happens when the data transfer at a maximum rate with no chip busy waiting time needed. The minimum data transfer required cycle time of the NAND flash chip we use is 25ns, thus the maximum performance will be 40MB/s. In the following section, we introduce several ways to let the execution performance closer to the performance bound.
From the block diagram of the basic controller, all chips are connected to one interface control module, and only one command is accessed at a time. Therefore, only one small data buffer is needed while execution. The proposed controller use the sequencer to schedule parallel access cross chips, and the separate interface controller used two-plane command and distributed finite state machine is designed to improve the performance within a single chip; the I/O buffer of the controller also is separated into two sets to transfer input and output data at the same time. The proposed controller block diagram is shown at Figure 4.1.
Figure 4.1: The block diagram of advanced NAND flash device
The sequencer is designed to enhance the NAND flash device performance is to increase the probability of parallel access. For a basic NAND flash controller, only one interface controller is used, and control signal are set to each NAND flash chip through a de-multiplexer. Therefore, only one NAND flash chip is being accessed while other chips remain idle. For an advanced controller, a sequencer module is added to sense the probability of parallel access, and these commands are pre-loaded to other NAND flash chips to shorten the busy waiting time. Also, each NAND flash chip has its own interface controller to obtain the ability of parallel command controlling.
The interface controller extends its command set to shorten the average execution time of each access. A basic NAND flash controller uses only simple page read, program and erase commands. By the principal of locality, the data to be accessed are usually grouped together. The proposed NAND flash controller uses two-plane addressing and adds two-plane command set which can access two pages of data at a time. Thus the average data access time per page is cut down to half and the performance of the controller increases.
4.2 Out-of-order execution
the same die must still access in-order. Only the command that access different dies may access in parallel when there is no data hazard.
For further hazard prevention, we discuss the four different scenarios that commands might appear in queues separately:
All commands in queues are all read type instructions.
All commands in queues are all write type instructions.
Write type commands follow by read.
Read type commands followed by write.
When all commands in queues are all read type commands, no data hazard will occur since there is no read after read (RAR) hazard. The execution order of read commands won’t affect the result of each data reads out from NAND flash. Therefore the read command of an idle chip can be sent first to shorten the data waiting time by increasing the number of parallel accesses among chips.
If the commands in queues are write-type, the data in NAND flash device may change after each write, so we must ensure that the data written to NAND flash chip is in the correct order. In order to avoid all data hazards, first we need to know the data
If the commands in queues are write-type, the data in NAND flash device may change after each write, so we must ensure that the data written to NAND flash chip is in the correct order. In order to avoid all data hazards, first we need to know the data