• 沒有找到結果。

Chapter 1 Introduction

1.2 Outlines

In chapter 2, the basic concept of dram and flash reliability problems, some key features of error-correcting code for various memory-chips system are described, and the fundamentals of the existing error-correcting code circuits are reviewed briefly.

In chapter 3, the constructed methods and interleaved mechanisms of the proposed SEC-SoddEC-SBED-DED code and Multi-Bit-Layer SEC-SoddEC-SBED-DED

interleaved-codes are clarified for fast and flexible programmable issues due to the different page size of the varieties of memory-chips.

In chapter 4, this section mainly describes hardware implementation for the programmable architecture and circuit design of the proposed FEC codec. Furthermore, the proposed ECC codes have been implemented in C-language software design for any (n, k, m) parameters. In addition, many performances comparisons with the existing ECC also are listed, such as throughput rate (maximum operating transfer rate), complexity (area overhead), decoded error rate...etc.

In chapter 5, the hardware and software simulation results are described, such as encoding-decoding waveform, read-write data flow of the proposed FEC-Codec, decoded error-rate...etc.

In chapter 6, a summary to our error correcting codec is given in this section.

Chapter 2

Basic Concepts for Memory Reliability Issues and the Existing ECC Codes

2.1 The DRAM and Flash memory reliability issues

Firstly, we introduce the common reliability problems to both dram and flash memory.

The common reliability problems on flash memory have generally two types of errors [4]:

(1) After write and erase cycles, stored electrons can leak away from the floating gate through tunnel oxide during aging. The charge loss causes a decrease in the memory transistor

threshold voltage, which may result in random 0 to1 errors.

(2) During read operation, the floating gate slowly gains electrons with the control gate held at Vcc. The charge gain causes an increase in the memory transistor threshold voltage, which may result in random 1 to 0 errors. The above (1) and (2) reliability problems are shown in Fig. 2.1.

Another reliability problems for dram memories mainly have also two types of errors [5]:

(1) One is called the memory cell error-upset that the associated cell or node capacitance in deep submicron process is scale-down, hence the capacitor is highly susceptible to being

discharge by noise electrons.

(2) Another is called bit-line error-upset that the sensing margin of sense amplifier is a very small signals, thus the bit-line differential voltage may degrade due to noise-couple, and hence the resulting read operation may be erroneous.

The foregoing flash and dram reliability will become a significant concern in deep sub-micron MLC (multi-level-cell, a 2q level cell has q bits storage unit) technology. a bi-level single memory cell must distinguish between two voltage states, whereas a multiple-bit MLC-cell uses a voltage window with similar structure size, the distance between adjacent bit-to-bit threshold voltage levels in MLC is much smaller than traditional binary-level memory, which makes the reliability problems of MLC-memories more critical than conventional bi-level cell (BLC) memories [3], [8], as shown in Fig. 2.2 (a), (b).

The most of foregoing reliability issues are caused mainly by soft error due to alpha

particles and soft errors are defined widely such as transient errors, power-supply noise spikes, thermal effects, and man-made states. These errors are called soft, because they do not

damage the physical functions of a cell permanently, and they can easily corrected by complementing the data in the faulty cells [2], [5]. In a DRAM chip more than 98% of single-bit failures are radiation induced soft-errors [20]-[21], and In NAND-flash memory Over 99% of failures are attributed to single-bit soft errors [22]. Because dram storage unit is a trench or stacked capacitor and flash storage unit is by using floating-gate, which is a solid-state memory so the influence of the alpha particle induced soft error rate on dram memory is more significant than flash memory. About DRAM and flash memory reliability testing results are shown in the papers [23]-[27], we can know the average FIT (Failure in Time) and Bit-Error-Rate (BER) under different process, chip-size or different conditions.

The soft error rate of different memory-chips is listed as follows [23]-[27], where 1-FIT = 1 failure per billion device-hours.

Type BLC

NOR-Flash

MLC NOR-Flash

BLC NAND-Flash

MLC NAND-Flash

BLC DRAM

# bits 16M/64M 64M 256M 256M ~512M

Process 0.23/0.17um 0.23um 0.16um 0.16um 0.25~0.13um

FITs/Mbit

(Sea Level) 6E-9/3E-6 3E-7 1E-3 (read) 1E-4 (program)

1.0 (read)

1E-2 (program) 500~1000 FITs/Mbit

(aircraft) 2E-6/1E-5 1E-4 - - -

As a consequence of these issues, the use of error correcting code techniques can help to reach adequate reliability of the deep sub-micron process, high-capacity, MLC-memories for immunity to soft-errors.

2.2 A discussion on the existing ECC codes

Many ECC schemes have been widely proposed to enhance the reliability of

dynamic-RAM, NAND-type flash and solid-state disk [1]-[9], [15]-[19]. In these [1]-[19]

papers, it was understood that applying ECC to a memory control system requires a moderate balance between performance (access time penalty, operation frequency, throughput rate, encoding-decoding cycle count, error correcting ability, other features such as interleaving function, etc), chip-size overhead (circuitry complexity and parity check-bit overhead), and reliability enhancement (low decoded error rate or error probability, high detected error rate, soft-error-rate or yield improvement, reducing mean time to failure). Based on the above reasons, the proposed error-correcting code circuit must satisfy the following conditions for the most of various memory chips.

1) For a reliability issue of memory-chips in page-oriented memory-system application, because the memory chips usually can’t have built-in error correcting code circuits

(ECC-circuits) due to the limitations of access time penalty and an additional area cost of ECC-circuits, i.e. the non-ECC commodity memory chips have NAND-type flash and specific mobile-DRAM. So the external memory control systems need a system/board-level ECC to ensure the validity of received data of the page/sector-oriented memories. In

general, memory reliability depends on the both error correcting-detecting capability and the soft-error rate or failure rate of memory-chips.

2) For a high throughput data rate, the memory control systems need a high-speed FEC Codec hardware to minimize access latency and maximum operating clock speed. In order to demands of execute in place, the error-correcting code circuit can correct any error-bit of reading random address immediately after serial download program-code procedure from external memory as shown in Fig. 2.3. In other words, after the received n-bytes serial program-code data, the ECC circuit must be to look for the error-address and error-value instantly so that the CPU can execute the program-code right now for real-time application requirements. In general, a high-speed page access time is about 5ns~15ns based on

DDR/SDR SDRAM memory, 10ns~70ns for NOR-Flash, 50ns for NAND-memory.

3) For low-cost considerations as Fig. 1.1. We need a compact, flexible FEC Codec to minimize the ECC Codec complexity, parity check-bits overhead, and furthermore programmable code-length feature that applied to the different page sizes demands of various memory-chips, i.e. a page or sector in a single memory-chip is organized as m-bits data width (m-bit is one byte length) and an n-bytes data-length, where a page or sector has the number of bits. Programmable (n, k, m) parameters are necessary so that the users of memory-chips can define an arbitrary data length with ECC parity check-bits.

m n×

In general, memory-chip data-wide m is a multiple of 4, such as 4, 8, 16, 32 bits, but some special memories have a specific data-wide. A page length usually is a multiple of 8, such as 8, 16, 32, 64, 128 Bytes, and so on. Furthermore, a page size of NAND-flash is 528, or 2112-Bytes.

Among [1]-[19] literatures, we try to compare these error control code for finding the optimal coding style and to investigate the range of the page-sizes, data-width and an

acceptable Bit-Error-Rate to correspond to the transition error probability of both DRAM and FLASH memories in practical conditions, so that our proposed FEC-Codec has low-cost, low-complexity and high-speed to provide a good performance and moderate reliability meet with the foregoing 1 to 3 ideas. Basically, the existing ECC generation methods have still some restrictions to the programmable coding length and width, and we propose

ECC-generation methods that have almost no restrictions to coding length and width. Here we analyze the existing ECC codes in order to apply for the programmable (n, k, m), where n=code-length, k=data, message or information length, m=data I/O wide or a byte/symbol size in bit. We known parity check-bit length

r

=(

n

k

), then the number of parity-check

bits

R

=(

n

k

m

, the number of information-bitsK =k×m, and the total number of coding-bits are N =K +R=n×m(bits). N is user-defined memory block-size with both parity check-bits and information bits equal to a page memory-space, and k >> m in general memory-chips applications.

SEC/SEC-DED Hamming-codes or odd-weight column modified-Hamming-codes were presents in [4], [8], [15], [16] that they are suitable for on-chip, fixed code-length ECC design.

It has a proper number of parity-bits

R

=

log2

k

+2, and R+K =N for SEC-DED and suits to serial data-bit coding by using a Hamming cyclic-code. It is hard and complex to design a variable code-length n and data-wide m in the modular decoding-circuit unless a multi-SEC-DED code using multiple decoding circuits can solve this problem.

However the cost overhead will be obviously increased, i.e. the larger parity check-bits is about

R

=(

log2

k

+2)×

m

.

Another traditional SEC-DED codes are bidirectional cross-parity/product codes that the type of code have been present in [1], [19] for on-chip ECC applications. Though it is suitable for programmable n and m parameters due to a simple encoding-decoding circuit, it also has a large number of parity check bits in proportion to k and m, such asR=k+m+1.

Some DEC-TED codes and TEC-QED codes are presents in [2], [5] and [7] respectively.

They have a good correcting capability, and programmable (n, k, m) circuits are feasible, but that’s necessary to pay a largest number of parity check-bits, i.e. the TEC-QED research [7]

was designed by combining odd-weight-column SEC-DED with vertical parity bit technique for a memory array, i.e. all word-lines of memory-array are along column direction, bit-lines of array are along row direction. Each column employs odd-weight-column SEC-DED codes, and each row employs a parity bits. So we get a parity check bitsR=m×(log2k+2)+k, or . The DEC-TED researches [2], [5] were designed using orthogonal Latin-square code which belong a majority-logic decoding code. For a square arrangement of the m

m m

k

R= ×(log2 +2)+

2 data array, it has also a large parity check-bitR= m3 +1.

The Reed-Solomon code (RS-code), or BCH-code have a powerful multiple bytes error correcting and detecting capability, and a small number of check-bits for single/double byte correction, but have a complex decoding hardware and a longer decoding time. The RS-code defined in for programmable (n, k, m) are feasible but have some coding limitations by , , t = the number of error-correcting bytes. In these RS-code researches [9]-[14], the versatility of RS-decoders could be achieved by changing only the information length k with the block length n and symbol size m fixed [10], [11], [14], in order to change the error correcting capability t. Another type is to fix symbol-size m in order to the both n and k are variable [12]-[13]. These architectures pay a largest area cost for decoding circuit, and clock time and decoding latency is also bigger.

)

A class of multiple bits error correcting and detecting code were presented in [28]-[31] that these codes are based on Fujiwara codes, i.e. An odd-weight-column-matrix code over GF(2b) is an SbEC-DbED code, where b denotes the number of bits in a byte and equal to m. the kinds of code may have arbitrary code and byte length, and the researches have a proper number of parity check-bits as follows. (3 log ( 2))

based-on Fujiwara codes for programmable (n, k, m) are feasible but still pay a complex decoding circuit though these codes have a proper error correcting capability, such as [30] has random double bit within a block error correction- single byte error detecting capability, [29]

has single bit error correction-double bit error detection and fixed b-bit byte error detection capability, [28] has t-bit error correction within a single b-bit byte and single b-bit byte error detection capability, where t=3 and b=8, but they still have a poor error detection capability of random double/triple bit failures.

Chapter 3

The Proposed ECC Codes Constructed Methods and Interleaved Mechanisms

3.1 Constructed method of the proposed SEC-S

odd

EC-SBED-DED ECC code

We propose a systematic error-correcting code by modified bidirectional cross-parity code which called SEC-SoddEC-SBED-DED codes to have the kinds of capabilities, such as random Single bit Error Correction-Single odd-bit Error Correction within a single byte-

Single Byte Error Detection-random Double bits Error Detection. Traditional bidirectional cross-parity/product SEC-DED codes need parity check-bits r=k+m+1, that we utilize hierarchical structure to reduce the number of parity check bit. The proposed (n, k, m) systematic code is constructed as shown Fig. 3.1, and the following steps are performed:

Step1: To define an encoded data page or sector size asm×kbits for m-bit data wide and the information length k, where0≤im−1, and0≤

j

k

−1.

Step2: Each i-column is to perform a vertical-direction parity-bit for all bits.

This will result in generating m-bit column-parity-bits as the expression:

where addition is equal to XOR logic-operation and b

1

coordinate of one bit position.

Step3: Each row is to perform horizontal-direction parity bits using a hierarchical method for these m-bit bytes of k rows. This will result in generating (2×

log2

k

) row

parity-bits. These parity check-bit generating expression is as follows.

for ,i.e. (j mod 2)=1, and for

If j=0, 1, 2….k-1, let is the number of a pair of row parity-bits, data page size of bits, hence the bits number of code-length n = k (the number of data-bits) + r (the number of parity check-bits) =

log

) and then the encoded r parity check-bytes by step1~step3 continue to write in

memory after k data-bytes, in order to finish the proposed systematic error-correcting-code. Hence a serial access operation, it just

needsn=k+rclock-cycle counts.

In a read operation, firstly we read n data-bytes from external-memory in sequence, and then

the encoded r parity check-bytes by step1~step3 will gain new column parity check-bits for and new row parity check pair-bits for

C

i

1

0≤im− (RX,R'X ) 1

x

log2

k

after the received k data-bytes. The decoding process is as following steps.

Step5: Syndrome generation methods are expressed as follows.

During the 0~(k-1) cycles for reading information bytes, and this k-th cycle for reading old column parity check-byte, and (k+1)~(n-1) cycles for reading old row parity check-bytes, so that we generate column syndrome bits at k-th cycle, and row syndrome bits during (k+1)~(n-1) cycles as the following expression:

Let

b

ikindicates one bit position of old column parity check-byte on the read k-th byte for0≤im−1. Here addition is equal to an XOR logic-operation.

Column syndrome-bits:

S

col(

i

)=

b

ik +

C

i

Let

b

ijindicates one bit position of old row parity check-byte for(

k

+1)≤

j

≤(

n

−1), 1

0≤im− , and1≤

x

log2

k

.

Row syndrome-bits: X , for i= odd integer.

m

Step6: Error correcting and detecting methods are analyzed as follows.

(a) No error: all

S

col(

i

)=

S

row(

x

)=

S

'row(

x

)=0, for0≤im−1, and1≤

x

log2

k

. Another type of error is that it has a single bit error falling in the ECC-area, and we

assume that it’s no error occurring in information-area when the three results has only 1 bit equal to logic-1.

)

(b) SEC-SoddEC: there are odd-bit errors occurring on a single-byte that these error-bits

can be corrected. When the both , and for

are existence, where indicate error-bits position as an error value and indicate error-address as an error location. We know the error value and error location that we can invert the error-bit data in order to correct it when the error address is read.

for0

x

log2

k

, indicate that a single-byte error at least are detected.

In the other words, there are some even-bits errors occurring in a single-byte or multiple-bytes.

(d) DED: The type of error is assumed that it has the case of double errors or larger than double errors. If (a), (b), (c) are inexistence, then any

S

col(

i

)≠0,

S

row(

x

)≠0 or

0 ) (

'

x

S

row indicates at least a double error existence.

The above (a), (b) both may be correctable, and the both (c) and (d) may be detectable and not to be correctable.

The foregoing error correcting code generation methods are very suitable for software or hardware implementation, programmable (n, k, m) ECC especially. By the above constructed method, we present a low-complexity and high-speed hardware in chapter 4.

Fig. 3.2 shows four capabilities of the proposed code for the16-bits memory page-size of information length k=4, and data wide m=4. The “X” denotes one fail-bit position.

Fig. 3.3 shows logical-scheme of the SEC-SoddEC-SBED-DED code for the16-bits memory page-size of information length k=4, and data wide m=4. We can generate the parity

check-bits by the foregoing step1~3.

3.2 Constructed methods of Multi Bit-Layer SEC-S

odd

EC-SBED-DED ECC code

Most interleaved techniques mainly can be used to solve the burst errors problems [32], and error patterns involving two or more adjacent cells are generally recovered by a proper

physical interleaving of cells belong to the same codeword, thereby increasing overall memory reliability [18]. Some multi-SEC/DED codes are interleaved for each word-line of on-chip ECC or each data I/O of off-chip ECC are presented in [3], [4], [7], [16], [18]. The papers [3], [18] present an on-chip ECC scheme for MLC-flash memories, based on a binary code providing single-bit correction, are organized in different bit-layer. The paper [4] is a (522,512) SEC hamming cyclic code for each data I/O, that this multi-ECC (n, k) codes are optimized in consideration of balance between the reliability improvement and

redundant-cells area overhead, but its weakness is that has a fixed 2n decoding latency even if the data is no error. The paper [7] is a TEC-QED ECC code which was designed by

combining odd-weight-column SEC-DED hamming-code with the vertical parity bit

techniques, but it has a large redundant cell overhead as parity-check bit . The paper [16] is a multi SEC-DED (n, k) Hamming code that a k-bit information data was

k k m

R= ×(log2 +2)+

split up into two SEC-DED hamming codeword so that it able to correct a two-bit error in two-bits-per-cell MLC-DRAM. The foregoing paper [3], [7],[16], [18] are only suitable to specific on-chip ECC coding way, and in practical, we need a compact, flexible and quick interleaved coding method in order to reach programmable coding and real-time mapping-out operation. So we propose an interleaved method which is called Multi-Bit-Layer

SEC-SoddEC-SBED-DED code. The principle of the proposed interleaved-code is to encode a -bits block-data, and generate respective SEC-S

)

(

k

×

m

(

n

l,

k

l,

m

l) oddEC-SBED-DED code on

each data I/O so m-bit data I/O perform m-number of(

n

l,

k

l,

m

l)SEC-SoddEC-SBED-DED code which is called(

n

l,

k

l,

m

l,

m

)Multi-Bit-Layer SEC-SoddEC-SBED-DED code.

For a programmable(

n

l,

k

l,

m

l,

m

)Multi-Bit-Layer SEC-SoddEC-SBED-DED code, each data I/O code-length has nl-bytes, the user-defined coding-data wideon each data-I/O has

m

l-bit, the number of data I/O wide is m-bit, and the encoded information length on each data-I/O has also kl-bytes correspond to

When m=1 (only one data I/O), and k is equal to 64-bit, 512bit, 4096-bit respectively, the dependence of user-defined coding-data wide ml and the number of parity check bit R is shown in Fig. 3.4.

For a Multi-Bit-Layer SEC-SoddEC-SBED-DED code, if the number of data I/O wide is m, the

For a Multi-Bit-Layer SEC-SoddEC-SBED-DED code, if the number of data I/O wide is m, the

相關文件