適用於快閃記憶體之二位元軟輸入(9153,8256) 低密度奇偶校驗碼解碼器之設計與實作

(1)

國立交通大學

電子工程學系電子研究所碩士班

碩士論文

適用於快閃記憶體之二位元軟輸入(9153,8256)

低密度奇偶校驗碼解碼器之設計與實作

Design and Implementation of a (9153,8256) LDPC Decoder

with 2-bit Soft Input for NAND Flash Memory

學生：何堅柱

(2)

適用於快閃記憶體之二位元軟輸入(9153,8256)

低密度奇偶校驗碼解碼器之設計與實作

Design and Implementation of a (9153,8256) LDPC Decoder

with 2-bit Soft Input for NAND Flash Memory

研究生：何堅柱 Student : Kin-Chu Ho

指導教授：張錫嘉博士 Advisor : Dr. Hsie-Chia Chang

國立交通大學

電子工程學系電子研究所碩士班

碩士論文

A Thesis

Submitted to Department of Electronics Engineering & Institute Electronics College of Electrical and Computer Engineering

National Chiao Tung University In Partial Fulfillment of the Requirements

for the Degree of Master of Science

in

Electronics Engineering August 2010

(3)

適用於快閃記憶體之二位元軟輸入(9153,8256)

低密度奇偶校驗碼解碼器之設計與實作

學生：何堅柱指導教授：張錫嘉博士

國立交通大學

電子工程學系電子研究所碩士班

摘要

BCH碼因為硬體架構非常簡單，目前是應用在快閃記憶體系統上錯誤更正碼的主流。面對先進製程的發展與記憶體儲存容量的大幅提升所造成可靠度的降低，以代數解碼演算法為主的BCH碼只能不斷增加校驗碼的數量來提升解碼效能，如此一來也間接地減少資料所能儲存的空間。據此，本論文提出適用於快閃記憶體系統的低密度奇偶校驗碼（Low Density Parity Check, 簡稱LDPC Codes）及其解碼器架構，以二位元軟輸入之LDPC Codes提供在相同編碼率下比BCH碼更好的錯誤更正能力。

由於下世代快閃記憶體的儲存頁碼大小為1024Bytes，我們使用permutation matrix 演算法建出編碼率為 0.9 的 (9153,8256) LDPC Codes ，並利用 variable-node-centric sequential scheduling (簡稱VSS)來降低檢查節點運算元之電

(4)

Design and Implementation of a (9153,8256) LDPC Decoder

with 2-bit Soft Input for NAND Flash Memory

Student : Kin-Chu Ho Advisor : Hsie-Chia Chang

Department of Electronics Engineering

Institute of Electronics

National Chiao Tung University

Abstract

This thesis proposes a LDPC decoder architecture for NAND flash memory system.BCH code is famous for NAND flash memory system because of its simple hardware architecture. However, advanced technology scale down and more bits of data stored per NAND Flash cell will cause the degradation of reliability. More parity bits are required to improve the correcting capability of BCH code. But this greatly degrades the storage capacity and is infeasible to commercial products. Soft input is required to improve the correcting capability of error correcting code. However, BCH code has only little improvement when soft input is provided. This thesis proposes a 2-bits soft input LDPC decoder, which can outperform BCH code under same code rate.

The (9153, 8256) LDPC code is constructed by permutation matrix algorithm with code rate 0.9. The variable-node-centric sequential scheduling (VSS) architecture is adopted and CNU is modified to reduce hardware complexity. Compared to the conventional Min-Sum two-stage pipelined architecture, the proposed architecture can reduce approximately 96% combination circuits of VNU and 76.8% registers. Using 90nm CMOS technology, the maximum throughput can achieve 2.78 Gbps under operating frequency of 100 Mhz with 10 iterations.

(5)

誌謝

不知不覺兩年的碩士生活就要結束了，要感謝很多人對我的照顧與幫忙。首先最要感謝是我的父母，很感激他們對我的支持。大學加碩士這六年，我都沒有辦法長時間陪伴在他們身邊，只有寒暑假才可以短暫回家探望他們。但他們還是沒有抱怨，支持我去做我想做的事。我也要感謝我的指導教授張錫嘉老師，除了在學術研究上的指導外，也很關心我的生活狀況，很感謝他對我的包容。還有就是 LDPC GROUP 的陳志龍學長和嚴紹維學長，除了細心指導研究以外，還常常帶我去體驗新竹美食，對我非常照顧。最後要感謝 OCEAN 與 OASIS 的每一位伙伴。一起在研究上共同奮鬥，一起聊天吃飯，慢慢培養了大家的感情。天下無不散之筵席，不少伙伴也在今年要離開 OCEAN 這個大家庭。雖然有點不捨，但也衷心祝福大家前程似錦。其實心裡還有很多人想要感謝，但篇幅有限。最後讓我再次感謝每一位，謝謝你們的照顧與幫忙，謝謝您們!!

(6)

List of Figures

2.1 The Block Diagram of Flash Memory System. . . 4

2.2 NAND Flash Cell Programming [1]. . . 4

2.3 Threshold voltage distribution of a Signle Level Cell of NAND Flash Mem-ory [1]. . . 5

2.4 NAND Flash Cell Erasing [1]. . . 5

2.5 NAND Flash Cell Reading [1]. . . 6

2.6 Program Disturb. . . 7

2.7 Read Disturb. . . 7

2.8 Threshold voltage distribution of a 2bits/cell NAND flash cell. . . 8

2.9 Threshold voltage distribution of a 2bits/cell NAND flash cell. . . 9

3.1 Illustratin of standard BP. . . 12

3.2 Illusion of VSS. . . 14

3.3 An example of a tanner graph with cycle-6. . . 15

3.4 Performance of LDPC code with different column degree. . . 16

3.7 An example of QC LDPC code, dc = 3, dv = 2 and p = 4. . . 19

3.8 Demonstration of cycle-4. . . 20

3.9 Pariyt check matrix H. . . 20

3.10 Performance of (9153, 8256) LDPC code. . . 21

4.1 Architecture and scheduling for VSS algorithm. . . 23

4.2 Conventional accumulative sorter. . . 24

4.3 Demonstration of conventional accumulative sorter. . . 25

4.4 Accumulative sorter w/o 2nd min. . . 26

4.5 Demonstration of accumulative sorter w/o 2nd min. . . 26

4.6 Performance of (9153, 8256) LDPC code with different global 2nd_min com-pensation, MS - MinSum, MS-VSS - MinSum with variable-node-centric sequential scheduling. . . 27

4.7 Variable node unit architecture. . . 28

4.8 Illusion of messages shifted between CNUs. . . 29

4.9 Parity Check Matrix of (9153,8256) LDPC code. . . 30

5.1 2 bits (4 levels) non-linear quantization. . . 33

5.2 Performance of (9153, 8256) (Column deg = 8) LDPC code with different parameters. . . 34

(9)

5.4 Performance comparison, Iteration = 40. . . 36 5.5 Layout of Place and Route. . . 38

(10)

List of Tables

5.1 Synthesis result of CNU and VNU with technology UMC90. . . 36 5.2 Summary of implementation result (Place and Route). . . 37

(11)

Chapter 1 Introduction

1.1 Motivation

Error correcting code is important to NAND flash memory system since error is un-avoidable [1]. BCH code [2] [3] is famous for NAND flash memory system because of its simple hardware architecture and hard input requirement. As advanced technology scaled down and more bits of data stored per NAND flash cell, more errors are introduced. Under the limitation of number of parity bits, the correcting capability of BCH code is not enough to meet the requirement of next generatation NAND flash emory system. Soft input is required to improve the correcting capability of error correcting code. However, BCH code has only little improvement when soft input is provided [4] [5]. LDPC code [6] is a good candidate for its powerful correcting capability and simple decoding algorithm. 2-bit soft LDPC code can outperform BCH code with same code rate.

Low density parity check (LDPC) code is a famous error correcting code with near Shannon limit performance [7]. The parity check matrix H can be described by a Tanner graph [8]. The rows and columns of H are mapped to check nodes and variable nodes respectively. In standard belief propagation (BP) algorithm, a LDPC decoder exchanges messages between check nodes and variable nodes iteratively in fully parallel.

High code rate is a necessary condition for error correcting code applied on NAND flash memory system. A high code rate LDPC code introduces large row degree which causes implementation difficulty. The proposed LDPC code has a row degree of 81. The

(12)

This greatly reduces the routing complexity and storage memory. A (9153, 8256) LDPC code is constructed by permutaion matrix algorithm with code rate is 0.9. The proposed LDPC code decoder has a better performance than BCH code with the same code rate when 2-bit soft input is provided. The maximum throughput can achieve 2.78 Gbps under operating frequency of 100Mhz with 10 iterations, using 90nm CMOS technology.

1.2 Thesis organization

The rest of this thesis is organized as follows. Chapter II gives the introduction of NAND flash memory. In Chapter III, we introduce the decoding algorithm, performance-related code paramemters and code construction. In Chapter IV, decoder architecture is presented. The simulation result is given in Chapter V and conclusion in Chapter VI.

(13)

Chapter 2 NAND Flash Memory

2.1 Introduction of NAND Flash Memory

This section introduces the flash memory system and basic operations : Programming, Erasing and Reading.

2.1.1 Flash Memory System

Flash memory is widely used for data storage in portable devices. Since flash memory is non-volatile, no power is needed to maintain the information stored. In addition, flash memory offers fast read access times comparing to hard disk. In this thesis, we take a NAND flash memory as the target flash memory.

There are three basic operations in NAND flash memory called programming, erasing and reading. NAND flash memory can be programmed and erased block by block. Each block contains number of pages. NAND flash memory can be read page by page. More details of these three operations will be presented in next section.

Fig. 2.1 shows the flash memory system. Data are transmitted in pages where a page size is equal to 4K or 8K bytes. One single page consists of data area and spare area. The data area stores the user data, and the spare area stores the system-control signal and parity bits of error correcting code (ECC). Pages are encoded before programming, and decoded after reading from flash memory.

(14)

Flash

Memory

Buffer

ECC

System

Figure 2.1: The Block Diagram of Flash Memory System.

2.1.2 NAND Flash Cell Programming

Fig. 2.2 shows a NAND flash Cell Programming. In a NAND flash Cell, there is a Floating Gate between the Gate and Substrate. When data is written into NAND flash Cell, 0V is applied to the Source and Drain. A high voltage (VG) is applied to the Gate.

Electrons in Substrate are attracted to the Floaging Gate. Different (VG) can be applied

to control the amount of electrons injected in Floating Gate. The amount of electrons injected in Floating Gate determines the threshold voltage of a NAND flash Cell.

Substrate

0V 0V

VG

Figure 2.2: NAND Flash Cell Programming [1].

A Single Level Cell (SLC) means that only 1 bit data is stored per cell. Therefore, the threshold voltage region of a SLC is divided into two levels. Fig. 2.3 shows the threshold voltage distribution of SLC. For example, the threshold voltage is controlled to 2.5V if data 1 is stored, or 5.5V if data 0 is stored. There is variation of threshold voltage due to noise disturb and will be introduced in the next subsection.

2.1.3 NAND Flash Cell Erasing

Electrons in Floating Gate must be erased before reprogramming. When NAND flash Cell is earsed, 0V is applied to the Source, Drain and Gate. And high voltage (VS) is

(15)

Figure 2.3: Threshold voltage distribution of a Signle Level Cell of NAND Flash Memory [1].

applied to the Substrate. Electrons in Floating Gate are attracted to the Substrate and no more electrons are left in Floating Gate.

Substrate

0V 0V

0V

VS

Figure 2.4: NAND Flash Cell Erasing [1].

2.1.4 NAND Flash Cell Reading

Fig. 2.5 shows NAND flash Cell Reading. To read a NAND flash cell, the selected wordlines are grounded and high voltage (VD) is applied to the unselected wordlines. A

bias is applied to the bitlines. Current will flow through the transistor if there is no charge stored in the cell.

(16)

0V VD VD Unselected WL VD Unselected WL VD Selected WL 0V Bit Line VBIAS

Figure 2.5: NAND Flash Cell Reading [1].

2.2 Reliability of NAND Flash Memory

Electron leakage, program and read disturb cause the variation of threshold voltage of NAND flash cell. Errors may be introducted if the threshold voltage shifts to other level. More details about noise disturb will be introduced in this subsection.

2.2.1 Electron Leakage

The number of electrons stored in Floating Gate decreases over time because electrons may leak from the NAND flash Cell. This problem can be solved by erasing and repro-gramming periodly. But NAND flash Cell may be damaged when number of Program / Erase cycles increases. Leakage will be more serious if NAND flash Cell is damaged. Errors become unavoidable if NAND flash Cell is desired for a long time use.

2.2.2 Program Disturb

Fig. 2.6 shows the program disturb of a NAND flash Cell. Unselected cells on the same wordline or on adjacent wordlines of programmed cell, may suffer from voltage stress resulting in unwanted programming. Therefore, the threshold voltage of those unselected cells increases and may shift to other level.

(17)

0V Unselected WL 10V Unselected WL 10V Selected WL 20V 0V VCC VCC VCC VCC Program Disturb Cells Programmed Cell

Figure 2.6: Program Disturb.

2.2.3 Read Disturb

Unselected cells adjacent to cells being read may suffer from voltage stress resulting in unwanted programming. As in program disturb case, the threshold voltage of those unselected cells increases and may shift to other level.

0V Unselected Page 4.5V

Unselected Page 4.5V

Selected Page 0V Read Disturb

Cells

VBIAS

VBIAS VBIAS 4.5V

4.5V

(18)

In Fig. 2.3 , threshold voltage below 4V represents data 1 is stored, and threshold voltage above 4V represents data 0 is stored. There is a tolerance range for the variation of threshold voltage. Data is still correct if the threshold voltage does not shift to other level.

Figure 2.8: Threshold voltage distribution of a 2bits/cell NAND flash cell.

Fig. 2.8 shows a 2bits/cell NAND flash cell. The storage capacity is doubled comparing to the 1bit/cell NAND flash cell. Threshold voltage region is divided into 4 levels and region for each level is narrower. Therefore, the probability of threshold voltage shifting to other level is increased and led to degradation of reliability.

Nowadays, NAND flash memory system only provides hard input to error correcting code. For example, in Fig. 2.8, only three voltages (3.2V, 4V and 5.1V) are applied to check in which level the threshold voltage is. NAND flash memory system does not provide any information that how likely this bit to be ’0’ or ’1’. Information received by error correcting code is exactly ’0’ or ’1’. We call this hard input.

BCH code is feasible for its simple hardware architecture and only hard input require-ment. However, advanced technology scale down and more bits of data stored per NAND flash cell will cause the degradation of reliability. More parity bits are required to im-prove the correcting capability of BCH code. The increase of spare area (area for parity bits storage) greatly degrades the data storage capacity and is infeasible to commerical product. To overcome this problem, NAND flash memory system will provide more infor-mation (soft input) in the next generation standard and much powerful error correcting

(19)

code can be adopted.

Figure 2.9: Threshold voltage distribution of a 2bits/cell NAND flash cell.

In Fig. 2.9, if data 01 is stored and threshold voltage shifts to 5.5V, hard input only provides that the second bit is a ’0’. More information can be provided if one more voltage (5.8V) is applied to Gate. We can know that the threshold voltage is less than 5.8V, and the second bit has a high probability of being ’1’. This provides more information for each data bit to error correcting code and we call this soft input.

BCH code has only little improvement when soft input is provided [4] [5]. LDPC code is probability-based and soft information can be well-used. Therefore, LDPC code is a good candidate for the next generation NAND flash memory system. Providing soft input will inrease reading latency in flash memory system. This is a trade-off between correcting capability and system latency. This thesis shows that only 2-bits soft input LDPC code can outperform BCH code under same code rate. Therefore, degradation to system latency is minimized.

(20)

Chapter 3 Low Density Parity Check Code

LDPC code was first discovered by Gallager [6] in the early 1960s. But it does not at-tract great attention until 1900s. The main reason is the high routing complexity making implementaion very difficult. Decoding algorithm of LDPC code is iterative message-passing decoding. Messages are passed between Check Node Unit (CNU) and Variable Node Unit (VNU) during decoding process. This iterative message-passing algorithm pro-vides superior correcting ability and makes LDPC code widely adopted in communication application.

In this section, decoding algorithm will be introduced and performance-related code paramemters will be discussed. Finally, a code construction algorithm will be introduced.

3.1 Decoding Algorithm

3.1.1 Standard Belief Propagation (BP) Algorithm

The log-likelihood ratio (LLR) of intrinsic information of nth _{variable node is denoted}

by Pn. The message from nth variable node to mth check node is denoted by zmn. The

message from mth _{check node to n}th _{variable node is denoted by ǫ}

mn. The a posteriori

LLR of nth _{bit is denoted by z}

n. The current number of iteration and maximum number

of iteration is represented by i and IM ax respectively. The standard BP is carried out as

followed.

1.Initialzation:

Set i = 1. For each m, n, set z0

(21)

2.Iterative Decoding:

(a)check node to variable node update step, for 1 ≤ m ≤ M and each n ∈ N (m), process ǫimn = 2 tanh−1( d Y n′_{∈N (m)\n} tanh(z i−1 mn′ 2 )) (3.1)

(b)variable node to check node update step, for 1 ≤ n ≤ N and each m ∈ M (n), process zi mn= Pn+ X m′_{∈M (n)\m} ǫi m′_n (3.2) zi n = Pn+ X m′_{∈M (n)} ǫi m′_n (3.3) 3.Hard Decision:

Let Xn be the nth bit of decoded codeword. If z(i)n ≥ 0, Xn = 0, else if zn(i) < 0, Xn =

1. If H(x(i)₎t _{= 0 or I}

M AX is reached, the decoder stops and outputs the codeword.

Otherwise, it sets i = i + 1 and goes on iterative decoding.

The iterative decoding processes for one iteration of standard BP is illustrated below. The messages are updated in parallel way between check nodes and variable nodes. The process is shown in Fig. 3.1.

3.1.2 Variable-node-centric Sequential Scheduling (VSS)

Algorithm

High code rate LDPC code introduces high row degree. This makes implementation difficult due to the large number of inputs to sorter. The hardware cost and critical path of Check Node Unit (CNU) is greatly incresed. Shuffle decoding algorithm [9] [11] with

(22)

V1 V2 V3 V1 V2 V3 V4 V5 V1 V1 V4 V5 εi11 zi-1₁₄ zi-115

(a) Check node to variable node update of BP algorithm

V1 V2 V3 V1 V2 V3 V4 V5 V1 V2 V5 εi15 zi₁₅

(b) Varibale node to check node update of BP algorithm

Figure 3.1: Illustratin of standard BP.

BP algorithm. The only difference between two algorithms is the updating procedure. Assume the N bits of a codeword are divided into G groups, so each group contains N/G = NG bits. The messages are only exchanged between variable nodes from one

group and check nodes which are connected to that group. In addition, each group of messages is updated in order. Furthermore, one iteration takes N cycles. For G = 1, the VSS scheduling becomes standard BP.

The normalized min-sum (NMS) algorithm which compensates the approximation er-ror in check node update step can also be applied to VSS approach with normalized factor β = 0.5. The updating procedure of NMS algorithm with VSS approach is carried out as follows.

1.Initialzation: For each m, n, set z0

mn = Pn

2.Iterative Decoding:

(23)

n ∈ N (m), process ǫimn = Y n′_{∈N (m)\n,n}′_≤g·N_G₋₁ sign(zmni ′) × Y n′_{∈N (m)\n,n}′_≥g·N_G sign(z_mni−1′)× min min n′_{∈N (m)\n,n}′_≤g·N_G₋₁|z i mn′| , min n′_{∈N (m)\n,n}′_≥g·N_G|z i−1 mn′| × β (3.4)

(b)variable node to check node update step, for g · NG ≤ n ≤ (g + 1) · NG− 1 and each

m ∈ M (n), process zmni = Pn+ X m′_{∈M (n)\m} ǫim′_n (3.5) zni = Pn+ X m′_{∈M (n)} ǫim′_n (3.6) 3.Hard Decision:

Let Xn be the nth bit of decoded codeword. If z (i)

n ≥ 0, Xn= 0, else if z (i)

n < 0, Xn = 1. If

H(x(i)₎t_{= 0 or I}

M AX is reached, the decoder stops and outputs the codeword. Otherwise,

it sets i = i + 1 and goes on iterative decoding.

The decoding process for one iteration of VSS is illustrated in Fig. 3.2 with G = 3 as example. The arrows with blue color represent check node to variable node messages to be updated. The arrows with red color represent variable node to check node messages to be updated. On the other hand, black lines represent that messages are not updated in that cycle.

(24)

C1 C2 C3

V1 V2 V3 V4 V5 V6

C1 C2 C3

V1 V2 V3 V4 V5 V6

(a) 1st group’s message updated

C1 C2 C3

V1 V2 V3 V4 V5 V6

C1 C2 C3

V1 V2 V3 V4 V5 V6

(b) 2nd group’s message updated

C1 C2 C3

V1 V2 V3 V4 V5 V6

C1 C2 C3

V1 V2 V3 V4 V5 V6

(c) 3rd group’s message updated

Figure 3.2: Illusion of VSS.

3.2 Performance-Related Parameters

3.2.1 Cycles in Tanner Graph

A LDPC code with cycle-4 introduces smaller trapping set [12]. It will cause per-formance degradation in water fall region. For LDPC code, we call this perper-formance degradation in water fall region, the error floor [13]. Therefore, constructing LDPC code with cycle-4 should be avoided and cycle should be as large as possible. Fig. 3.3 illustrates a Tanner Graph with cycle-6 cycles and its corresponding parity check matrix.

(25)

C1 C2 C3

V1 V2 V3 V4 V5 V6 C4

(a) A tanner graph with cycle-6

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1 H













=

_

_









(b) Parity check matrix H corresponds to (a)

Figure 3.3: An example of a tanner graph with cycle-6.

3.2.2 Column Degree

A LDPC code with higher column degree has better performance in water fall region. It means that it can suppress the error floor in lower bit error rate region. Fig. 3.4 shows the performance of LDPC codes with different column degree. S represents scaling factor in this thesis.

In Fig. 3.4, (672, 588) is a LDPC code from IEEE 15.3c Standard, with column degree 3. It has poor performance at waterfall region due to its low column degree. LDPC code with column degree 8 and 12 has better performance at waterfall region.

(26)

3.5

4

4.5

5

10

-6

10

-5

10

-4

10

-3

10

-2

10

-1

N=10

7

, AWGN Channel,

Iteration = 25, Normalized Min-Sum

Eb/No(db)

B

ER

(9409,8256), Column Deg=12, S=0.4, R=0.877 (9153,8256), Column Deg=8, S=0.5, R=0.9 (672,588), Column Deg=3, S=0.4, R=0.875

Figure 3.4: Performance of LDPC code with different column degree.

Fig 3.5 shows that LDPC code with higher column degree has better performance at waterfall region. Both (2071,1746) and (2033,1714) LDPC codes are constructed by permutation matrix algorithm [14] and will be introduced in next subsection. LDPC codes constructed by permutation matrix algorithm has no cycle-4. They are QC code [15] and their columne degree is 4. For (2048,1723) (IEEE 802.3an Standard [16]) LDPC, error floor will not appear until BER down to 10−10_{. Thus, high column degree LDPC code is}

(27)

2.5

3

3.5

4

4.5

5

10

-6

10

-5

10

-4

10

-3

10

-2

10

-1

N=10

7

, Iteration = 50, Normalized Min-Sum

Eb/No(db)

B

ER

(2071, 1746), Column Deg=3, S=0.75 (2033, 1714), Column Deg=3, S=0.75 (2048, 1723), Column Deg=6, S=0.75

In Fig 3.6, improvement of performance in waterfall region from higher column degree is not clear. Since codeword length is very long, the improvement is expected to appear in deeper Bit Error Rate region. Software computation is not fast enough to investigate the error floor. FPGA simulation will be done in the future. Error correcting code applied on NAND flash memory system requires high code rate and no performance degradation down to bit error rate near 10−12_{. Therefore, a higher column degree LDPC code with no}

cycle-4 is preferred. The proposed LDPC code in this thesis is (9153, 8256), with column degree 8 and no cycle-4.

(28)

3

3.5

4

4.5

5

10

-7

10

-6

10

-5

10

-4

10

-3

10

-2

10

-1

10

5

_{codewords, Iteration = 50, Normalized Min-Sum}

Eb/No(db)

B

ER

(9160,8247), R=0.9, Column Deg=4, S=0.75 (9050,8149), R=0.9, Column Deg=5, S=0.75 (9153,8256), R=0.9, Column Deg=8, S=0.5

3.3 Code Construction

3.3.1 Permutation Matrix Algorithm

Permutation matrix [14] algorithm is a code construction of QC LDPC code. The parity check matrix H of QC code is composed of many sub-matrixes. Each sub-matrix will be an Identity matrix or cyclic shift of an Identity matrix. An example of QC code is demonstrated in Fig 3.7. The number inside a sub-matrix represents the amount of cyclic shift.

Cycle-4 causes performance degradation and this code construction can avoid any cycle-4. Algorithm of code construction is described in [14]. In this thesis, we provide another view of this algorithm. There are 3 parameters to be decided: row degree (dc),

(29)

1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0                           H = _{H =} 0 1 2 0 3 1      

Figure 3.7: An example of QC LDPC code, dc = 3, dv = 2 and p = 4.

variable node degree (dv) and size of sub-matrix (p). Row degree determines the number

of sub-matrix in one sub-matrix row. And variable node degree determines the number of sub-matrix in one sub-matrix column.

Another view of permutation matrix algorithm:

Let’s Si,j represents the amount of cyclic shift in sub-matrix of ith sub-matrix row and

jth _{sub-matrix column. d}

c represents row degree. dv represents variable node degree. And

p represents size of sub-matrix and must be a prime number. 1.Initialization :

S0,j = j, 0 ≤ j ≤ dc− 1

2.Completion of the remaining Si,j:

Si,j = (j + (j + 1) · i) mod p, 0 ≤ j ≤ dc − 1, 0 ≤ i ≤ dv− 1

Fig. 3.8 demonstrates the condition that cycle-4 occurs. For any 4 numbers in a square (the red dash box), if the difference between the cyclic shift amount in one sub-matrix column, is equal to the difference between the cyclic shift amount in other sub-matrix column, cycle-4 is formed. For example in Fig. 3.8, the difference between 1 and 2 is equal to the difference between 2 and 3.

(30)

1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0                           H = _{H =} 0 1 2 0 2 3      

Figure 3.8: Demonstration of cycle-4.

The number in small square represents the cyclic shift amount of that sub-matrix. There are n sub-matrix in one sub-matrix row and m sub-matrix in one sub-matrix column. p is the size of a sub-matrix.

1 1 2 2 1 2

Figure 3.9: Pariyt check matrix H.

C = [A + (x1 + 1)m′] mod p D = [B + (x2 + 1)m′] mod p (3.7) C − A = [(x1+ 1)m′] mod p D − B = [(x2+ 1)m′] mod p (3.8)

(31)

where m′ _{= m}

2− m1; 0 ≤ x1 < x2 ≤ n − 1; 0 ≤ m1 < m2 ≤ m − 1; n, m ≤ p

Since p is a prime number and, x1 < x2 ≤ n − 1 and n ≤ p, (C − A) will never be

equal to (D − B). Therefore, no cycle-4 is formed.

3.3.2 Code Performance

The proposed LDPC code in this thesis is (9153, 8256) with code rate 0.9. Column degree is 8. The size of a sub-matrix is 113 and decoding algorihtm is Normalized Min-Sum. S represents scaling factor and number of iteration is 40 .Fig. 3.10 shows its performance. 4 4.2 4.4 4.6 4.8 10-6 10-5 10-4 10-3 10-2 10-1 N=107, R=0.9, S=0.5, Iteration=40 Eb/No(db) B ER

(9153,8256), AWGN Channel, Floating

(32)

Chapter 4 LDPC Decoder Architecture

4.1 Single Pipelined Architecture for VSS

Algorithm

Details of variable-node-centric sequential scheduling algorithm(VSS) [10] is intro-duced in previous section. Hardware architecture will be fully explained in this section.

The entire decoder depicted in Fig. 4.1(a) is composed of fully-parallel CNUs and partial-parallel VNUs. Variable nodes are divided into 27 groups (G = 27). There are 904 Check Node Units (CNU) and 339 Variable Node Units (VNU). Let αi

g,m denotes the

sorted messages (1st _{min, 2}nd _{min and indices) from variable nodes in the g}th _{group to}

mth _{check node at i}th _{iteration, which is:}

αi g,m = min n′_{∈N (m)\n,g·N}_G_≤n′_≤(g+1)·N_G₋₁ zi_mn′ (4.1) Then the magnitude part of check node to variable node message in equation 3.4 could be computed by the following equation:

ǫi_mn = min n αi j,m j<g, α i g,m,αi−1k,m k>g o (4.2) Fig. 4.1(b) demonstrates the timing diagram of proposed decoder. G initialization cycles are required to calculate α0

g,m for 0 ≤ g ≤ G − 1. Since only one subgroup of the

message zi

mn is updated in each cycle of one iteration, the main operation of CNU could

be simplified to calculate αi

g,m(local sorting) in each cycle and then perform global sorting

like equation 4.2. In single pipelined architecture, only messages αi

(33)

while the variable node to check node message zi

mn is on-the-fly calculated. The CNU

could be updated immediately after VNU’s operations in VSS approach and no variable to check node message need to be stored.

1st min 2nd min 1st min 2nd min 1st min 2nd min ... R ou ti n g N e tw o rk ... _... ... ... ... ... ... ... ... ... R o u ti n g N e tw ork ... ... ... ...

(a) Single pipelined architecture for VSS algorithm

1 2 G 1 2 G 1 C V C V C V C V C V C V C V C V C CLK Initialization Iteration 1 0 0 0 1 2 min, 2 min{nd , , , } m m mG α α ⋯α 1 0 0 0 1 2 3 4 min, 2 min{nd , , , } m m m m α α α α 1 1 0 0 1 2 3 4 min, 2 min{nd , , , } m m m m α α α α

Ready to update bit nodes in Group 1

1 m

α

( represents

sorted messages from group m)

V

(b) Timing Schedule

(34)

4.2 Check Node Unit (CNU)

This section presents detail CNU architecture based on VSS scheduling. The CNU architecture is further optimized to reduce storage requirement and the number of sorters. Different CNU architectures will affect the convergence speed and performance which will be discussed in the next chapter. The messages sent from VNU are converted from two’s complement format to sign-magnitude format for efficient computation of CNU. Therefore, the operation of check node to variable node update could be divided into magnitude part and sign part. For our proposed LDPC codes with row degree 81, the VSS approach with G = 27, the number of messages need to be computed in each CNU group is 3.

4.2.1 Accumulative Sorter

Fig. 4.2 illustrates the magnitude part of CNU, which is an accumulative sorter composed of a local sorter and a global sorter. The local sorter is used to find the local 1st _{min and 2}nd _{min values in each subgroups, and global 1}st _{min and 2}nd _{min values of}

a row will be found by a global sorter. G − 1 registers are required to store local 1st _min

from different group. And local 2nd _{min is the same. The global sorter has 27 × 2 = 54}

inputs in total. Number of registers will be increased if G becomes larger. This increases the number of inputs to global sorter and the critical path.

1st min 2nd min ... ... G-1 registers G lo b a l So rt e r 1 st m in 2 n d m in Global 1st min Global 2nd min

Local 1st min in different group

Figure 4.2: Conventional accumulative sorter.

(35)

group (G) = 3. R1stmin and R2ndminrepresent the local 1st min and 2nd min of each group

respectively. The value in registers is reset to infinity before initialization. Since G = 3, there are three variable nodes in each group and they provide new values to the sorter every cycle. Local 1st _{min and 2}nd _{min will be obtained and stored in the registers. The}

values in each register is shifted to the right. The global sorter chooses the global 1st _min

and 2nd _{min from these 7 values (3 new inputs, local 1}st _{min and 2}nd _{min from 2 local}

groups). The red number represents the global 1st _{min in that cycle.}

st

Group 1 Group 2 Group 3 Group 1

R1st min ∞ ∞ 0.1 ∞ 0.4 0.1 0.7 0.4

R2nd min ∞ ∞ 0.2 ∞ 0.5 0.2 0.8 0.5

Inputs 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7

Figure 4.3: Demonstration of conventional accumulative sorter.

4.2.2 Accumulative Sorter without 2

nd

minimum value

To reduce storage memory, local 2nd _{min values and global 2}nd _{min values are not}

stored. The local 1st _{min value is the minimum value from G − 1 groups. And global 2}nd

min value is taken from local 1st _{min value directly. This may cause some performance}

loss.

When local 1st _{min value is smaller than global 1}st _{min value, global 1}st _{min value is}

replaced by local 1st _{min value. Then value stored in local 1}st _{min register should be set}

to a maximum value. Local sorter starts to find the new local 1st _{min value.}

Conventionally, when the current updating group is the same as the group that global 1st _{min value comes, global 2}nd _{min value should be sent to the variable nodes. Since}

(36)

There are some methods for compensation on global 2nd _{min such as multipling or}

adding a scalar to original global 1st _{min. But these methods only provide limited}

im-provement. Since local 1st _{min value from G − 1 groups contains updated information,}

taking local 1st _{min value as global 2}nd _{min value can provide better improvement.}

1st min R G lo b al So rt e r st R

Figure 4.4: Accumulative sorter w/o 2nd min.

A demonstration is provided in Fig 4.4. We assume row degree = 9 and number of group (G) = 3. Rlocal represents the local 1st min and Rglobal represents the global 1st

min. Number of registers is indepentent of G. Number of inputs to local sorter is equal to N/G + 1 and number of inputs to local sorter is equal to 2. The new global 1st _min

comes from the three new inputs, local 1st _{min and previous gloabl 1}st _{min. The red}

number represents the global 1st _{min in that cycle. After the initialization, the global 1}st

min stored in register comes from gorup 1. At 4th _{cycle (group 1 update of 1}st _iteration),

there are new valus from group 1 and the global 1st _{min in register should be cleared.}

Threrfore, global 1st _{min is replaced by local 1}st _min.

st

Group 1 Group 2 Group 3 Group 1

R local ∞ ∞ 0.4 ∞

R_global _∞ 0.1 0.1 0.4

Inputs 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7

(37)

0 ₅

₁₀

₁₅

₂₀

10

-7

10

-6

10

-5

10

-4

10

-3

10

-2

N=1000627200, SNR = 5.1dB, (9153,8256), S=0.5, Q=(4,2),

Iteration

B

ER

MS MS-VSS with 2nd min

MS-VSS, Global 2nd min = local 1st min + 0.25 MS-VSS, Global 2nd min = local 1st min + 0.5 MS-VSS, Global 2nd min = local 1st min

Figure 4.6: Performance of (9153, 8256) LDPC code with different global 2nd _min

compen-sation, MS - MinSum, MS-VSS - MinSum with variable-node-centric sequential schedul-ing.

Figure 4.6 shows the performance of (9153, 8256) LDPC code with different global 2nd _{min compensation. MinSum with VSS algorithm has a faster convergence speed than}

MinSum algorithm. If global 2nd _{min is not stored, there is some performance degradation}

and the convergence speed decreases. But reduced storage memory version is preferred for the FPGA simulation. Compensation on global 2nd _{min (local 1}st _{min) does not provide}

any improvement. Thus, no compensation on global 2nd _{min is preferred. BER decreases}

slowly after 10th iteration due to the absence of original global 2nd _{min value. Therefore.}

(38)

4.3 Varible Node Unit (VNU)

Fig. 4.7 shows the architecture of a VNU. SM to TC represents sign-magnitude to two’s-complement conversion, and TC to SM represent two’s-complement to sign-magnitude conversion. Registers are corresponding to different channel values in the different groups. Since G = 27, there are 27 2-bits registers to store channel values in one VNU. The bit width of messages passing between CNU and VNU is 4. The variable node degree is 8. Thus, number of inputs of adder is 9. 2 bits channel value is mapped to 4 bits value by non-linear quantization. More details of non-linear quantization will be discussed in next chapter.

SM to TC SM to TC SM to TC

...

... R R R Channel Value

...

Decoded bit Clipping Clipping Clipping

...

TC to SM TC to SM TC to SM 4 4 4 4 4 4 8 4 7 7 7 4 4 4 1 MSB

(39)

4.4 Shifting Network

High compexity of routing network between Check Node Units (CNU) and Varible Node Units (VNU), is the main difficulty for hardware implementation of LDPC code. Shifting Network [17] [18] [19] [20] has been proposed to reduce the routing complexity. There are two routing networks between CNU and VNU. One is the direction from CNUs to VNUs, while another one is the direction form VNUs to CNUs.

The shifting network of LDPC code, which is constructed by permutation matrix algorithm, can be simplified. The wire connection from CNUs to VNUs is fixed and no shifting network is needed. But messages of each CNU are shifted between CNUs. The idea is explained in Fig 4.8.

1 0 0 0 0 1 0 0 0 1 1 0 1 0 1 0 0 0 0 1 0 0 0 1 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 H         =         

(a) Parity Check Matrix of a LDPC code, with variables divided into 6 groups

C1 C2 C3 V1 V2 C4 C5 C6 C2 C3 C4 V3 V4 C5 C6 C1 C3 C4 C5 V5 V6 C6 C1 C2 C4 C5 C6 V7 V8 C1 C2 C3 C5 C6 C1 V9 V10 C2 C3 C4 C6 C1 C2 V11 V12 C3 C4 C5

(40)

…

0 1 2 3 3 24 3 24

Figure 4.9: Parity Check Matrix of (9153,8256) LDPC code.

Fig. 4.9 shows the cyclic shift amount of some sub-matrices in parity check matrix of (9153,8256) LDPC code. Since G = 27, 3 sub-martrices are processed in each decoding cycle. The difference between cyclic shift amount of each group is a constant. Thus, messages are shifted between CNUs after each decoding cycle and routing network can be eliminated.

(41)

4.5 Comparison with Conventional Architectures

For accumulative sorter in Fig. 4.2, larger subgroup number G will result in fewer inputs of local sorter but more inputs of global sorter. And the number of storage memory for 1st _{min, 2}nd _{min, and index values will increase. In addition, the critical path will}

be shorter when G is larger becasue sorter is smaller. In traditional two-stage pipelined architecture, both check node to variable node message and variable node to check node message are kept in registers or memory. Assume the bit-width w of messages is 4 and variable node degree is dv, then the required memory size (or registers) is as follows:

Conventional Min-Sum two-stage pipelined architecture:

RegV N U + RegCN U

= N · dv· w + m · 1stmin + 2ndmin + Index + Sign

= 9153 · 8 · 4 bits + 904 · (3 + 3 + 7 + 81)

= 377872 bits

(4.3)

VSS architecture with conventional accumulative sorter(Fig. 4.2):

RegV N U + RegCN U

= 0 + m · (local 1stmin + local 2ndmin+

global 1st_{min + global 2}nd_{min + Index + Sign)}

= 904 · (3 · 26 + 3 · 26 + 3 + 3 + 7 + 81) = 226000 bits

(42)

Proposed VSS architecture with No 2nd _{min accumulative sorter(Fig. 4.4):}

RegV N U + RegCN U

= 0 + m · local 1stmin + global 1stmin + Index + Sign = 904 · (3 + 3 + 5 · 2 + 81)

= 87688 bits

(4.5)

Compared to the conventional Min-Sum two-stage pipelined architecture, proposed architecture reduces 76.8% registers. Compared to the VSS architecture with conventional accumulative sorter, proposed architecture reduces 61.2% registers with some performance loss. Since G = 27, the reduction of combinational circuit of VNU is approximately 96%.

(43)

Chapter 5 Simulation and Implementation

Result

5.1 Quantization

Belief Propagation (BP) is a probability-based message passing algorithm. When soft input is available, LDPC code can provide powerful correcting ability. LDPC code with 2-bit soft input can outperform BCH code under same code rate. Additive White Gaussian Noise (AWGN) channel with Binary Phase Shift Keying Modulation (BPSK) are used for demonstration and simulation. We assume that data ’0’ is mapped to ’1’ and data ’1’ is mapped to ’-1’. 2-bit quantization represents 4 levels. A bit with channel value near 0 has a high probability to be an error bit. Therefore, a non-linear quantization is preferred. We make a threshold f to divide channel value into 4 levels.

-f

f

Vmin -Vmin Vmax -Vmax 1 -1 0

(44)

Fig. 5.2 shows the performance of LDPC code with different parameters f, Vmin and Vmax.

The bit width of Input LLR after non-linear quantization and messages passing between CNUs and VNUs in decoder are 4 bits. Decoding algorithm is Normalized Min-Sum algorithm with scaling factor = 0.5.

4.6 4.8 5 5.2 5.4 5.6 10-6 10-5 10-4 10-3 10-2 N=10 7 , Iteration = 40 Eb/No(db) B ER f=0.35 Vmin=0.50 Vmax= 1.75 f=0.50 Vmin=0.50 Vmax= 1.75 f=0.35 Vmin=0.75 Vmax= 1.75 f=0.50 Vmin=0.75 Vmax= 1.75 f=0.35 Vmin=1.00 Vmax= 1.75 f=0.50 Vmin=1.00 Vmax= 1.75

Figure 5.2: Performance of (9153, 8256) (Column deg = 8) LDPC code with different parameters.

Parameter f = 0.35, Vmin = 0.5 and Vmax = 1.75 provides the best performance.

In Fig. 5.3, the performance loss between floating input and 2 bits non-linear input quantization is 0.3dB. 2 bits non-linear input quantization can provides better perfor-mance than 4-bit linear input quantization. As more-bits input information requires more READ on NAND flash cell, latency of reading data will increase. Therefore, 2 bits non-linear input quantization is chosen.

(45)

4 4.5 5 5.5 6 10-6 10-5 10-4 10-3 10-2 10-1 N=107, R=0.9, Iteration=40, (9153,8256) Eb/No(db) B ER

Soft Input, Floating 2 bits non-linear Input, Q(4,2) 4 bits linear Input, Q(4,2) 4 bits linear Input, Q(4,1) 5 bits linear Input, Q(5,1) Hard Input, Q(4,2)

Figure 5.3: Performance of LDPC code with different input quantization.

5.2 Performance

In Fig. 5.4, there is 0.7dB coding gain of 2-bit non-linear soft input LDPC code over BCH code at BER=10−4_{. 2-bit non-linear soft input LDPC code has a great potential to}

replace BCH code for NAND flash memory system. The simulation parameters of LDPC code are 4-bit quantization (2-bit integer and 2-bit decimal fraction), with scaling factor 0.5. The bit width of messages passing between CNU and VNU is 4.

Without storing global 2nd_{min value introdueces 0.1dB performance loss. But}

Variable-node-centric Sequential Scheduling (VSS) architecture with no 2nd _{min value reduces}

(46)

4

4.5

5

5.5

6

10

-6

10

-5

10

-4

10

-3

10

-2

10

-1

N=10

7

, S=0.5, Iteration=40, R=0.9

Eb/No(db)

B

ER

(9153,8256), Soft Input, NMS, Floating (9153,8256), 2 bits Soft Input, VSS w/o 2nd min, Q(4,2) (9153,8256), 2 bits Soft Input, VSS w 2nd min, Q(4,2) (9153,8256), 2 bits Soft Input, NMS, Q(4,2) (9153,8256), Hard Input, NMS, Q(4,2) (9032,8192), BCH code, t=60

Figure 5.4: Performance comparison, Iteration = 40.

5.3 Throughput

Gate count and critical path of CNU and VNU after synthesize is listed in Table. 5.1. The critical path of CNU + VNU is 5ns. We assume that the critical path of control circuit is 2ns. Therefore the clock cycle is 7ns. The LDPC decoder can operate at a frequency of 125MHz.

Table 5.1: Synthesis result of CNU and VNU with technology UMC90. CNU(sign bit register is not included) VNU

Gate count 225 620

(47)

Number of iteration is 10 and clock frequency in Place and Route is 100Mhz.

T hroughput = Inf ormation length

Cycles per iteration · (N umber of iteration + 1) · Cycle length

= 8256

27 · (10 + 1) · 10ns ≈ 2.78Gbps

5.4 Implementation Results

Table 5.2: Summary of implementation result (Place and Route). Proposed LDPC Decoder Technology UMC 90nm 1P9M Code Spec (9153,8256) Code Rate 0.9 Row Degree 81 Column Degree 8 Algorithm Variable-node-centric Sequential Scheduling Area 4.82 mm2 ( No IO Pad ) Gate Count 1100k Iteration 10

Input Quantization 2 bits Clock Frequency 100MHz Maximum Throughput 2.78 Gbits/s

Power 437 mW

Table 5.2 shows the postlayout result. Gate Count after synthesis is 1100k and Core area is 4.82mm2 _{without IO pad. Using 90nm CMOS technology, the maximum}

through-put can achieve 2.78 Gbps under operating frequency of 100Mhz with 10 iterations. Power consumption is 437mW.

(48)

(49)

Chapter 6 Conclusion and Future Work

6.1 Conclusion

This thesis proposes a (9153, 8256) LDPC code with code rate 0.9 for NAND flash memory system. (9153,8256) LDPC code is constructed by permutation matrix algorithm, with column degree 8. Simulations show that LDPC code with 2-bit soft input can outperform BCH code under same code rate. Therefore, LDPC code is a good candidate to replace BCH code in the next generation standard.

High code rate LDPC code introduces high row degree. This makes implementation difficult due to the large number of inputs to sorter, and the routing complexity also increases. Variable-node-centric sequential scheduling (VSS) is a good solution to this problem. Variable nodes are divided into G groups. Check node update procedures are processed in G cycles, reducing the number of inputs to sorter. CNU is further modified to reduce the hardware cost. Compared to the conventional Min-Sum two-stage pipelined architecture, it saves approximately 96% combination circuits of VNU and reduces 76.8% registers. The maximum throughput can achieve 2.78 Gbps under operating frequency of 100Mhz with 10 iterations, using 90nm CMOS technology.

(50)

6.2 Future Work

Flash memory system requires Bit Error Rate (BER) down to 10−12_{. And this thesis}

proposes a high column degree LDPC code in order to suppress error floor. Simulation of BER down to 10−12 _{consumes years on computer. Therefore, we will do simulation on}

FPGA to investigate the performance of LDPC code down to 10−12 _{in the future.}

There is no standard flash memory channel for any simulation. Therefore, a standard flash memory channel is desired if we want to compare performances of different error correcting code on flash memory. It is a new challenge and more details about flash memory will be studied.

(51)

References

[1] D. M. Greg Atwood, Al Fazio and B. Reaves, “Intel StrataFlashTM Memory Tech-nology Overview,” Intel TechTech-nology Journal, pp. 1–8, 4th Quarter 1997.

[2] R.C.Bose and D.K.Ray-Chaudhuri, “On a class of error-correcting binary group codes,” Inform. and Contr, vol. 3, pp. 68–79, March 1960.

[3] A. Hocquenghem, “Codes correcterus d’erreurs,” Chiffres, vol. 2, pp. 117–156, September 1959.

[4] W. J. ReidIII, L. L. Joiner, and J. J. Komo, “Soft Decision Decoding of BCH Codes Using Error Magnitudes,” IEEE Int. Symp. on Info. Theory, p. 303, June 1997. [5] Y. M. Lin, C. L. Chen, H. C. Chang, , and C. Y. Lee, “A 26.9K 314.5Mbps Soft

(32400, 32208) BCH Decoder Chip for DVB-S2 System,” in IEEE Asian Solid-State Circuits Conference, Nov. 2009, pp. 373–376.

[6] R.G.Gallager, “Low-Density Parity-Check Codes,” in MA: MIT Press, 1963.

[7] D. MacKay and R. Neal, “Near Shannon limit performance of low density parity check codes,” Electron. Lett, vol. 33, no. 6, pp. 457–458, March 1997.

[8] X.-Y.Hu, E. Eleftheriou, and D.-M. Arnold, “Progressive edge-growth Tanner graphs,” in Proc. IEEE Global Telecommunications Conf. (GLOBECOM), San An-tonio, TX, Nov. 2001, pp. 995–1001.

[9] J. Zhang and M. Fossorier, “Shuffled iterative decoding,” IEEE Transactions on Communications, vol. 53, no. 2, pp. 209–213, Feb. 2005.

[10] C.-L. Chen, K.-S. Lin, H.-C. Chang, W.-C. Fang, and C.-Y. Lee, “A 11.5-Gbps LDPC Decoder Based on CP-PEG Code Construction,” in ESSCIRC, 2009, pp. 412–415. [11] J. Sha, Z. Wang, M. Gao, and L. Lio, “Multi-Gb/s LDPC Code Design and

Imple-mentation,” IEEE Transactions on VLSI Systems, vol. 17, no. 2, pp. 262–268, Feb. 2009.

(52)

[14] H. Song, V. Kumar, and B.V.K., “Low-density parity check codes for partial response channels,” IEEE Signal Processing Magazine, pp. 56–66, Jan. 2004.

[15] M. Fossorier, “Quasicyclic low-density parity-check codes from circulant permutation matrices,” IEEE Trans. Inf. Theory, vol. 50, no. 8, pp. 1785–1793, Aug 2004. [16] IEEE Std. 802.3an, Carrier Sense Multiple Access with Collision Detection

(CSMA/CD) Access Method and Physical Layer Specifications Std., 2006.

[17] D.Oh and K.Parhi, “Area Efficient Controller Design of Barrel Shifters for Recon-figurable LDPC Decoders,” IEEE Internatinal Symposium on Circuits and Systems, pp. 240–243, May 2008.

[18] C.-H. Liu, C.-C. Lin, H.-C. Chang, and Y. C.-Y. Lee, “Multi-Mode Message Passing Switch Networks Applied for QC-LDPC Decoder,” IEEE Internatinal Symposium on Circuits and Systems, vol. 18, no. 1, pp. 85–94, Jan 2010.

[19] D.Oh and K.Parhi, “Low-Complexity Switch Network for Reconfigurable LDPC De-coders,” IEEE Transactions on Very Large Scale Integration Systems, pp. 752–755, May 2008.

[20] J. Lin, Z. Wang, L. Li, J. Sha, and M. Gao, “Efficient Shuffle Network Architecture and Application for WiMAX LDPC Decoders,” IEEE Transcations on Circuits and Systems, vol. 56, no. 3, pp. 215–219, March 2009.

適用於快閃記憶體之二位元軟輸入(9153,8256) 低密度奇偶校驗碼解碼器之設計與實作

國立交通大學

電子工程學系 電子研究所碩士班

碩士論文

適用於快閃記憶體之二位元軟輸入(9153,8256)

低密度奇偶校驗碼解碼器之設計與實作

Design and Implementation of a (9153,8256) LDPC Decoder

with 2-bit Soft Input for NAND Flash Memory

學生：何堅柱

適用於快閃記憶體之二位元軟輸入(9153,8256)

低密度奇偶校驗碼解碼器之設計與實作

Design and Implementation of a (9153,8256) LDPC Decoder

with 2-bit Soft Input for NAND Flash Memory

國立交通大學

電子工程學系 電子研究所 碩士班

碩士論文

適用於快閃記憶體之二位元軟輸入(9153,8256)

低密度奇偶校驗碼解碼器之設計與實作

學生：何堅柱 指導教授：張錫嘉 博士

國立交通大學

電子工程學系 電子研究所碩士班

摘要

Design and Implementation of a (9153,8256) LDPC Decoder

with 2-bit Soft Input for NAND Flash Memory

Student : Kin-Chu Ho Advisor : Hsie-Chia Chang

Department of Electronics Engineering

Institute of Electronics

National Chiao Tung University

Abstract

誌謝

Table of Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

Motivation

1.2

Thesis organization

Chapter 2

NAND Flash Memory

2.1

Introduction of NAND Flash Memory

2.1.1

Flash Memory System

Flash

Memory

Buffer

ECC

System

2.1.2

NAND Flash Cell Programming

2.1.3

NAND Flash Cell Erasing

2.1.4

NAND Flash Cell Reading

2.2

Reliability of NAND Flash Memory

2.2.1

Electron Leakage

2.2.2

Program Disturb

2.2.3

Read Disturb

Chapter 3

Low Density Parity Check Code

3.1

Decoding Algorithm

3.1.1

Standard Belief Propagation (BP) Algorithm

3.1.2

Variable-node-centric Sequential Scheduling (VSS)

Algorithm

3.2

Performance-Related Parameters

3.2.1

Cycles in Tanner Graph

1

1

0

電子工程學系電子研究所碩士班

電子工程學系電子研究所碩士班

學生：何堅柱指導教授：張錫嘉博士

電子工程學系電子研究所碩士班

_

_