適用於低密度奇偶檢查迴旋碼之改良式差值動態排程解碼演算法

(1)

國立交通大學

電信工程學系碩士班

碩士論文

適用於低密度奇偶檢查迴旋碼之改良式差

值動態排程解碼演算法

Improved Residual

Improved Residual-

--

-Based Dynamic

Based Dynamic

Scheduling for Decoding of Low

Scheduling for Decoding of Low-

--

-Density

Density

Parity

Parity-

--

-Check Convolutional Codes

Check Convolutional Codes

研究生：吳牧諶

(2)

適用於低密度奇偶檢查迴旋碼之改良式差值動

態排程解碼演算法

Improved Residual-Based Dynamic Scheduling

for Decoding of Low-Density Parity-Check

Convolutional Codes

研究生：吳牧諶 Student: Mu-Chen Wu

指導教授：王忠炫 Advisor: Chung-Hsuan Wang

國立交通大學

電信工程學系碩士班

碩士論文

A Thesis

Submitted to Department of Communication Engineering

College of Electrical and Computer Engineering

National Chiao Tung University

in Partial Fulfillment of the Requirements

for the Degree of

Master of Science

in

Communication Engineering

August, 2010

(3)

適用於低密度奇偶檢查迴旋碼之改良式差值動

態排程解碼演算法

研究生：吳牧諶

指導教授：王忠炫博士

國立交通大學

電信工程學系碩士班

摘

要

前人的研究顯示，具有分式奇偶檢查矩陣的低密度奇偶檢查迴旋碼因為其 Tanner 圖裡有長度為 4 的循環，所以位元錯誤率很差。然而我們最近的研究發現，如果透過穿刺的概念將原來的 Tanner 圖轉換成具有較大的最小循環長的 Tanner 圖，那些具有分式奇偶檢查矩陣的低密度奇偶檢查迴旋碼的位元錯誤率可以跟具有多項式奇偶檢查矩陣的低密度奇偶檢查迴旋碼的位元錯誤率一樣好，或者更好。由於以往對於穿刺的低密度奇偶檢查碼的解碼而言，排程法常被用來進一步改善位元錯誤率或是增加解碼的收斂速度，所以我們在現有的排程法中選擇表現較好的排程演算法 (EDS 演算法) 來解那些具有分式奇偶檢查矩陣的低密度奇偶檢查迴旋碼，看看是否可以得到更好的位元錯誤率。在這篇論文中，我們首先修改了 EDS 演算法中的差值方程式，使得解碼過程中的更新順序更洽當。此外，透過觀察我們發現，使用排程法來解碼可能會有不收斂或者是收斂到非最佳碼字的問題，有鑑於此，我們提出兩個方法，分別使用擾動和位元翻轉來解決問題。根據模擬結果，無論是對於具有分式校驗矩陣的低密度校驗迴旋碼還是對於具有多項式校驗矩陣的低密度校驗迴旋碼，我們提出的排程演算法的位元錯誤率都比一些現有排程演算法的位元錯誤率好。

(4)

Improved Residual-Based Dynamic Scheduling for Decoding of Low-Density Parity-Check Convolutional Codes

Student: Mu-Chen Wu Advisor: Chung-Hsuan Wang Department of Communication Engineering

National Chiao Tung University

Abstract

Previous studies on low-density parity-check convolutional codes (LDPC-CCs) revealed that LDPC-CCs with rational parity-check matrices (RPCM) have poor bit-error-rate (BER) performances due to the existence of lenth-4 cycles in their Tanner graphs. In our recent work, we found that we can transform the original Tanner graph of an LDPC-CC with an RPCM into a new Tanner graph with larger girth based on the concept of puncturing such that the LDPC-CC can have a comparable or even better BER performance than those of LDPC-CCs with polynomial parity-check matrices (PPCMs). For the decoding of punc-tured LDPC codes, sequential schedules are usually used to improve BER performances or speed up the convergence of the decoding. We select the well-performed efficient dynamic scheduling (EDS) among the available sequential schedules to decode those LDPC-CCs with RPCMs in order to obtain better BER performances. In this thesis, we firstly modify the residual function of EDS to have a more appropriate updating order. Besides, since several observations indicate that the decoding based on the original EDS or our improved EDS may not converge or converge to non-optimal codewords, two refined strategies based on the perturbation and the bit-flipping are hence proposed to mitigate these problems. Revealed by the simulations results, not only for RPCMs but also for PPCMs, our proposed algorithm can provide better BER performances than those of several existent schemes.

(5)

誌

謝

首先，我要感謝指導教授王忠炫博士兩年的教誨，也感謝實驗室

翁健家學長和張力仁學長不論在學習研究上或生活上給予我的幫

助，另外還要感謝實驗室同學和學妹的扶持和鼓勵，讓我度過了充實

的兩年研究生的生活。最後，對一直在背後默默支持我的家人以及陪

在我身旁的朋友們，我想說聲我愛你們。我由衷地感謝大家。

民國九十九年八月

研究生吳牧諶謹識於交通大學

(6)

List of Figures

2.1 The Tanner graph of a rate R = 1/2 LDPC code. . . 4 2.2 A cycle of length 6. . . 4

3.1 A rate R = 1/3 and syndrome former memory ms = 3 LDPC-CC encoder . . 9

3.2 Decoding window of LDPC-CC . . . 10 3.3 Cycles of length 4 in H . . . 11

4.1 A subgraph that most check nodes connect to even wrong variable nodes. . . 17 4.2 Block diagram of combination of perturbation algorithm and modified EDS. 19 4.3 A sketch map of codeword space. . . 20 4.4 A special subgraph of the LDPC-CC in Ex. 3.1. . . 21 4.5 The block diagram of the proposed algorithm. . . 22

5.1 The BER performances of an R = 0.5 LDPC-CC with a PPCM, where ms =

203. . . 25 5.2 The BER performances of an R = 0.4 LDPC-CC with a PPCM, where ms =

395. . . 26 5.3 The BER performances of an R = 0.5 LDPC-CC with an RPCM, where

(9)

Chapter 1 Introduction

In 1990s, low-density parity-check (LDPC) codes were rediscovered. LDPC codes were shown that they can achieve near-Shannon-limit performances with iterative message-passing decoding and sufficiently long block length. In 1999, Jim´enez Felstr¨om and Zigan-girov proposed low-density parity-check convolutional codes (LDPC-CCs) [1], which can be considered as convolutional counterparts of LDPC block codes. They showed that LDPC-CCs have comparable performances to those of LDPC block codes (LDPC-BCs). Further-more, an LDPC-CC can be easily encoded in a systematic way only by adders and shift registers. It can also be encoded with arbitrary length of data bits.

Previous studies of LDPC-CCs revealed that LDPC-CCs with rational parity-check ma-trices have poor bit-error-rate (BER) performances due to the existence of length-4 cycles in their Tanner graphs. To acquire a better performance by the sum-product algorithm, in our recent work, we propose a procedure [6] based on the concept of puncturing for obtaining an equivalent Tanner graph with larger girth. To further enhance the BER performance and simultaneously accelerate the speed of convergence, many researchers suggest that the sequential schedules should be applied for decoding. For sequential schedules, they can be partitioned into two classes–deterministic scheduling and dynamic scheduling, where the former decides its updating order before the decoding while the latter continuously regulates its updating order during the decoding [3][4][5]. In the previous research, dynamic schedul-ing is shown to have a better performance than that of deterministic schedulschedul-ing. Among

(10)

many sequential schedules, we apply one well-performed dynamic sequential schedule, which is named efficient dynamic scheduling (EDS) [2], to decode LDPC-CCs.

In this thesis, we first modify the residual function of EDS to have a more appropriate updating order. The original residual function aims to speed up the convergence of the decoding while our improved residual function not only aims to speed up the convergence of the decoding but also consider which variable nodes can help other variable nodes. Moreover, we find that the decoding based on the original EDS or our improved EDS may not converge or converge to non-optimal codewords. Two refined strategies based on the perturbation and the bit-flipping are hence proposed to solve these problems. The former adds small noise to the received sequence to help the decoding process escape from the decoding trap while the later searches several codeword candidates around the original decoded codeword to obtain a better decoded result. Revealed by the simulations results, not only for rational parity-check matrices but also for polynomial parity-check matrices, our proposed algorithm can provide a better BER performance than those of several existent schemes.

The rest of this thesis is organized as follows: First of all, in Section 2 and 3, LDPC-BCs and LDPC-CCs are briefly described. The proposed algorithm and the simulation results are given in Section 4 and 5. In the end, the work is concluded in Section 6.

(11)

Chapter 2 Overview of Low-Density

Parity-Check Codes

In this chapter, we first give a review of LDPC codes and Tanner graphs. Then the sum-product algorithm for the decoding of LDPC codes is also described.

2.1 LDPC Codes and Tanner Graphs

An LDPC code is a linear binary block code whose parity-check matrix H has low density

of ones. If there are J 1’s in every column and K 1’s in every row, and the number of 1’s in common between any two columns is smaller than 2, it is called a (J, K) regular LDPC code with the column weight wc= J and the row weight wr = K. However, if the column

weight or the row weight of an LDPC code is not constant, the LDPC code is called an irregular LDPC code.

An LDPC code is usually described by its Tanner graph. A Tanner graph is a bipartite graph and can be partitioned into two classes–variable nodes vi’s and check nodes cj’s, which

represent the codeword bits and the check equations, respectively. In a Tanner graph, there is no edge connecting two nodes from the same class. If and only if the bit is included in the parity check, there is an edge connecting a variable node and a check node. The neighbors of one node are the nodes which connect to it. N (vi) and N (cj) denote the neighbors of the

(12)

Figure 2.1: The Tanner graph of a rate R = 1/2 LDPC code.

Figure 2.2: A cycle of length 6.

Example 2.1 Consider a rate R = 1/2 irregular LDPC code with

H =           1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 1 1           .

Its Tanner graph is composed of 10 variable nodes and 5 check nodes, as shown in Fig. 2.1, where white squares represent check nodes and black circles represent variable nodes.

In a Tanner graph, if one can start from one node and go back to the same node through l edges with the condition that no node is passed through more than twice, there is a length-l cycle. Relatively speaking, in a parity-check matrix H, if one can start from some “1” , sequentially walk vertically to another “1” and walk horizontally to the other “1”, and walk back to the original “1” through l steps, there is a length-l cycle. A length-6 cycle in the Tanner graph of Ex. 2.1 is shown in Fig. 2.2. The girth of a Tanner graph is the minimum length of all cycles. The girth of the LDPC code in Ex. 2.1 is 6.

(13)

2.2 Sum-Product Algorithm

The sum-product algorithm (SPA) is an iterative decoding algorithm which is often used to decode LDPC codes. It can be operated on a Tanner graph that variable nodes and check nodes exchange extrinsic information iteratively. For SPA, it assumes that the received messages of every node in a Tanner graph are independent. If a Tanner graph is cycle-free, SPA is optimal for the decoding of corresponding LDPC code. However, short cycles in a Tanner graph increase the dependence of the messages and worsen the BER performance of corresponding LDPC code. SPA is also called flooding because all check nodes or all variable nodes are processed at the same time.

Assume transmitted bit xi is priori equally likely to be +1 or −1 under binary phase

shift keying (BPSK) scheme, thus the log a posterior probability (log-APP) ratio based on the channel output yi is

Xvi = log

P (xi = +1|yi)

P (xi = −1|yi)

= 2yi/σ2.

In the beginning of the algorithm, we initialize the message passed from variable node vi

to a check node cj as m (0)

vi→cj = Xvi, and the message passed from check node cjto variable

node vi as m (0)

cj→vi = 0. In the lth iteration of the algorithm, firstly, each check node cj

computes the message

m(l)_c j→vi = 2tanh −1   Y vk∈N (cj)\vi tanh 1 2m (l−1) vk→cj  

and sends it to its neighbors N (cj). On the other hand, each variable node vi computes the

message m(l)_v i→cj = X ck∈N (vi)\cj m(l)_c k→vi + Xvi

and sends it to its neighbors N (vi). Secondly, we compute the log-APP ratio

Qi = Xvi+

X

cj∈N (vi)

(14)

and make a hard decision ˆ xi =    +1, if Qi > 0 −1, otherwise

for each variable node vi. If all parity checks are satisfied, ˆxHT = 0, or the maximum

(15)

Chapter 3 LDPC Convolutional Codes

3.1 Definition

Let u[0,t] = (u0, u1, . . . , ut), where ui = (u(1)i , u (2) i , . . . , u (k) i ) and u (·)

i ∈ GF (2), be the information sequence and

v[0,t]= (v0, v1, . . . , vt), where vi = (v (1) i , v (2) i , . . . , v (n) i ) and v (·)

i ∈ GF (2), be the encoded sequence. A time-invariant

LDPC-CC is defined as the set of all sequences v[0,∞]satisfying the equation v[0,∞]HT_[0,∞]= 0,

where HT_[0,∞]=        HT₀ · · · HT_m_s 0 . .. . .. 0 HT 0 · · · HTms . .. . ..       

is a semi-infinite transposed parity-check matrix, called syndrome former. For a rate R = k/n code, the elements of HT

[0,∞] are submatrices of dimension (n − k) × n and ms is the

syndrome former memory.

The same with a convolutional code, information sequence u[0,∞] and encoded sequence

v[0,∞] can be represented by polynomial vectors

U(D) = (U1(D), U2(D), . . . , Uk(D))

(16)

where Ui(D) = u (i) 0 + u (i) 1 D + · · · and Vi(D) = v (i) 0 + v (i)

1 D + · · · , respectively. The parity

check matrix can also be denoted by

H(D) = H0+ H1D + · · · + HmsD

ms_,

and V(D) is a codeword if and only if V(D)HT_{(D) = 0.}

3.2 Encoding

We usually require H0 to be full rank in order to take this property to easily encode a

LDPC-CC with only shift registers and adders.

Example 3.1 Consider a LDPC-CC with R = 1/3, which can be specified by

H(D) =   1 D D3 D3 _D2 ₁  .

First of all, we decompose it into a superposition of matrices in different degrees of delay

H(D) = H0+ H1D + . . . + Hms =   1 0 0 0 0 1  +   0 1 0 0 0 0  D +   0 0 0 0 1 0  D2+   0 0 1 1 0 0  D3.

Then, since v[0,∞]HT[0,∞]= 0, we relate encoded bits of time t with past bits by the equation

vtHT0 + vt−1HT1 + . . . + vmsH

T ms = 0,

and obtain the following equations   

v_t(1)+ v_t−1(2) + v(3)_t−3= 0 v_t(3)+ v_t−2(2) + v(1)_t−3= 0.

By setting v_t(2) be the information bit ut, we can solve the simultaneous equations and

(17)

( )1 1 − t

v

_t( )₋2₁

v

_t( )₋3₁

v

t( )1−2 ( )2 2 − t

v

_t( )₋3₂

v

_t( )1₋₃

v

_t( )₋2₃

v

_t( )₋3₃ ( ) ( ) ( )3 2 1

,

_t _t t

v

t

u

Figure 3.1: A rate R = 1/3 and syndrome former memory ms = 3 LDPC-CC encoder

3.3 Decoding

Viterbi algorithm is rarely used to decode an LDPC-CC, because the syndrome former memory ms of an LDPC-CC is usually large. However, since H is low-density, SPA is

considered instead. Different from decoding an LDPC block code, it’s hard to process whole codeword at one time since the codeword length can go to infinity. As being shown in the Fig. 3.2, there’s a sliding window which stores the data under process. The Tanner graph in the Fig. 3.2 is derived from the H(D) in the Ex. 3.1. The window is composed of I, which equals to the iteration number, processors with size (ms+ 1) time instants. Every

time we receive n channel outputs, we put n variable nodes and n − k check nodes into the window and pop out the last n variable nodes and n − k check nodes from it. We firstly

activate the front n − k check nodes in each processor, and secondly activate the last n variable nodes in each processor. Once check nodes at time t + 4I − 1 are updated, all check nodes which connect to variable nodes at time t + 4(I − 1) have already been updated once. Thus, all those variable nodes can compute m(1)vi→cj and complete their first iteration. In the

other hand, when the check nodes at t + 4(I − 1) − 1 are updated, all variable nodes connect to them have already been updated once. Thus, all those check nodes can compute m(2)cj→vi.

However, these processors are operated independently, because one variable node at time t only connects to check nodes which locate between time t to time t + ms, and one check

(18)

...

processor 1 processor 2 processor I-1 processor I

...

cha nne l va lue s de codi ng re sul ts

active check node active variable node

cha nne l va lue s de codi ng re sul ts sliding window t t+4I−1

Figure 3.2: Decoding window of LDPC-CC

t+ms. Thus active check nodes in some processor only access variable nodes located in their

processor and vice versa. After those active nodes are updated, we put next n received bits into the window and pop out the last n bits again. Obviously, each variable node will be updated I times in the window. The complete flooding algorithm is described in Algorithm 1.

Algorithm 1 Flooding for LDPC-CC

1: Pop in n variable nodes and n − k check nodes.

2: for i = 1 to I do

3: Activate the front n − k check nodes in processor i.

4: Activate the last n variable nodes in processor i.

5: end for

6: Pop out the last n variable nodes and n − k check nodes in the window.

7: if termination criterion is satisfied then

8: Leave the algorithm.

9: else

10: Go back to step 1.

11: end if

3.4 Girth and Rational Parity-Check matrices

(19)

O O O O O O O 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 O O O O O O O 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1

Figure 3.3: Cycles of length 4 in H

other nonzero term Dn2_{, and repeat these two steps sequentially until a cycle is found.}

Define d(ni, ni+1) = ni+1− ni be the displacement of walking from Dni to Dni+1. A path

(Dn0_{, D}n1_{, . . . , D}n2l_{, D}n2l+1_{, D}n0_{) forms a cycle of length 2l if and only if the summation of}

all vertical displacements is 0

Dv = d(n0, n1) + d(n2, n3) + · · · + d(n2l, n2l+1) = 0. Example 3.2 Consider H(D) =   1 + D D4 _D6 D2 _D5 _D3  .

There’s a path (D2_{, D, D}4_{, D}5_{, D}2_{) forming a cycle of length 4 with D}

v = (2−1)+(4−5) = 0,

as shown in Figure 3.3.

For LDPC-CC, multinomial terms, excluding binomial terms, in H(D) promise girth less than or equal to 6. For example, we can easily find that a trinomial term Da+ Db+ Dc, where a < b < c, contains a cycle (Da, Db, Dc, Da, Db, Dc, Da) with length 6. Moreover, if b−a = c−b, the length of the cycle will be 4. Thus, in the past, only monomial and binomial terms were discussed. Besides, rational H(D) was also rarely discussed because small girth follows traditional Tanner graph, which was obtained by multiplying the denominator to the corresponding row or expanding the rational term into an infinite geometric series. Nevertheless, we can obtain a new Tanner graph with larger girth by using dummy variable nodes [6] to replace rational or multinomial terms.

(20)

Example 3.3 Consider H(D) =   1 D D3 D3 _D2 1 1+D  .

If we multiply (1 + D) to the second row of H(D), we can obtain

H0(D) =   1 D D3 D3_{+ D}4 _D2_{+ D}3 ₁  .

Unfortunately, there’s cycle (D3, D2, D3, D4, D3) with length 4. If we expand _1+D1 into 1 + D + D2+ D3+ · · · , there’s also a cycle (1, D, D2, D, 1, ) with length 4. Both methods promise girth equal to 4. Nevertheless, let dummy variable node M (D) = _1+D1 V3(D) and obtain

H00(D) =     1 D D3 0 D3 _D2 ₀ ₁ 0 0 1 1 + D     .

One can easily check there’s no cycle with length 4 in H00. Using dummy variable nodes equals using a super code with rate R = k/(n + n0) to replace the original one, where n0 is number of dummy variable nodes. However, to main-tain the same rate, previous researcher directly punctured the dummy variable nodes. In order to improve performance, we find which columns being punctured can acquire the best performance through simulations.

(21)

Chapter 4 The Proposed Algorithm

4.1 Residual-Based Dynamic Scheduling

For the decoding of LDPC-CCs, the flooding algorithm is usually used. However, there is one troublesome problem that the hardware complexity of the decoder is proportional to the number of iterations, which can be few hundreds. Thus, we hope to develop another decod-ing algorithm. Although the hardware complexity of the decoder is limited, the decoddecod-ing algorithm still has a good performance. For LDPC-BCs, scheduling is one common way to accelerate the speed of convergence and improve the performance, especially for punctured ones. Since we originally focus on rational H(D), which is processed by dummy variable nodes and punctured to maintain the same code rate, scheduling is considered.

Sequential schedules can be partitioned into two classes–deterministic ones and dynamic ones. Deterministic sequential schedules decide their updating order before the decoding while dynamic sequential schedules continuously regulate their updating order during the decoding. However, dynamic sequential schedules have been shown that they have better BER performances and faster speed of convergence than deterministic sequential schedules. For dynamic sequential schedules, they can be further partitioned into several classes based on how to decide their updating orders. A residual-based dynamic sequential schedule defines a residual function and decides which variable node, check node or edge should be updated first based on its residual function. Residual belief propagation (RBP) [5], node-wise residual belief propagation (NWRBP) [5], and efficient dynamic scheduling (EDS) [2]

(22)

are residual-based dynamic sequential schedules. For RBP, it defines its residual function F(l)(mcj→vi) = |m (l+1) cj→vi − m (l) cj→vi|

based on messages passed from check nodes to variable nodes and iteratively updates the edge with the largest residual. For NWRBP, it also defines its residual function

F(l)(cj) = max ∀vi∈N (cj) |m(l+1) cj→vi− m (l) cj→vi|

based on messages passed from check nodes to variable nodes. However, it not only updates the edge with the largest residual but also other edges which connect to the same check node. For EDS, it defines its residual function

F(l)(vi) =

|Q(l)_i − Q(l−1)_i | |Q(l)_i + Q(l−1)_i |

based on log-APP ratios of variable nodes and iteratively updates the variable node with the largest residual. In [2], EDS was shown that it has a better performance than those of RBP and NWRBP.

For EDS, the residual of variable node vi is defined by

F(l)(vi) =

|Q(l)_i − Q(l−1)_i | |Q(l)_i + Q(l−1)_i |.

Every time it picks one variable node with the largest residual and updates the neighboring check nodes of the variable node. Neighbors of those check nodes are also updated. Then it computes new residuals for updated variable nodes and reset the residual of the selected one to 0. Now we apply EDS to decode LDPC-CCs. Different from the flooding algorithm, the window is composed of only one processor with size K ×(ms+1) time instants. Here, K does

not promise that every node can be updated K times. For convenience, we partition the window into K blocks with size (ms+ 1) time instants. Every time we receive (ms+ 1) × n

(23)

residual F(0)_v

i = Xvi and sort all variable nodes’ residuals. Then we pick the variable node

with the largest residual, update its neighboring check nodes, and also update the neighbors of those check nodes. Afterwards, We compute the residuals for updated variable nodes and reset the residual of the selected one to 0. We call these operations hybrid operation PUC and repeat it N times. Then, we shift the variable nodes and the check nodes again.

For RBP and NWRBP, their residual were defined by

F(l)(cj → vi) = |m(l)cj→vi − m (l) cj→vi| and F(l)(cj) = max ∀i∈N (cj) |m(l)_c_j→vi − m (l) cj→vi|

, respectively. Both two functions only describe the difference between a message before and after being updated. However, for EDS, when there are more than one variable nodes with the same |Q(l)_i − Q(l−1)_i |, it picks the variable node with the smallest reliability. It believes that updating variable nodes with small reliability first can accelerate the speed of convergence. Nevertheless, updating unreliable variable nodes first may influence other reliable variable nodes and even cause errors. Thus, we modify the residual function

F(l)(vi) =

|Q(l)_i − Q(l−1)_i | |Q(l)_i /Q(l−1)_i | .

For those variable nodes with the same |Q(l)_i − Q(l−1)_i |, we pick the one with the largest reliability. Besides, for a variable nodes whose sign is changed, we want to give it a higher residual. Thus, we further modify the residual function

F(l)(vi) =

|Q(l)_i − Q(l−1)_i | |Q(l)_i /Q(l−1)_i + 1|.

For variable nodes with the same |Q(l)_i − Q(l−1)_i |/|Q(l)_i /Q(l−1)_i |, we pick the one whose sign is changed. We call the algorithm with the modified residual function modified EDS, and it is completely described in Algorithm 2.

In modified EDS, every time we pick one variable node from the window and update the neighboring check nodes. However, the neighboring check nodes of the variable nodes

(24)

Algorithm 2 Modified EDS

1: Pop (ms+ 1) × n variable nodes and (ms+ 1) × (n − k) check nodes into the window.

2: Pop out the last (ms+ 1) × n variable nodes and (ms+ 1) × (n − k) check nodes and

shift variable nodes and check nodes in the window to the left.

3: Set messages passed from these check node to 0.

4: Let F(0)vi = Xvi.

5: Sort all variables’ residuals. 6: for i = 1 to N do

7: Pick the variable node with the largest residual, update the neighboring check nodes, and update the neighbors of those check nodes.

8: Compute the residuals of updated variable nodes.

9: Set the residual of the selected variable node to 0.

10: Sort all variables’ residuals again. 11: end for

12: Go back to step 1.

in the last block may connect to the variable nodes which have already leave the window. On the other side, the neighboring check nodes of the variable nodes in the first block probably have not enter the window yet. Thus, we exclude the last and front (ms+ 1) × n

variable node from the selection. Besides, instead of using a huge stack to sort all variable nodes’ residuals, we use K − 2 stacks to sort the residuals of variable nodes in each block and another stack to sort the largest residual in each stack. One stack corresponds to one block. Thus, there is no need to clear stacks and sort all residuals again when new data enters. And that is why we propose popping in (ms + 1) × n instead of n channel

outputs each time. However, once the residual of a variable node is modified, there are log((ms+ 1) × n) + log(K − 2) ≈ log((ms+ 1) × n) comparisons.

For the flooding algorithm, it costs I processors and I × (ms+ 1) × (n + (n − k) ×P wr)

storage elements. However, for modified EDS algorithm, it costs one powerful processors, K × (ms+ 1) × (n + (n − k) ×P wr) storage elements, K − 2 stacks of size (ms+ 1) × n and

one stack of size K − 2. Additionally, the front needs a buffer of size n to hold new received values. Nevertheless, the buffer size of the latter should be expand to (ms+ 1) × n.

(25)

wrong variable node

correct variable node

Figure 4.1: A subgraph that most check nodes connect to even wrong variable nodes.

4.2 Improved Scheme Based on Perturbation

For the decoding based on the original EDS or the modified EDS, we find that it may not converge or converge to non-optimal codewords. These problems influence the performances in the waterfall region and the error-floor region. To mitigate these problems, we propose two improved schemes based on the perturbation and the bit-flipping, respectively and introduce these two schemes in this section and the next section.

During the decoding, sometimes the sequential schedule may cause that a lot of check nodes are satisfied but each of them connects to even wrong variable nodes, as been shown in Fig. 4.1. In Fig. 4.1, unless the wrong variable node which connects to the unsatisfied check node is corrected, other wrong variable nodes can be corrected. Numerous variable nodes, which can not converge, in this subgraph result in errors. Rather than to avoid the occurrence of the problem, we try to identify those questionable variable nodes and let them converge. We assume that if the transmitted bits of those questionable variable nodes suffer different set of noise, they may converge and be decoded successfully.

With the modified EDS, every time before popping in new data, we compute average log-APP of variable nodes in each block in the window to decide the state of convergence.

(26)

If there are blocks not convergent after being decoded for a while, we suppose that these blocks will not be decode correctly. Then, we active perturbation algorithm to try to let variable nodes in these blocks converge. To speak elaborately, we add additional zero-mean Gaussian noise ∆i with variance δ2 to the received value yi of a questionable variable nodes

vi. Then we reset the log-APPs of variable nodes in these blocks

Qi = 2(yi+ ∆i)/¯σ2 = 2(xi+ (ni+ ∆i))/¯σ2 = 2¯yi/¯σ2,

where ¯σ2 _{= σ}2_{+ δ}2 _{and n}

i is channel noise. After that, we execute hybrid operation PUC Ne

times and check if these blocks are convergent or not. Here, δ2 _{should be chosen carefully.}

It should not be neither too large nor too small. Since ¯yi can be viewed as that yi suffered

larger noise, if δ2 _{is too large, decoding result may get worse. However, if δ}2 _{is too small,}

it may make no difference to the original result. Unfortunately, ∆i’s may fail. If it does,

we generate another set of perturbation noise ∆0_i’s unless successive Nf tries fail. A block

diagram of combination of perturbation algorithm and modified EDS is shown in Fig. 4.2. Previously, perturbation algorithm (PA) is usually used to generate more candidate codewords. Besides, perturbation noise is added to the whole codeword. However, PA in this thesis is used to help variable nodes escape from the decoding trap, where parts of the Tanner graph can not converge. In addition, perturbation noise is added only to those questionable variable nodes.

4.3 Improved Scheme Based on Bit-Flipping

Several observations indicate that the decoding based on the original EDS or our im-proved EDS may converge to non-optimal codewords, which means that there is another codeword whose Euclidean distance is smaller than that between the decoded codeword and the received sequence, as shown in Fig. 4.3. To solve this problem, the idea is that we can generate several codeword candidates and choose the codeword with the best metric to be

(27)

Execute “operation A” N times

Shift data

Check convergence

let fail=0

N

Y

Perturbation

Execute “operation A” N

_extra

times and

check convergence every N

_C

times.

fail++, fail<L

_A

?

Y

N

Termination?

Y

N

Y

N

(28)

received sequence

transmitted

codeword

decoded codeword

Figure 4.3: A sketch map of codeword space.

For LDPC-CCs, the difference between a codeword and one of its neighboring codewords is composed of one or several groups of variable nodes. Each group of variable nodes and all check nodes which connect to them can form a special subgraph. In this special subgraph, all check nodes connect to even variable nodes in the subgraph. For convenience, we call the relationship between variable nodes in a group pattern Γi. For an LDPC-CC, there can

be infinite groups of variable nodes with the same pattern. Thus, Γi(t1) denotes the group

of variable nodes whose variable nodes locate at time instant Γ ≥ Γ1. Through different

patterns, we continuously check whether it is possible to get another codeword with a smaller Euclidean distance. However, assume that y is decoded into a wrong codeword with a larger Euclidean distance, and the difference between the decoded codeword and the transmitted codeword is composed of one group of variable nodes. Those wrong variable nodes in the group are usually caused by the decoding. Their received values may not be unreliable. Since to influence a group with many variable nodes is more difficult than a group with few ones, decoding errors are more likely to be variable nodes in small groups. Thus, only patterns of small groups are needed for improvement of the performance.

(29)

t

+

1 t

+

1

Figure 4.4: A special subgraph of the LDPC-CC in Ex. 3.1.

Example 4.1 There’s a group, which is composed of following variable nodes

v_t(4), v_t+74(4) , v_t+84(4) , v_t+90(5) , v_t+106(2) , v_t+144(1) , v(2)_t+154, v_t+192(2) , v(4)_t+194, v_t+206(4) , v_t+212(5) , v_t+218(1) , v_t+228(1) , v(2)_t+238, v_t+250(5) , v(1)_t+264, v_t+268(4) , v_t+276(2) , v_t+278(4) , v_t+284(5) , v(4)_t+290, v_t+296(5) , v(2)_t+300, v_t+312(1) , v_t+348(2) , v_t+396(1) , v_t+406(5) , v(2)_t+432, in an LDPC-CC with R = 0.5 and ms = 203, H(D) =           1 + D194 _D158 _D166 _D144 ₀ _D65 ₀ ₀ D97 D49 0 D203 D65 D37 1 0 0 D106 D83 D138 D48+ D132 1 0 0 0 0 1 0 0 0 D20 1 0 0 0 0 0 0 1 + D76 ₁           . Assume that there is a group Γi(t) going to leave the window. For Γi(t), we firstly

check its state of convergence and whether all related check nodes are satisfied. Since if these variable nodes are not convergent or some of related check nodes are not satisfied, the decoded sequence is not a codeword and there is no need to check whether there is a codeword with a smaller Euclidean distance. Then, we make hard decisions z(Γi(t)) for variable nodes

in Γi(t). In addition, we compute the Euclidean distance D1(Γi(t)) between z(Γi(t)) and the

(30)

Execute PUC N times Shift data

Check convergence

let F=0

False

True _{Are check nodes satisfied?}

True False

let i=0 Add perturbation noise

Execute PUC α times and check the state of convergence let F=0 F<β? True False True False

Compute D₁(i,t) and D₂(i,t)

D₁(i,t) > D₂(i,t)? 1 F++ Flip z(Γ_i(t)) True False

i<ζ?

i++ True False

Figure 4.5: The block diagram of the proposed algorithm.

the Euclidean distance D2(Γi(t)) between z

0

(Γi(t)) and y(Γi(t)). If D1(Γi(t)) > D2(Γi(t)),

we flip all variable nodes in Γi(t).

If the decoded codeword is farther from the received sequence than the correct codeword, as shown in Fig. 4.3, we may acquire the correct one by bit-flipping. However, if the decoded codeword is closer to the received sequence than the correct codeword, we can’t rectify it. What’s even worse, if the decoded codeword is the right one, but it is farther from the received sequence than others, we’ll make a mistake. However, the probability of the occurrence of the first case is the highest. Thus, we can further improve the performance by this method.

(31)

Chapter 5 Simulation Results

In this chapter, we will show the performances of the proposed algorithm on a BPSK-modulated AWGN channel for LDPC-CCs with PPCMs and RPCMs.

Example 5.1 Consider a rate R = 0.5 LDPC-CC with a PPCM

H1(D) =     1 + D194 D158 D166 D144 0 D65 D97 D49 1 D203 D65 D37 0 D106 _D83 _D138 _D48_{+ D}132 ₁    

with syndrome former memory ms= 203.

In Fig. 5.1(a), we show the performances of the flooding algorithm, the original EDS, our modified EDS, and the proposed algorithm. The number of iterations of the flooding algorithm is 100. For the original EDS, our modified EDS, and the proposed algorithm, let N = 20400 so that the amounts of check nodes being updated are the same with those of the flooding algorithm. Besides, K is 66 for the original EDS and our modified EDS. K is 50 for the proposed algorithm. Let α = 1000000 and β = 5. Perturbation approximately results in an increase of 0.5 times of the computation complexity. As we can see, since the modified EDS has a more appropriate updating order, it has a better performance than that of the original EDS. In addition, with the aids of the perturbation and the bit-flipping, the proposed algorithm can further improve the performance. Comparing to the performance of the flooding algorithm, the performance of the proposed algorithm is better. In Fig. 5.1(b), we also show the performance of the flooding algorithm, whose number of iterations is 200.

(32)

200 is sufficient for the convergence of the flooding algorithm. Revealed by the simulation results, the performance of the proposed algorithm is also better than that of the flooding algorithm.

Example 5.2 Consider a rate R = 0.4 LDPC-CC with a PPCM

H2(D) =     1 D251 _D353 _D376 _D278 D32 ₁ _D356 _D395 _D119 D256 _D37 ₁ _D359 _D312    

with syndrome former memory ms= 395.

In Fig. 5.2(a), we show the BER performances of an R = 0.4 LDPC-CC with a PPCM. The number of iterations of the flooding algorithm is 100. For scheduling-based schemes, let N = 39500 so that the amounts of check nodes being updated are the same with those of the flooding algorithm. Besides, K is 66 for the original EDS and our modified EDS. K is 50 for the proposed algorithm. Let α = 1000000 and β = 5. Perturbation also approximately results in an increase of 0.5 times of the computation complexity. As we can see, the performance of the modified EDS is better than that of the original EDS and the proposed algorithm further improve the performance. Comparing to the performance of the flooding algorithm, the performance of the proposed algorithm is better. In Fig. 5.2(b), the performance of the flooding algorithm, whose number of iterations is 500, is also shown. 500 is sufficient for the convergence of the flooding algorithm. Revealed by the simulation results, the performance of the proposed algorithm is also better than that of the flooding algorithm.

Example 5.3 Consider H1(D) in example 1. Now we replace its one polynomial term by

a rational one and obtain

0

 

1 + D194 _D158 _D166 _D144 ₀ _D65 

(33)

0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 10−7 10−6 10−5 10−4 10−3 10−2 10−1 SNR information BER

Flooding Algorithm I=100 EDS Modified EDS Proposed Algorithm (a) 0.8 1 1.2 1.4 1.6 1.8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 SNR information BER

Flooding Algorithm I=100 Flooding Algorithm I=200 EDS

Modified EDS Proposed Algorithm

(b)

(34)

0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 10−5 10−4 10−3 10−2 10−1 100 SNR Information BER

Flooding Algorithm I=100 EDS Modified EDS Proposed Algorithm (a) 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 10−5 10−4 10−3 10−2 10−1 100 SNR Information BER

(b)

(35)

By using two variable nodes, M1(D) and M2(D), we expand it into a super code. Let

M1(D) =

1

1 + D20_{+ D}76V3(D),

rewrite the check equation

D97V1(D) + D49V2(D) + 1 1 + D20_{+ D}76V3(D) + D 203_V 4(D) + D65V5(D) + D37V6(D) = 0 into D97V1(D) + D49V2(D) + D203V4(D) + D65V5(D) + D37V6(D) + M1(D) = 0

, and additionally add a new check equation

V3(D) + (1 + D20+ D76)M1(D) = 0. Then, we obtain H00₂(D) =        1 + D194 _D158 _D166 _D144 ₀ _D65 ₀ D97 _D49 ₀ _D203 _D65 _D37 ₁ 0 D106 _D83 _D138 _D48_{+ D}132 ₁ ₀ 0 0 1 0 0 0 1 + D20+ D76        .

However, the trinomial term 1 + D20+ D76 will results in small girth, thus we let M2(D) =

(1 + D76)M1(D) and transform H 00 2(D) into H000₂(D) =           1 + D194 D158 D166 D144 0 D65 0 0 D97 D49 0 D203 D65 D37 1 0 0 D106 _D83 _D138 _D48_{+ D}132 ₁ ₀ ₀ 0 0 1 0 0 0 D20 ₁ 0 0 0 0 0 0 1 + D76 ₁           . To maintain the same code rate, we puncture variable nodes V3(D) and M1(D). In

Fig. 5.3(a), we show the BER performances of an R = 0.5 LDPC-CC with an RPCM. The number of iterations of the flooding algorithm is 100. Let N = 34000 such that the amounts

(36)

0.8 1 1.2 1.4 1.6 1.8 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 SNR information BER

Flooding Algorithm I=100 EDS Modified EDS Proposed Algorithm (a) 0.8 1 1.2 1.4 1.6 1.8 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 SNR information BER

(b)

Figure 5.3: The BER performances of an R = 0.5 LDPC-CC with an RPCM, where ms =

(37)

of check nodes being updated of other algorithms are the same with those of the flooding algorithm. Besides, K is 66 for the original EDS and our modified EDS. K is 50 for the proposed algorithm. Let α = 1000000 and β = 5. Perturbation also approximately results in an increase of 0.5 times of the computation complexity. As we can see, the performance of the modified EDS is better than that of the original EDS and the proposed algorithm further improve the performance. Comparing to the performance of the flooding algorithm, the performance of the proposed algorithm is better. In Fig. 5.3(b), the performance of the flooding algorithm, whose number of iterations is 300, is also shown. 300 is sufficient for the convergence of the flooding algorithm. The performance of the proposed algorithm is also better than that of the flooding algorithm.

(38)

Chapter 6 Conclusion

In this thesis, we apply EDS to decode LDPC-CCs and modify the residual function of EDS to improve the BER performances. Besides, we analyze the decoded results of dynamic scheduling and propose two improved schemes based on the perturbation and the bit-flipping to further improve the BER performances. By the simulations results, the proposed algorithm is shown that it can provide a better BER performance comparing to that of the flooding algorithm for LDPC-CCs with rational parity-check matrices. In addition, for polynomial parity-check matrices, the proposed algorithm also performs well.

(39)

Bibliography

[1] A. Jim´enez Felstr¨om and K. Zigangirov, “Time-varing periodical convolutional codes with low-density parity-check matrix,” IEEE Trans. Inf. Theory, vol. 45, pp. 2181-2191, Sep. 1999.

[2] Guojun Han and Xingcheng Liu, “An efficient dynamic schedule for layered belief-propagation decoding of LDPC codes,” IEEE Commun. Lett., vol. 13, pp. 950-952, Dec. 2009.

[3] Valentin Savin, “Iterative LDPC decoding using neighborhood reliabilities,” in Proc. IEEE Int. Symp. Inform. Theory, Nice, France, June 2007, pp. 221-225.

[4] Hua Xiao and Amir H. Banihashemi, “Graph-based message-passing schedules for de-coding LDPC codes,” IEEE Trans. on Communications, vol. 52, pp. 2098-2105, Dec. 2004.

[5] Andres I. Vila Casado, Miguel Griot, and Richard D. Wesel, “Informed dynamic scheduling for belief-propagation decoding of LDPC codes,” in Proc. IEEE Int. Conf. Commun., Glasgow, Scotland, June 2007, pp. 932-937.

[6] Chih-Chieh Lai, “A study on LDPC-CC with rational parity-check metrices and related decoding algorithms,” master thesis, National Chiao Tung University, Hsinchu, Taiwan, R.O.C., 2009.

[7] Arvind Sridharan, “Design and analysis of LDPC convolutional dodes,” Ph.D. disser-tation, the University of Notre Dame, Indiana, U.S.A, 2005.

(40)

[8] J. Hahenauer, E. Offer, and L. Papke, “Iterative decoding of binary block and convo-lutional codes,” IEEE Trans. Inf. Theory, vol. 42, pp. 425-449, Mar. 1996.

適用於低密度奇偶檢查迴旋碼之改良式差值動態排程解碼演算法

國 立 交 通 大 學

電信工程學系碩士班

碩 士 論 文

適用於低密度奇偶檢查迴旋碼之改良式差

適用於低密度奇偶檢查迴旋碼之改良式差

適用於低密度奇偶檢查迴旋碼之改良式差

適用於低密度奇偶檢查迴旋碼之改良式差

值動態排程解碼演算法

值動態排程解碼演算法

值動態排程解碼演算法

值動態排程解碼演算法

Improved Residual

Improved Residual

Improved Residual

Improved Residual-

--

-Based Dynamic

Based Dynamic

Based Dynamic

Based Dynamic

Scheduling for Decoding of Low

Scheduling for Decoding of Low

Scheduling for Decoding of Low

Scheduling for Decoding of Low-

--

-Density

Density

Density

Density

Parity

Parity

Parity

Parity-

--

-Check Convolutional Codes

Check Convolutional Codes

Check Convolutional Codes

Check Convolutional Codes

研究生：吳牧諶

適用於低密度奇偶檢查迴旋碼之改良式差值動

適用於低密度奇偶檢查迴旋碼之改良式差值動

適用於低密度奇偶檢查迴旋碼之改良式差值動

適用於低密度奇偶檢查迴旋碼之改良式差值動

態排程解碼演算法

態排程解碼演算法

態排程解碼演算法

態排程解碼演算法

Improved Residual-Based Dynamic Scheduling

for Decoding of Low-Density Parity-Check

Convolutional Codes

研究生：吳牧諶 Student: Mu-Chen Wu

指導教授：王忠炫 Advisor: Chung-Hsuan Wang

國立交通大學

電信工程學系碩士班

碩士論文

A Thesis

Submitted to Department of Communication Engineering

College of Electrical and Computer Engineering

National Chiao Tung University

in Partial Fulfillment of the Requirements

for the Degree of

Master of Science

in

Communication Engineering

August, 2010

適用於低密度奇偶檢查迴旋碼之改良式差值動

適用於低密度奇偶檢查迴旋碼之改良式差值動

適用於低密度奇偶檢查迴旋碼之改良式差值動

適用於低密度奇偶檢查迴旋碼之改良式差值動

態排程解碼演算法

態排程解碼演算法

態排程解碼演算法

態排程解碼演算法

研究生：吳牧諶

指導教授：王忠炫 博士

國立交通大學

電信工程學系碩士班

摘

摘

國立交通大學

碩士論文

指導教授：王忠炫博士