• 沒有找到結果。

Chapter 2 Low-Density Parity-Check Code

2.4 Conventional LDPC Decoding Algorithm

2.4.2 Message Passing Algorithm

arg(min0 )

k k

i n i

j S

≤ <

= (2.33)

2.4.2 Message Passing Algorithm [11]

Since the bit-flipping algorithm is hard to be implemented in hardware, the message passing algorithm is extensively used for LDPC decoding. The message passing algorithm is an iterative decoding process. Messages between variable nodes and check nodes are exchanged back and forth. The decoder expects that error will be corrected progressively by using this iterative message-passing algorithm. At present, there are two types of iterative decoding algorithms applied to LDPC codes in general.

„ Sum-product algorithm, also known as belief propagation algorithm.

„ Min-sum algorithm

Both of sum-product algorithm and min-sum algorithm are message passing algorithms. In the following, we will discuss these two algorithms in detail. First, we explain the decoding procedure in Tanner graph below.

Decoding Procedure in Tanner Graph Form

Now we make a description of the message passing algorithm using Tanner graph form. Here is a simple example of irregular LDPC code. The parity-check matrix is shown below.

Tanner graph of this parity-check matrix is shown in Figure 2.7.

Figure 2.7 Tanner graph of the given example parity-check matrix

Assume every line in the Tanner graph has two information messages. One is expressed in a solid line and the other is expressed in a dotted line. We use the messages to decode the received signal. For convenience of explanation, we take one part of Tanner graph which is shown below.

1 1 0 1 1 0 1 1

H ⎡ ⎤

= ⎢ ⎥

⎣ ⎦

S2

x1 x2 x3 x4 S1

The solid line and the dotted line are represented by qsx and rxs, respectively. In this example, we can get qs1x1 by rx2s1 and rx4s1. Equation (2.34) shows how to compute qs1x1.

1 1 ( 2 1 4 1)

s x x s x s

q =CHK r r (2.34)

On the other hand, we can also get rx1s1 by qs2x1 and L , where 1 L is the 1 initialization value. The initialization value L will be discussed later. Equation (2.35) 1 shows how to compute

1 1

x s

r .

1 1 ( 2 1 1)

x s s x

r =VAR q L (2.35)

There is CHK function in equation (2.34) and VAR function in equation (2.35).

The two special functions will be introduced in the following contents. In the Tanner graph, we can compute the solid line message qsx by the dotted line messages rxs which are connected to the same check node. In the same way, we can compute the

dotted line message rxs by the real line messages qsx which are connected to the same bit node. So the values of rxs and qsx are updated iteratively. We call this iterative decoding.

Decoding Procedure in Matrix Form

Because Tanner graph is a representation of the parity-check matrix H, we can also use the matrix form to replace Tanner graph form. Let us take the same parity-check matrix H in the previous section 1 1 0 1

1 0 1 1

H ⎡ ⎤

= ⎢ ⎥

⎣ ⎦ as an example. In equation (2.36) and equation (2.37), we define matrix Q and matrix R. The positions of the nonzero values in R and Q are the same as those of the ones in H.

The elements in the matrix Q are computed by the elements in the matrix R, for example, qs1x1 =CHK r( x2s1rx4s1). On the other hand, the elements in the matrix R are computed by the elements in the matrix Q. For example,

1 1 ( 2 1 1)

x s s x

r =VAR q L , where L is the initialization value. So the elements in 1 matrix R and Q are updated iteratively. We can also regard the CHK function as the horizontal step and VAR function as the vertical step in the decoding procedure.

In the LDPC iterative decoding procedure, there are two main functions: VAR and CHK. Equation (2.38) shows the VAR function with two inputs and equation (2.39) is the general form of the VAR function. The VAR function is fixed regardless of the decoding algorithms. It is just a summation operation.

The CHK function with two inputs can be reformulated in different forms.

There are When CHK function is in the form of equation (2.40) or equation (2.42), we call the decoding algorithm as sum-product algorithm. The fourth term

2

equation (2.42) is called the correction factor. When the check node computation is in

the form of equation (2.43), or in other words an approximate form, we call it the min-sum algorithm.

The above discussion of check node computation is only about the CHK function with two inputs. Now, we will discuss the general form of the CHK function. The general form of the CHK function can be expressed in equation (2.44).

) )...) )

( ( (...

( )

...

(L1 L2 Ll CHK CHK CHK CHK L1 L2 L3 Ll

CHK ⊕ ⊕ ⊕ = ⊕ ⊕ ⊕ (2.44)

The purpose of equation (2.44) is to unfold CHK L( 1L2⊕ ⊕... Ll). The procedure is:

first, compute a1=CHK L( 1L2), then a2 =CHK a( 1L3), …,

1 ( 2 )

l l l

a =CHK aL . The computation result of equation (2.44) is al1. This can be viewed as serial computation. Figure 2.8 shows the serial configuration for the general form of the CHK function.

Figure 2.8 Serial configuration for check node update function

The serial computation has a long critical path in the check node update unit.

From equations (2.40), (2.43), and (2.44), we can generalize the CHK function as equation (2.45) for sum-product algorithm, and equation (2.46) for min-sum algorithm.

1 2 1 2 1

( l) l ( ) [ (i ) ( ) ( l)]

i

CHK L L L sign L φ φ L φ L φ L

=

⊕ ⊕ ⊕L =

+ + +L (2.45)

where ( ) ln 1 1

x x

x e

φ = ⎜e +

⎝ − ⎠

1 2 1 2 1 2

( l) ( ) ( ) ( ) min[l , , , l ]

CHK LL ⊕ ⊕L L =sign Lsign L ⋅Lsign L L L L L (2.46) Equations (2.45) and (2.46) tell us that the check node update function can also be

viewed as parallel configuration. If we derive the check node update function in parallel configuration, the critical path of the check node update function will be reduced. Figure 2.9 and 2.10 respectively show the check node updating function of the sum-product algorithm and the min-sum algorithm. These two figures neglect the multiplication of the sign symbols for an artistic view of the figures.

Figure 2.9 Check node update function of sum-product algorithm

Figure 2.10 Check node update function of min-sum algorithm

Iterative Decoding Procedure [12]

The discussion in section 2.4.2 is only part of the whole iterative decoding procedure. Now, we consider the actual decoding procedure. It means that there will involve many iterations for a decoding process. First, let us describe some notations for the iterative decoding procedure in Figure 2.11. M(l) denotes the set of check nodes that are connected to the variable node l, i.e., positions of “1”s in the l th column of the parity-check matrix. L(m) denotes the set of variable nodes that participate in the m parity-check equation, i.e., the positions of “1”s in the th m th row of the parity-check matrix. L(m)\l represents the set L(m) excluding the l th iterative decoding procedure is shown below.

mindex nodeCheck

Figure 2.11 Notations for iterative decoding procedure

1. Initialization transmitted, the probability that the receiver receives a, where σ2 is the noise variance of the Gaussian channel. For every position ( lm, ) such that Hm,l =1,

m l

q is initialized as

m l l

q = . L (2.47)

2. Message passing

Step1 (message passing from check nodes to variable nodes): Each check node m gathers all the incoming message qml’s, and update the message on the variable node l based on the messages from all other variable nodes connected to the check node m.

L denotes the set of variable nodes that participate in the m parity-check th equation. )L(m can also be viewed as the horizontal set in the parity check matrix H.

Step2 (message passing from variable nodes to check nodes): Each variable node l passes its probability message to all the check nodes that are connected to it.

( )\ ( )\ that are connected to the variable node l are summed up.

( ) ( )

Hard decision is made on ql. The decoded vector is decided as Otherwise, it goes to step1 until the parity-check equation is procured or the specific maximum iteration number is reached. The whole LDPC decoding procedure can be expressed in Figure 2.12.

Figure 2.12 The whole LDPC decoding procedure

Table 2.4 Summary of sum-product algorithm 1. Initialization:

2. Message passing:

Step1: Message passing from check nodes to variable nodes. For each ml, ,

If HxT then x is the estimated codeword

or the iteration number is reached a predetermined threshold the algorithm stops

2. Message passing:

Step1: Message passing from check nodes to variable nodes. For each ml, ,

Chapter 3

Modified Min-Sum Algorithms

In this chapter, we will introduce modified LDPC decoding algorithms. As mentioned in chapter 2, the sum-product algorithm has better performance than min-sum algorithm. In the following, we will depict the difference between sum-product algorithm and min-sum algorithm. Our final goal is to modify min-sum algorithm in order to achieve decoding performances close to sum-product algorithm.

3.1 Normalization Technique for Min-Sum Algorithm [14]

Equation (3.1) is the check node updating function in the sum-product algorithm.

In equation (3.1), there is a major function ( ) ln 1 1

x x

x e

φ = ⎜e +

⎝ − ⎠. The function plot of ( )x

φ is shown in Figure 3.1. Implementation of the nonlinear function ( )φ x is complicated. Even the commonly adopted table-look-up scheme suffers loss in error performance because of the large quantization error, especially when x is small.

1 2 1 2

1

( ) ( ) [ ( ) ( ) ( )]

w

l i l

i

CHK L L L sign L φ φ L φ L φ L

=

⊕ ⊕ ⊕L =

+ + +L (3.1)

0 1 2 3 4 5 6 7 8 0

1 2 3 4 5 6

x

1 1()ln

x xe exφ⎛⎞+ ⎜⎟ ⎝⎠=

Figure 3.1 Function plot of ( )φ x

Equation (3.2) is the check node update function used in min-sum algorithm.

The key part of equation (3.2) is to find the minimal value among w numbers:

1 2

min[L , L , ,L Lw]. The value of w is decided by the row weight of the parity check matrix H and it’s usually small (say, 6 or 7). Therefore, the min-sum algorithm is more suitable to for implementation in hardware.

1 2 1 2

1

( w) w ( ) min[i , , , w]

i

CHK L L L sign L L L L

=

⊕ ⊕ ⊕L =

L (3.2)

As we mentioned in chapter 2, equation (3.2) is an approximate form of equation (3.1). Assume the result of equation (3.1) is A and the that of equation (3.2) is B. In [14], it proves the following two statements about the relationship between A and B.

Statements:

(1) Values A and B have the same sign, i.e.,sign A( )=sign B( );

(2) The absolute magnitude of B is always greater than that of A, i.e., B > A Statement (1) is quite straightforward because ( )φ x and min( )x are both positive functions. For convenience of proving statement (2), we assume B = Li , where i is an arbitrary number between 1 and w.

Note that because ( )φ x is a decreasing function, the comparison symbol should be changed if one takes the function ( )φ x on both inequality sides. Hence statement (2) is proved.

These two statements suggest the use of normalization to get more accurate soft values from B . In other words, one can multiply B by a factor β which is smaller than 1 to get a better approximation of A . To determine the normalization factor β, one can consider the criterion of forcing the mean of the normalized magnitude β⋅ B to equal the mean of the magnitude A [14], i.e. may not be the best, but it seems a quite reasonable choice. In the following, a theoretical value of β is derived.

It is assumed the channel is a Gaussian channel with noise variance σ2. For convenience, one denotes the set { :L ii =1, 2, , }L w . Then L are independent, and i identically distributed (i.i.d.) random variables. The probability density function (p.d.f.) of L depends on SNR and code rate. One can also write i of A and B statistically based on equations (3.4) and (3.5). The normalization factor can be obtained from equation (3.3). One can calculate equations (3.4) and (3.5) by the theory of probabilities.

First, one can calculate E B[ ]. Let Mi = Li , 1, 2, ,i= L w, so that the p.d.f. of

The second integration in (3.8) can be omitted and finally one obtains

A few lower-order terms of equation (3.12) are enough to give a very good estimation of E A[ ] in most cases. Combined with value E B[ ] given in equation (3.9), one

can obtain the theoretical value of the normalization factor β. But in practical, the theoretical value of β is hard to compute. To use the theoretical value of β for different SNR values seems to be impractical. Thus, for a specific LDPC code, one can associate a fixed normalization factor through simulations.

Now, let’s set the number of w (the input number of a check node updating function) to 6. This is because the row-weight of H is 6 or 7 in 802.16e standard (see appendix A). Assume

1 2 6

[ ( ) ( ) ( )]

A=φ φ LL + +L φ L (3.13)

1 2 6

min[ , , , ]

B= L L L L (3.14)

The purpose of Figure 3.2 is to find the normalization factor β. The vertical axis of Figure 3.2 is β⋅ −B A, and the horizontal axis is β. In hardware implementation, only a certain value of β will be chosen for finite-precision representation. For example, one can set β to be a multiple of 0.125 for simple hardware implementation. Through Figure 3.2, our objective is to choose the most appropriate

β so that the value of β⋅ −B A is as small as possible. From simulations, β =0.75 is found to be a suitable value. When β is 0.75, it is shown that

B A

β⋅ − is less than 0.2.

Figure 3.2 The absolute difference between the normalization technique and sum-product algorithm, vs. the normalization factorβ

3.2 Dynamic Normalization Technique for Min-Sum Algorithm [23]

In section 3.1, one can use the normalized factor β to compensate the result of equation (3.2) so that it can approximate equation (3.1) more accurately. In [23], it shows the idea to adjust the normalized factor β dynamically to get better decoding performance. Thus the normalization factor β can have the form:

1

2

, when , when

B K

B K

β β β

⎧ <

= ⎨⎩ ≥ (3.15)

In [23], it selects two normalization factors β1 and β2 first. For convenience of hardware implementation, only certain simple values of β1 and β2 should be chosen for finite-precision realizations. For check node degree of 6, it found that

1 0.75

β = and β2 =0.875 are good choices. Then through simulations, one can find the optimum threshold value K to have the lowest decoder BER. The detailed

simulation results are in chapter 4.

3.3 Proposed Dynamic Normalized-Offset-Compensation Technique for Min-Sum Algorithm

Compared to the dynamic normalization technique, one can extend the idea by adding an additional offset factor α to equation (3.2) [6] in order to get even more accurate check-node updating values. Equation (3.16) shows the normalized-offset technique for min-sum algorithm.

1 2 1 2

1

( w) w ( ){i min[ , , , w] }

i

CHK L L L sign L β L L L α

=

⊕ ⊕ ⊕L =

L + (3.16)

In section 3.1, we have decided the value 0.75 of β when the check node degree is 6.

Through simulations in chapter 4, we find that for fixed value of α , the decoding performance is not always better than that of α =0. So we have the idea to adjust the offset factor α dynamically.

Now, we have the inspiration if the offset factor α can be dynamically adjusted to get better performance. Equation (3.17) shows the dynamic offset factor α .

1

2

, when , when

B K

B K

α α α

⎧ <

= ⎨⎩ ≥ (3.17)

Through simulations, we can decide the best values of α1 and α2. As we discuss in section 3.1, In hardware implementation, only certain simple values of α1and α2 will be chosen for finite-precision realizations. For check node degree of 6, we found that α1= and 0 α2 =0.125 are good choices.

In the following, we are going to decide the threshold K for a particular LDPC Code. Figure 3.3 shows the selection of K for rate 1/2 LDPC code vs. SNRs. K=0

means that we have fixed offset factor α . Otherwise, we have the dynamic offset factor α . In Figure 3.4, we can find the threshold value K equal to 1.5 is a good choice. The detail simulation results will be shown in chapter 4.

Figure 3.3 BER performance vs. threshold values K for rate 1/2 LDPC code

Chapter 4

Simulation Results and Analysis

In the beginning of this chapter, we will make a comparison of error correction performances by using different structures of the parity-check matrices such as randomly constructed code, and block-LDPC code in 802.16e standard. Then we will make a comparison of error correction performance with major decoding algorithms for LDPC codes such as sum-product algorithm, min-sum algorithm, and the proposed improved min-sum algorithm. In the end, we will furthermore analyze the finite-precision effects on the decoding performance, and decide proper word lengths of variables considering tradeoffs between the performance and the hardware cost.

Before proceeding to the following simulations, some parameters should be described here:

1: The randomly constructed codes are derived from [22], and they have a regular column weight and row weight.

2: The block-LDPC code used is for 802.16e standard.

3: For the decoding algorithm, we adopt the sum-product algorithm, min-sum algorithm, and the proposed modified min-sum algorithm.

4: We assume AWGN channels and BPSK modulation as our test environment conditions.

4.1 Floating-Point Simulations

One of the most important factors of concern when decoding the received signals is the iteration number. As the number becomes larger, the correct codewords are more likely to be decoded. However, more iterations imply higher computation cost and latency. Therefore, we need to choose a proper iteration number in the decoding process. In Figure 4.1, we show the BER simulation results vs. SNR, with different iteration numbers, for the LDPC code at rate 1/2 and length 576, BPSK, and sum-product decoding algorithm are adopted. We can find that the performance improvement tends to be insignificant after 10 iterations, which is about 0.2 dB. As a result, LDPC decoding with 10 iterations is considered as a good choice for practical implementation.

1 1.5 2 2.5 3 3.5 4

10-5 10-4 10-3 10-2 10-1 100

Eb/No

BER

iteration=1 iteration 10 iteration 20 iteration 30 iteration 50

Figure 4.1 Decoding performance at different iteration numbers.

1 1.5 2 2.5 3 3.5 4 10-7

10-6 10-5 10-4 10-3 10-2 10-1

Eb/No

BER

Length 576 length 2304

Figure 4.2 BER Performance of the rate-1/2 code at different codeword lengths, in AWGN channel, maximum iteration=10.

1 1.5 2 2.5 3 3.5 4

10-6 10-5 10-4 10-3 10-2 10-1 100

Eb/No

BER

Min-Sum Algorithm Sum-Product Algorithm

Figure 4.3 Floating-point BER simulations of two decoding algorithms in AWGN channel with code length=576, code rate=1/2, maximum iteration=10.

1 1.5 2 2.5 3 3.5 4 10-6

10-5 10-4 10-3 10-2 10-1 100

Eb/N o

BER

Normalized min-sum:beta=0.5 Normalized min-sum:beta=0.75 Normalized min-sum:beta=0.875 Sum-Product

Figure 4.4 Floating-point BER simulations of normalized min-sum decoding algorithms in AWGN channel with code length=576, code rate=1/2, maximum iteration=10.

1 1.5 2 2.5 3 3.5 4

10-6 10-5 10-4 10-3 10-2 10-1 100

Eb/No

BER

afa=0.25 afa=-0.25 afa=0 afa=0.125

Figure 4.5 Floating-point BER simulations under normalized-offset technique in min-sum decoding algorithms, in AWGN channel with code length=576, code rate=1/2, maximum iteration=10.

1 1.5 2 2.5 3 3.5 4 10-6

10-5 10-4 10-3 10-2 10-1 100

Eb/N o

BER

NMS:beta=0.75 Proposed DNOMS Sum-Product

Figure 4.6 Floating-point BER simulations of the dynamic normalized-offset min-sum decoding algorithm and its comparison with other algorithms, in AWGN channel with code length=576, code rate=1/2.

1 1.5 2 2.5 3 3.5 4

10-6 10-5 10-4 10-3 10-2 10-1 100

Eb/No

BER

Dynamic Normalization Proposed DNOMS

Dynamic Normalization with Offset Factor Sum-Product

Figure 4.7 Floating-point BER simulations under normalized-offset-compensated technique and dynamic normalization technique in min-sum algorithm.

4.2 Fixed-Point Simulations

In this section, we furthermore analyze the finite-word-length performance of the LDPC decoder. Possible tradeoff between hardware complexity and decoding performance will be discussed. Let [t:f] denote the quantization scheme in which a total of t bits are used, and f bits are used for the fractional part of the values.

Various quantization configurations such as [6:3], [7:3], [8:4] are investigated here.

1 1.5 2 2.5 3 3.5 4

10-5 10-4 10-3 10-2 10-1 100

Eb/No

BER

Fixed-Point MS[6:3]

Fixed-Point MS[7:3]

Fixed-Point MS[8:4]

floating-point MS

Figure 4.8 Fixed-point BER simulations of three different quantization

configurations of min-sum decoding algorithm, in AWGN channel, code length=576, code rate=1/2, maximum iteration=10.

1 1.5 2 2.5 3 3.5 4 10-6

10-5 10-4 10-3 10-2 10-1 100

Eb/N o

BER

NMS fixed-point[7:3] beta=0.75 NMS floating-point beta=0.75 DNOMS fixed-point[7:3]

DNOMS floating-point

Figure 4.9 Floating-point vs. fixed-point BER simulations of the normalization and dynamic normalized-offset min-sum algorithm.

Chapter 5

Architecture Designs of LDPC Code Decoders

In this chapter, we will introduce the hardware architectures of the LDPC code decoder in our design and discuss the implementation of an irregular LDPC decoder for 802.16e standard. The decoder has a code rate 1/2 and code length of 576 bits. The parity-check matrix of this code is listed in Appendix A.

5.1 The Whole Decoder Architecture

The parity-check matrix H in our design is in block-LDPC form as we discuss in section 2.2. The parity-check matrix is composed of mb× sub-matrices. The nb sub-matrices are zero matrices or permutation matrices with the same size of z z× . The permutations used are circular right shifts, and the set of permutation matrices contains the z z× identity matrix and circular right shifted versions of the identity matrix.

0,0 0,1 0, 1

1,0 1,1 1, 1

1,0 1,1 1, 1

b

b

b b b b

n

n

m m m n

P P P

P P P

H

P P P

⎡ ⎤

⎢ ⎥

⎢ ⎥

= ⎢ ⎥

⎢ ⎥

⎢ ⎥

⎣ ⎦

L L

M M L M

L

Figure 5.1 The parity check matrix H of block-LDPC Code

In our design, we consider a LDPC code with code-rate 1/2 and 288-by-576 parity-check matrix for 802.16e standard. While considering circuit complexity, the 288-by-576 parity-check matrix H of LDPC code are divided into four 144-by-288 sub-matrices to fit partial-parallel architecture, which is shown in Figure 5.2. The LDPC code decoder architecture in our design is illustrated in Figure 5.4. This architecture contains 144 CNUs, 288 BNUs and two dedicated message memory units (MMU). The set of data processed by CNUs are { ,h00 h01} and { , }h10 h11 , whereas the data fed into BNUs should be { ,h00 h10} and { , }h01 h11 . Note that two MMUs are employed to process two different codewords concurrently without stalls. Therefore, the LDPC decoder is not only area-efficient but also its the decoding speed is comparable with fully parallel architectures.

Figure 5.2 The partition of parity-check matrix H

Figure 5.3 I/O pin of the decoder IP

Figure 5.4 The whole LDPC decoder architecture for the block LDPC code

The I/O pin of the decoder chip is shown in Figure 5.3. Figure 5.4 shows the block diagram of the decoder architecture. The modules in it will be described explicitly in the following. We adopt partial-parallel architectures [19], so the decoder can handle 2 codewords at one time.

Input Buffer [19]

The input buffer is a storage component that receives and keeps channel values

The input buffer is a storage component that receives and keeps channel values