The Whole Decoder Architecture - Architecture Designs of LDPC Code Decoders

Chapter 5 Architecture Designs of LDPC Code Decoders

5.1 The Whole Decoder Architecture

The parity-check matrix H in our design is in block-LDPC form as we discuss in section 2.2. The parity-check matrix is composed of m_b× sub-matrices. The n_b sub-matrices are zero matrices or permutation matrices with the same size of z z× . The permutations used are circular right shifts, and the set of permutation matrices contains the z z× identity matrix and circular right shifted versions of the identity matrix.

0,0 0,1 0, 1

1,0 1,1 1, 1

b b b b

m m m n

P P P

−

− − − −

⎡ ⎤

⎢ ⎥

= ⎢ ⎥

⎢ ⎥

⎣ ⎦

L L

M M L M

Figure 5.1 The parity check matrix H of block-LDPC Code

In our design, we consider a LDPC code with code-rate 1/2 and 288-by-576 parity-check matrix for 802.16e standard. While considering circuit complexity, the 288-by-576 parity-check matrix H of LDPC code are divided into four 144-by-288 sub-matrices to fit partial-parallel architecture, which is shown in Figure 5.2. The LDPC code decoder architecture in our design is illustrated in Figure 5.4. This architecture contains 144 CNUs, 288 BNUs and two dedicated message memory units (MMU). The set of data processed by CNUs are { ,h₀₀ h₀₁} and { , }h₁₀ h₁₁ , whereas the data fed into BNUs should be { ,h₀₀ h₁₀} and { , }h₀₁ h₁₁ . Note that two MMUs are employed to process two different codewords concurrently without stalls. Therefore, the LDPC decoder is not only area-efficient but also its the decoding speed is comparable with fully parallel architectures.

Figure 5.2 The partition of parity-check matrix H

Figure 5.3 I/O pin of the decoder IP

Figure 5.4 The whole LDPC decoder architecture for the block LDPC code

The I/O pin of the decoder chip is shown in Figure 5.3. Figure 5.4 shows the block diagram of the decoder architecture. The modules in it will be described explicitly in the following. We adopt partial-parallel architectures [19], so the decoder can handle 2 codewords at one time.

Input Buffer [19]

The input buffer is a storage component that receives and keeps channel values for iterative decoding. Channel values should be fed into the COPY module during initialization and BNU processing time.

COPY, INDEX, and ROM modules

The parity-check matrix H is sparse which means there are few ones in the matrix. It is not worth to save the whole parity-check matrix in the memory. So we use the module INDEX to keep the information of H. We take a simple example to explain how these modules work. Figure 5.4 shows the simple parity-check matrix.

Figure 5.5 A simple parity-check matrix example, based on shifted identity matrix.

The parity-check matrix is composed by 4 sub-matrices and the sub-matrices are right-circular-shifted matrices. The shifted numbers are expressed in Figure 5.5. Since the parity-check matrix size in this example is 8-by-8, we receive 8 channel values.

The channel values are assumed to be vr=

[

v1 v2 v3 v4 v5 v6 v7 v8

]

, and then they are fed to the module “COPY”. Figure 5.6 (a) and 5.6 (b) show how modules “COPY”, “INDEX”, “ROM” work. The outputs of the module “INDEX” are

1, , , 2 3 4

iv uv uv uvi i i

. They reserve the channel values and add the indices of the shifted numbers. The indices of the shifted numbers are stored in module “ROM.”

Figure 5.6 (a) The sub-modules of the whole decoder

Figure 5.6 (b) The outputs of the module INDEX

The indices represent the shifted amounts and the information of H. So we place the indices in front of the channel values.

SHUFFLE1, SHUFFLE2 modules

Before sending the values to the check-node update unit, we have to shuffle left the values in order to give the correct positions when doing check-node computation and shuffle right the values before doing the bit-node computation. The amount of the shuffling value is decided by the index numbers. Figure 5.7(a) and 5.7(b) show how modules SHUFFLE1 and SHUFFLE2 work. In this example,

2 7 3 8 4 5 1 6

( , ),( , ),( , ),( , )v v v v v v v v are the input pairs of the check-node update unit.

Before sending the values to the bit-node update unit, we have to shuffle back the values. Thus we can have the correct answers.

Figure 5.7(a) Values shuffling before sending to check-node update unit

Figure 5.7(b) Values shuffling before sending to bit-node update unit CNU[15]

Check node update units (CNUs) are used to compute the check node equation.

The check-to-bit message r for the check node _{m l}_, m and bit node l using the incoming bit-to-check messages q_{m l}_, is computed by CNU as follows

, , , '

( )\

( ) min{ }

m l m l m l

l L m l

r sign q _′ q

′∈

∏

× ^(5.1)

where ( ) \L m l denotes the set of bit nodes connected to the check node m except l. Figure 5.8(a) shows the architecture of the CNU using the min-sum algorithm. The check node update unit has 6 inputs and 6 outputs. In Figure 5.8(a) and 5.8(b), the output of “MIN” is the minimal value of the 2 inputs. The aim of this circuit is to find the minimal value of the other 5 inputs. This architecture is quite straightforward.

Figure 5.8(b) shows the architecture of the CNU using the proposed modified min-sum algorithm.

Figure 5.8(a) The architecture of CNU using min-sum algorithm

Figure 5.8(b) The architecture of CNU using modified min-sum algorithm

The other way to implement equation (5.1) is to search the minimal value and the second minimal value from inputs. Figure 5.9 shows the block diagram of the compare-select unit (CS6). The detailed architecture of CMP-6 in Figure 5.9 is illustrated in Figure 5.10, which consists of two kinds of comparators: CMP-2 and CMP-4. CMP-4 finds out the minimal and the second minimal values from the four inputs, a, b, c , and d. In addition, CMP-2 is a two input comparator which is much simpler than CMP-4.

Figure 5.9 Block diagram of CS6 module

Figure 5.10(a) Block diagram of CMP-4 module

Figure 5.10(b) Block diagram of CMP-6 module

The whole architecture of the 6-input CNU is shown in Figure 5.11.

Figure 5.11 CNU architecture using min-sum algorithm

Table 5.1 compares the hardware performance of two different CNU architectures. We call the architecture in Figure 5.8(a) is direct CNU architecture and the architecture in Figure 5.11 is backhanded CNU architecture. We can find that the direct CNU architecture has only 45% size of the backhanded CNU architecture. So we choose the direct CNU architecture.

Table 5.1 Comparison of direct and backhanded CNU architectures

Direct CNU architecture Backhanded CNU architecture

Area (gate count) 0.52k 1.16k

Speed (MHz) 100 100

Power Consumption (mW)

4.82 10.85

BNU

Figure 5.12 shows the architecture of the bit node update unit for 4 inputs. “SM”

means the sign-magnitude representation and “2’s” means the two’s compliment representation. While finding the absolute minimal value of two inputs, sign-magnitude representation is more suitable for hardware implementation than two’s compliment. In contrast, while adding computation, two’s compliment representation is more suitable for hardware implementation than sign-magnitude representation.

Figure 5.12 The architecture of the bit node updating unit with 4 inputs

MMU0 and MMU1 [19]

In [19], it introduces a partial-parallel decoder architecture that can increase the decoder throughput with moderate decoder area. We adopt the partial-parallel architecture in our design and make an improvement in the message memory units.

Message memory units (MMU) are used to store the message values that are generated by CNUs and BNUs. To increase the decoding throughput, two MMUs are employed to concurrently process two different codewords in the decoder. The register exchange scheme based on four sub-blocks (RE-4B) is proposed as shown in Figure 5.13(a). In MMU, sub-blocks A, B, D capture the outputs from CNU while sub-blocks C and D deliver the message data to SHUFFLE2. The detailed timing diagram of MMU0 and MMU1 are illustrated in Figure 5.13(b). h_xy⁽⁰⁾ means the copied message of codeword 0 and h_xy⁽¹⁾ means that of codeword 1.

Figure 5.13(a) The architecture of RE-4B based MMU

Figure 5.13(b) The timing diagram of the message memory units

While in the iterative decoding procedure, MMU0 and MMU1 pass messages to each other through SHUFFLE1, CNU, SHUFFLE2, and BNU modules. Disregarding the combinational circuit, the detailed relationship and snapshots between MMU0 and MMU1 is shown in Figure 5.14.

Figure 5.14 The message passing snapshots between MMU0 and MMU1

在文檔中低密度對偶檢查碼解碼演算法之改進以及其高速解碼器架構之設計 (頁 63-76)