Performance in AWGN Channel with Fixed-Point ProcessingProcessing

Simulation and DSP Implementation of LDPC Encoder and Decoder

5.2 Performance in AWGN Channel with Fixed-Point ProcessingProcessing

In the above, we select the offset BP-based algorithm to convert the floating-point value to the fixed-point value. By multiplying the original floating-point values by 1000 and rounding the result to integer. Then we use 12 bits to represent this result. Note that we only change the number of bit in the decoder input. We fix the integer part in 4 bit and change the fraction part bit numbers. But the precision of intermediate results decoding computation is still 16 bits.

In Fig. 5.6, we compare the performance when bit number used in decoder is between 5 to 12 for offset BP-based decoding at rate 1/2, length 576 and three different modulations.

When we use 8 to 12 bits, the BER curves are almost the same for QPSK, 16QAM and 64QAM. For QPSK, the BER curve when we use 5 bit, is in our acceptable bound. But in

1 2 3 4 5 6 7 8 9 10 11

LDPC code Offset BP−Based decoding with β=0.125, QPSK, Rate 1/2, Length 576, Iteration 20 Uncoded QPSK

LDPC code Offset BP−Based decoding with β=0.125, 16QAM, Rate 1/2, Length 576, Iteration 20 Uncoded 16QAM

LDPC code Offset BP−Based decoding with β=0.125, 64QAM, Rate 1/2, Length 576, Iteration 20 Uncoded 64QAM

Figure 5.6: LDPC decoding performance at different bit numbers with different modulations employing fixed-point computation.

1 2 3 4 5 6 7 8 9 10 11

LDPC code Offset BP−Based decoding with β=0.125, QPSK, Rate 1/2, Length 576, Iteration 20

Uncoded QPSK

LDPC code Offset BP−Based decoding with β=0.125, QPSK, Rate 5/6, Length 576, Iteration 20

Uncoded QPSK

Figure 5.7: LDPC decoding performance at different bit numbers at two different coding rate employing fixed-point computation.

1 2 3 4 5 6 7 8 9 10 11

LDPC code Offset BP−Based decoding with β=0.125, QPSK, Rate 1/2, Length 576, Iteration 20

Uncoded QPSK

LDPC code Offset BP−Based decoding with β=0.125, QPSK, Rate 1/2, Length 2304, Iteration 20

Uncoded QPSK

Figure 5.8: LDPC decoding performance at different bit numbers at two different codeword lengths employing fixed-point computation.

16QAM and 64QAM, 6 bit is the limit that we can acceptable. Table 5.3 shows the coding gain between floating-point and fixed-point.

In Fig. 5.7, we compare the performance between coding rates 1/2 and 5/6. When coding rate is 5/6, the SNR need at least 6.5 dB to keep the performance better than uncoded QPSK if we use 7–12 bit. 6 bit is the boundary that we can accept, that the SNR need more than 7 dB to keep the performance better than uncoded QPSK.

In Fig. 5.8, we compare the performance between codeword lengths 576 and 2304. As we discuss above, when length is 576, 5 bit is not enough. But in long codeword length, the BP-Based algorithm is optimum. For the codeword length 2304, it has very good performance.

Then, we can also use 5 bit to implement our decoder. The performance just less 1 dB than we use 6 bit to implement.

5.2.1 Profile of the DSP code

Encoder

First, we optimize our code and show the profile. In the case, codeword length 512 and code rate 1/2, it need 21715443 cycles to encode one block. However, the speed performance is awful. As we discuss in chapter 2, LDPC encoder needs to compute the shift size and do the circular shift. Coding one block, it uses circular shift 1002 times at rate 1/2 and codeword length 576. In Table 5.4, 96.3% execution time expends on doing circular shift. In Fig. 5.9, we show the C codes of circular shift. In Fig. 5.10 and 5.11, show the assembly codes of circular shift. At every codeword length, the value of “ZZ” is known. Then we can calculate the circular shift value and compute the circular shifted matrix by ourself. Now, we reduce the C code about circular shift and compute the p(f, i, j) initially. We write the circular shifted matrix into a table. The encoder reads circular shifted matrix from the table. In Table 5.5, it just needs 812491 cycles to encode one block.

Table 5.4: Original Profile of LDPC Encoder (Cycles)

Areas Cycles Percentage (%) Processing Rate (kbits/sec)

LDPC Encoder 21684592 100 13.3

Circular Shift 20881567 96.3

Table 5.5: Profile of LDPC Encoder with Matrix Table (Cycles)

Areas Cycles Processing Rate

(kbits/sec)

Improvement (%)

LDPC Encoder (Original) 21684592 13.3 N/A

LDPC Encoder (with Table) 812491 354.5 96.3

Table 5.6: Profile of LDPC Encoder with Different Coding Rates Coding Rate

Profile 1/2 2/3A 2/3B 3/4A 3/4B 5/6

Cycle 812491 482041 477639 316731 319386 1748774 Processing Rate (Kbps) 354.5 597.5 603.1 909.3 901.7 1646.9

Figure 5.9: The C codes of circular shift.

Table 5.7: Profile of LDPC Decoder with different Coding Rate Coding Rate

Profile 1/2 2/3A 2/3B 3/4A 3/4B 5/6

Cycle 37714286 56177704 58141064 76294841 85273741 93146880

Processing Rate (Kbps) 7.6 5.1 5.0 3.8 3.4 3.1

In Table 5.6, we show the profile with different coding rate when codeword length is 576.

In this case, when the coding rate is higher the cycle number is more. Because it need to compute more parity bit, and it need more computation complexity.

Decoder

In this subsection, we show the profile of the LDPC decoder when codeword length 576 and iteration 20. Table 5.7 shows the execution speed and the processing rate of our LDPC decoder on DSP. In advance, the LDPC decoder is more complex.

In our code, coding rate 1/2 and codeword length 576, doing one iteration need the loop:

Figure 5.10: The assembly codes of circular shift (1/2).

Figure 5.11: The assembly codes of circular shift (2/2).

Figure 5.12: The C code of computing form check nodes to bit nodes.

• 288*N(m)*N(m).

• 576*M(n)*M(n).

• 576*M(n).

In Fig. 5.12, we show the C code which computing the value from check nodes to bit nodes. In the code. There are 576*M(n)*M(n) loops. 576 is the number of bit nodes and 288 is the numbers of check node.M(n) means the number of check nodes connected to bit node n and N(m) denotes the number of bit nodes connected to the check nodes. The loop in one iteration is depended on M(n) and N(m). However, in coding rate 1/2, M(n) is more than 6 and N(m) is more than 3. It means that there is more than 288 ∗ 6 ∗ 6 + 576 ∗ 3 ∗ 3 + 576 ∗ 3 = 17280 loops in one iteration. In one loop, it should execute the bit node value or check node value, read value from memory, and a little stall or NOP cycle. We approximate 90 cycles in one loop and it need about 90 ∗ 17280 = 1555200 cycles in one iteration. So, it needs about

Table 5.8: Final Profile of LDPC Code (Code Size).

Code Size (byte)

Encoder 6028

Decoder 2688

31104000 cycles after 20 iterations.

In Fig. 5.13, 5.14 and 5.15, we see the assembly code which computing the value from check nodes to bit nodes. The parallelism is not good. We see that in our code, the value in check nodes and bits nodes are read from memory many times. Reading memory costs several cycles and reduces the code parallelism.

In Table 5.8, we just show the code size of encoder and decoder without randomizer, interleaver and modulator.

Figure 5.13: The assembly code of computing form check nodes to bit nodes (1/3).

Figure 5.14: The assembly code of computing form check nodes to bit nodes (2/3).

Figure 5.15: The assembly code of computing form check nodes to bit nodes (3/3).

Figure 5.16: Software pipeline information for LDPC decoder.

Chapter 6

在文檔中 IEEE 802.16e OFDMA通道編碼技術與數位訊號處理器實現之研究 (頁 96-111)