Simulation and Experiment Results - Proposed Memory Contention Free Scheme for Parallel Turbo D

Chapter 4 Solving Memory Collision Problem For Parallel Turbo Decoder

4.4 Proposed Memory Contention Free Scheme for Parallel Turbo Decoder

4.4.4 Simulation and Experiment Results

Since the quantization should be the trade-off between coding performance loss and hardware cost, the fix-point can be determined via Monte-Carlo simulation. The primary specifications of the Turbo decoder are given in Table 4-1, where the code polynomial follows the Consultative Committee for Space Data Systems (CCSDS) standard [5]. Fig. 4-7 demonstrates the Turbo decoding performance results after 8 iterations for different sliding window lengths (SW) and fixed point bit-width. We can see that the curve of (SW=32 & fixed point) has the minimum error performance loss relative to that of floating point simulation. On the other hand, the curve of (SW=32 &

fixed point & 4-bits non-linear encoded [25]) leads to larger performance loss relative to that of floating point simulation, but less extrinsic memory requirement in the Turbo decoder is expected.

Our proposed contention free algorithm can be used for parallel Turbo decoder supporting the arbitrary SISO numbers and high radix VLSI architecture. Here, we take the specification of Table 4-1 into account to achieve the purpose of contention free for 8, 16 and 32-SISO numbers.

Table 4-1 Summary of parameters for Turbo code simulation

Fig. 4-7 BER performance of the Turbo decoder

The Fig. 4-8 demonstrates the change of cost functions of contention free algorithm for various parallel Turbo decoder applications. The contention free algorithm is terminated until the cost function reaches zero value. Then, the solution (color sets) is obtained with respect to memory banks such that each nodes (extrinsic value) has no occurrence of conflicting events. The Fig. 4.9-11 shows the solution of contention free algorithm for 8, 16 and 32-parallel Turbo decoder, where the horizontal axes denotes the time index (i.e., the order of output sequence of SISO decoder); and the vertical axes corresponds to the location of P-SISO decoders in the Turbo decoder. We can see that each column is drawn with different color. Thus, our proposed algorithm guarantees to achieve the purpose of contention free.

In general, the circuit function needs to be verified by function verification after synthesis or FPGA (Field Programmable Gate Array) platform. Due to the high expense in the IC manufacturing, the FPGA provides more cheaper programmable and reconfigurable ways to verify your circuit. Even the FPGA platform offers immediate real electronic signals to work together with other system platform (i.e., ARM) or measured from oscilloscope. The Fig. 4-12 shows that the output signals of Turbo decoder using the Xilinx Virtex-IV XC4VLX60 FPGA. For the function verification, we first store all output signals into text file and then compare the output values of golden model from MATLAB^@ with the output values of FPGA as shown in Fig. 4-13.

When the error signal is raised, there exists some difference between golden model and output signals from FPGA. Otherwise, the output signals of FPGA platform are correct.

We have simulated and verified the design logic by comparing the output results to MATLAB^@fixed-point simulation and performed synthesis targeted at UMC 130nm CMOS technology by the Synopsis^@ design compiler.

Fig. 4-14 shows the proposed architecture of contention free parallel Turbo decoder which major consists of multiple double-input-buffers, SISO decoders, Look-Up-Table (LUT) and some control circuits. Each SISO core consists of three recursion units for acquisition, forward and backward recursion which requires additional controllers for the state and branch metrics memories, where we assume eight iterations are performed for turbo decoding and clock rate is set 200MHz. There are two input memory-banks applied such that the decoding process could be able to continuously decode noisy codeword at different frame [20] and the extrinsic storage also employs P sets distinct memories to achieve the goal of memory collision free. Then, the results of memory collision free are stored into the LUT memories. One of LUT memories is used for the arbitrator device; the other is used for the decision device. Finally, the control circuit is employed such that the design can be more flexible.

Each SISO core consists of three recursion units for acquisition, forward and backward recursion which requires additional controllers for the state and branch metrics memories, where we assume eight iterations are performed for turbo decoding and clock rate is set at 200MHz.

Table 4-2 lists the area requirement of the proposed parallel collision free Turbo decoder implementation for various number SISO decoders. As a result, the high parallel Turbo decoder has larger total area size but relative its throughput also becomes faster than that of low parallel Turbo decoder. In practice, the hardware implementation should choose the appropriate parallel parameter P by achieving the throughput requirement and minimizing the area requirement. However, our proposed algorithm can support arbitrary parallel parameter P such that no conflicting element causes the degradation of whole throughput of Turbo decoder.

Fig. 4-8 The change of cost functions of contention free algorithm for various parallel Turbo decoder applications.

Fig. 4-9 The solution of contention free algorithm for 8-parallel Turbo decoder

Fig. 4-10 The solution of contention free algorithm for 16-parallel Turbo decoder

Fig. 4-11 The solution of contention free algorithm for 32-parallel Turbo decoder

Fig. 4-12 The VLSI architecture implementation of Turbo decoder in the FPGA platform

Fig. 4-13 The comparison of the output values of golden model from matlab^@ with the output values of FPGA

Fig. 4-14 The block diagram for the proposed contention free parallel Turbo decoder Table 4-2 Parallel Turbo decoder area and through for various number of SISO

decoders at clock frequency 200MHz.

4.5 An Approach for Reducing Memory Area of Parallel

在文檔中用於平行渦輪碼之無衝突演算法 (頁 65-75)