VLSI Implementation - Complexity Analysis, Simulation Results, and Implementation

Chapter 5 Complexity Analysis, Simulation Results, and Implementation

5.4 VLSI Implementation

In this section, we begin to show that how implement a multi-mode MIMO detector using the proposed GPGI-T1 algorithm. We replace the block diagram of the GPGI framework in Fig. 3.2 to that of our proposed GPGI-T1 algorithm in Fig. 5.13.

Qrder

Chapter 5 Complexity Analysis, Simulation Results, and Implementation

Besides, the channel matrix H is the same for each frame. It means that we just compute the variable that only related to H once each frame. So, we divide the detection flow of the GPGI-T1 algorithm into two parts, preprocessing part and decision part. The preprocessing part computes just once when the channel matrix H is unchanged, and the decision part operates for each symbol period. In this thesis, we just implement the decision part, where the maximum iteration I_max is equal to zero. The block diagram of the GPGI-T1 implementation is shown in Fig. 5.14.

Qrder

Fig. 5.14: Block diagram of the implementation of the GPGI-T1 algorithm.

Moreover, we would like to design a multi-mode GPGI-T1 detector which can work in many practical conditions including (2,2) and (4,4) MIMO system with QPSK, 16-QAM, and 64-QAM inputs. We design the MIMO detector with power-aware feature in (4,4) MIMO system. Before designing hardware architecture, we simulate the BER performance of the floating-point GPGI-T1 algorithm and the modified fixed-point GPGI-T1 algorithm in (4,4) MIMO system with 64-QAM inputs, the critical mode in our implementation, as shown in Fig. 5.15. The modified GPGI-T1 algorithm changes the MED function from 2-norm to 1-norm. We can find that GPGI-T1(2,8,0) is a setting candidate for the trade-off of the BER performance and computational complexity. Therefore, we set that the maximal list length equals eight in the multi-mode GPGI-T1 detector, and the maximal input word length equals ten bits. Table 5.3 illustrates how to attain multi-mode BER performance by adjusting the parameter .

The pipeline architecture of the multi-mode GPGI-T1 detector is depicted in Fig. 5.16.

The two-group input buffers are used to store inputs and process data simultaneously.

The Pre-U and Pre-B-Chase parts implemented by multiply Accumulate (MAC) unit process the reused variables for B-Chase and DF&MED parts. The B-Chase part is divided to four stages. The former two stages are in charged of parallel search and Euclidean distance calculation. The latter two stages play the role of sorting network implemented by Bitonic sort. The DF&MED part is divided to four stages including interference cancellation (IC), decision feedback 1 (DF1), decision feedback 2 (DF2), and minimum Euclidean distance (MED). The IC stage cancels the interference from the two symbols obtained by B-Chase. The DF1 and DF2 stages decide another two symbols and calculate the temporary variable for Euclidean distance calculation. The MED stage calculates the final Euclidean distance and stores the symbols with MED after comparison.

Chapter 5 Complexity Analysis, Simulation Results, and Implementation

Fig. 5.15: BER performance of the GPGI-T1 algorithm and B-Chase algorithm in (4,4) MIMO system with 64-QAM inputs.

Table 5.3: Performance selection by choosing different list length

Antenna 4×4

Modulation QPSK 16-QAM 64-QAM

BER Performance

Close to optimal

Close to optimal /

Sub- optimal

Close to optimal / Sub- optimal

Close to optimal /

Sub- optimal

List length  2 1 4 1 8 1

Fig. 5.16: Pipeline architecture of the multi-mode GPGI-T1 detector.

Concerning the chip implementation, the cell-based design flow with Artisan standard cell library is adopted and the multi-mode GPGI-T1 detector has been implemented in TSMC 0.18-um CMOS process. The Synopsys Design Compiler is used to synthesize the RTL design of the proposed detector and Cadence SOC Encounter is adopted for placement and routing (P&R). The Synopsys PrimePower is used to analyze the power consumption. The active chip layout area of the proposed multi-mode GPGI-T1 detector as shown in Fig. 5.17 is 1.41 mm × 1.39 mm. Table 5.4 summarizes the chip characteristics of the multi-mode GPGI-T1 detector. Table 5.5 summarizes the supplied modes and the respective power consumption of our chip design. It can work in nine modes, where three and six modes belong to (2,2) and (4,4) systems, respectively. The multi-mode functions of the GPGI-T1 detector has been proved by post-layout simulation verification as shown in Fig 5.18.

Table 5.6 provides a comprehensive comparison of the relevant ASIC implementations for MIMO detection. In [23] and [24], the BER performance of the implementation algorithms is optimal or close to optimal, respectively, but the power consumption is large. An implementation of the BODF algorithm by square root method [9] which shows low computational complexity but poor BER performance was proposed in [25]. The above chip design [25] including preprocessing part has better

Chapter 5 Complexity Analysis, Simulation Results, and Implementation

defined as the ratio of the normalized throughput to the normalized power. Our implementation of the GPGI-T1 algorithm has best power efficiency compared with other implementation designs. For example, in (4,4) MIMO system with 16QAM inputs, our design shows seven times the power efficiency of the implementation in [24].

Furthermore, our design possesses low-complexity computation and multi-mode implementation with better power efficiency compared with other reference designs.

Fig. 5.17: Chip layout of the multi-mode GPGI-T1 detector.

Table 5.4: Chip characteristics of the multi-mode GPGI-T1 detector

Process Technology TSMC 0.18 um CMOS

Table 5.5: Supplied modes of the GPGI-T1 chip implementation

Antenna 2×2 4×4 Throughput 50Mbps 100Mbps 150Mbps 100Mbps 200Mbps 300Mbps

Power (mW) 79 95 126 118/116 137/130 177/161

Chapter 5 Complexity Analysis, Simulation Results, and Implementation

Table 5.6: Comparison of ASIC implementation for MIMO detection IEEE

*Note that the implementation includes the preprocessing part. If the preprocessing part is removed, the power consumption will decrease greatly.

Chapter 6 Conclusion and Future Work

In this thesis, the GPGI framework that generates many MIMO detection algorithms has been presented. Based on the GPGI framework, we propose the GPGI-T1 detection algorithm that trades off the complexity and performance by modifying the number of symbols detected first, the list length and the numbers of maximum iterations. The GPGI-T1 detection algorithm significantly reduces the multiplication complexity and has comparable BER performance compared with the existing detection algorithms. For example, in (8,8) system with 16-QAM inputs, GPGI-T1(4,1,3) can reduce the multiplication complexity by 33.9% and outperform 10 dB compared with the BODF detection at low complexity end. At high performance end, GPGI-T1(4,16,3) and GPGI-T1(4,2,3) can reduce the multiplication complexity by 21.5% and 39.3% at the penalty of 0.3 dB and 0.8 dB loss compared with the B-Chase(16) detection, respectively. With the features of low complexity, satisfactory BER performance and parallel processing, the GPGI-T1 algorithm is suitable for modern high-speed communication systems. According to the proposed GPGI-T1 algorithm, we implement a multi-mode MIMO detector using TSMC 0.18um process CMOS. The resulting implementation supports QPSK, 16-QAM, and 64-QAM modulation modes, and can work in nine modes, where three and six modes

在文檔中應用於多輸入多輸出通道之低複雜度多模式訊號偵測演算法與超大型積體電路實現 (頁 46-55)