Motivation and Problem Statement

Chapter 2 System Assumptions

2.2 Motivation and Problem Statement

While employing large number of antennas and high order QAM constellations in MIMO-OFDM systems, it leads a challenge to design the MIMO detection with acceptable complexity and sub-optimal performance. Especially occurred in the maximum-likelihood detector (MLD) [3], it requires unacceptable computation to exhausted search the all combinations of each likelihood symbols.

To overcome the complexity problem, the Variable and Overlapped Cluster-based MIMO algorithm tries to restrict the extending constellation points according to the pre-estimated signal of the N-QAM constellations, and then keeps the shorted K paths of the likelihood candidates which can reduce the search space and computation complexity significantly.

In this thesis, we make a comparison in complexity and performance between our proposal algorithm and the well-known K-best SD. The K-best SD is the most attractive one of the MIMO Detection algorithms in recently researches, because of its optimal performance as well as its complexity which is proportional to the number of transmit antennas and is lower than the optimal maximum-likelihood detector.

The aim of the Variable and Overlapped Cluster-based method is to design an MIMO Detection algorithm with nearly ML performance and low complexity cost in large number of antennas and high order QAM constellations.

Chapter 3 Variable and Overlapped

Cluster-based MIMO Detection

3.1 Introduction

In the beginning, we describe the basic idea of cluster-based MIMO Detection algorithm, which employed standard detectors, ZFD or MMSED, to estimate the N transmitted symbol, and then pick out possible constellation points falling on each antenna. After pruning the search space, we only need to detect correct transmitted signal vector by computing the candidates left in the corresponding clusters.

One of the cluster-based methods is called multilevel cluster-based MIMO detection algorithm by partitioning the transmitted MIMO signal vectors into clusters with the multilevel N-QAM structures in each dimension.

Figure 3.1 The multilevel cluster-based MIMO detection algorithm

… … …

Figure 3.2 (a) Example of multilevel partitions with mean symbols in 64-QAM constellation. (b) Example of multilevel cluster tree in 64-QAM constellation.

The non-repeated candidates picked between each clusters and the fixed size of candidate number in each clusters make SNR loss significant in some environment, such as low channel gain, which should need more candidates. Also, the algorithm persists in square type cluster and hierarchical clustering that aren‟t a clever way because it increases the possibility choosing the wrong cluster while the pre-estimate I/Q falls on the boundary of two nearby clusters.

To overcome this problem, we draft a flexible clustering method by removing the unitary of clusters by allowing that more than one clusters possess the same candidate constellation points. The simplest example is shown in the Fig. 3.3(a) & (b), we increase the cluster diversity by adding one more cluster in the center of the original 4 clusters.

In this thesis, we break the square type of cluster and increase the cluster diversity substantially. It‟s given an introduction to our Variable and Overlapped Cluster-based MIMO Detection algorithm.

(a) (b)

Figure 3.3 (a) Example of 4 clusters in 16-QAM constellation. (b) Example of 5 overlapped clusters in 16-QAM constellation.

3.2 Variable and Overlapped Cluster-based MIMO Detection

3.2.1 Steps of the Variable and Overlapped Cluster-based MIMO Detection

We propose the Variable and Overlapped Cluster-based MIMO Detection algorithm by partitioning the transmitted MIMO signal vectors into vary clusters with estimated symbol in each dimension in 64-QAM/256-QAM and finding out the result signal by comparing the received signal with all the candidates above. And the proposed method, step A), B) as well as C), are demonstrating in the following.

Above all in step A), we have two pre-processing blocks for our proposal algorithm. One is Sorted QR decomposition for computing the unitary matrix Q and the upper-triangular matrix R for the latter use of SD algorithm. And the other one is linear detectors, such as ZFD or MMSED, to get pre-estimating signals.

In B) step1, we demonstrate Overlap Clustering Algorithm that the estimated signal got by linear detectors, as ZFD or MMSED, and then pick out the possible constellation points falling on each antenna according to the range which the estimated signal is in.

After Overlap Clustering Algorithm in B) step 1, we enlarge/narrow the possible constellations points according to the column norm of H included channel gain information which is called Dynamic Cluster Algorithm.

Moving on C), we have all the candidates signals compare with the received signal, and then apply breadth-first Sphere Decoder with best K candidates in the searching space of MMSE SQRD. Eventually, the detection signal with the least accumulative square Euclidean distance is delivered.

Sorted-QRD Pre-estimate

Reach the end node ? Calculate the left Z

with the survivor path x

Figure 3.4 The workflow of the Variable and Overlapped Cluster-based MIMO Detection

Figure 3.5 The graphic representation of the Variable and Overlapped Cluster-based MIMO Detection

3.2.2 Sorted QR Decomposition

The QR decomposition for computing the unitary matrix Q and the upper-triangular matrix R is often called preprocess in the SD algorithm.

In order to reduce the complexity in the SD algorithm, a common preprocessing approach to prune of the search tree is obtained by performing sorting such that stronger streams in terms of effective SNR correspond to levels closer to the root.

This will be known as sorted QR decomposition algorithm (SQRD) in the following that is basically an extension to the modified Gram-Schmidt procedure by reordering the column norm of the channel matrix H iteratively into ascending order prior to each orthogonalization step. That is, SQRD let the diagonal elements R as greater as _ii possible at higher level and therefore reduces visited nodes in tree traversal.

In the sequel, we used an adapted version of this heuristic algorithm for MMSE detection (MMSE-SQRD) in both K-best algorithm and proposal algorithm.

3.2.3 Pre-estimating

A Pre-estimating method is also a preprocessing block which is needed to estimate the transmitted signal vector, the calculation of Pre-estimating is much less complex than the calculation of the squared Euclidean distance. The transmitted signal vector ( ˆx_MMSE) can be estimated through minimum mean-squared error (MMSE) approach (xˆ_MMSE (H H^H ²I)^¹H^Hy ), where ² is a noise variance and I is an identity matrix), which needs very little computation complexity.

3.2.4 Overlap Clustering Algorithm

The Overlap Clustering Algorithm employs the standard detectors, ZFD or MMSED, to estimate the N transmitted symbol, and then pick out possible constellation points {C C_i¹, _i²,... C_i^k} falling on each antenna.

To increase the cluster diversity, we take the real/imaginary part of pre-estimate I/Q as reference value x , and then confine range of the spanning candidates^'

1 2

{C C_i, _i,... C_i^k} according to the boundary values x₁ x^' x₂ given in following (3.1) and Fig. 3.5. That‟s said, we first separate the I/Q to real and imaginary parts and then compute the distance individually with the confine range of the spanning candidates.

1 2 '

is the estimated signal and ... are boundary values

x x x

The more obviously the characteristic is, the more the candidate size is.

Therefore, the feature of Overlap Clustering Algorithm owning vary clusters with

flexible size of spanning candidates is reasonable to deploy in practical communication environment.

For example, the candidate I/Qs are (+5,+5), (+5,+7), (+7,+5) and (+7,+7) while the real and imaginary values of estimated I/Q are both larger than 6.5. The candidate I/Qs are(+5,-3) ,(+5,-1), (+5,+1), (+5,+3),(+5,+5),(+5,-3) ,(+7,-1), (+7,+1), (+7,+3) and (+7,+5) while the real value is greater than 6.5 as well as the imaginary one is between 0 and 3.5.

Figure 3.6 (a) Illustration of Overlap Clustering Algorithm applying in first quadrant in 64QAM

Figure 3.6 (b) Illustration of Overlap Clustering Algorithm in 64QAM

In the Fig. 3.5, it just represents the Overlap clustering Algorithm applying in first quadrant where the reference values are positive. While joining with the left ones, we will have a complete Overlap clustering Algorithm and the whole picture is shown in Fig. 3.6.

3.2.5 Dynamic Clustering Algorithm

To perform efficiently in the changing wireless environment, we deliver Dynamic Clustering Algorithm to enhance our Overlap Clustering Algorithm described previously. While employing Overlap Clustering Algorithm, we enlarge/narrow the possible constellations points {C C¹_i, _i²,... C_i^k} according to the column norm h of H which is included channel gain information at the same time. ^'

1 2 '

For instance, the original candidate I/Qs are (+7,+7), (+7,+5), (+7,+3), (+7,+1), (+5,+7), (+5,+5), (+5,+3), (+5,+1), (+3,+7), (+3,+5), (+3,+3), (+3,+1), (+1,+7), (+1,+5), (+1,+3) and (+1,+1) will be narrow down to (+3,+3), (+3,+5), (+5,+3) and (+5,+5) while the column norm h^' is greater than 25.

…

(3.2)

Figure 3.7 Illustration of Dynamic Clustering Algorithm in 64QAM

3.2.6 Detail Matching

The technique Detail Matching used here is one of the well-known SD algorithms named K-best algorithm. It is a breadth-first algorithm based on a tree decoding structure only searching in the forward direction, but the best K candidates are kept at each level. We make a distinct change in the origin K-best algorithm by eliminating the search space of the extending child nodes remarkably, and the principle of Detail Matching is outlined as below.

1) At the root node, initialize all paths with PED (Partial Euclidean Distance) zero.

2) Apply Variable and Overlapped Cluster-based Algorithm to prune the search space of the extending child nodes.

3) Extend each survivor path, retained from the previous node, to contender paths, and then update the accumulated PEDs for each path.

4) Sort the contender paths according to their accumulated PEDs, and select the shortest K-best paths.

5) Update the path history for each retained path, and discard the other paths.

6) If the iteration arrives at the end node, stop the algorithm. Otherwise, go to 2).

The best path at the final iteration is the hard decision output of the decoder. The advantage of the K-best algorithm over the sequential algorithm is its fixed decoding throughput, since it is easily implemented in a parallel and a pipelined fashion.

Meanwhile, a strict K-best algorithm should keep as large as possible without compromising on the optimality, compared with the exhaustive-search ML algorithm.

However, limitation can reduce the complexity of the breadth-first algorithm.

Therefore, there is a tradeoff between complexity and performance in to select a proper K value.

Figure 3.8 Illustration of Detail Matching in 64QAM

Chapter 4 Simulation Results

This section compares performance and complexity between K-best SD and the Variable and Overlapped Cluster-based Algorithm in MIMO detection. Note that the performance comparison is considered under packet error rate (PER) 0.08 and normalizes to the ML detection methods.

A typical MIMO-OFDM system is based on IEEE 802.11n Wireless LANs, TGn Sync Proposal Technical Specification [10] which is used as the reference design platform. The simulation model is mainly based on TGn multipath specification of mode E, which is the multipath fast-fading channel model of 15-taps and 100ns Root Mean Square (RMS) delay. The major simulation parameters are shown in Table 4.1.

Environment Description

Parameter Value

Simulation Platform IEEE 802.11n

Signal Bandwidth 40 MHz

Number of subcarries 108 subcarriers

FFT size 128 points

Number of antenna 4 Tx 4 Rx / 8 Tx 8 Rx Forward Error Correction Convolution and Viterbi

(Coding Rate 2/3) Packet size 1024 Bytes per Tx antenna

Channel Model TGN-E with AWGN

RMS delay spread 100 ns

Subcarrier modulation 64QAM/256QAM

Preprocessing Block SQRD、ZFD

Signal Detection K-best SD Algorithm

Variable and Overlapped Cluster-based Table 4.1 Simulation parameters

4.1 Performance Evaluation

Since K-best sphere decoder was accepted in practical implementation, the goal of our Variable and Overlapped Cluster-based algorithm is complexity reduction and remains performance at the same time. To compare with the K-best sphere decoder, we tune K-best parameter: k and cluster parameter: Spanning Cluster Candidate &

Boundary to have nearly the same performance in different methods.

For the purpose of performance comparison, Fig. 4.1 and Fig. 4.2 present the PER with ML, the Variable and Overlapped Cluster-based algorithm as well as K-best sphere decoders for 4 x 4 and 8 x 8 MIMO-OFDM systems. The methods such as the proposed Variable and Overlapped Cluster-based method and K-best sphere decoder maintain SNR degradation within 0.57dB in the Fig. 4.2 and 0.58dB to 1.02dB in the Fig. 4.3.

The table 4.2 summarizes the performance of Fig. 4.1 normalized to ML detection method and the complexity compared with the K-best SD algorithm. The proposed Variable and Overlapped Cluster-based algorithm can maintain performance within 0.57dB such that the method is suitable for practical system. And the algorithm complexity can reduce to 27.29% ~ 56.25% in average case and 39.06% ~ 57.25% in worst case which means the hardware cost in practical implementation.

For 8 x 8 MIMO-OFDM systems in the table 4.3, the proposed method maintains performance within 1.02dB. Still, the algorithm complexity can reduce to 35% ~ 56.25% in average case and 57.25% in worst case .

It‟s clear to see that, there is better performance in 4 x 4 MIMO-OFDM system rather than 8 x 8 one. However, while it comes to higher antenna number, it becomes a critical issue that the complexity grows remarkably. Hence, Variable and Overlapped

Figure 4.1 Performance in the VACO, 4T4R 64QAM

Figure 4.2 Performance in the VACO, 8T8R 256QAM

4 x 4 MIMO-OFDM system 64 QAM

Method ML K-best SD

MMSE-SQRD Variable and Overlapped Cluster-based

╳ K=12 K=12

Spanning Cluster Candidate

8 8 6 6,4 6,4,2 5,4,3,2

Boundary ╳ ╳ 0 0,3.5 0,4.5,6.5 0,3.5,5.5,7.5

SNR in PER

0.08 28.55 28.62 28.74 28.70 28.77 29.13

SNR-Loss 0 0.07 0.19 0.15 0.22 0.57

Average Case Candidate

Number Reduction

╳ 100% 56.25% 38.25% 38.17% 27.29%

Multiplication ╳ 36864 20736 14100 14071 10059

Addition ╳ 35008 19692 13390 13362 9553

Worst Case Candidate

Number Reduction

╳ 100% 56.25% 56.25% 56.25% 39.06%

Multiplication ╳ 36864 20736 20736 20736 14400

Addition ╳ 35008 19692 19692 19692 13675

Table 4.2 Performance & complexity reduction table, 4T4R 64QAM

8 x 8 MIMO-OFDM system 256 QAM

Method ML K-best SD

MMSE-SQRD Variable and Overlapped Cluster-based

╳ K=12 K=12

Spanning Cluster Candidate

16 16 12 12,10,8 12,10,8,6

Boundary ╳ ╳ 0 0,7,10 0,5,10,14

SNR in PER

0.08 35.23 35.81 36.25 36.15 36.15

SNR-Loss 0 0.58 1.02 0.92 0.92

Average Case Candidate

Number Reduction

╳ 100% 56.25% 39.94% 35%

Multiplication ╳ 294,912 165,888 117,798 102,407

Addition ╳ 289,536 162,864 115,651 100,540

Worst Case Candidate

Number Reduction

╳ 100% 56.25% 56.25% 56.25%

Multiplication ╳ 36,864 165,888 165,888 165,888

Addition ╳ 35,008 162,864 162,864 162,864

Table 4.3 Performance & complexity reduction table, 8T8R 256QAM

4.2 Complexity Evaluation

Discussed in the section 4.1 previously, we compare the complexity between the Variable and Overlapped Cluster-based algorithm and K-best SD with nearly the same performance. Differently in this section, we do a comparison of the performance between them with nearly the same complexity.

By observing the Fig. 4.3, it‟s very clearly to see that the Variable and Overlapped Cluster-based algorithm has better performance than K-best SD.

Meanwhile, it also maintains performance within 0.5dB that the method is suitable for practical system.

While it comes to the same complexity in both methods above, detail statistics are shown in table 4.4. Our proposal method is 0.25dB better compared to K-best SD.

Figure 4.3 Performance in the VACO with the same complexity, 4T4R 64QAM

4 x 4 MIMO-OFDM system 64 QAM

Method ML K-best SD

MMSE-SQRD Variable and Overlapped Cluster-based

╳ K=4 K=8

Table 4.4 Performance & complexity table, 4T4R 64QAM

Chapter 5 Hardware Implementation and Measurement

5.1 Introduction

The Variable and Overlapped Cluster-based algorithm is a modified method of K-best SD, thus it inherits the K-best SD advantage so that it is very suitable to parallel and design in pipeline. In this chapter, our proposed hardware architecture is presented.

5.2 Design Flow

Figure 5.1 The design flow

In the Fig. 5.1 shows the design flow of the hardware architecture for the Gate-Level Simulation

Synthesis RTL Model

Algorithm Model (Fixed Point) Algorithm Model (Float Point)

Dsign Specification

Variable and Overlapped Cluster-based algorithm. In the step of algorithm design, Matlab is used to build up and experiment the detecting algorithm. After the algorithm model is determined, the measurement of the bit length and accuracy is applied so that we need to convert the variable from float point to fixed point. Meanwhile, the performance loss is taken carefully and the golden pattern is generated for logic design. After the algorithm simulations, the hardware design is implemented by Register Transfer Level (RTL) with the Verilog. The Verilog tool helps us code in behavior language and confirm the correctness of hardware design. Then, the RTL code will be synthesized by Design Compiler to gate-level netlist. Finally, the gate-level simulation helps us to verify whether the behavior of gate-level is fit in with our requirements.

5.3 Proposed Architecture

Table 5.1 gives the detail specification of the Variable and Overlapped Cluster-based algorithm, where achieving GigaLAN is our goal here.

The Fig. 5.2 illustrates overviews of the VACO. In the top architecture diagram, there is a preprocessing block including common sorted QR decomposition (SQRD).

And the MIMO Detection is implemented with the Variable and Overlapped Cluster-based algorithm.

The Fig. 5.3 shows the parallel architecture of the proposed architecture. Due to the reason that there‟s not enough time to process the input I/Qs while using only one set of MIMO Detector. (Roughly 4 clock cycle time to process one level I/Qs which is absolutely impossible). With 14 sets of MIMO detector in parallel architecture, there‟s is enough time to finish this work. (Up to 56 clock cycle time)

As shown in block diagram of Fig. 5.2, the architecture consists of twelve pipeline stages. Each stage has a processing element (PE), which implements the

operations corresponding to step 2)–step 5) of Detail Matching in section 3.2.6. Stage 1 to stage 12 corresponds to the twelfth to the first level of computation in the algorithm.The buffers R, Z, D, U and E between adjacent PEs are correspond to the upper triangular matrix, updated received signal, K-best PEDs, K-best paths and estimated signal in the algorithm, respectively.

Design Specification

Parameter Value

Simulation Platform IEEE 802.11n

Signal Bandwidth 50 MHz

Number of subcarriers 108 subcarriers

FFT size 128 points

Number of antenna 6 Tx 6 Rx

Forward Error Correction Convolution and Viterbi (Coding Rate 3/4) Packet size 1024 Bytes per Tx antenna

Subcarrier modulation 256QAM

Preprocessing Block ZFD

Signal Detection Variable and Overlapped Cluster-based

Table 5.1 The proposed design specification

Figure 5.2VLSI architecture of the VACO for 6T6R 256-QAM MIMO system

Proprocessing Variable and Overlapped Cluster-based MIMO Detector

Preprocessing SQRD

MIMO Detection

FFT OutputChannel Frequency Responses (CFR) Hard Decision

Diagonal Matrix (R)Receive Signal (Z)Estimated Signal (E)

Control Unit

Buffer(56)Buffer(56)Buffer(56)Buffer(12)Buffer(56)Buffer(56)Buffer(56)Buffer(12)Buffer(56)Buffer(56)Buffer(56) MUX Buffer(15) Buffer(56)Buffer(56)Buffer(56)Buffer(12)Buffer(56)Buffer(56)Buffer(56)Buffer(12)Buffer(56)Buffer(56)Buffer(56) MUX

Buffer(15) Buffer(15)Buffer(15) Buffer(56)Buffer(56)Buffer(56)Buffer(12)Buffer(56)Buffer(56)Buffer(56) MUX

Buffer(20)

Shift <<2 Shift <<3 Shift <<4 MUX

MUX

Buffer(19) Buffer(19)

Add Signed Bit Add Signed Bit Buffer(56)Buffer(56)Buffer(56)Buffer(12)Buffer(56)Buffer(56)Buffer(56) MUX

Buffer(15)

Buffer(56)Buffer(56)Buffer(56)Buffer(12)Buffer(56)Buffer(56)Buffer(56)Buffer(15) Buffer(56)Buffer(56)Buffer(56)Buffer(12)Buffer(56)Buffer(56)Buffer(56)Buffer(20) Buffer(56)Buffer(56)Buffer(56)Buffer(12)Buffer(56)Buffer(56)Buffer(56)Buffer(3) Buffer(56)Buffer(56)Buffer(56)Buffer(12)Buffer(56)Buffer(56)Buffer(56)Buffer(3x2) Buffer(56)Buffer(56)Buffer(56)Buffer(12)Buffer(56)Buffer(56)Buffer(56)Buffer(3) MUX

Detection Signal (U) DEMUX Buffer(56)Buffer(56)Buffer(56)Buffer(12)Buffer(56)Buffer(56)Buffer(56)Buffer(20) Buffer(56)Buffer(56)Buffer(56)Buffer(12)Buffer(56)Buffer(56)Buffer(56)Buffer(15)

DEMUX

The Left Receive Signal (Z) Buffer(56)Buffer(56)Buffer(56)Buffer(12)Buffer(56)Buffer(56)Buffer(56)Buffer(15x11) Buffer(56)Buffer(56)Buffer(56)Buffer(12)Buffer(56)Buffer(56)Buffer(56)Buffer(15x11) Calculate the left Z Buffer(56)Buffer(56)Buffer(56)Buffer(12)Buffer(56)Buffer(56)Buffer(56)Buffer(15x10)

DEMUXDEMUX

MIMO Detecion #1

MIMO Detecion #1 Processing Buffer Time

14x6

The parallel Architecture of the VCOMD

Figure 5.3The parallel architecture of the VACO

5.3.1 Word-Length Determination

In the mathematical model, all the variable and computations use the floating point number. On the other hand, the practical hardware computations use the fixed point number. To translate the float point model to fixed point model, the simulations of measurement are required. The measurement includes the length (width) and depth (accuracy). The longer word length it has, the higher performance it has. Hence, the tradeoff between the hardware cost and performance is needed. Fig. 5.4 illustrates the signal distribution of variable R, and the word length and the depth of variable R are roughly 15 and 2^-10. The value is a rough estimate, and the detail simulations will be taken to get the proper parameter.

Table 5.2 a) gives the number of all buffer needed while table 5.2 b) collates the word-length information of all buffer. In the end, the performance comparison between floating point and fixed point is showed in Fig. 5.5 with 256-QAM 6 x 6 MIMO system. The SNR degradation in word-length determination is less than 0.2 dB.

Figure 5.4 The signal distribution of variable R.

Buffer Stage

1 2 3 4 5 6 7 8 9 10 11 12

R 1x12 1x11 1x10 1x9 1x8 1x7 1x6 1x5 1x4 1x3 1x2 1x1

D 8 8 8 8 8 8 8 8 8 8 8 1

Z 8 8 8 8 8 8 8 8 8 8 8 1

U 8x1 8x2 8x3 8x4 8x5 8x6 8x7 8x8 8x9 8x10 8x11 8x12

E 1 1 1 1 1 1 1 1 1 1 1 1

Table 5.2 (a) Buffer number needed in each stage

Word-Length

R 15

D 20

Z 15

U 3

E 15

Table 5.2 (b) Word-length needed in each buffers

Figure 5.5Performance comparisons between float point and fixed point

5.3.2 Sorting Design

In each PEs, there are 16 PEDs to sort or 96 PEDs at most. Sorting PEDs is the most time-cost part in the MIMO Detection. This is a critical issue in our VSLI implementation. To overcome the problem of sorting, we deliver 3 sorting designs, which are combined with different number of sorting unit.

The sorting unit shown in Fig. 5.4 employ insertion sort algorithm so that it is able to sort one input data in one cycle time. The first design with one sorting unit in Fig. 5.5 (a), it costs 16 to 96 clock cycles to finish the ordering procedure. And the

在文檔中應用於多天線偵測之可變可重疊之叢集演算法 (頁 14-0)