Algorithm

Chapter 2 Receiver Structures for MIMO System

2.2 MIMO Receivers

2.3.2 Algorithm

Assume the number of receiver antennas is more than receive antennas, and the real value channel matrix H is column independent and can be QR decomposed as:

[ ] ^'

Then we will derive SD algorithm [8]. The lattice point Hs lies inside the hypersphere D of radius d can described by Eq.2.4

2 2

Therefore one necessary condition for Hs lies inside the hypersphere D is

( )

s

satisfying (2.9) then we cam derive a stronger necessary condition as

( )

The procedure will terminate until S1 is found. Finally we obtain some candidate vector symbols and choose the closet one as the decoding result [8].

Figure 2.3 shows the searching tree of the concept of sphere decoding algorithm.

Figure 2.3 Tree associated with the sphere decoding algorithm

The dotted lines present the candidates are not in the radius.

The variables above are real-valued. If H y s , and , , n are complex-valued, we just need to convert them into these formats:

( ) ( )

From Figure 2.2, we recognize that how to choose the initial radius is vary important. We adopt a simple and effective method:

min ( )

ⁱ

initial i

d = H

(2.13) Choosing the minimum column norm of H as the initial searching radius. More details can be found in [9].

Chapter 3 DSP Board Implementation Result and

Discussion

3.1 DSP Hardware and Software Implementation

Environment

3.1.1 Quixote DSP Board [10]

The sphere decoding algorithm was developed on Quixote DSP board and section 3.3.1 will brief introduce the Quixote DSP board. Quixote is a 64-bit cPCI 6U board that combines one 1GHz TMS320C6416 DSP with a two-million-gate Xilinx Virtex-II FPGA which is a powerful and flexible platform to implement communication system. Figure 3.1 is the picture of the Quixote DSP board.

Figure 3.1 Quixote DSP board [10]

Below are some applications for Quixote:

· Software Defined Radio

· Wireless IP Development & Hardware Testing

· Physical Layer Field Testing

· Ultra-Fast Flexible Data Acquisition

· Vector Signal Generation/Signal Identification

· RADAR

· Electronic Warfare

3.1.2 TMS320 C6416 DSP Processor [11] [12]

The DSP chip built in Quixote is TMS320 C6416 which is a fixed-point DSP using the VelociTI^TM architecture. VelociTI is a high-performance, advanced very-long-instruction-word (VLIW) architecture, making C6416 an excellent choice for multichannel and multifunctional applications. Because of the VLIW architecture C6416 can execute up to eight 32-bits instructions per cycle. In order to write an efficiency DSP program, making instructions as independent as possible.

Features of the DSP device include [12] :

1. Advanced VLIW CPU with eight functional units, including two multipliers and six arithmetic units.

2. Efficient code execution on independent functional units

3. 8/16/32-bit data support, providing efficient memory support for a variety of applications. 40-bit arithmetic options add extra precision for vocoders and other computationally intensive applications

4. Special communication-specific instructions have been added to address common operations in error-correcting codes.

5. Instruction packing: Gives code size equivalence for eight instructions executed serially or in parallel. Reduces code size, program fetches, and power consumption

Figure 3.2 The block diagram of this DSP chip.

3.1.3 Flow of developing a DSP program

Traditional flow of developing a DSP program is focusing on writing hand coded DSP assembly language. This method cost a lot of time and hard to maintain big project.

The main feature of TMS320C6416 is abundant generation tools and libraries to help us optimize our programs as soon as possible rather than forcing the programmer to code by hand in assembly. Figure 3.3 is the code developing flow when constructing the DSP program.

Figure 3.3 The code developing flow when constructing the DSP program. [11]

Internal benchmarking efforts at Texas Instruments have shown that most loops achieve maximal throughput after steps 1 and 2.

3.2 System Structure for DSP Implementation

Figure 3.4 shows the 2x2 MIMO system that has been implemented. The input bits are generated from pseudo random sequence and then mapped to 16 QAM constellations with Gray coding. The mapped signals are spilt into two streams which are transmitted independent at two transmit antenna and received synchronously at the two receive antennas. Here we assume the channel is quasistatic, Rayleigh flat- fading channel. The received signals are detected by the sphere decoder to achieve brute-force method performance with lower complexity.

Figure 3.4 A simple structure of 2x2 MIMO system

3.3 Result

Figure 3.5 shows that our method of choosing the initial radius is an easy and efficiency way. We compare our method to the algorithm proposed by Babak Hassibi[5]. In his thesis, the initial radius is d² =α σn ² . In figure 3.5, we see that Hassibi’s algorithm has less computational complexity than us just in low SNR and our method is higher than his method in high SNR.

When α =3.5, that means we will find a lattice point inside the sphere with the probability 0.99. And ifα =1.5, the probability is 0.8. If α =1, the probability is 0.59.

Figure 3.5 A complexity comparison between our algorithm and Hassibi’s algorithm.

3.3.1 Floating-Point Result

Figure 3.6 BER curves of brute-force detector and SD using full precision.

Assume Mt = 2, Mr = 2, 16 QAM.

Figure 3.7 The comparison of execution speed between brute-force and SD.

The bit error ratio (BER) performance as a function of the signal to noise ratio (SNR) of brute-force detector and SD is shown in Figure 3.5. It can be seen that the SD detector with full precision can achieve brute-force detector performance.

In Figure 3.6, we can see SD is faster than brute-force method about 120 times when high SNR. Even at low SNR, SD is still faster than brute-force method about 50 times.

We have mentioned before the C6416 DSP microprocessor is a fixed-point processor and therefore need a fixed-point version DSP program to speed up the DSP microprocessor.

3.3.2 Fixed-Point Result

In the section, we developed a fixed-point DSP program to speed up the execution time. The numerical format adopted here is two’s complement. Figure 3.7 shows the BER performance of SD using 16 and 32 bits compared to brute-force

detector. We can observe that SD using fixed-point version has worse BER performance than brute-force detector at high SNR. It can be seen how the quantization steps results an error floor. Error floor appears due to the fixed-point precision used for the input data and for the operations performed to obtain the input matrices (QR decomposition). We will compare two QR decomposition algorithms to see which one is more suitable for fixed-point implementation.

Figure 3.8 BER curves of brute-force detector and SD using 16 bits and 32 bits.

Assume Mt = 2, Mr = 2, 16 QAM.

3.3.3 QR Decomposition Algorithms

We will introduce two algorithms of QR decomposition and compare the performance of them. Firstly, classical Gram-Schmidt algorithm was shown

below [13]:

Second, Modified Gram-Schmidt wad shown below:

Figure 3.8 shows the classic Gram-Schmidt has worse BER performance than Modified Gram-Schmidt (MGS). That means MGS is more suitable for implementation because of its stable numerical property.

Figure 3.9 BER curves of SD using classic Gram-Schmidt algorithm and

Modified Gram-Schmidt. (Mt = 2; Mr = 2; 16QAM)

3.4 Discussion

3.4.1 Implementation issues

When using C6416 to implement sphere decoder, there ate some things must be careful. If we must use for loop, try to replace it with while. When writing C program, equation 2.10 and 2.11 must be careful. Because they are computed at each layer. And we can use .NOW function to estimate execution time or see clock cycles.

Table 3.1 Comparison of different implementations for sphere decoders Implementati

Detector Depth-first sphere decoder

Max freq. 1000MHz 200

Table 3.1 shows implementation results of sphere decoders (depth first algorithm) in recent years. It is obviously that using DSP to implement the sphere decoder has the slowest data rate. Because depth-first sphere decoding algorithm needs many branch instructions and highly data dependency between states. So it can’t parallel process on single DSP processor. If we want to achieve real-time implementation, we need to analyze the dependency relations between and adopt parallel structure.

The data dependency graph has been proposed in [14]. Figure 3.9 shows the flow chart of the depth-first sphere decoding. And the highly data dependency between states as demonstrated in Figure 3.10. The real line represents the two states are

dependent and dotted line denotes they are independent. From Figure 3.10, we realize that using parallel structure like Figure 3.11 can conquer the dependency between states to achieve high data rate.

Figure 3.10 Flow chart of the depth-first sphere decoder [14].

Figure 3.11 Dependency graph of the depth-first sphere decoder [14].

Figure 3.12 A parallel structure for the sphere decoder [14].

In section 3.1.1, we introduce the Quixote DSP Board. It combines one 1GHz TMS320C6416 DSP with a two-million-gate Xilinx Virtex-II FPGA. DSP is good at multiplying and adding, so we can use it to calculate QR decomposition. The data rate we can achieve is 13 Mb/sec. And FPGA can implement the sphere decoder by parallel structure to achieve high data rate. Figure 3.12 shows the hardware structure diagram of the sphere decoder.

Figure 3.13 Hardware structure of the sphere decoder on Quixote DSP board.

3.4.2 Conclusion

This chapter has analyzed the implementation results on DSP board and proposes a hardware structure which is suitable for Quixote DSP board and the sphere decoder.

The main conclusions are summarized as:

The BER performance of the sphere decoder with full precision and brute-force decoder are the same.

SD is faster than brute-force method about 120 times when high SNR and 50 times when low SNR.

The BER performance of the sphere decoder on the DSP board with fixed-point precision approximately matches that of brute-force method, except at high SNR.

The difference appears due to the fixed-point precision.

Modified Gram-Schmidt is better for implementation duo to its stable numerical

property.

The parallel structure can conquer the highly data dependency.

A hardware structure (Figure 3.12) which is suitable for Quixote DSP board and the sphere decoder to achieve high data rate.

Chapter 4 Applications of the Sphere Decoder in Space-Time-Frequency codes

4.1 MIMO-OFDM System Model

In chapter four, we will discuss the applications of the sphere decoder in Space-Time-Frequency codes for different fading channels. These channels include flat fading channel, frequency selective fading channel, and time selective fading channel. We will introduce the system model of MIMO-OFDM (Orthogonal Frequency-Division Multiplexing) for space-time- frequency codes. OFDM converts MIMO system transmissions over frequency-selective MIMO channels to an equivalent set of flat fading MIMO channels. Figure 4.1 shows a block diagram of MIMO-OFDM system.

Figure 4.1 MIMO-OFDM system block diagram

There are N_Ttransmitted antennas and N_R received antennas. The channel is assumed frequency selective fading channel. The input symbol x_n^µ( )p is indexed by three variables: (i) µ∈

[

^1, NT

]

is space index specifying the transmit-antenna. (ii) n is

time index denoting the OFDM block symbol. (iii) p∈

[

^0,Nc− is sub-carrier ¹

]

index. After cyclic prefix removal and FFT processing, the input-output relationship per sub-carrier is shown in E.q. (4.1) . sub –channel gain from the µth transmit-antenna to the ν receive-antenna on the th pth subcarrier: circularly symmetric, zero mean, complex Gaussian with unit variance, and γ is introduced to control the transmission power.

E[tr( )]

4.2 Space Time Coding

In this section, we deal with space time codes over flat fading MIMO channel, frequency selective fading MIMO channel. Flat fading channels are typically encountered in narrowband communication systems and frequency selective fading channels are usually confronted in broadband systems.

4.2.1 Flat Fading Channel

We can derive the input-output relationship of space time codes over flat fading channel as e.q. (4.4) [17] We can see this input-output equation is capable of using sphere decoding algorithm to decode the input matrix symbolX s . ( )

From e.q. (4.4), we will discuss orthogonal space time block code (OSTBC) and quasi-orthogonal space time block code (QOSTBC) which can fit for e.q. (4.4).

Let 1

,...,

E.q. (4.5) is called Alamouti code and is easy to decode over flat fading channel by Alamouti’s decoding method [18]. If N_T is larger than two, there are no fast and low complexity decoding algorithms like Alamouti decoding algorithm, and the sphere decoder will be a good choice to decode X(s).

OSTBC has the maximum possible diversity gain but the code rate is relative

low when there are more than two transmit antennas. Table 4.1 shows the code rate of OSTBC with different transmit antennas.

Code rate of OSTBC

Transmit antennas Code rate

NT = 2 1

NT = 3 , 4 3/4

N_T > 4 1/2

Table 4.1 code rate of OSTBC with different transmit antennas

In many applications, however, achieving high data rate is more important than achieving high diversity gain, so quasi-orthogonal space time block codes (QOSTBC) are developed. For example, s=[ , , ,s s s s₁ ₂ ₃ ₄]^T and N_T = 4, the QOSTBC matrixes

The code rate is one when N_T = 4. The input-output relationship of a QOSTBC system is as e.q. (4.4) and can decoded by the sphere decoder.

4.2.2 Frequency Selective Fading Channel

Wireless broadband system usually encounter frequency selective fading channel.

We will derive the channel model and to examine is this model can be decoded by the sphere decoder. Assume a multi-antenna system with N_T transmit-, N_R receive-antennas and single carrier system (i.e. MIMO system). The fading channel

between µth transmitted antenna and ν received antenna is frequency selective th fading channel and can be modeled as:

( 1)x1

(0),..., ( )

^T ^L

h_νµ

= ⎡ ⎣

h_νµ h_νµ L

⎤ ⎦ ∈

C ⁺ (4.9) where L is the channel order. The input-output relationship is as fellows:

1 0 and unit variance and γ denotes the average SNR per receive-antenna. We write the transmitted symbols and received symbols into (4.11)

x 1

Using (4.11) and (4.12), we can obtain (4.13):

(4.13) is not the form like (1.1) that the sphere decoder can decode. We can convert (4.13) into a form like (1.1) and the process is derived below [17].

Assume information block 1

,...,

Space time code matrix X transmitted through N_T transmit-antenna in N time _x slots. Then we define for the frequency selective channel of order L, the set of channel matrices: derive the received space time block code matrix as:

[ ]

Where Y is the received space time block code matrix of size

N

x( N

+ L )

From equation (4.18), we know that space time code over frequency selective fading channel can be decoded by the sphere decoder.

4.3 Space Frequency Coding

MIMO-OFDM systems have been identified as a promising approach for high transmission rate and high spectral efficiency wideband system [20]. Space frequency (SF) codes and space time frequency (STF) codes are widely used in MIMO-OFDM systems. In section 4.3, SF codes will introduce first and STF codes will be shown in section 4.4.

Assume frequency selective fading channel and G denotes the number of sub -carriers which is in the range of space frequency block code. The input-output relationship is as (4.19) [21]:

There are two kinds of space frequency block codes (SFBC): one is full rate SFBC the

other is full diversity SFBC.

Full Rate SFBC – orthogonal design

For example, in case of the 2x2 orthogonal design [18]:

1 2

The symbol rate is one and the received symbol at the receive-antenna ν can be expressed as: receive-antenna and we can rewrite (4.21) in matrix form.

y

^ν

= H x

^ν

+ w

^ν (4.22)

where y= ⎣⎡y y¹, ²,...,y^N^R⎤⎦ , ^T w= ⎣⎡w w¹, ²,...,w^N^R⎤⎦ , and ^T H = ⎣⎡H H¹, ²,...,H^N^R⎤⎦ . ^T

Full Rate SFBC – quasi-orthogonal design

Assume four transmit-antenna, the quasi-orthogonal code has the form [22]:

1 2 3 4

We also can write (4.25) into matrix form that can be decoded by the sphere decoder.

Assume y^ν = ⎣⎡y^ν(0) (1) (2) (3)y^ν ^* y^ν y^ν ^*⎤⎦ ,^T w^ν = ⎣⎡w^ν(0) w^ν(1) ^* w^ν(2) w^ν(3)^*⎤⎦ , ^T We combine the received symbols and we can obtain

y = Hx w +

(4.27)

where y= ⎣⎡y y¹, ²,...,y^N^R⎤⎦ , ^T w= ⎣⎡w w¹, ²,...,w^N^R⎤⎦ , and ^T H = ⎣⎡H H¹, ²,...,H^N^R⎤⎦ ^T From (4.27), we know quasi-orthogonal space frequency code can be decode by the sphere decoder.

The full rate space frequency codes just can achieve full spatial diversity (diversity order N_RxN_T ) and don’t use the frequency diversity. So the full diversity

space frequency codes were proposed in [23]. We will check weather the full diversity coded can be decoded by the sphere decoder.

Full Diversity SFBC

In [23], a full diversity space frequency code was proposed and can achieve diversity order to N_Rx x N_T l (1≤ ≤l L; L is the channel order). For example,

2, 1, 2

T R

N = N = L= with repetition two times, the diversity order is four and the transmitted code matrix is

So, we can decode full diversity SFBC by using the sphere decoder.

4.4 Space Time Frequency Coding

In this section, we will examine space time frequency (STF) codes to see are they suitable for the sphere decoder. Assume the information symbols

I I

x1 1,..., _N ^N

s=⎡⎣s s ⎤⎦∈C and X is the space time frequency codeword comprises

symbols ( )x_n^µ p with µ∈

[

^1,NT

]

, n∈

[

^0,Nx− , and ¹

]

p∈

[

^0,Nc− . Where ¹

]

N_T denotes the number of transmit-antenna, N OFDM blocks, and _x N subcarriers. _c Each STF codeword contains N_T x N N symbols. Figure 4.1 shows the STF _x _c

Figure 4.2 Space Time Frequency codes [17]

Assume X denotes X =

[

X^(0),X^(1),...,X N⁽ c −¹⁾

]

∈C^{N N N}^T ^c ^x and for each

And in each subcarrier, the channel is flat fading MIMO channel.

And the received matrix Y p( )∈C^{N N}^R ^T with entries

[

Y p^{( )}

]

_νn = yn^µ^{( )}p , we can write the input-output relationship at pth subcarrier as

[ ]

For example, there are four subcarriers, two transmit antennas, and two receive-antennas. We can rewire (4.32) into matrix form as:

(0) (0) 0 0 0 (0) (0)

0 is the zero matrix with size 2x2 and we can rewrite (4.33) as Y=HX+W. This can be decoded by the sphere decoder but if the number of subcarriers is too large (512, 1024, 2048) the matrix size is to large to decoded. Therefore, subcarrier grouping STF codes is the way to reduce dimension. Suppose the number of subcarriers is an integer multiple of the channel length.

( 1)

c g

N =N L+ (4.34)

N is the number of groups and L is the channel order. And the information symbol g

used in this group is { }s_g ^{g N}_g⁼₌₀^g⁻¹ which is divided from original information symbol

1,..., _NI ^T

s= ⎣⎡s s ⎤⎦ . The codeword of the grouped STF code is

( 1)

From (4.36) we can decode grouped STF codes in a smaller dimension by the sphere decoder.

4.5 Discussion

In the chapter, we discuss ST, SF, STF codes at flat fading and frequency selective fading channels and derive the general form of input-output relationship. We find that the sphere decoder can detect the following codes at different channels:

1. Space time block codes at flat fading and frequency selective fading MIMO channel.

2. Space frequency codes on MIMO-OFDM systems at frequency selective fading channel and flat fading channel.

3. Space time frequency codes at frequency selective fading channel.

4. Grouped Space time frequency codes in order to reduce the decoder complexity.

Chapter 5 Conclusion and Future Work

This thesis has analyzed the implementation of the sphere decoder on Quixote DSP board in Rayleigh flat fading MIMO systems. And we compare the implementations of the sphere decoding on DSP, FPGA, and ASIC to find out a suitable hardware structure for our Quixote DSP board. Finally, we discuss the applications of the sphere decoder on space-time-frequency codes in MIMO-OFDM systems. The main conclusions can be summarized as:

The BER performance of the sphere decoder with full precision and brute-force decoder are the same.

This sphere decoder we implemented is faster than brute-force method about 120 times when high SNR and 50 times when low SNR under 2x2 MIMO systems.

The fixed point implement can result BER performance degradation at high SNR.

Since there are one TI DSP chip and one Xilinx Virtex-II FPGA on this Quixote DSP board, we proposed a hardware structure to realize the sphere decoder.(Figure 3.12)

The sphere decoder is suitable to decode space-time-frequency on MIMO-OFDM systems under Rayleigh flat fading and frequency selective fading channel.

Chapter 6 Reference

[1] E. Viterbo and J. Boutros, “A universal lattice code decoder for fading channels,”

IEEE Trans. Inform. Theory, vol. 45, no. 5, pp. 1639–1642, July 1999.

[2] M. O. Damen, H. E. Gamal, and G. Caire, “On maximum likelihood detection and the search for the closest lattice point,” IEEE Trans. Inform. Theory, vol. 49, no.

10, pp. 2389– 2402, Oct. 2003.

[3] P. Wolniansky, G. Foschini, G. Golden, and R. Valenzuela, “V-BLAST: An Architecture for Realizing Very High Data Rates over the Rich-Scattering Wireless Channel,” Proc. ISSSE, pp. 295-300, Sept. 1998.

[4] U. Fincke and M. Pohst, “Improved Methods for Calculating Vectors of Short Length in Lattice, Including a Complexity Analysis,” in Mathematics of Computation, Apr. 1985, vol. 44, no. 170, pp. 463-471.

[5] B. Hassibi and H. Vikalo, “On the Sphere Decoding Algorithm. I. Expected Complexity,” IEEE transactions on signal processing, vol. 53, no. 8, pp.

2805-2818, Aug. 2005.

[6] M. O. Damen, H. E. Gamal, and G. Caire, “On Maximum-Likelihood Detection and the Search for the Closest Lattice Point,” IEEE transactions on information theory, vol. 49, no. 10, pp. 2389-2402, Oct. 2003.

[7] O. Damen, A. Chkeif, and J.-C. Belfiore, “Lattice Code Decoder for Space-Time Codes,” IEEE communications letters, vol. 4, no. 5, pp. 161-163,May 2000.

[8] L. M. Davis, “Scaled and Decoupled Cholesky and QR Decompositions with Application to Spherical MIMO Detection,” Proc. IEEE WCNC, pp.

326-331,Mar. 2003.

[9] Chin-Yun Hung, “A Sphere Decoding Algorithm for MIMO Channels”, 2006.

[10] Innovative Quixote data sheet

[11] TMS320C6000 Programmer’s Guide, Literature Number SPRU198I, March

在文檔中在以數位訊號處理器為基礎的多重輸出輸入平台上實作和研究球體解碼器 (頁 15-0)

Chapter 2 Receiver Structures for MIMO System

2.2 MIMO Receivers

2.3.2 Algorithm

[ ] '

( )

s

( )

( ) ( )

min ( )

d = H

Chapter 3

DSP Board Implementation Result and

Discussion

3.1 DSP Hardware and Software Implementation

Environment

3.1.1 Quixote DSP Board [10]

3.1.2 TMS320 C6416 DSP Processor [11] [12]

3.1.3 Flow of developing a DSP program

3.2 System Structure for DSP Implementation

3.3 Result

3.3.1 Floating-Point Result

3.3.2 Fixed-Point Result

3.3.3 QR Decomposition Algorithms

3.4 Discussion

3.4.1 Implementation issues

3.4.2 Conclusion

Chapter 4

Applications of the Sphere Decoder in Space-Time-Frequency codes

4.1 MIMO-OFDM System Model

[

]

[

]

4.2 Space Time Coding

4.2.1 Flat Fading Channel

,...,

4.2.2 Frequency Selective Fading Channel

(0),..., ( )

= ⎡ ⎣

⎤ ⎦ ∈

,...,

[ ]

N

x( N

+ L )

4.3 Space Frequency Coding

Full Rate SFBC – orthogonal design

y

= H x

+ w

Full Rate SFBC – quasi-orthogonal design

y = Hx w +

Full Diversity SFBC

4.4 Space Time Frequency Coding

[

]

[

]

[

]

[

]

[

]

[ ]

4.5 Discussion

Chapter 5

Conclusion and Future Work

Chapter 6 Reference

[ ] ^'