Theory and design of near-optimal MIMO OFDM transmission system for correlated multipath Rayleigh fading channels

(1)

Theory and Design of Near-Optimal MIMO OFDM

Transmission System for Correlated Multipath Rayleigh

Fading Channels

Kun-Chien Hung and David W. Lin

Abstract: We consider channel-coded multi-input multi-output

(MIMO) orthogonal frequency-division multiplexing (OFDM) transmission and obtain a condition on its signal for it to attain the maximum diversity and coding gain. As this condition may not be realizable, we propose a suboptimal design that employs an or-thogonal transform and a space-frequency interleaver between the channel coder and the multi-antenna OFDM transmitter. We pro-pose a corresponding receiving method based on block turbo equal-ization. Attention is paid to some detailed design of the transmitter and the receiver to curtail the computational complexity and yet deliver good performance. Simulation results demonstrate that the proposed transmission technique can outperform the conventional coded MIMO OFDM and the MIMO block single-carrier trans-mission with cyclic prefixing.

Index Terms: Frequency-domain equalization, input multi-output (MIMO), orthogonal frequency-division multiplexing (OFDM), precoding, space-time coding, turbo equalization.

I. INTRODUCTION

Orthogonal frequency-division multiplexing (OFDM) with multiple transmit and receive antennas has drawn much recent attention in research on high-speed transmission over multi-path fading channels. To exploit more fully the inherent di-versity under multi-input multi-output (MIMO) OFDM in fad-ing channels, one usually needs to employ space-time and/or space-frequency coding [1], [2]. There is now abundant litera-ture on space-time/frequency coding. Taking space-time coding as an example, the approaches can be divided into two broad categories: The coding approach (as represented by space-time trellis coding and space-time block coding) and the linear processing approach (as represented by linear constellation pre-coding for signal space diversity [3], [4]).

It has been shown that a key factor influencing the perfor-mance of space-time/frequency coded transmission is the deter-minant of the correlation matrix of pairwise codeword differ-ences [5], [6]. In this paper, we show that the determinant is maximized when the correlation matrix is a multiple of the iden-tity matrix; or in other words, when the codeword differences are “white.” In attempting to use this result in system design, however, we find that, when the block size of the system is large (the meaning of “block size” will become clear later), there is

Manuscript received December 16, 2006.

The authors are with the Department of Electronics Engineering and Center for Telecommunications Research, National Chiao Tung University, Hsinchu, Taiwan 30010, ROC, email: hkc.ee90g@nctu.edu.tw, dwlin@mail.nctu.edu.tw.

This work was supported by the National Science Council of R.O.C. under grant no. NSC 95-2219-E-009-003.

difficulty in achieving such whiteness. As a result, we resort to an approximately white design.

For signal reception, we consider block-based turbo decision-feedback equalization (DFE) [7]–[9] to exploit the available di-versity while keeping the complexity under guard. Moreover, we propose a multi-stage technique for turbo DFE to further re-duce the complexity. In association with the rere-duced-complexity turbo DFE, we propose a particular space-frequency transform (SFT) technique for quasi-whitening of the transmitted signal. The SFT combines an orthogonal transform (for which fast computational algorithms exist) and a certain way of space-frequency interleaving (SFI) whose details will be given later. Simulation results show that the proposed transmission tech-nique significantly outperforms the conventional coded MIMO OFDM and the cyclic-prefixed single carrier transmission in var-ious channel conditions.

In summary, this paper considers the theory of optimal coded MIMO OFDM transmission and derives a near-optimal design. The contributions of the work include: 1) Derivation of the nec-essary and sufficient condition on the space-time/frequency code to maximize the diversity gain in coded MIMO OFDM trans-mission over correlated multipath Rayleigh fading channels, 2) a particular technique of SFT that is nearly optimal in the above sense and facilitates signal reception employing reduced-complexity turbo DFE, and 3) an associated turbo DFE design for signal reception.

This paper is organized as follows. Section II describes the basic transmission system structure and the channel model. Sec-tion III derives the signal condiSec-tion that achieves maximum di-versity gain. Section IV proposes a transmitter design principle based on SFT that approximately yields the above signal con-dition. Section V discusses the receiver design based on turbo DFE. Built on it, Section VI proposes a particular way of SFT for the transmitter that facilitates reduced-complexity turbo DFE in the receiver. Section VII presents some simulation results. And Section VIII is the conclusion.

II. SYSTEM MODEL

A. Transmission System Structure

Fig. 1 illustrates the structure of the considered transmission system. There are N transmit and M (≥ N) receive antennas. The binary source data S are grouped into blocks containing rKN log2(Q) bits each, where r is the channel code rate, K

is the IDFT size, and Q is the modulation order. After channel coding and modulation mapping, therefore, each block contains KN signal samples. In principle, the channel coding is not re-1229-2370/07/$10.00 c 2007 KICS

(2)

Fig. 1. Structure of the MIMO OFDM system considered. stricted to any particular type. For example, it could be bit-interleaved coded modulation (BICM) employing convolutional coding, trellis coded modulation, or turbo coding. The coded signal samples then pass through the transform block whose de-sign lies at the heart of the present study. After the transform, the signal samples are divided into N streams of K samples each, with each stream undergone a K-point IDFT and cyclic prefixing prior to being transmitted through one of the N anten-nas. In the receiver, the K received samples at each of the M antennas for each signal block are subject to de-cyclic prefixing and DFT before entering the signal detector. Several approaches to signal detection are maximum-likelihood (ML) detection and turbo equalization (including turbo DFE) [7]–[9].

B. Channel Model

Assume perfect synchronization. The signal samples in each block, omitting the cyclic prefix, can be organized in several ways for mathematical manipulation. For example, we may arrange the concurrently transmitted or received samples into a column vector, or we may arrange the K-sample time sequence transmitted or received by one antenna into a column vector. Each of the two representations has its advantage; we consider both below.

First, consider representing the sample sequence at each an-tenna by a column vector. Let x_n be the vector transmitted by antenna n and rmthat received by antenna m. Then,

rm = N n=1 Hmnxn+ nm = Hmx + nm (1)

where (assuming that the channel stays unchanged over one signal block and that the cyclic prefix is long enough to cover the delay spread) Hmnis a circulant matrix whose first column

gives the channel impulse response from transmit antenna n to receive antenna m, n_m is the vector of additive noise, Hm =

[Hm1, Hm2, · · ·, HmN], and x = [xT1, xT2, · · ·, xTN]T, with

su-perscript T denoting matrix transpose. We assume that the noise is white Gaussian. Alternatively, since Hmnxn = Xnhmn

where Xn is the circulant matrix whose first column is given

by x_nand h_mnis the first column of Hmn, we have

rm = N n=1 Xnhmn+ nm = Xh_m+ n_m (2) where X = [X1, X2, · · ·, XN] and hm = hTm1, hTm2, · · ·, hTmN _T .

Assume that the length of the channel response is L samples where L < K. Then the last K − L elements of each of h_mi, i = 1, 2, · · ·, N , are zero. They can be excised from h_mto result in a vector, denoted h_m, of smaller dimension, yielding

rm= XDhm+ nm (3)

where D = IK×L ⊗ IN with IN being the N × N

iden-tity matrix, IK×L the K × L matrix composed of the first

L columns of IL, and⊗ the Kronecker product. Stacking up

rm, hm, and nm, respectively, into higher-dimensional

vec-tors as r = rT1, rT2, · · ·, rTM T , h = hT1, hT2, · · ·, hTM _T , and n =nT1, nT2, · · ·, nTM _T , we obtain r = (XD ⊗ IM) h + n. (4)

Now consider representing the concurrently transmitted or re-ceived samples by a column vector. Then, the input-output rela-tion of the MIMO channel can be written as

r(k) =L−1

l=0

Hlx(k − l) + n(k) (5)

wherer(k) is the received M-vector sequence, x(k) is the

trans-mitted N-vector sequence, Hlis the M ×N matrix channel

im-pulse response (whose (m, n)th element is the (l + 1)th element in hmn), andn(k) is the M-vector sequence of noise. For

quasi-static correlated multipath Rayleigh fading channels,Hlcan be

modeled as [10], [11]

Hl= R1/2l AlT1/2l , l = 0, 1, · · ·, L − 1 (6)

whereAlis an M×N matrix with zero-mean complex Gaussian

entries,Tlis the N × N matrix summarizing the pairwise

cor-relations between the channel responses associated with differ-ent transmit antennas at delay l, and Rlis the M × M matrix

summarizing the pairwise correlations between the channel re-sponses associated with different receive antennas at delay l.

III. CONDITION FOR MAXIMUM DIVERSITY GAIN

In this section, we look into the condition on transmitted sig-nals which would yield the maximum diversity gain under mul-tipath MIMO Rayleigh fading. We approach this by minimizing the pairwise codeword error probability (PEP). The result will be used to guide the system design in later sections.

A. PEP of Coded and Transformed MIMO OFDM under Mul-tipath Rayleigh Fading

In designing coded transmission systems, one often seeks to minimize the worst PEP rather than the average error probability because the latter is often difficult to compute and because the transmission performance is often dominated by the worst PEP. Consider two codewordsc and ˆc. With slight abuse of the

notationX defined in the last section, let X(c) and X(ˆc) denote

(3)

respectively. After channel transmission, the PEP between these codewords under ML detection is given by [1]

P (c ˆc) = Q Es 2N0 (ED ⊗ IM) h (7) where Esis the average symbol energy, N0is the power spectral

density (PSD) of the additive white Gaussian noise (AWGN),

andE = X(c) − X(ˆc). The matrix E has been termed the

error matrix of the codeword pair. Therefore, the worst PEP is minimized by maximizing the Euclidean distance

d(c, ˆc) (ED ⊗ IM) h. (8)

To continue, let h(l) = vec{Hl} and α(l) = vec{Al} where

vec{·} means stacking up of the columns of its matrix argument into a vector. Then, from (6) we have

h(l) = T12 l ⊗ R 1 2 l α(l). (9) Now, let h = hT(0), hT(1), · · ·, hT(L − 1) T = _L−1 l=0 G12 l αT(0), αT(1), · · ·, αT(L − 1)T G12_α (10) where G12 l = T 1 2 l ⊗ R 1 2 l and _L−1 l=0 G 1 2

l stands for a

block-diagonal matrix with block-diagonal entries G12

0, G 1 2 1, · · ·, G 1 2 L−1. It is

easy to see that hand h are related by permutation. Therefore,

h =ZG12_α (11)

whereZ is an MNL × MNL permutation matrix. We remark

that the above way of decomposing a correlated channel re-sponse h into the product of a vector of i.i.d. (independent and identically distributed) random elements and a matrix that char-acterizes the correlation has been adopted in some studies ad-dressing channel conditions closely related to that considered in the present work.

Putting the above results together, we have d2(c, ˆc) = (ED ⊗ IM) h2

= hH _DH_EH_{ED ⊗ I}

Mh

= αH_C

eα (12) where superscript H denotes Hermitian transpose and

Ce G12HZH DHEHED ⊗ I_MZG12_. (13)

Taking average over all channel realizations and applying known results about the Chernoff bound of the PEP under Rayleigh fading, we have [1]

¯ P (c ˆc) = Eα{P (c ˆc)} ≤ r(Ce) i=1 1 + Es 4N0λi(Ce) ₋₁ = I + Es 4N0Ce −1 (14)

where r(Ce) denotes the rank of Ce, λi(Ce) denotes the ith

eigenvalue of Ce, I denotes an identity matrix, and the last

equality holds ifCehas full rank. Moreover, whenCehas full

rank and the signal-to-noise ratio (SNR) is high, we have the approximate Chenoff bound

¯ P (c ˆc) ≤ Es 4N0 _−MNL |Ce|−1. (15)

B. Principle of Design for Maximum Diversity Gain

Now consider how to design the system to minimize the aver-age PEP under a given channel coding scheme. From the above results, we see that it is appropriate to consider maximizing |Ce|. Now, given any unitary linear transform, if d(c, ˆc) = c−

ˆc = d0, then it can be easily shown that tr DHEHED =

Ld20where tr(·) takes the trace of its argument. Therefore, the

optimization problem can be stated as

max |Ce| s.t. tr DHEHED= Ld20. (16)

Towards a solution, note first that

|Ce| = G12HZHZG12DHEHED ⊗ I_M = GDH_EH_EDM

. (17)

Hence, maximization of DH_EH_ED maximizes |C_e_{|. Now,}

the Hadamard inequality gives

DH_EH_ED ≤NL j=1

j (18)

where jis the jth diagonal term of DHEHED and the equality

holds if and only ifDH_EH_{ED is diagonal. By the}

arithmetic-geometric inequality, we have

NL j=1 j ≤ ⎛ ⎝ 1 N L NL j=1 j ⎞ ⎠ NL = tr DH_EH_ED N L NL = d20 N NL (19) where the first equality holds if and only if jis the same for all

j. In summary, therefore, |Ce| ≤ d20 N NL |G| = d20 N NL L−1 l=0 |Rl|N|Tl|M (20)

where the first equality holds iffDH_EH_{ED is a constant times}

the identity matrix or equivalently iffEH_{E is a constant times}

the identity matrix.

In conclusion, to attain the maximum diversity and coding gain, the system should be designed such thatEH_{E is a constant}

(4)

Fig. 2. Transmitter with two-stage transform.

C. Performance of Coded MIMO OFDM without Transform In light of the above results, we comment on the performance of coded MIMO OFDM that does not have a transform between channel coding and the IDFT units in the transmitter. This de-sign can be shown to yield relatively poor performance.

First, when there is no transform, the highest rank ofEH_{E is}

given by the minimum Hamming distance of the channel code. If this distance is smaller than NL, then Cedoes not have full

rank and the system cannot achieve maximum diversity gain. Second, if the IDFT size is large and the Hamming distance between codewords is small, then the nonzero elements of the corresponding error matrix are located sparsely in the space-frequency domain. This makes the eigenvalues ofDH_EH_ED

highly uneven in magnitude, which deviates significantly from the optimal condition that the eigenvalues be identical. There-fore, the performance of coded MIMO OFDM without trans-form can be highly suboptimal.

IV. APPROACH TO PRACTICAL TRANSFORM DESIGN

We have shown that the optimal transform is a “white” one in the sense that EH_{E = d}2

0I. When the transform order is

large, however, one may not be able to find a transform that satisfies the equality exactly. Hence, we seek for an “approxi-mately white” design that givesEH_{E − d}2

0I d20I.

In fact, several transformed transmission methods have been proposed in different contexts to achieve different objectives, for example, the the energy-spreading transform (EST) for MIMO transmission [9], [12] and the linear constellation precoding (LCP) for MIMO or multi-carrier CDMA transmission [3]. In our context, cyclic-prefixed block single-carrier transmission can also be treated as a transformed OFDM system. Here, we present a transform design that has relatively low complexity and can help lower the complexity of the receiver.

Fig. 2 illustrates the proposed transmitter structure, where the transform has a two-stage architecture. The first stage in-volves N orthogonal transforms and the second stage inin-volves an SFI operation. For ease of implementation, the first stage may employ a simple orthogonal transform (such as the Hadamard transform) or one for which fast computation methods exist (such as the DFT). Ideally, the orthogonal transforms should spread the coded symbols over all the subcarriers so that any differential codeword c − ˆc is not contained in a few

subcar-riers but is spread over the entire transmission band. Then, the SFI randomizes the distribution of the codeword energy (already

spread across the frequencies by the orthogonal transform) in the spatial dimension to further exploit the spatial as well as the multipath diversity. Through this two-stage transform, we attain the approximate whitening effect on the transmitted signal.

Mathematically, the transmitted signal “super-vector” x (see Section II for definition) with the proposed SFT is given by

x = WH⊗ INP (T ⊗ IN) X WH⊗ INSX (21)

where X is the modulated version of c, W and T are the DFT and the orthogonal transform matrices, respectively, andP is the

SFI permutation matrix.

V. BLOCK TURBO DFE

For signal detection, we consider block turbo DFE. We divide the discussion into two sections. In this section, we consider how block turbo DFE operates under MIMO OFDM in general. Then in the next section, we consider in greater detail how it works together with SFI. It will become clear that, unlike the conventional turbo equalizer that operates in the time domain, the proposed block turbo DFE operates in the frequency domain. A. Block Turbo DFE as Iterative Solution to Constrained

Least-Square Problem

Employing earlier notations, the signal propagation behavior of a coded MIMO OFDM system can be described as

r =Hx + n (22)

where (with slight abuse of the notation) x = x(c) for some codewordc and H = ⎡ ⎢ ⎢ ⎢ ⎣ H11 H12 . . . H1N H21 H22 . . . H2N .. . ... ... ... HM1 HM2 . . . HMN ⎤ ⎥ ⎥ ⎥ ⎦ (23)

with the entries Hij being circulant matrices. When n is white

Gaussian and H is known, the ML detection can be

formu-lated as a constrained least-square (CLS) problem which seeks to minimize the cost function

J (c) = r− Hx(c)2 (24)

subject to the constraint thatc is a valid codeword.

A block turbo DFE estimates the transmitted signal in the fol-lowing way:

ˆxk= Fr + B¯xk−1 ₍₂₅₎

where k is the iteration count, ¯xk−1 _{is the decision output of}

the (k − 1)th iteration (which could be the soft output from the channel decoder), ˆxk is the signal estimate of the kth iteration,

andF and B are the feedforward filter (FFF) matrix and the

feedback filter (FBF) matrix, respectively. Taking the gradient descent approach [13], [14], we obtain the DFE coefficients as

(5)

where µ = KN/H2

F with H2F = tr{HHH}, i.e., the

Frobenius norm of H. With this set of DFE coefficients, the

equalizer output of the kth iteration can also be written as ˆxk_{= ¯x}k−1_{+ µH}H _r_{− H¯x}k−1_. ₍₂₇₎

If the decision output of the (k − 1)th iteration is error-free, i.e., if ¯xk−1 _{= x(c), then the block DFE cancels the intersymbol}

interference (ISI) completely.

In the initial iterations, however, ¯xk−1 _{may contain many or}

large decision errors. Such errors could affect adversely the sig-nal estimate ˆxkin the kth iteration, especially if B2Fis large.

To alleviate such effects, therefore, we consider the general-ized gradient descent approach which improves the convergence property by “conditioning” the iterative updates with a shaping or preconditioning filter as [15]

ˆxk = ¯xk−1_{+ CH}H _r_{− H¯x}k−1_. ₍₂₈₎

In other words, we modify the DFE filters to

F = CHH_{, B = I − CH}H_H. ₍₂₉₎

According to [15],C should be Hermitian symmetric and

pos-itive semi-definite and commute with HH_{H. An example is}

given by the quasi-Newton method [16] where

C = µ HH_{H + αI}−1 ₍₃₀₎

with α being some constant. B. MMSE Shaping Filtering

We consider using a shaping filter that minimizes the mean-square error. Let ek−1_{= ¯x}k−1_{− x(c). Then, the error in}

equal-izer output is given by

wk ˆxk− x(c) = CHHn + I − CHHHek−1. (31) In minimum mean-square error (MMSE) shaping filtering, we seek to minimize E{wk2_{}. In addition, in order to avoid}

di-rect error feedback, we should constrain the average of the diag-onal elements ofB to zero.

To proceed, let α = σ2

n/σ2e, i.e., the ratio of noise variance to

the variance of the decision error ek−1_{. Note that both σ}2

e and

α are functions of k. But for notational convenience we have omitted explicit indication of this dependence. In the Appendix, we show that the MMSE shaping filter is given by

C = µ HH_{H + αI}−1 ₍₃₂₎

where

µ = KN

tr{(αI + HH_H)−1_HH_H}. (33)

We see that the MMSE shaping filter has the same form as the quasi-Newton shaping filter.

In the case of single-input single-output systems, the above result is the same as that in [8]. But our result is more gen-eral in that it applies to MIMO systems. Note, in addition, that when the various quantities converge with iterations, σ2

ereduces

to zero and α approaches infinity. Then, the shaped DFE coeffi-cients also approach that without shaping.

A major drawback of MMSE shaped turbo DFE is, of course, the heavy complexity burden associated with changing the filter coefficients with each iteration. We consider reducing the com-putational complexity in the next subsection.

C. Employing Fixed Shaping Filter for Reduced Complexity To reduce the computational complexity, we consider fixing the shaping filter during the iterations. This results in a three-stage algorithm: 1) Initialization: Perform MMSE block linear equalization (i.e., letB = 0), because no decision output is

available at this time. 2) Middle stage: Use a fixed shaping filter that can tolerate a large range of decision error power. 3) Final stage: Use the unshaped DFE filters.

Based on the foregoing results, the determination of the shap-ing filter for the second stage reduces to the choice of a suit-able operating value for α. For this, we do not have a theoreti-cally optimal formula, but only some rules of thumb. Experience shows that underestimation of α would not cause significant en-hancement of total MSE when the true α is large enough. In contrast, if the true α is small, then overestimation of it would cause great increase of the MSE. This phenomenon is heuris-tically reasonable, because (true) α is defined to be equal to σn2/σ2e. Assuming a smaller value for α than its true value is

tantamount to assuming a less converged state, which may re-sult in some slowdown in the convergence speed but would not likely cause stability problems. On the other hand, assuming a larger value for α than its true value means being over-optimistic on the convergence status, which would more likely cause per-formance degradation. Therefore, we choose to use a reasonably small value in the place of α. By experiment, we find that a suit-able range of its values is 0.5–2, with unity being a good choice. To decide whether to switch from the second stage to the final stage, we observe the variance of the likelihood ratio (LLR) of the decoded codeword. When the variation in the variance over two successive iterations is small, we assume that the equalizer has converged sufficiently and make the switch. Note that the LLR variance indicates the reliability of the decoded bits [17]. When its value over a number of iterations in stage 2 (shaped block turbo DFE) is even smaller than that in stage 1 (block linear equalization), we may safely conclude that the channel condition is very bad and the turbo DFE may not provide any advantage. In this case, we may simply use the linear equalizer output for decoding.

D. Benefit of Whitening Transform to Turbo DFE Performance Similar to the EST [9], the proposed MIMO OFDM with transform can benefit the noise performance of block turbo DFE by reducing error propagation. This is due to its ability to spread the symbol energy over the whole block. As a result, any sym-bol decision error is also so spread. This reduces the interference contribution of each symbol decision error to all other symbols, thereby lowering the probability of error propagation and ben-efiting the convergence of turbo DFE. Note that the benefit ap-plies not only to uncoded systems, but also to coded systems, because in typical channel codes the difference between two

(6)

nearby codewords (in Hamming distance or Euclidean distance) is concentrated in only a few locations rather than having its en-ergy spread over a large signal block. The mechanism can be compared to how coded MIMO OFDM with transform excels over coded MIMO OFDM without transform as discussed in an earlier section.

VI. DESIGN FOR RECEIVER COMPLEXITY REDUCTION

Thus far, we have described the operating principles of the transmitter and the block turbo DFE-based signal detector. We now present a particular design that enables receiver implemen-tation at a reduced complexity. We first consider how the turbo DFE can operate in the frequency domain. Then, we present a particular design of the SFI suited to the proposed way of re-ceiver operation. And lastly, we discuss the computational com-plexity.

A. Turbo DFE in Frequency Domain

By the circulant nature of the channel matrix in (23), we may decompose it as H = (W ⊗ I)H ⎡ ⎢ ⎢ ⎢ ⎣ Λ₁₁ Λ₁₂· · · Λ1N Λ₂₁ Λ₂₂· · · Λ2N .. . ... ... ... Λ_M1Λ₂₂· · · ΛMN ⎤ ⎥ ⎥ ⎥ ⎦(W ⊗ I) = (W ⊗ I)H_{Λ(W ⊗ I)} ₍₃₄₎

whereW is the DFT matrix and each Λij is a K × K

diago-nal matrix of the frequency response of the channel from trans-mit antenna j to receive antenna i. By permutation, the “super-matrix”Λ can be reorganized into

Λ = QT _K k=1 Λ(k) Q (35)

where Q is a permutation matrix, Λ(k) is the MIMO

chan-nel response at subcarrier k, and recall that K_k=1Λ(k) denotes the block diagonal matrix with diagonal entries Λ(1), Λ(2), · · ·, Λ(K).

The FFF and FBF can be likewise decomposed as

F = (W ⊗ I)H_QT _K k=1 F (k) Q(W ⊗ I), (36) B = (W ⊗ I)H_QT _K k=1 B(k) Q(W ⊗ I) (37)

where for unshaped turbo DFE we have, for subcarrier k, F (k) = µΛH(k), B(k) = I − µΛH(k)Λ(k). (38) Similarly, for the shaping filter we have

C(k) = µ ΛH(k)Λ(k) + αI−1 (39)

Fig. 3. The equalizer-decoder loop.

for subcarrier k. These equations show that both the shaped and the unshaped DFE can be performed in the frequency do-main, independently over the subcarriers. The complexity can be lower than performing equalization in the time domain. B. The Equalizer-Decoder Loop

Thus far, we have omitted some details concerning the in-terfaces between the DFE and the decoder. To this subject we now turn. Let x = (W ⊗ I) x be the frequency spectrum of

x, and let ˆx and ¯x be the DFE output and the FBF input in the frequency domain, respectively. Since the transmitted signal is spread by the space-frequency transformS, we must apply the

inverse transformS−1_{to the DFE output before feeding it to the}

channel decoder. In addition, we also need to apply the

trans-formS to the decoder output to obtain the FBF input for the next

iteration. The structure of the equalizer-decoder loop is illus-trated in Fig. 3, where we have assumed the use of a soft-output decoder.

Mathematically, the decoder input is given by ˆ

X = TH⊗ IPTˆx (40)

and the FBF input is obtained from the decoder output ¯_{X by}

¯x = P (T ⊗ I) ¯X (41)

wherePT _{corresponds to deinterleaving,}_TH_{⊗ I to inverse}

or-thogonal transform,T ⊗ I to orthogonal transform, and P to

interleaving.

C. Design of Space-Frequency Interleaving

We note that the complexity of the equalizer-decoder loop can be reduced by a particular design of the SFI method. Specifi-cally, we employ a “separable SFI” in which the permutations in the spatial and the frequency domains are separable. Then, in the receiver, the SFI and the inverse SFI can be replaced by equivalent operations on the FFF input and the DFE coefficients, instead of performing them at the decoder input and output.

The SFI is separable in the spatial and the frequency dimen-sions if the permutation matrixP can be decomposed into the

product of a spatial permutation matrixΘ and a frequency

per-mutation matrixΦ, such as P = ΘΦ. As illustrated in Fig. 4,

in frequency interleaving, signal samples at the same subcarrier are moved to another subcarrier as a group, and in spatial inter-leaving, signal samples at the same subcarrier are permuted in a pseudo-random manner to different antennas.

(7)

Fig. 4. The illustration of separable space-frequency interleaving.

Fig. 5. Simplified receiver with separable SFI.

Disregard the additive noise. Then, the received signal after DFT and inverse SFI is given by

PT_{r = P}T_{ΛP(T ⊗ I)X} ₍₄₂₎

wherePT_{ΛP is the space-frequency interleaved channel}

fre-quency response. WithP = ΘΦ, we have

PT_{ΛP = Φ}T_ΘT_{ΛΘΦ Φ}T_Λ†_{Φ Λ}‡_. ₍₄₃₎

Note thatΛ†_and_Λ‡_{have a similar structure as}_{Λ, because the}

two pairs of interleaving and deinterleaving operations amount to mere re-ordering of the spatial and the frequency indexes. Therefore, the frequency domain turbo DFE can be made to op-erate onΛ‡_{in exactly the same way as on}_{Λ without any}

mod-ification. In other words, the inverse SFI and SFI functions can be omitted in the equalizer-decoder loop. The receiver can then take the form shown in Fig. 5, where the turbo DFE is as shown in Fig. 3, but the SFT and the inverse SFT now only perform the orthogonal transform and the inverse orthogonal transform, respectively.

D. Computational Complexity

We now consider the computational complexity of the pro-posed system. We only consider the receiver for it is much more complicated than the transmitter.

To start, we examine the complexity of equalization. Recall that the equalization process is divided into three stages: MMSE block linear equalization, shaped DFE, and basic DFE. Assume that the channel response is known. Assume also that, in each stage, we calculate the filter coefficients first (the setup phase) and then use the results in equalization (the processing phase). We use the number of complex multiplications to measure the complexity. In MMSE block linear equalization, the setup phase needs approximately 2MN2_{+ N}3_{K computations and the}

processing phase M2_{N K computations. In the shaped DFE}

stage, the setup phase requires a similar amount of computation for the FFF coefficients and MN2_{K computations for the FBF}

coefficients. In the processing phase, the FFF output only needs to be calculated once per signal block, which costs M2_{N K}

computations. Each iteration then needs N3_{K computations for}

FBF filtering. In the basic DFE stage, the FFF needs no com-putation for setup, and the setup of FBF takes MN2_K

compu-tations. Again, the FFF output only needs to be calculated once per signal block, and it takes MN2_{K computations. The}

com-plexity per iteration in the processing phase is the same as the shaped DFE stage.

Next, we examine the complexity associated with the trans-form. In the receiver, an inverse orthogonal transform and an orthogonal transform are needed for each turbo DFE iteration. Different transforms have different computational complexities. If the Hadamard or fast Hadamard transform (FHT) is em-ployed, then there is no complex multiplication but only some complex additions (2NK log2(K) for FHT). If the DFT is used

(which applies to MIMO block single-carrier transmission with cyclic prefixing, or CPBSC), then the amount is 2NK log2(K)

computations for IDFT and DFT.

VII. SIMULATION RESULTS

In this section, we present some simulation results to illus-trate the performance of the proposed system and compare it with the performance of the coded MIMO OFDM and the coded MIMO block single-carrier transmission with cyclic prefixing (CPBSC). All simulations are run with the following conditions: 1) The channel code is the rate-1/2 recursive systematic convolu-tional code (RSC) with generator vectors (23,35); 2) the coded bits are randomly interleaved before being subjected to QPSK modulation; 3) two transmit antennas (N = 2) and two receive antennas (M = 2) are used; 4) the DFT size (K) is 1024; 5) the MIMO channel response and the noise variance are known ex-actly; 6) α = 1 in the second equalization stage (shaped block turbo DFE); and 7) the channel decoder is of the in soft-out type, employing the soft-soft-output Viterbi algorithm (SOVA) [18], and the QPSK symbols are regenerated based on the ML criterion.

A. Simulation Case 1: A Difficult Channel

In this set of simulations, we use the power-delay profile of the Proakis-C channel given by

[0.227 0.460 0.688 0.460 0.227]. (44) MIMO channels are generated based on the correlation matrices

Rl= Tl=

1 0.5

0.5 1 , l = 0, 1, 2, 3, 4. (45) This channel may not be realistic, but is a difficult one to deal with.

Figs. 6 and 7 depict the simulated bit error rate (BER) and block error rate (BLER) performance of the different systems under several different conditions. The results show that the DFT and the FHT variants of the proposed system (both labeled SFI TDFE in the figures) perform close to each other and both

(8)

2 4 6 8 10 12 14 16 10−6 10−5 10−4 10−3 10−2 10−1 Eb/No BER

Results of MIMO systems over Proakis−C PDP

DFT SFI LMMSE DFT SFI TDFE 10th Iter. FHT SFI LMMSE FHT SFI TDFE 10th Iter. OFDM LMMSE OFDM 10th Iter. CP−BSC LMMSE CP−BSC TDFE 10th Iter.

Fig. 6. Simulation case 1: BER performance of different systems under a difficult channel. 2 4 6 8 10 12 14 16 10−3 10−2 10−1 100 Eb/No BLER

Results of MIMO systems over Proakis−C PDP

Fig. 7. Simulation case 1: BLER performance of different systems under a difficult channel.

exhibit superior performance compared to MIMO OFDM and MIMO CPBSC. At BLER = 10−2 _{and with a maximum of 10}

equalizer iterations, the proposed system outperforms MIMO CPBSC by more than 2 dB and it outperforms MIMO OFDM by more than 6 dB.

B. Simulation Case 2: The ETSI Vehicular A Channel

Next, consider the power-delay profile of the ETSI Vehicular A channel. The channel has 6 paths. Let the channel bandwidth be 10 MHz. Then, the path delays are equal to 0, 3.03, 6.93, 10.64, 16.89, and 24.51 sample spacings, respectively. The av-erage power levels are 0,−1, −9, −10, −15, and −20 dB, re-spectively. The spatial correlation is the same as in the last case. Figs. 8 and 9 depict the simulated BER and BLER perfor-mance. Very similar observations as for the last case can be made of this set of results.

VIII. CONCLUSION

We considered channel-coded MIMO OFDM transmission and obtained a condition on its signal for it to attain the

max-2 4 6 8 10 12 14 10−6 10−5 10−4 10−3 10−2 10−1 Eb/No BER

Results of MIMO systems over Vehicular A PDP

Fig. 8. Simulation case 2: BER performance of different systems under an ETSI Vehicular A-based channel condition.

2 4 6 8 10 12 14 16 10−3 10−2 10−1 100 E_b/N_o BLER

Results of MIMO systems over Vehicular A PDP

Fig. 9. Simulation case 2: BLER performance of different systems under an ETSI Vehicular A-based channel condition.

imum diversity and coding gain. Briefly, the condition required certain whiteness in the transmitted coded signal. As this con-dition might not be realizable when the MIMO OFDM sig-nal block is large, we proposed a suboptimal design that could yield approximate whiteness. The proposed design employed an orthogonal transform and a space-frequency interleaver be-tween the channel coder and the multi-antenna OFDM trans-mitter. We proposed a corresponding receiving method based on block turbo DFE. The equalizer operated in the frequency domain, thus enjoying a lower complexity than a time-domain equalizer. We addressed various detailed operational issues of the transmitter and the receiver, with the aim of achieving good performance in acceptable computational complexity. Simula-tion results demonstrated that the proposed transmission tech-nique could outperform the conventional coded MIMO OFDM and the MIMO block single-carrier transmission with cyclic pre-fixing in difficult channel conditions by several dB.

Many additional topics are yet to be investigated concerning the proposed transmission technique. A few examples are chan-nel estimation, synchronization, and multi-user operations.

(9)

APPENDIX

DERIVATION OF THE MMSE SHAPING FILTER

First, let the singular value decomposition (SVD) of the chan-nel matrix be given by

H = UΛV. (46)

Since the desired C commutes with H, we may let C =

VHΛcV . As in typical adaptive filter analysis, assume that ek−1 is white and uncorrelated with n. Then,

J E{wk2} = σ2 e α!!CHH!!2_F+!!CHHH − I!!2_F = σ2 e KN k=1 α |λc(k)λ(k)|2+λc(k) |λ(k)|2− 1 2 (47) where λ(k) and λc(k) are the kth singular values of H and C,

respectively.

Now, constraining the average of the diagonal elements ofB

to zero amounts to requiring

tr"CHH_H#_{= KN} ₍₄₈₎ or 1 KN KN k=1 λc(k) |λ(k)|2= 1. (49)

Employing the Lagrange multiplier method, we consider the modified cost function

Jm J + µ 1 − 1 KN KN k=1 λc(k) |λ(k)|2 (50) and set ∂Jm/∂λc(k) = 0, which leads to the solution

λ∗c(k) =_|λ(k)|µ2_{+ α} (51)

and

C∗_{= µ} _HH_{H + αI}−1_. ₍₅₂₎

SubstitutingC∗_{into the constraint on}_{B, we obtain}

µ = _tr_{(H_H_{H + αI)}KN ₋₁_H_H_H}. (53)

REFERENCES

[1] V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space-time codes for high data rate wireless communication: Performance criterion and code con-struction,” IEEE Trans. Inf. Theory, vol. 44, no. 2, pp. 744–765, Mar. 1998. [2] S. M. Alamouti, “A simple transmit diversity technique for wireless com-munications,” IEEE J. Sel. Areas Commun., vol. 16, no. 8, pp. 1451–1458, Oct. 1998.

[3] Y. Xin, Z. Wang, and G. B. Giannakis, “Space-time diversity systems based on linear constellation precoding,” IEEE Trans. Wireless Commun., vol. 2, no. 2, pp. 294–309, Mar. 2003.

[4] J. Boutros and E. Viterbo, “Signal space diversity: A power- and bandwidth-efficient diversity technique for the Rayleigh fading channel,”

IEEE Trans. Inf. Theory, vol. 44, no. 4, pp. 1453–1467, July 1998.

[5] S. Yiu, R. Schober, and L. Lampe, “Distributed space-time block coding,”

IEEE Trans. Commun., vol. 54, no. 7, pp. 1195–1206, July 2006.

[6] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, U.K.: Cam-bridge Univ. Press, 1999.

[7] C. Laot, A. Glavieux, and J. Labat, “Turbo equalization: Adaptive equal-ization and channel decoding jointly optimized,” IEEE J. Sel. Areas

Com-mun., vol. 19, no. 9, pp. 1744–1752, Sept. 2001.

[8] A. M. Chan and G. W. Wornell, “A class of block-iterative equalizers for intersymbol interference channels: Fixed channel results,” IEEE Trans.

Commun., vol. 49, no. 11, pp. 1966–1976, Nov. 2001.

[9] T. Hwang and Y. Li, “Novel iterative equalization based on energy spread-ing transform,” IEEE Trans. Signal Process., vol. 54, no. 1, pp. 190–203, Jan. 2006.

[10] O. Oyman, R. U. Nabar, H. Bolcskei, and A. J. Paulraj, “Characterizing the statistical properties of mutual information in MIMO channels,” IEEE

Trans. Signal Process., vol. 51, no. 11, pp. 2784–2795, Nov. 2003.

[11] A. K. Sadek, W. Su, and K. J. R. Liu, “Diversity analysis for frequency-selective MIMO-OFDM systems with general spatial and temporal corre-lation model,” IEEE Trans. Commun., vol. 54, no. 5, pp. 878–888, May 2006.

[12] T. Hwang and Y. Li, “Space-time energy spreading transform based MIMO technique with iterative signal detection,” in Proc. IEEE

GLOBE-COM, vol. 4, Nov.–Dec. 2004, pp. 2470–2474.

[13] W. H. Gerstacker, R. R. Muller, and J. B. Huber, “Iterative equaliza-tion with adaptive soft feedback,” IEEE Trans. Commun., vol. 48, no. 9, pp. 1462–1466, Sept. 2000.

[14] R. Freund, G. Golub, and N. Nachtigal, “Iterative solution of linear sys-tems,” Acta Numer., vol. 1, pp. 57–100, 1992.

[15] M. Piana and M. Berterp, “Projected Landweber method and precondi-tioning,” Inverse Problems, vol. 13, no. 2, pp. 441–463, 1997.

[16] B. Kaltenbacher, “Some Newton-type methods for the regularization of nonlinear ill-posed problems,” Inverse Problems, vol. 13, no. 3, pp. 729– 753, 1997.

[17] S.-J. Lee, A. C. Singer, and N. R. Shanbhag, “Linear turbo equalization analysis via BER transfer and EXIT charts,”" IEEE Trans. Signal Process., vol. 53, no. 8, pp. 2883–2897, Aug. 2005.

[18] J. Hagenauer and P. Hoeher, “A Viterbi algorithm with soft-decision out-puts and its applications,” in Proc. IEEE GLOBECOM, vol. 3, Nov. 1989, pp. 1680–1686.

Kun-Chien Hung received the B.S. degree from National Chiao Tung University, Hsinchu, Taiwan, R.O.C., in 2001. He is currently pursuing the Ph.D. degree at National Chaio Tung University. His cur-rent research interests are in digital signal processing, communication receiver design, and wireless commu-nication.

David W. Lin received the B.S. degree from National Chiao Tung University, Hsinchu, Taiwan, R.O.C., in 1975 and the M.S. and the Ph.D. degrees from the University of Southern California, Los Angeles, CA, U.S.A., in 1979 and 1981, respectively, all in electri-cal engineering. He was with Bell Laboratories during 1981–1983 and with Bellcore during 1984–1990 and again during 1993–1994. Since 1990, he has been a professor in the Department of Electronics Engineer-ing and the Center for Telecommunications Research, National Chiao Tung University, except for a leave in 1993–1994. He has conducted research in digital adaptive filtering and tele-phone echo cancellation, digital subscriber line (DSL) and coaxial network transmission, speech and video coding, and wireless communication. His re-search interests include various topics in communication engineering and signal processing.