EM-Based Iterative Receivers for OFDM and BICM/OFDM Systems in Doubly Selective Channels

(1)

EM-Based Iterative Receivers for

OFDM and BICM/OFDM Systems in

Doubly Selective Channels

Meng-Lin Ku, Member, IEEE, Wen-Chuan Chen, and Chia-Chi Huang

Abstract—In this paper, we resort to the

expectation-maximization (EM) algorithm to tackle the inter-carrier interfer-ence (ICI) problem, caused by time-variant multipath channels, for both orthogonal frequency division multiplexing (OFDM) systems and bit-interleaved coded modulation (BICM)/OFDM systems. We first analyze the ICI in frequency domain with a reduced set of parameters, and following this analysis, we derive an EM algorithm for maximum likelihood (ML) data detection. An ML-EM receiver for OFDM systems and a TURBO-EM receiver for BICM/OFDM systems are then developed to reduce computational complexity of the EM algorithm and to exploit temporal diversity, the main idea of which is to integrate the pro-posed EM algorithm with a groupwise ICI cancellation method. Compared with the ML-EM receiver, the TURBO-EM receiver further employs a soft-output Viterbi algorithm (SOVA) decoder to exchange information with a maximum a posteriori (MAP) EM detector through the turbo principle. Computer simulation demonstrates that the two proposed receivers clearly outperform the conventional one-tap equalizer, and the performance of the TURBO-EM receiver is close to the matched-filter bound even at a normalized maximum Doppler frequency (MDF) up to 0.2.

Index Terms—Orthogonal frequency division multiplexing,

bit-interleaved coded modulation, inter-carrier interference, expectation-maximization algorithms, turbo receivers.

I. INTRODUCTION

O

RTHOGONAL frequency division multiplexing

(OFDM) is a promising technique to realize high data rate transmission over multipath fading channels. Due to the use of a guard interval (GI), it allows for a simple one-tap equalizer [1]. In addition, bit-interleaved coded modulation (BICM) combined with OFDM, known as BICM/OFDM, is introduced as a way to offer superior performance by exploiting frequency diversity [2]. Over the past decade, OFDM has found widespread application in several standards such as 802.16e wireless metropolitan area network (WMAN) [3]. However, in mobile radio environments, multipath channels are usually time-variant. The channel time variation destroys the orthogonality among Manuscript received April 29, 2009; revised October 29, 2009 and May 31, 2010; accepted February 10, 2011. The associate editor coordinating the review of this paper and approving it for publication is F. Takawira.

M. L. Ku is with the Department of Communication Engineering, National Central University, Taoyuan, 320, Taiwan (e-mail: mlku@ce.ncu.edu.tw).

W. C. Chen is with the Institute for Information Industry, Taipei, 106, Taiwan (e-mail: tata9876@gmail.com).

C. C. Huang is with the Department of Electrical Engineer-ing, National Chiao Tung University, Hsinchu, 300, Taiwan (e-mail: huangcc@faculty.nctu.edu.tw).

Digital Object Identifier 10.1109/TWC.2011.030911.090611

subcarriers, and thereby yields inter-carrier interference (ICI). The effect of ICI on the bit-error-rate (BER) performance has been intensively studied in [4], [5]. As the maximum Doppler frequency (MDF) increases, the one-tap equalizer is no longer sufficient to conquer this channel distortion. It is shown in [5] that if the MDF is larger than 8% of the subcarrier spacing, the signal-to-ICI plus noise (SINR) ratio is less than 20dB. Hence, in order to obtain reliable reception, there is a need for efficient algorithms to combat the ICI effect in a mobile OFDM receiver.

A wide variety of schemes for ICI mitigation have been proposed, mainly consisting of ICI self-cancellation, blind equalization, and ICI cancellation-based equalization [5] – [19]. At the expense of reduced bandwidth efficiency, the ICI self-cancellation scheme is simple and effective to provide good BER performance [6], [7]. The scheme, however, is not suitable for existing standards as modification to trans-mit formats is required. In contrast, the blind equalization scheme is efficient in saving bandwidth but it involves high computation complexity [8]. Among the three ICI mitigation schemes, the ICI cancellation-based equalization scheme is the most common [9] –[19]. Based on zero-forcing (ZF) or minimum mean-square-error (MMSE) criterion, two optimal frequency-domain equalizers are derived in [9] –[12]. To enhance the performance, successive interference cancellation (SIC) with optimal ordering can be incorporated with the MMSE equalizer [13]. Several works, like [5] and [14]–[16], are targeted toward reducing the complexity of frequency-domain equalizers. By ignoring small ICI terms, a partial MMSE equalizer is proposed in [14] to avoid the inversion of a large-size matrix, while a recursive algorithm is developed in [5] for calculation of equalizer coefficients. Moreover, [15] incorporates a partial MMSE equalizer with SIC, and [16] combines the partial MMSE equalizer with BICM. Both methods benefit greatly from time diversity gains induced by mobility. We also find two decision-feedback (DF) equalizers in [17], [18], which make use of power series expansion on time-variant frequency response. Apart from using frequency-domain equalizers, [9] and [19] consider time-frequency-domain equal-izers which first achieve ICI shortening, followed by MMSE detection and parallel interference cancellation, respectively, to remove the residual ICI.

For successful implementation of the ICI cancellation-based equalization, it is essential to obtain an accurate estimate of channel variation or the equivalent ICI channel matrix. In 1536-1276/11$25.00 c⃝ 2011 IEEE

(2)

general, this can be accomplished through the use of embed-ded reference signals such as pilot symbols or pilot tones. In [13], an MMSE estimator, which demands frequent pilot symbols inserted among OFDM data symbols, is proposed to estimate time-variant channel impulse response (CIR). As complexity is concerned, most studies model the time variation of each channel tap as a polynomial function. By assuming CIR varies in a linear fashion within an OFDM symbol, [14] and [17] exploit pilot symbols for parameter estimation, whereas [18] and [20] belong to the category which uses pilot tones. It is concluded that a linear model is valid for capturing channel dynamics with the normalized MDF up to 0.1. When normalized MDF is larger than 0.1, a 2-D polynomial surface function is suggested in [10] to model time-varying channel frequency response and to gain better performance. Since most currently developed communication standards, e.g., 802.16e and long-term evolution (LTE), aim at providing mobility with the normalized MDF up to about 0.1 which is regarded as a fairly fast-fading scenario for mobile environments, we also consider using a linear model for approximating channel time variation in this paper.

The expectation-maximization (EM) algorithm can facilitate solving the maximum-likelihood (ML) estimation problem in an iterative manner which alternates between an E-step, calculating an expected complete log-likelihood (ECLL) func-tion, and an M-step, maximizing the ECLL function with respect to some unknown parameters [21]. Recently, a few EM-based methods have been proposed for channel estimation (CE) and data detection in OFDM systems [12], [22]–[24]. The major difference among these methods lies in whether they formulate the original ML problem into a data sequence detection problem or a channel variable estimation problem. Yet, the wireless channel is assumed to be quasi-static in [22]– [24], i.e., channel gain remains constant over the duration of one OFDM symbol. Even though the EM-based channel estimation scheme in [12] can be implemented in time-varying fading channels, the proposed EM scheme can only estimate average channel gains.

In this paper, we investigate two EM-based iterative re-ceivers for OFDM and BICM/OFDM systems in doubly selective fading channels. By assuming channel varies in a linear fashion, we first analyze the ICI effect in frequency domain and derive a data detection method based on the EM algorithm using the ML criterion. In an effort to reduce complexity, groupwise processing is adopted for the two EM-based receivers. For OFDM systems, we implement an ML-EM receiver which iterates between a groupwise ICI canceller and an EM detector. Based on this receiver structure, a TURBO-EM receiver for BICM/OFDM systems is then proposed to successively improve the performance by applying the turbo principle. Finally, for the initial setting of the two receivers, MMSE-based CE is first performed by using a few pilot tones and it is later improved via the decision feedback methodology. To the best of our knowledge, this is the first work studying an EM approach for joint ICI estimation and data detection in multipath time-varying channels. Our work differs from the previous EM-based approaches [12], [22]– [24] in two aspects. First, the works in [22]–[24] do not take into account the ICI problem, and they can only be

[ ] x k [ ] c k t n[ ] Information bits OFDM Modulator 2 -ary Mapper Binary Convolutional Encoder Bitwise Interleaver BICM γ

Fig. 1. BICM/OFDM systems.

applied in quasi-static channels. Second, although the work in [12] can be extended to incorporate the ICI approximation and cancellation for data detection in time-varying channels, the ICI estimation and data detection schemes therein are designed separately. In other words, the channel mean values between two consecutive OFDM symbols are first estimated to construct the ICI matrix through linear interpolation, and it is followed by an MMSE data detector.

The rest of this paper is organized as follows. In Section II, we describe the OFDM and BICM/OFDM systems, followed by the analysis of ICI in frequency domain. According to the frequency domain ICI model, an EM-based data detection method is developed in Section III. In Section IV, an ML-EM receiver and a TURBO-EM receiver are proposed. Afterwards, we describe the initialization procedure of the two receivers and discuss their computation complexity. In Section V, we present our computer simulation results. Finally, some conclu-sions are drawn in Section VI.

Notation: Superscripts (⋅)𝑇 _and _(⋅)† _{stand for transpose}

and Hermitian transpose, respectively. Column vectors and matrices are denoted by boldface lowercase and uppercase letters, respectively. The notation 𝐸 [⋅] takes expectation. We use I𝐾 to denote a 𝐾 × 𝐾 identity matrix and 𝑑𝑖𝑎𝑔 {x} to

denote a diagonal matrix withx on its diagonal. The notations ((⋅))𝑁 and ⊙ present the modulo-𝑁 and Hadamard product

operation, respectively. The notation{⋅} denotes a set, e.g. a setx = {𝑥1, . . . , 𝑥𝑁}. Further, we denote the vector norm as

∥⋅∥.

II. SYSTEMMODEL

A. Transmitted and Received Signals

Fig. 1 shows a BICM/OFDM system, where information bits are modulated by BICM along with an OFDM modulator [2]. Data symbols are generated by concatenating a binary convolutional encoder with a 2𝛾_{-ary mapper through a}

bit-wise interleaver (denoted as Π). Throughout this paper, we only consider binary phase shift keying (BPSK) modulation (𝛾 = 1); therefore, data symbols are one-to-one mapped from coded bits. Subsequently, these data symbols are transmit-ted over 𝑁𝐹 consecutive OFDM symbols. Let 𝑥 [𝑘] be the

data symbols to be transmitted over the 𝑘𝑡ℎ subcarrier for

an OFDM symbol. After modulated by an 𝑁-point inverse

discrete Fourier transform (IDFT) and appended with GI of length 𝑁𝐺, time domain samples of an OFDM symbol are

given by 𝑡 [𝑛] = 1 𝑁 𝑁−1_∑ 𝑘=0 𝑥 [𝑘] 𝑒𝑗2𝜋𝑘𝑛 𝑁 (1)

for 𝑛 = −𝑁𝐺, . . . , 𝑁 − 1, where we assume that 𝑥 [𝑘] is

mapped from the coded bit stream 𝑐 [𝑘] , and GI considered here is a cyclic prefix.

(3)

At the receiver, by removing the GI and taking the discrete Fourier transform (DFT), the demodulated signal in frequency domain is given by [20]: 𝑦 [𝑘] = 𝐻 [𝑘, 𝑘] 𝑥 [𝑘] + 𝑁−1∑ 𝑚=0,𝑚∕=𝑘 𝐻 [𝑘, 𝑚] 𝑥 [𝑚] ICI term +𝑧 [𝑘] (2) where 𝐻 [𝑘, 𝑚] = ∑𝐿−1_𝑙=0 𝛼 [𝑘, 𝑚, 𝑙] 𝑒−𝑗2𝜋𝑚𝑙 𝑁 represents

the leakage term of ICI from the 𝑚𝑡ℎ subcarrier to

the 𝑘𝑡ℎ subcarrier, 𝑘 = 0, . . . , 𝑁 − 1, 𝛼 [𝑘, 𝑚, 𝑙] = (1/𝑁)∑𝑁−1_𝑛=0 ℎ [𝑙, 𝑛] 𝑒−𝑗2𝜋𝑛((𝑘−𝑚))𝑁𝑁 is the DFT of a time

series ℎ [𝑙, 𝑛] corresponding to the 𝑙𝑡ℎ channel tap at time delay 𝑛, for 𝑙 = 0, . . . , 𝐿 − 1 and 𝑛 = 0, . . . , 𝑁 − 1, and 𝑧 [𝑘] is the additive white Gaussian noise with zero-mean and variance𝜎2

𝑧. Moreover, we assume that the channel tapℎ [𝑙, 𝑛]

for different 𝑙 is an independent and identically distributed (i.i.d.) complex Gaussian random variable with zero mean and variance Ξ𝑙. From (2), we can observe that a demodulated

subcarrier is affected by the ICI contributed from all the other subcarriers, and this effect severely degrades the system performance if a conventional one-tap equalizer is employed [5].

B. Modeling of ICI in Frequency Domain

We adopt a linear function to model the temporal variation of each channel tap over an OFDM symbol, as follows:

ℎ [𝑙, 𝑛] = 𝑎 [𝑙, 1] 𝑛 + 𝑎 [𝑙, 0] (3) for 𝑙 = 0, . . . , 𝐿 − 1 and 𝑛 = 0, . . . , 𝑁 − 1, where 𝑎 [𝑙, 𝑝] is the complex coefficient of the 𝑝𝑡ℎ order for the 𝑙𝑡ℎ tap. Substituting (3) into𝛼 [𝑘, 𝑚, 𝑙] of (2), we can obtain

𝛼 [𝑘, 𝑚, 𝑙] =

{_𝑁−1

2 𝑎 [𝑙, 1] + 𝑎 [𝑙, 0] , for 𝑘 = 𝑚

Φ [𝑘, 𝑚] 𝑎 [𝑙, 1] , otherwise (4)

whereΦ [𝑘, 𝑚] can be derived as Φ [𝑘, 𝑚] = _𝑁1 ∑𝑁−1 𝑛=0 𝑛𝑒 −𝑗2𝜋𝑛(𝑘−𝑚) 𝑁 (5) = −1₂+ 𝑗_{2 tan (𝜋 ((𝑘 − 𝑚))}1 𝑁/𝑁)

According to the fact of1 ≤ ((𝑘 − 𝑚))_𝑁 ≤ 𝑁 −1, we observe that the value of 𝜋((𝑘−𝑚))𝑁

𝑁 ranges from 𝑁𝜋 to (𝑁−1)𝜋𝑁 . Apply

the Maclaurin series of tan (𝑥) ≈ 𝑥 , for ∣𝑥∣ < 𝜋

2, and after

some straightforward derivation, we can representΦ [𝑘, 𝑚] as Φ [𝑘, 𝑚] ≈ ⎧  ⎨  ⎩ −1 2, for ((𝑘 − 𝑚))𝑁 = 𝑁2 −1 2+2𝜋((𝑘−𝑚))𝑗𝑁 𝑁, for 1 ≤ ((𝑘 − 𝑚))𝑁 < 𝑁 2 −1 2+2𝜋(((𝑘−𝑚))𝑗𝑁 𝑁−𝑁), for 𝑁 2 < ((𝑘 − 𝑚))𝑁 ≤ 𝑁 − 1 (6)

From (6), it follows thatΦ [𝑘, 𝑚] is a fixed value, which only depends on(𝑘 − 𝑚) modulo 𝑁 and it can be calculated in advance. By using (4) and (6), (2) can be rewritten as

𝑦 [𝑘] = 𝐻 [𝑘, 𝑘] 𝑥 [𝑘] + 𝑁−1∑ 𝑚=0,𝑚∕=𝑘 Φ [𝑘, 𝑚] 𝑤 [𝑚] 𝑥 [𝑚] ICI term +𝑧 [𝑘] (7) where𝑤 [𝑚] = ∑𝐿−1_𝑙=0 𝑎 [𝑙, 1] 𝑒−𝑗2𝜋𝑚𝑙

𝑁 defines a new channel

variable in frequency domain. It is worthy to mention that in orthogonal frequency division multiple access (OFDMA) systems, each user merely detects a set of nearby subcarriers of interest (e.g. zones or clusters in 802.16e), instead of all the 𝑁 subcarriers [3]. With the formulation of (7), one can deal only with a small number of channel variables even when the number of channel taps is large. Finally, we can rewrite (7) in a matrix notation, leading to a more compact representation:

y = Hx + z = (M + ΦW) x + z = Mx +Φw + z (8)⌢

where y = [𝑦 [0] , . . . , 𝑦 [𝑁 − 1]]𝑇, the (𝑘, 𝑚) 𝑡ℎ entry

of H is 𝐻 [𝑘, 𝑚], w = [𝑤 [0] , . . . , 𝑤 [𝑁 − 1]]𝑇, x =

[𝑥 [0] , . . . , 𝑥 [𝑁 − 1]]𝑇 , z = [𝑧 [0] , . . . , 𝑧 [𝑁 − 1]]𝑇, W = 𝑑𝑖𝑎𝑔 {w}, M = 𝑑𝑖𝑎𝑔{[𝐻 [0, 0] , . . . , 𝐻 [𝑁 − 1, 𝑁 − 1]]𝑇}, the (𝑘, 𝑚) 𝑡ℎ entry of Φ is just Φ [𝑘, 𝑚], the (𝑘, 𝑚) 𝑡ℎ entry

of Φ is given by Φ [𝑘, 𝑚] 𝑋 [𝑚], the value of Φ [𝑘, 𝑚] for⌢

𝑘 = 𝑚 is defined as zero. Moreover, we have w = Fs, where s = [𝑎 [0, 1] , . . . , 𝑎 [𝐿 − 1, 1]]𝑇 andF is a DFT matrix of size 𝑁 × 𝐿, with the (𝑚, 𝑙) 𝑡ℎ entry given by 𝑒−𝑗2𝜋𝑚𝑙

𝑁 .

III. PROPOSEDEM-BASEDDATADETECTIONMETHOD

From (8), the optimum ML data detection problem can be formulated as follows: x𝑀𝐿= arg max x∈{1,−1}𝑁𝐿 (y∣ x) (9) = arg max x∈{1,−1}𝑁log ∫ 𝑃 (y∣ w, x) 𝑃 (w) 𝑑w where 𝐿 (⋅) is a log-likelihood function, obtained by taking logarithm of the corresponding probability density function (PDF) 𝑃 (⋅). Direct calculation using (9), however, involves multidimensional integration over the hidden variablew. With the ability to tackle missing data models, the EM algorithm is considered as a good alternative to solve (9), and the core idea behind this algorithm is to iterate between E-step and M-step such that monotonic increase in 𝐿 (y∣ x) is obtained. More details of the algorithm and its application can be found in [21]. The E-step and the M-step associated with the optimization problem of (9) are expressed respectively as

Ω(x∣ y, ˆx(𝑚−1))_{= 𝐸} w∣y,ˆx(𝑚−1)[𝐿 (y, w∣ x)] (10) ˆx(𝑚)_{= arg} _max x∈{1,−1}𝑁Ω ( x∣ y, ˆx(𝑚−1)) ₍₁₁₎

where ˆx(𝑚) _{denotes the hard decision of} _{x at the 𝑚𝑡ℎ}

EM iteration, and Ω(x∣ y, ˆx(𝑚−1)) _{is known as the ECLL}

function, to be maximized in the M-step of (11). By using the fact that𝐿 (y, w∣ x) = 𝐿 (y∣ w, x) + 𝐿 (w) and from (8), we can further simplify (10) as

Ω(x∣ y, ˆx(𝑚−1))_{= 𝐸} w∣y,ˆx(𝑚−1)[𝐿 (y∣ w, x)] + 𝐶 = 𝐸_w∣y,ˆ_x(𝑚−1) [ −1 𝜎2 𝑧 ∥y − Hx∥ 2]_{+ 𝐶} =−1 𝜎2 𝑧 ( y†_{y − y}†_{Hx − x}_˜ †_H_˜†_{y + x}†_Σx_˜ )_{+ 𝐶} (12)

where H˜ and Σ˜ denote 𝐸w∣y,ˆx(𝑚−1)[H] and

𝐸_w∣y,ˆ_x(𝑚−1)

[

(4)

𝐶 in (12) can be dropped for simplicity. Without loss of generality, the channel state information (CSI) M can be estimated through pilot tones embedded in each OFDM symbol, and we denote the estimate as ˆM. By inserting

H = ˆM + ΦW into ˜H and ˜Σ, it is straightforward to

calculate the two terms as ˜ H = 𝐸_w∣y,ˆ_x(𝑚−1) [ ˆ M + ΦW]= ˆM + Φ ˜W (13) ˜ Σ = 𝐸w∣y,ˆx(𝑚−1) [( ˆ M + ΦW)†(M + ΦWˆ )] = 𝐸w∣y,ˆx(𝑚−1) [ ˆ M†_{M + ˆ}ˆ _M†_{ΦW + W}†_Φ†_Mˆ +W†_Φ†_ΦW] = ˆM†_{M + ˆ}_ˆ _M†_{Φ ˜}_{W + ˜}_W†_Φ†_M_ˆ +𝐸w∣y,ˆx(𝑚−1)[(ww† )_𝑇 ⊙(Φ†_Φ)] = ˆM†_{M + ˆ}ˆ _M†_{Φ ˜}_{W + ˜}_W†_Φ†_Mˆ +(𝐸w∣y,ˆx(𝑚−1)[ww†])𝑇 ⊙(Φ†Φ) (14)

where W˜ is defined as 𝐸_w∣y,ˆ_x(𝑚−1)[W]. Let

˜ W =Δ 𝑑𝑖𝑎𝑔 { ˜w}, ˜w =Δ 𝐸_w∣y,ˆ_x(𝑚−1)[w] and ˜ Σw =Δ 𝐸w∣y,ˆx(𝑚−1) [ (w − ˜w) (w − ˜w)†] = 𝐸_w∣y,ˆ_x(𝑚−1) [ ww†]_{− ˜}_{w ˜}_w†_{. We can rewrite (14) as} ˜ Σ = ˆM†_{M + ˆ}_ˆ _M†_{Φ ˜}_{W + ˜}_W†_Φ†_M_ˆ +(Σ˜w+ ˜w ˜w† )𝑇 ⊙(Φ†_Φ) (15)

Also, from (8), it is observed that the conditional PDF

𝑃(w∣ y, ˆx(𝑚−1))_{is a Gaussian distribution, with mean and}

covariance given by [25] ˜ w = 𝝁_w+ CwyC−1yy ( y − 𝝁_y) (16) ˜ Σw= Cww− CwyC−1yyCyw (17)

where the relevant terms are defined and statistics are cal-culated in the following way. We first apply a first-order autoregressive (AR) channel model to compute𝝁_wandCww. Details are provided in Appendix A, and the two terms can be derived as

𝝁_w= 𝐸 [w] = 0 (18)

Cww= 𝐸

[

(w − 𝝁_w) (w − 𝝁_w)†]= FCssF† (19) whereCss is a diagonal matrix with the 𝑙𝑡ℎ diagonal entry equal to 2(1−𝛼)Ξ𝑙

(𝑁−1)2 , and𝛼 is the channel tap autocorrelation as

defined in (A.2). Moreover, we can get

𝝁_y= 𝐸 [y] = ˆMˆx(𝑚−1) (20) Cyy= 𝐸[(y − 𝝁y ) ( y − 𝝁_y)†] (21) =Φ⌢(𝑚−1)Cww ⌢ Φ(𝑚−1) † + 𝜎2 𝑧I𝑁 ( ) ( )† 0 † 1 Initialization: ˆ _ˆ

Calculate , choose , and set 0 Execution of EM algorithm: do { 1 E-step: Compute statistics ~ m m m m m − − = = + = = ww ss ww M x C FC F w C

(

( ) ( )

)

(

( )

)

( )

(

( ) ( )

)

( )

(

)

(

)

{ }

† † † 1 1 1 2 1 1 1 1 1 2 1 † † † † † † ˆ ˆ Σ ˆ ˆ ˆ ˆ ˆ

For all 1, 1 , calculate

m m m m z N m m z N T N σ σ − − − − − − − − + − = − + = + = + + + + ∈ − ww w ww ww ww ww w C I y Mx C C C I C H M W M M M W W M ww x ( )

(

)

( )

( ) { } ( )

(

)

( ) ( )

(

)

1 † † † † † 2 1 1, 1 1 1 ˆ , M-step: ˆ ˆ arg max , ˆ ˆ } while and N m z m m m m EM m N σ − − ∈ − − Ω = − − − + = Ω ≠ ≤ x x y x y y y Hx x H y x x x x y x x x Φ( Φ( Φ( Φ Φ Φ Φ Φ Φ( Φ( Φ( Φ( ∼ Σ ∼ Σ ∼ Σ ∼ ∼ ∼ ∼ ∼ ∼ ∼ _{∼ ∼} _.

Fig. 2. EM-based data detection method.

Cwy= 𝐸 [ (w − 𝝁_w)(y − 𝝁_y)†]= Cww ⌢ Φ(𝑚−1) † (22) whereCwy= C†yw, and ⌢ Φ(𝑚−1) is obtained by substituting the hard decision ˆx(𝑚−1) _into _{Φ. Using (13)-(22), we can}⌢

calculate the ECLL function of (12). The EM algorithm for data detection is then summarized in Fig. 2, and it is repeated until a stopping criterion holds. The stopping criterion is to check whetherˆx(𝑚)_{= ˆx}(𝑚−1)_{or the iteration number reaches}

a predefined limit𝑁𝐸𝑀.

IV. IMPLEMENTATION: EM-BASEDITERATIVERECEIVERS The data detection method in (10) and (11) involves not only 𝑁 ×𝑁 matrix inversion in the E-step but also heuristic search over 2𝑁 _{lattices in the M-step, thereby making computation}

intractable in practice. In this section, we investigate two EM-based iterative receivers for practical implementation. A. ML-EM Receiver for OFDM Systems

As depicted in Fig. 3, we consider an ML-EM receiver with 𝑁 subcarriers partitioned into 𝑅 groups, and each group consists of 𝐺 subcarriers. Denote the 𝑗𝑡ℎ group of subcarriers as G𝑗 = {𝑗𝐺, . . . , (𝑗 + 1) 𝐺 − 1}, for 𝑗 =

0, . . . , 𝑅 − 1. Next, we define the 𝑗𝑡ℎ data group and ob-servation group asx𝑗 = [𝑥 [𝑗𝐺] , . . . , 𝑥 [(𝑗 + 1) 𝐺 − 1]]𝑇 and

y𝑗 = [𝑦 [𝑗𝐺] , . . . , 𝑦 [(𝑗 + 1) 𝐺 − 1]]𝑇, respectively. Further,

we use B𝑗 to denote the set{((𝑗 − 𝑄))𝑅, . . . , ((𝑗 + 𝑄))𝑅}.

Without loss of generality, we focus on detecting the 𝑘𝑡ℎ data group. Assume that due to the ICI effect, the energy

(5)

y yj i x H ˆ M ˆ M i x G 1 R− y k y ( ) (k Q− )R y 0 y k x 0 x x((k Q− −1))R x((k Q− ))R x((k Q+))R x((k Q+ +1))R xR−1 ( ) (k Q+ )R y E x E y M y OFDM Demodulator Initialization ICI Canceller EM (ML) Detector CE Update (optional)

(a) ML-EM receiver for OFDM systems.

(b) An illustration for group detection. 1. Calculate statistics of w , by using x and y .

2. Calculate ECLL function, by using y .

iteration=0 iteration≥1

…

M E

Fig. 3. (a) ML-EM receiver for OFDM systems; (b) an illustration for group detection.

of x𝑘 is spread over 2𝑄 + 1 observation groups of y𝑗, for

𝑗 ∈ B𝑘, which also contains interfering energy caused by

other adjacent data groupsx𝑗, for𝑗 ∈ B𝑗∖ {𝑘}. As observed

in Fig. 3(a), there is an additional iteration loop outside the EM detector, called ML iteration. Within an ML iteration, the ICI is first reconstructed and subtracted from the observation group, yielding a signal:

¯y𝑗= y𝑗−

∑

𝑖∈ B𝑘∖{𝑘}

¯

H𝑗,𝑖¯x𝑖 (23)

for𝑗 ∈ B𝑘, where¯x𝑖 is the tentative decision of x𝑖, and the

(𝑝, 𝑞) 𝑡ℎ entry of ¯H𝑗,𝑖 is given by the (𝑗𝐺 + 𝑝, 𝑖𝐺 + 𝑞) 𝑡ℎ

entry of ¯H, the estimate of H, for 𝑝, 𝑞 = 0, . . . , 𝐺 − 1. Both ¯x𝑖 and ¯H are obtained from the output of the EM

detector at the previous ML iteration. After ICI cancella-tion, the EM detector is executed by applying the EM-based data detection method in Section III. Define x𝐸 =

[ ¯x𝑇 ((𝑘−𝑄−1))𝑅, . . . , ˆx (𝑚−1)𝑇 𝑘 , . . . , ¯x𝑇((𝑘+𝑄+1))𝑅 ]𝑇 and y𝐸 = [ y𝑇 ((𝑘−𝑄−1))𝑅, . . . , y 𝑇 ((𝑘+𝑄+1))𝑅 ]_𝑇

, whereˆx(𝑚−1)_𝑘 is the hard decision of x𝑘 at the (𝑚 − 1) 𝑡ℎ EM iteration within the

EM detector. Particularly, for 𝑚 = 1 , we initialize ˆx(0)_𝑘 as ¯x𝑘. In the E-step, at the 𝑚𝑡ℎ EM iteration, we replace x

andy (in Fig. 2) with x𝐸 and y𝐸 to calculate the statistics

˜

W, ˜H and ˜Σ. The size of these three matrices now

be-comes(2𝑄 + 3) 𝐺 × (2𝑄 + 3) 𝐺. After that, the interference-reduced signal y𝑀 = [ ¯y𝑇 ((𝑘−𝑄))𝑅, . . . , ¯y 𝑇 ((𝑘+𝑄))𝑅 ]𝑇 is taken to compute the ECLL function, for each combination of

x𝑘 ∈ {1, −1}𝐺, as follows: Ω (x𝑘∣ y𝑀, y𝐸, x𝐸) = −1 𝜎2 𝑧 ( y†_𝑀y𝑀 − y†𝑀H˜𝑘x𝑘− x𝑘†H˜†𝑘y𝑀 + x†𝑘Σ˜𝑘x𝑘)(24)

where the matrices ˜H𝑘 and ˜Σ𝑘 are of size(2𝑄 + 1) 𝐺 × 𝐺

and 𝐺 × 𝐺 , with the (𝑝, 𝑞) 𝑡ℎ entry given by

the (𝐺 + 𝑝, (𝑄 + 1) 𝐺 + 𝑞) 𝑡ℎ entry of ˜H and the

((𝑄 + 1) 𝐺 + 𝑝, (𝑄 + 1) 𝐺 + 𝑞) 𝑡ℎ entry of ˜Σ, respectively. Finally, the decision of x𝑘 is calculated in the M-step

according to:

ˆx(𝑚)_𝑘 = arg max x𝑘∈{1,−1}𝐺

Ω (x𝑘∣ y𝑀, y𝐸, x𝐸) (25)

Within the EM detector, the above procedure is conducted to detect 𝑅 groups simultaneously, i.e., we use parallel pro-cessing for group detection. Once the stopping criterion is met, the receiver proceeds to the next ML iteration until a good performance is achieved, and ¯x𝑘 and ¯H are updated.

In other words, at the end of the𝑘𝑡ℎ parallel processing, ¯x𝑘

is replaced by ˆx(𝑚)_𝑘 , the (𝑘𝐺 + 𝑗) 𝑡ℎ diagonal entry of ¯W is renewed by the((𝑄 + 1) 𝐺 + 𝑗) 𝑡ℎ diagonal entry of ˜W, for 𝑘 = 0, . . . , 𝑅 − 1 and 𝑗 = 0, . . . , 𝐺 − 1, and ¯H is calculated as ˆM + Φ ¯W.

The intuition behind the group detection is explained as follows. While computing and maximizing the ECLL func-tion, we can acquire the diversity gains through examining the interference-reduced signals y𝑀, of which the energy is

contributed mainly by the data group x𝑘. Therefore, it is

reasonable to expect that full diversity gain is achievable when the value of 𝑄 is sufficiently large and the ICI is perfectly cancelled out. The diversity gain we can achieve also depends upon whether a good estimate of ¯W is provided. Recall that the 2𝑄 + 1 observation groups (in the neighborhood of y𝑘)

are sufficient for estimating the channel variables of the 𝑘𝑡ℎ group, but they also include ICI from adjacent data groups. For example, the((𝑘 − 𝑄))_𝑅𝑡ℎ observation group is interfered by the data groups x𝑗, for 𝑗 = ((𝑘 − 2𝑄))𝑅, . . . , 𝑘. In order to

obtain accurate estimate of channel variables, it is necessary to count the ICI effect from the adjacent data groups. Hence, in the E-step, we takes an enlarged cluster of original observation groups y𝐸, as well as the corresponding data groups x𝐸,

to calculate the channel variables of the 𝑘𝑡ℎ group. Strictly speaking, the size ofy𝐸 andx𝐸 should be chosen as4𝑄 + 1,

but our experimental trials suggest that the choice of2𝑄+3 is large enough to get a good result. In other words, it means that the adjacent two data groups (outside the 2𝑄 + 1 observation groups) cause the main power of interference to the 2𝑄 + 1 observation groups, and the ICI contributed from the other adjacent data groups (outside the2𝑄 + 1 observation groups) have negligible influence on the accuracy of estimation of channel variables because it is relatively small.

B. TURBO-EM Receiver for BICM/OFDM Systems

Fig. 4 shows the TURBO-EM receiver for BICM/OFDM systems. The receiver implements the turbo iterations by exchanging the extrinsic information between the EM detector (after the ICI canceller) and the soft-output Viterbi algorithm (SOVA) decoder. Within each TURBO iteration, the ICI is

(6)

y y_j , C ext i H ˆ M ˆ M i x 1 − , D ext i , C post i Initialization OFDM Demodulator ICI Canceller CE Update (optional) EM (MAP) Detector SOVA Detector iteration=0 iteration≥1 λ λ λ Π Π Π

Fig. 4. TURBO-EM receiver for BICM/OFDM systems.

first reconstructed and subtracted from the observation group to obtain the interference-reduced signal¯y𝑗 by using (23), but

with soft decision ⌢_x

𝑖 for interference reconstruction. In this

way, we can mitigate the error propagation effect, and the soft decision for the BPSK case is given by [26]

⌢_x 𝑖= 𝐸 [x𝑖] = tanh ( 𝝀𝐶,𝑝𝑜𝑠𝑡_𝑖 2 ) (26) where𝝀𝐶,𝑝𝑜𝑠𝑡_𝑖 =[𝜆𝐶,𝑝𝑜𝑠𝑡_{[𝑖𝐺] , . . . , 𝜆}𝐶,𝑝𝑜𝑠𝑡_{[(𝑖 + 1) 𝐺 − 1]}]𝑇

is the a posteriori log-likelihood ratio (LLR), associated with x𝑖, from the SOVA decoder at the previous TURBO iteration,

and the LLR of a data symbol𝜗 is defined as the logarithm of the ratio of𝑃 (𝜗 = +1) to 𝑃 (𝜗 = −1). Before applying the SOVA decoder, we execute the EM detector with several EM iterations by using the interference-reduced signal ¯y𝑗 from

the ICI canceller. Here, we apply the maximum a posteriori (MAP) EM algorithm to the EM detector, which further takes account of the a priori information to compute the ECLL function as follows: Ω (x𝑘∣ y𝑀, y𝐸, x𝐸) = −1 𝜎2 𝑧 ( y_𝑀† y𝑀− y†𝑀H˜𝑘x𝑘− x𝑘†H˜†𝑘y𝑀 + x†𝑘Σ˜𝑘x𝑘 ) +𝐿 (x𝑘) (27)

where 𝐿 (x𝑘) = ln 𝑃 (x𝑘) is calculated from

the interleaved extrinsic information 𝝀𝐶,𝑒𝑥𝑡

𝑘 =

[

𝜆𝐶,𝑒𝑥𝑡_{[𝑘𝐺] , . . . , 𝜆}𝐶,𝑒𝑥𝑡_{[(𝑘 + 1) 𝐺 − 1]}]𝑇 _with _respect

to x𝑘 which is generated by the SOVA decoder. Under

the assumption of an ideal interleaver, data symbols are independent of each other, and we obtain

𝐿 (x𝑘) = 𝐺−1_∑

𝑗=0

𝐿 (𝑥 [𝑘𝐺 + 𝑗] = 𝑞𝑗) (28)

where 𝑞𝑗 denotes the value of 𝑥 [𝑘𝐺 + 𝑗], and

𝐿 (𝑥 [𝑘𝐺 + 𝑗] = 𝑞𝑗) is calculated by using (B.4) in Appendix

B. Note that for each EM iteration, we need to choose a hard decisionx𝑘 ∈ {1, −1}𝐺 to maximize the ECLL function of

(27). The iterative EM procedures can help converge to the local optimal values ofx𝑘andw. Finally, in order to pass soft

information to the SOVA decoder, the extrinsic a posteriori LLR𝝀𝐷,𝑒𝑥𝑡

𝑘 =

[

𝜆𝐷,𝑒𝑥𝑡_{[𝑘𝐺] , . . . , 𝜆}𝐷,𝑒𝑥𝑡_{[(𝑘 + 1) 𝐺 − 1]}]𝑇 _is

generated at the final EM iteration as the output of the EM

y _Mˆ xi i x ˆ M i x M MMSE-based Channel Estimator One-Tap Equalizer Viterbi Decoder for TURBO-EM receiver for ML-EM receiver CSI of

Fig. 5. Initialization procedure for ML-EM and TURBO-EM receivers.

detector. From [27] and (B.3), we get 𝜆𝐷,𝑒𝑥𝑡_{[𝑘𝐺 + 𝑗]} = ln𝑃 ( 𝑥[𝑘𝐺+𝑗]=+1∣y𝑀) 𝑃 ( 𝑥[𝑘𝐺+𝑗]=−1∣y𝑀)− 𝜆 𝐶,𝑒𝑥𝑡_{[𝑘𝐺 + 𝑗]} ≈ max x𝑘∈Ω+𝑗 { −1 𝜎2 𝑧 y𝑀 − ˜H𝑘x𝑘 2 +1 2x𝑇𝑘∖{𝑗}𝝀𝐶,𝑒𝑥𝑡𝑘∖{𝑗} } − max x𝑘∈Ω−𝑗 { −1 𝜎2 𝑧 y𝑀 − ˜H𝑘x𝑘 2 +1 2x𝑇𝑘∖{𝑗}𝝀𝐶,𝑒𝑥𝑡𝑘∖{𝑗} }(29)

where Ω+_𝑗 denotes the set for which the 𝑗𝑡ℎ entry of x𝑘

is ”+1”; Ω−

𝑗 is defined similarly, and the vectors x𝑘∖{𝑗}

and 𝝀𝐶,𝑒𝑥𝑡_{𝑘∖{𝑗}} are obtained by omitting the 𝑗𝑡ℎ entry of x𝑘

and 𝝀𝐶,𝑒𝑥𝑡_𝑘 , respectively. The extrinsic LLR 𝝀𝐷,𝑒𝑥𝑡_𝑘 is then converted into soft bits using (26), modeled as the output of an AWGN channel with unit mean and variance𝜎2

𝐶, deinterleaved

throughΠ−1_{, and passed to the SOVA decoder. The variance}

𝜎2 𝐶 is estimated as 𝜎2 𝐶 =_𝑁1 𝐼 𝑁𝐼 ∑ 𝑖=1 (∣𝜇 [𝑖]∣ − 1)2 (30)

where𝜇 [𝑖] indicates the soft value of the coded bits ranging between −1 and +1, and 𝑁𝐼 represents the interleaver size.

It is mentioned in [28] that the Gaussian assumption is not satisfied at the beginning of TURBO iterations, but it becomes a good approximation as the number of iterations increases. After the SOVA decoder produces the soft information by considering the ML path and its strongest competitor in the trellis diagram, the receiver progresses toward the next TURBO iteration until a preset maximum number of iteration, 𝑁𝑇 𝐵, is reached.

C. Initial Setting and Channel Estimation Update

Fig. 5 depicts the block diagram for initialization of the two receivers. The initial CE is performed through the use of pilot tones and improved via the decided data symbols. Let X𝑃 be a diagonal matrix whose diagonal elements are

obtained from the stacked vector of𝐽 pilot tones on subcarri-ers{𝑃0, . . . , 𝑃𝐽−1} within the OFDM symbol. Applying the

MMSE-based CE method, we obtain [29] ˆ M = FΨ−1 𝑃 F†𝑃X†𝑃y𝑃 (31) where Ψ𝑃 = F†𝑃X†𝑃X𝑃F𝑃 +(𝜎𝑧2+ 𝜎𝐼𝐶𝐼2 ) I𝐿, y𝑃 and F𝑃

are defined similar to y and F, respectively, but here related to subcarriers{𝑃0, . . . , 𝑃𝐽−1} only. By invoking central limit

theorem, the ICI energy𝜎2

(7)

TABLE I

COMPUTATIONCOMPLEXITY. (EXAMPLE:𝐺 = 4, 𝑄 = 4, 𝛾 = 1, 𝐿 = 6, 𝑁 = 256AND_𝑁_𝐺_{= 64)}

ML-EM receiver and TURBO-EM receiver

Number of Complex

Multiplica-tions Example Unit

ICI canceller (2𝑄) 𝐺 2_{(2𝑄 + 1)} ₁₁₅₂ /ML (or TURBO) iteration /group EM detector 5 (2𝑄 + 3)3𝐺3₊ 𝐺2[_36𝑄2₊(_{108 + 2}𝛾𝐺+1)_𝑄 +(81 + 2𝛾𝐺+1)]₊ 𝐺[(2𝛾𝐺+2_{+ 4})_{𝑄 + 3 × 2}𝛾𝐺 +6] 447208 /ML (or TURBO) iteration /EM iteration /group Precalculation ofCww & Φ†_Φ (2𝑄 + 3)2_{(𝐿 + 1) 𝐺}2₊ (2𝑄 + 3) 𝐿𝐺 13816 /group Eq.(29) 2𝛾(𝐺−1)[(2𝑄 + 1) 𝐺2 + (2𝑄 + 2) 𝐺 + 1] 1480 /TURBO iteration /group

Method I and Method II in [19]

Method I 𝐿3+ 2𝐿2𝑁𝐺+ 2𝐿𝑁𝐺+ 2𝑁𝐺2

+𝑁3_{+ 2𝑁}2_{+ 𝑁} 16922328 /iteration

Method II _𝑁3_{+ 2𝑁}2_{+ 𝑁} _{16908544 /iteration}

[4]. Subsequently, an one-tap equalizer is used for data de-tection, and a decision-directed approach is carried out to initialize the two receivers. For the ML-EM receiver, decided data symbols together with the pilot tones are used to generate a new channel estimate ˆM by using (31) and then produce an updated decision symbol ¯x𝑖, while for the TURBO-EM

receiver, much more reliable decision symbols are generated by the Viterbi decoder. At the first TURBO iteration of the TURBO-EM receiver,𝝀𝐶,𝑒𝑥𝑡_𝑘 is set to 0, and ¯x𝑖 is used to

replace⌢

x𝑖 in (26). Moreover, we initialize ¯H as ˆM, i.e., set

¯

W = O, since no information on ¯W is available at the first

iteration of the two receivers.

Due to ICI, the initial estimate of M becomes inaccurate as 𝑓𝐷 increases. Hence, Fig. 3(a) and Fig. 4 offer an option

for CE update, in which both pilot tones and decision data symbols are used together. Particularly, for the TURBO-EM receiver, decision data symbols are fed back from the SOVA decoder. At the second and subsequent outer loop iterations, the ICI (reconstructed from 𝑁𝑈 adjacent subcarriers in a

hard or soft manner) to the subcarriers is canceled out in the received signaly, and the MMSE-based CE method is again used to refine the estimate ˆM by setting 𝜎2

𝐼𝐶𝐼 = 0.

D. Computation Complexity

Now let us look at the number of complex multiplications required for the two proposed receivers. Assume that the

operation of 𝐾 × 𝐾 matrix inversion needs 𝐾3 _complex

multiplications. In Table I, the first and second rows indicate the complexity of the ICI canceller and the EM detector,

respectively. The third row gives the complexity for pre-computing Cww and Φ†Φ in the EM detector. Note that the calculation of (28) in the MAP EM detector does not require any multiplications, and the number of multiplications required to calculate (29) is presented in the fourth row of the Table I. For example, in the case of 𝐺 = 4, 𝑄 = 4, 𝛾 = 1 and 𝐿 = 6, the complexity is calculated in the third column as well. Some complexity reduction can be achieved by applying the SAGE algorithm and the Viterbi algorithm, as proposed in [30] and [26] respectively, to simplify (25) and (29) when the values of 𝐺 and 𝛾 are relatively large to dominate the overall computation complexity. Moreover, the complexity of the SOVA decoder is, in general, upper-bounded by two times that of the Viterbi decoder. Finally, the computation complexity of the MMSE-based CE method can be referred from [29] for details.

In addition, Table I lists the complexity measure of the two piecewise linear methods in [20] whose BER performances are demonstrated later for comparison with the ML-EM re-ceiver. For the two methods, it is also required to calculate averaged channel impulse response, and its complexity is not included in the complexity evaluation. The number of complex multiplications for the two methods and the ML-EM receiver is mainly dominated by the values of𝑁3_{+ 2𝑁}2_and

(2𝑄 + 3)3𝐺3_{, respectively. Since there are} _{𝑅 =} 𝑁

𝐺 groups

in the ML-EM receiver and the EM iterations number is in general small, the computation complexity of the proposed ML-EM receiver is approximately a factor (2𝑄+3)_𝑁2_+2𝑁3𝐺2 of the

computation complexity of the two methods. V. COMPUTERSIMULATION

A. Simulation Parameters

Our simulation demonstrates the performance of the two EM-based receivers. The simulation parameters are defined according to the 802.16e OFDM standard [3]. The entire

bandwidth, 5𝑀𝐻𝑧, is divided into 𝑁 = 256 subcarriers

among which 192 subcarriers carry data symbols, 𝐽 = 8

subcarriers transmit pilot tones, and the remaining 56 sub-carriers are virtual subsub-carriers. The BPSK modulation scheme is adopted for the pilot tones, and a pilot subcarrier transmits at the same power level as a data subcarrier. Each OFDM data frame is composed of𝑁𝐹 = 4 0 OFDM data symbols, and the

length of GI is set to 𝑁𝐺 = 64. For the BICM scheme, we

employ a rate-1/2 convolutional code with generator polyno-mial (133,171) represented in octal and a block interleaver with96 rows and 80 columns. Both a conventional two-path channel and an International Telecommunication Union (ITU) Veh.A channel are used in our simulation with path delays uniformly distributed from 0 to 50 sample periods, where the relative path power profiles are set as 0, 0(𝑑𝐵) for the two-path channel and 0, −1, −9, −10, −15, −20(𝑑𝐵) for the ITU Veh.A channel [31]. The fading channel is generated with Jakes model [32]. The user-defined parameters are chosen as 𝑁𝑈 = 10 and 𝑁𝐸𝑀 = 5. Some statistical information such

as power delay profiles (PDP), Doppler frequency, and noise power is assumed to be known to the receivers. Throughout the simulation, the parameter Eb/No is defined as a ratio of

(8)

12 15 18 21 24 27 30 10−5 10−4 10−3 10−2 Eb/No (dB) BER Initialization Iter. 3 (w.o. CE update) Iter. 3 (CE update) Iter. 3 (CE update, PDP est.) Iter. 3 (CSI known) Iter. 3 (CSI and data known) Quasi−static channel (CSI known)

Fig. 6. BER performance of the ML-EM receiver in the ITU Veh.A channel at𝑓𝐷= 0.1. (𝑁𝑀𝐿= 3 and [𝐺, 𝑄] = [4, 4]) 12 15 18 21 24 27 30 10−4 10−3 10−2 Eb/No (dB) BER Two−Path channel [G,Q]=[1,19] Two−Path channel [G,Q]=[2,9] Two−Path channel [G,Q]=[4,4] Veh.A channel [G,Q]=[1,19] Veh.A channel [G,Q]=[2,9] Veh.A channel [G,Q]=[4,4]

Fig. 7. BER performance of the ML-EM receiver with CE update for various [𝐺, 𝑄]. (𝑁𝑀𝐿= 3)

averaged receive bit energy to the power spectral density of noise.

As a benchmark, the performance curve with ideal initializa-tion, labeled as CSI and data known, serves as a performance lower bound, and the results obtained with ideal CSI, denoted as CSI known, is provided for reference. We also include the performance curve of the one-tap equalizer in quasi-static channels, under the assumption of ideal CSI. In addition, the two piecewise linear methods, referred to as Method I and Method II in [20], are examined to compare with the proposed ML-EM receiver. For the two methods, the initial CSIs are found directly through IDFT-based channel estimation using pilot tones and the subsequent estimates are calculated based on the decision-feedback methodology. For fair comparison, we also simulate the BER performance of the two methods when the IDFT-based channel estimation is replaced with the EM algorithm in [23] and the MMSE-based channel estimation in [29], respectively. 0.05 0.075 0.1 0.125 0.15 0.175 0.2 10−4 10−3 10−2 f_D BER

ML−EM (CE update) Method I (IDFT−based CE) Method II (IDFT−based CE) Method I (MMSE−based CE) Method II (MMSE−based CE) Method I (EM−based CE) Method II (EM−based CE)

Fig. 8. BER versus𝑓𝐷 for the ML-EM receiver and the two piecewise

linear methods in the ITU Veh.A channel. (Eb/No= 30𝑑𝐵, [𝐺, 𝑄] = [4, 4] and𝑁𝑀𝐿= 3)

B. Simulation Results

Fig. 6 shows the BER performance of the ML-EM receiver in the ITU Veh.A channel at 𝑓𝐷 = 0.1. The parameter of

[𝐺, 𝑄] is set to [4, 4]. It is observed from Fig. 6 that after three iterations, the ML-EM receivers with or without CE update achieve much better performance than the same receiver at the initialization stage. For the ML-EM receiver with CE update, the required Eb/No at BER= 10−3 _{is almost the same as that}

for an one-tap equalizer in quasi-static channels. However, its BER performance is about3𝑑𝐵 worse than that based on ideal CSI knowledge, and for the case without CE update, an error floor occurs at BER= 2 × 10−3 _{in the high Eb/No}

region. When compared with the lower bound, there is still

an Eb/No gap of 4.5𝑑𝐵 at BER= 10−3 _{for the ML-EM}

receiver with CE update. Clearly, this gap is due to the error propagation effect. It is worth noting that the performance lower bound in Fig. 6 comes very close to the theoretical matched-filter bound found in [5]. Since time-variant channels introduce diversity gains, this performance lower bound is superior to the performance of the one-tap equalizer in quasi-static channels. In realistic scenario, the PDP of wireless channels is not known a priori at the receiver. In order to check the usefulness of our proposed schemes, we demonstrate the performance of the ML-EM receiver by assuming that the CSI at the preamble of each OFDM data frame is known and the power gain of the corresponding channel impulse response is used as the PDP. Without loss of generality, the estimated CSI could be very accurate because the number of pilot tones in the preamble is usually large enough. Our simulation result shows that there is no significant performance loss when the PDP is estimated in this way. Fig. 7 addresses the impact of group size on the BER performance, and the number of 𝑄 is selected to keep (2𝑄 + 1) 𝐺 ≈ 39 for fair comparison. As expected, joint detection with more subcarriers will ease the error propagation effect and attain better performance, and the improvement eventually saturates as the size of 𝐺 increases. To further improve the performance, channel decoding is

(9)

5 7 9 11 13 10−6 10−5 10−4 10−3 10−2 10−1 Eb/No (dB) BER Initialization Iter. 1 (CE update) Iter. 4 (CE update)

Iter. 4 (CE update and PDP est.) Iter. 4 (CSI and data known) Iter. 4 (CE update) [G,Q]=[2,9]

Fig. 9. BER performance of the TURBO-EM receiver in the ITU Veh.A channel at𝑓𝐷= 0.1. (𝑁𝑇 𝐵= 4 and [𝐺, 𝑄] = [4, 4]) 5 7 9 11 13 10−6 10−5 10−4 10−3 10−2 10−1 Eb/No (dB) BER f_D= 0.05, [G,Q]=[4,2] f_D= 0.1, [G,Q]=[4,4] f D= 0.2, [G,Q]=[4,4] f_D= 0.3, [G,Q]=[4,4] f_D= 0.4, [G,Q]=[4,4]

Fig. 10. BER performance of the TURBO-EM receiver in the ITU Veh.A channel for various𝑓𝐷. (𝑁𝑇 𝐵= 4)

needed to effectively handle the error propagation such as the method adopted in the TURBO-EM receiver. In Fig. 8, we examine the impact of𝑓𝐷on the BER performance of the

ML-EM receiver at Eb/No= 30𝑑𝐵 in ITU-Veh.A channel, and its BER performance is compared with that of the two methods in [20]. Four iterations are carried out in these two methods, and additional iterations yield no performance improvement. Although the ML-EM receiver performs worse when Doppler effect becomes more severe, it can still achieve a BER of 3 × 10−3 _at _𝑓_𝐷 _{= 0.2. Both the Method I and Method II}

are inferior to the ML-EM receiver no matter what kinds of channel estimation schemes are used. Obviously, due to the requirement of using some guard tones, the two methods with the IDFT-based channel estimation, originally proposed in [20], incur substantial performance loss. Even with the EM algorithm in [23] as the channel estimation scheme, the two methods perform worse than the ML-EM receiver since ICI estimation and data detection are not jointly designed in these two methods. 5 10 15 20 25 30 −55 −50 −45 −40 −35 −30 −25 −20 −15 −10 Eb/No (dB) MSE (dB) Iter. 1 (ML−EM) Iter. 2 (ML−EM) Iter. 3 (ML−EM) Lower bound (ML−EM) Iter. 1 (TURBO−EM) Iter. 2 (TURBO−EM) Iter. 3 (TURBO−EM) Lower bound (TURBO−EM)

TURBO−EM Receiver

ML−EM Receiver

Fig. 11. MSE performance of channel estimation in the ITU Veh.A channel. (𝑓𝐷= 0.1)

Fig. 9 shows the BER performance of the TURBO-EM re-ceiver in the ITU Veh.A channel at𝑓𝐷= 0.1. With the BICM,

the effect of error propagation is effectively suppressed. We see from Fig. 9 that after four iterations, the receiver with CE update achieves a performance gap with respect to the

lower bound by less than 1𝑑𝐵 at BER= 10−5_{, and the}

receiver exhibits a remarkable improvement as compared with the initialization stage. However, when CSI is not updated (which is not depicted here), the performance of the receiver deteriorates remarkably, although it is still better than that of the initialization stage. Hence, in order to achieve a good performance, it is necessary to refine the CSI, especially when the number of pilot tones is small and the normalized MDF is large. From Fig. 9, we also observe that the receivers with 𝐺 = 2 and 𝐺 = 4 have nearly identical BER performance. Besides, even when the PDP is unknown and estimated from the preamble, the TURBO-EM receiver still performs very well and there is only slight degradation in BER performance at Eb/No= 13𝑑𝐵. Fig. 10 depicts the BER performance of the TURBO-EM receiver in the ITU Veh.A channel for various 𝑓𝐷. We note that the TURBO-EM receiver shows a robust

BER performance against fading up to 𝑓𝐷 = 0.2, and the

performance gap, compared with the case of 𝑓𝐷 = 0.05,

is only about 0.3𝑑𝐵 at BER=10−5_{. The receiver, however,}

displays an irreducible BER error floor when𝑓𝐷is beyond0.3,

and a more complicated ICI model, e.g. the model suggested in [10], could be used to get better performance. Fig. 11 shows the normalized mean square error (MSE) performance of CE in the ITU Veh.A channel at𝑓𝐷= 0.1 for the Eb/No operating

ranges of the two receivers. We observe that for both receivers, the MSE decreases as the number of iterations increase, and performance gain is due to the reduction of ICI in the CE update. In fact, the performance improvement in MSE is about 7𝑑𝐵 to 10𝑑𝐵 for the Eb/No operating ranges and it eventually saturates after three iterations. For calibration purpose, the lower bound on MSE performance under the assumption of no decision error in quasi-static channels is also simulated and shown in this figure. We observe that the lower bound is

(10)

attainable for the TURBO-EM receiver, while for the ML-EM receiver, there is a significant performance gap especially at high Eb/No as compared with the low bound.

VI. CONCLUSIONS

We have investigated two EM-based iterative receivers for OFDM and BICM/OFDM systems in doubly selective chan-nels. Based on the proposed EM algorithm for data detection, both receivers use groupwise processing with ICI cancellation to reduce computation complexity and to explore time diver-sity inherent in time-variant channels. For OFDM systems, the ML-EM receiver significantly outperforms the conventional one-tap equalizer and the two piecewise linear methods in [20], and the BER performance at𝑓𝐷= 0.05 even approaches

the BER performance without Doppler effect. Compared with the matched-filter bound, an Eb/No gap appears because of the error propagation effect. For BICM/OFDM systems, a TURBO-EM receiver, which iterates between the MAP EM detector and the SOVA decoder, is then introduced. This receiver effectively solves the error propagation problem, and it attains a performance close to the lower bound in terms of BER. Simulation results indicate that in order to attain a good performance, the CE update is required when we use low-density pilot tones at high Doppler frequencies. As a final remark, a group size of two to four is large enough to guarantee an acceptable performance under practical channel conditions.

APPENDIXA. CALCULATION OF(18)AND(19) Let us first define an AR channel model

ℎ [𝑙, 𝑁 − 1] = 𝛼ℎ [𝑙, 0] + 𝑢 (A.1) where𝛼 is the parameter of the AR model and 𝑢 represents a complex Gaussian random variable, independent ofℎ [𝑙, 𝑛], with zero mean and variance 𝜎2

𝑢. From Jakes model, 𝛼 is

evaluated by 𝛼 = 𝐸 [ℎ [𝑙, 𝑁 − 1] ℎ∗_{[𝑙, 0]] = 𝐽} 0 ( 2𝜋𝑓𝐷(𝑁 − 1) 𝑁 ) (A.2) where 𝐽0(⋅) is the zeroth order Bessel function of the first

kind,𝑓𝐷 denotes the normalized MDF. Recall from (2) that

we have 𝐸 [ℎ [𝑙, 𝑛]] = 0 and 𝐸[ℎ2_{[𝑙, 𝑛]}] _{= Ξ}_𝑙_{. Following}

the energy conservation rule in (A.1), the variance of𝑢 can be calculated as𝜎2

𝑢=

(

1 − 𝛼2)_Ξ_𝑙_{. According to (A.1), (A.2)}

and (3), we find the slope of the𝑙𝑡ℎ tap over the duration of one OFDM symbol as

𝑎 [𝑙, 1] = 1

𝑁 − 1(ℎ [𝑙, 𝑁 − 1] − ℎ [𝑙, 0]) (A.3) = _{𝑁 − 1}1 ((𝛼 − 1) ℎ [𝑙, 0] + 𝑢)

and its mean and variance is calculated by

𝐸 [𝑎 [𝑙, 1]] = 0 (A.4) 𝐸[∣𝑎 [𝑙, 1]∣2]= (𝛼 − 1)2Ξ𝑙+ ( 1 − 𝛼2)_Ξ_𝑙 (𝑁 − 1)2 (A.5) = 2 (1 − 𝛼) Ξ𝑙 (𝑁 − 1)2

Besides, it is reasonable to assume that the slopes of channel taps are independent of each other, i.e., 𝐸 [𝑎 [𝑙, 1] 𝑎∗_[𝑙′_{, 1]] =}

0 if 𝑙 ∕= 𝑙′ _since _{ℎ [𝑙, 𝑛]}′_{s for different 𝑙}′_{s are independent.}

Consequently, we get (w = Fs):

𝝁_w= 0 (A.6)

Cww= FCssF† (A.7)

where Css = 𝐸[ss†] is a diagonal matrix with the 𝑙𝑡ℎ diagonal entry given by 2(1−𝛼)Ξ𝑙

(𝑁−1)2 .

APPENDIXB. CALCULATION OF(28) Using the definition of LLR, we have

𝐿 (𝑥 [𝑘𝐺 + 𝑗] = +1)

= 𝜆𝐶,𝑒𝑥𝑡_{[𝑘𝐺 + 𝑗] − ln}(_𝑒0_{+ 𝑒}𝜆𝐶,𝑒𝑥𝑡_{[𝑘𝐺+𝑗]}) (B.1)

𝐿 (𝑥 [𝑘𝐺 + 𝑗] = −1) = − ln(𝑒0_{+ 𝑒}𝜆𝐶,𝑒𝑥𝑡_{[𝑘𝐺+𝑗]})

(B.2) The calculation of (B.1) and (B.2) can be simplified by using the rule:

ln∑

𝑗

𝑒𝑎𝑗 _{≈ max}

𝑗 𝑎𝑗 (B.3)

When applied, straightforward manipulation yields 𝐿 (𝑥 [𝑘𝐺 + 𝑗] = 𝑞) = { − ln 2, if 𝜆𝐶,𝑒𝑥𝑡_{[𝑘𝐺 + 𝑗] = 0} min(𝑞𝜆𝐶,𝑒𝑥𝑡_{[𝑘𝐺 + 𝑗] , 0})_{, otherwise} (B.4) where𝑞 is either +1 or −1. REFERENCES

[1] R. van Nee and R. Prasard, OFDM for Wireless Multimedia Communi-cations. Artech House, 2000.

[2] E. Akay and E. Ayanoglu, “Achieving full frequency and space diversity in wireless systems via BICM, OFDM, STBC and Viterbi decoding,” IEEE Trans. Commun., vol. 54, pp. 2164-2172, Dec. 2006.

[3] IEEE Std. 802.16e-2005 and IEEE 802.16-2004/Cor1-2005, “Part 16: Air Interface for Fixed and Mobile Broadband Wireless Access Sys-tems,” IEEE-SA Standards Board, Tech. Rep., 2006.

[4] Y. Li and L. J. Cimini, Jr., “Bounds on the interchannel interference of OFDM in time-varying impairments,” IEEE Trans. Commun., vol. 49, pp. 401-404, Mar. 2001.

[5] X. Cai and G. Giannakis, “Bounding performance and suppressing inter-carrier interference in wireless mobile OFDM,” IEEE Trans. Commun., vol. 51, pp. 2047-2056, Dec. 2003.

[6] M.-X. Chang, “A novel algorithm of inter-subchannel interference self-cancellation for OFDM systems,” IEEE Trans. Wireless Commun., vol. 6, pp. 2881-2893, Aug. 2007.

[7] A. Seyedi and G. J. Saulnier, “General ICI self-cancellation scheme for OFDM systems,” IEEE Trans. Veh. Technol., vol. 54, pp. 198-210, Jan. 2005.

[8] H.-C. Wu, X. Huang, Y. Wu, and X. Wang, “Theoretical studies and efficient algorithm of semi-blind ICI equalization for OFDM,” IEEE Trans. Wireless Commun., vol. 7, pp. 3791-3798, Oct. 2008.

[9] P. Schniter, “Low-complexity equalization of OFDM in doubly selective channels,” IEEE Trans. Signal Process., vol. 52, pp. 1002-1011, Apr. 2004.

[10] T. Wang, J. G. Proakis, and J. R. Zeidler, “Techniques for suppression of intercarrier interference in OFDM systems,” in Proc. IEEE Wireless Commun. and Networking Conf., Mar. 2005, pp. 39-44.

[11] G. Li, H. Yang, L. Cai, and L. Gui, “A low-complexity equalization technique for OFDM system in time-variant multipath channels,” in Proc. IEEE Vehicular Technology Conf., Oct. 2003, pp. 2466-2470.

(11)

[12] J. Gao and H. Liu, “Low-complexity MAP channel estimation for mobile MIMO-OFDM systems,” IEEE Trans. Wireless Commun., vol. 7, pp. 774-780, Mar. 2008.

[13] Y. Choi, P. Voltz, and F. Cassara, “On channel estimation and detection for multicarrier signals in fast and selective Rayleigh fading channels,” IEEE Trans. Commun., vol. 49, pp. 1375-1387, Aug. 2001.

[14] W. G. Jeon, K. H. Chang, and Y. S. Cho, “An equalization technique for orthogonal frequency-division multiplexing systems in time-variant multipath channels,” IEEE Trans. Commun., vol. 47, pp. 27-32, Jan. 1999.

[15] K. Kim and H. Park, “A low complexity ICI cancellation method for high mobility OFDM systems,” in Proc. IEEE Vehicular Technology Conf., May 2006, pp. 2528-2532.

[16] S. Kim and G. Pottie, “Robust OFDM in fast fading channel,” in Proc. IEEE GLOBECOM, Dec. 2003, pp. 1074-1078.

[17] A. Gorokhov and J. P. Linnartz, “Robust OFDM receivers for dispersive time-varying channels: equalization and channel acquisition,” IEEE Trans. Commun., vol. 52, pp. 572-583, Apr. 2004.

[18] S. Tomasin, A. Gorokhov, H. Yang, and J. P. Linnartz, “Iterative interference cancellation and channel estimation for mobile OFDM,” IEEE Trans. Wireless Commun., vol. 4, pp. 238-245, Jan. 2005. [19] K. Chang, Y. Han, J. Ha, and Y. Kim, “Cancellation of ICI by Doppler

effect in OFDM systems,” in Proc. IEEE Vehicular Technology Conf., May 2006, pp. 1411-1415.

[20] Y. Mostofi and D. C. Cox, “ICI mitigation for pilot-aided OFDM mobile systems,” IEEE Trans. Wireless Commun., vol. 4, pp. 765-774, Mar. 2005.

[21] C. P. Robert and G. Casella, Monte Carlo Statistical Methods. Springer-Verlag, 1999.

[22] B. Lu, X. Wang, and Y. Li, “Iterative receivers for space-time block-coded OFDM systems in dispersive fading channels,” IEEE Trans. Wireless Commun., vol. 1, pp. 213-225, Apr. 2002.

[23] T. Y. Al-Naffouri, “An EM-based forward-backward Kalman filter for the estimation of time-variant channels in OFDM,” IEEE Trans. Signal Process., vol. 55, pp. 3924-3930, July 2007.

[24] K. Muraoka, K. Fukawa, H. Suzuki, and S. Suyama, “Channel esti-mation using differential model of fading fluctuation for EM algorithm applied to OFDM MAP detection,” in Proc. Personal, Indoor and Mobile Radio Commun., Sep. 2007, pp. 1-5.

[25] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice-Hall, 1993.

[26] F. Peng and W. E. Ryan, “A low-complexity soft demapper for OFDM fading channels with ICI,” in Proc. IEEE Wireless Commun. and Networking Conf., Apr. 2006, pp. 1549-1554.

[27] B. M. Hochwald and S. ten Brink, “Achieving near-capacity on a multiple-antenna channel,” IEEE Trans. Commun., vol. 51, pp. 389-399, Mar. 2003.

[28] C. Berrou and A. Glavieux, “Near optimum error correcting coding and decoding: turbo-codes,” IEEE Trans. Commun., vol. 44, pp. 1261-1271, Oct. 1996.

[29] O. Edfors, M. Sandell, J. J. van de Beek, S. K. Wilson, and P. O. Borjesson, “Analysis of DFT-based channel estimators for OFDM,” Wireless Personal Commun., vol. 12, pp. 55-70, Jan. 2000.

[30] J. A. Fessler and A. O. Hero, “Space-alternating generalized expectation-maximization algorithm,” IEEE Trans. Signal Process., vol. 42, pp. 2664-2677, Oct. 1994.

[31] J. Laiho, A. Wacker, and T. Novosad, Radio Network Planning and Optimisation for UMTS. Wiley, 2002.

[32] W. C. Jakes, Microwave Mobile Communications. Wiley, 1974.

Meng-Lin Ku was born in Taoyuan, Taiwan. He

received the B.S., M.S. and Ph.D. degrees in Com-munication Engineering from National Chiao Tung University, Hsinchu, Taiwan, in 2002, 2003 and 2009 respectively. Between 2009 and 2010, he held post-doctoral positions at National Chiao Tung Uni-versity and Harvard UniUni-versity. In August 2010, he joined the Department of Communication Engineer-ing, National Central University, as an Assistant Professor. His current research interests are in the area of next-generation mobile and wireless com-munications, cognitive radios, and optimization for radio access.

Wen-Chuan Chen was born in Penghu, Taiwan,

R.O.C. He received the B.S. and M.S. degree in the Department of Communication Engineering from Yuan Ze University in 2006 and National Chiao Tung University in 2008, respectively. She is cur-rently working in the Institute for Information In-dustry, Taipei, Taiwan.

Chia-Chi Huang was born in Taiwan, R.O.C. He

received the B.S. degree in electrical engineering from National Taiwan University in 1977 and the M.S. and ph.D. degrees in electrical engineering from the University of California, Berkeley, in 1980 and 1984, respectively.

From 1984 to 1988, he was an RF and Com-munication System Engineer with the Corporate Research and Development Center, General Electric Co., Schenectady, NY, where he worked on mobile radio communication system design. From 1989 to 1992, he was with the IBM T.J. Watson Research Center, Yorktown Heights, NY, as a Research Staff Member, working on indoor radio communication system design. Since 1992, he has been with National Chiao Tung University, Hsinchu, Taiwan, and currently as a Professor in the Department of Electrical Engineering.