A Lattice-Reduction-Aided Max-Log List Demapper for Coded MIMO Receivers

(1)

A Lattice-Reduction-Aided Max-Log List Demapper

for Coded MIMO Receivers

Tung-Jung Hsieh and Wern-Ho Sheen, Member, IEEE

Abstract—The max-log list demapper has been widely employed

in the implementations of a coded multiple-input–multiple-output (MIMO) receiver, where only a candidate list of signal vectors is examined in the likelihood-ratio calculation to reduce complexity. Traditionally, the candidate list is generated in the original-lattice domain, which, unfortunately, results in severe degradation in the performance of the demapper if the channel is in ill-condition. In this paper, two new lattice-reduction-aided max-log list demappers are proposed, i.e., one for an iterative receiver and the other for a noniterative receiver. With similar complexity, the proposed demappers provide significant gains over existing demappers, par-ticularly for the cases with a small list size and/or under a spatially correlated channel, due to the new algorithms for the generation of the candidate list. In addition, for the iterative receiver, the prior information coming out of the decoder is exploited to lower the complexity of the demapper.

Index Terms—Coded multiple-input–multiple-output (MIMO)

receiver, lattice reduction (LR), max-log list demapper.

I. INTRODUCTION

M

ULTIPLE-INPUT–MULTIPLE-OUTPUT (MIMO) tech-nology has been widely employed to improve the per-formance of wireless communications in a rich-scattered fading environment [1], [2]; by using multiple antennas at both the transmitter and the receiver, it is able to provide diversity gain, array gain (power gain), and/or degree-of-freedom gain over the single-input–single-output counterpart. MIMO technology along with channel coding (coded MIMO) has been widely adopted in today’s wireless broadband standards, including IEEE 802.16e [3] and 3GPP-LTE [4].

In a coded-MIMO system, the optimal maximum a posteriori (MAP) receiver is often too complex to be implemented in practice [5]–[7]. Instead, a suboptimal receiver that consists of a separate demapper and decoder is regarded as a more practical design and has been widely implemented in real systems. In such a receiver, the a posteriori probability (APP) demapper that calculates the true log-likelihood ratio (LLR) of coded bits has the best performance, but with complexity exponentially growing with the number of MIMO layers and/or the size of signal constellation.

Manuscript received August 8, 2012; accepted July 21, 2013. Date of publication August 15, 2013; date of current version February 12, 2014. The review of this paper was coordinated by Prof. Y. L. Guan.

T.-J. Hsieh is with the Department of Electrical and Computer Engineer-ing, National Chiao Tung University, Hsinchu 300, Taiwan (e-mail: fishdon. cm94g@nctu.edu.tw).

W.-H. Sheen is with the Department of Communications Engineering, Na-tional Chung Cheng University, Chiayi 621, Taiwan (e-mail: whsheen@ccu. edu.tw).

Digital Object Identifier 10.1109/TVT.2013.2278029

One common way to reduce the complexity of the APP demapper is to use the max-log demapper in which a simplified LLR, called max-LLR, is calculated by just including the signal vectors that have the MAP probability in each of the signal set associated with the code bit 0 and 1, respectively [8], [15]. Another popular way to reduce the complexity is to use the list demapper, where only a list of candidate signal vectors (rather than all) is examined in the LLR calculation [8]–[20]. The method of using a candidate list can be applied to both the APP and max-log demappers [8], [10]. In this paper, we are concerned with the design of the max-log list demapper.

In the literature, max-log list demapping can be done in the original-lattice [8]–[18], [35]–[37] or the lattice-reduced domain [19]–[26]. In the original-lattice domain, the demappers can be roughly classified into the depth-first [8]–[10], [35] and the breadth-first [11]–[18], [36], [37] type of methods; since the symbols of different MIMO layers are independent, the demapper can be implemented in a complexity-friendly way, particularly for the breadth-first type of methods [12]–[14], [36], [37]. Nevertheless, in the methods of the original-lattice domain, an increase in the list size is often necessary for a given performance if the channel is ill-conditioned [29], [30], and that significantly increases the complexity of demappers. To overcome this problem, the lattice-reduction-aided (LR-aided) list demappers have been proposed [19]–[26], with [19]– [25] focusing on the noniterative receiver and [26] on the iterative receiver. By exchanging the extrinsic information back and forth between the demapper and the decoder, the iterative receiver is regarded as a practical way to nearly achieve the performance of the optimal MAP receiver [5]–[7].

In [19]–[21], the candidate list of the max-log list demapper is generated after a linear filtering of the received signal in the lattice-reduced domain, whereas in [22], the successive cancelation of multilayer interference was employed instead so as to improve performance. The complexity in [22] was further lowered in [23]–[25]. In [26], a novel LR-aided successive-interference-cancelation-based list demapper was proposed; by exploiting the regularity property of the original-lattice domain constellation, the utilization of prior information was realized in the original-lattice domain.

In this paper, two new LR-aided max-log list demappers are proposed, i.e., one for an iterative receiver and the other for a noniterative receiver. Due to the new methods of producing the candidate list, which is produced in the breadth-first manner after a successive cancelation of multilayer interference in the lattice-reduced domain, the proposed demappers are superior to the existing methods under similar complexity; the gain is particularly prominent for the cases with small list size

0018-9545 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

(2)

Fig. 1. Considered spatially multiplexed MIMO system.

and/or under a spatially correlated channel. For the iterative receiver, the prior information coming out of the decoder is also exploited to lower the complexity of the demapper.

This paper is organized as follows. Section II describes the system and channel models. Section III reviews the concept of LR and the list-based max-log demapper. In Section IV, the new LR-aided demapper is presented, and simulation results and conclusions are given in Section V and Section VI, respectively.

II. SYSTEM ANDCHANNELMODELS

Consider the bit-interleaved coded-MIMO system with nl

transmit and nr≥ ntreceive antennas in Fig. 1. The source bit

sequence b is fed into the channel encoder to generate the coded bit sequence c. Then, after interleaver π, coded bit sequence c is grouped per m= log. 2M bits, which is mapped to a symbol

in constellationS of size M. A total of ntsymbols are collected

to form signal vector s = [s1, . . . , snt]

T _{∈ S}₌. _Snt_{, where s}

n

is the nth layer symbol to be transmitted from antenna n,Snt is the nt-fold Cartesian product of S, and [•]T denotes the

transpose of a matrix or vector. The symbols{sn} are assumed

to be independent and identically distributed random variables with zero mean and variance σ2

s. That is, E [ssH] = σs2· Int, where E[•] denotes taking expectation, Int is the nt× nt identity matrix, and [•]H _{denotes the Hermitian transpose of}

a matrix or vector.

For a flat-faded channel, the received signal vector is given by

y= [y. 1, . . . , ynr]

T _{= Hs + z} ₍₁₎

where H is an nr× ntchannel matrix, and z = [z1, . . . , znt]

T

is a complex Gaussian noise vector with zero mean and covari-ance matrix E{zzH} = σ2z· Inr. The widely used correlated channel model H = J1/2r FJ1/2t will be adopted in this paper,

where F consists of zero-mean uncorrelated complex Gaussian coefficients with unit variance, and Jtand Jr are the spatial

correlation matrices due to transmit and receive antennas, respectively [27]–[29].

The demapper calculates the max-LLR φo, based on the sequence of received signal vector y and prior information φa if there is any. The extrinsic information φeis obtained by sub-tracting prior information φa from φo, and after deinterleaver π−1, φe becomes prior information ψa to the decoder. For the iterative receiver, the outer decoder outputs the estimated source bit sequence ˆb if the maximum number of iterations is reached; otherwise, it outputs ψo, which, after subtracting

prior information ψa, gives extrinsic information ψe, and that becomes prior information φato the demapper after interleaver π. A more detailed description on the iterative receiver can be found in [5]–[10].

III. PRELIMINARIES

Here, the concept of LR will be reviewed first, followed by the calculation of max-LLR in the lattice-reduced domain. A. Lattice Reduction

Let hn∈ Cnr denote the nth column of channel matrix H,

where C is the set of complex numbers. As in [31]–[34], the lattice spanned by H is defined as the set of points

L=. _n t n=1 hnzn, zn ∈ Z ={Hz, z ∈ Znt} ₍₂₎ whereZ is the set of Gaussian integers, and H is called a basis of lattice L. Note that there are infinite number of bases of L, and two bases ˜H and H are said to span the same lattice

L if and only if ˜H = HT for a unimodular matrix T [31]–

[34]. A Gaussian integer matrix T is said to be unimodular if |det(T)| = 1, where det(T) denotes the determinant of T.

Lattice reduction is a procedure to find reduced basis ˜H

from H with a set of more orthogonal (shorter) column vectors than those of H. LR has been widely applied to improve the performance of the MIMO systems [20]–[22], [28], [31], [32]; with a set of more orthogonal column vectors, the effect of noise enhancement can be reduced in detection/decoding in the lattice-reduced domain. The Lenstra–Lenstra–Lovász (LLL) algorithm (real-valued LLL [33] or complex-valued LLL [34]) has been known as one of the effective LR algorithms to find reduced basis ˜H in a polynomial time [33].

In terms of reduced basis ˜H = HT, the signal model in (1)

can be rewritten as

y = ˜H˜s + z (3)

where ˜s= [˜. s, . . . , ˜snt]

T _{= T}−1_s.1 _{Note that in the}

lattice-reduced domain, the signal constellation is not as regular as in the original-lattice domain, and transformation T−1introduces dependence between layers in ˜s. Constellation irregularity and

dependence between layers in the lattice-reduced domain in-crease the complexity of the demapper, as will be detailed in Section IV.

B. Calculation of Max-LLR in Lattice-Reduced Domain Using reduced basis ˜H, the max-LLR for the ith bit of the

nth layer symbol is calculated by ˜ φo_n,i= max ˜ s∈˜S1 n,i −1 σ2 z y − ˜H˜sF 2 + log Pr(˜s) − max ˜ s∈˜S0 n,i −1 σ2 z y − ˜H˜sF 2 + log Pr(˜s) (4)

1_{Throughout this paper, the signal constellation has been shifted and scaled}

such that the constellation points are contiguous integers in both original-lattice and lattice-reduced domains [22], [31].

(3)

where ˜Sb_n,iis the set of ˜s with the ith bit of the nth layer symbol

sn equal to b, b∈ {1, 0}, Pr(˜s) is the a priori probability

of ˜s, andxF denotes the Frobenius norm of x. Using QR

factorization on ˜H, we have

˜

H = ˜Q ˜R (5)

where ˜Q is an nr× ntmatrix with orthonormal columns, and

˜

R is an nt× ntupper triangular matrix. Premultiplying y by

˜

QH_{, one has}

˜

v= ˜. QHy = ˜R˜s + ˜z (6) where ˜z= ˜. QH_{z, and φ}o

n,ican be rewritten as

˜ φo_n,i= max ˜ s∈˜S1n,i −1 σ2 z ˜v − ˜R˜sF 2 + log Pr(˜s) − max ˜ s∈˜S0n,i −1 σ2 z ˜v − ˜R˜sF 2 + log Pr(˜s) . (7) As is clear in (7), since all signal vectors in ˜S= ˜. S1_n,i∪ ˜S0_n,i have to be examined, the calculation of max-LLR may still be too complex in practical applications. To reduce the com-plexity, one practice is just to examine a candidate list Ψ with |Ψ| < | ˜S| in the calculation [22]–[25], where |X| denotes the cardinality of set X. In this way, (7) can be simplified with

˜

S1n,i and ˜S

0

n,i replaced by Ψ1n,i and Ψ0n,i, where Ψbn,i⊂ Ψ

is the set of ˜s in the list with the ith bit of the nth layer

symbol sn equal to b. Since only a part of signal vectors

(the candidate list Ψ) are involved in the calculation of max-LLR in the list demapper, those ˜s’s with small D(˜s)= (. ˜v −

˜

R˜sF)2− σz2log Pr(˜s) should be selected and put into Ψ for

better performance.

It is worth noting that if all signal vectors, that is,S (or ˜S), are examined in the calculation of max-LLR, then there is no difference whether the calculation is done in the original-lattice or lattice-reduced domain. If only a subset of signal vectors (a candidate list) is to be examined, on the other hand, it would be beneficial to do the calculation in the lattice-reduced domain because of less noise enhancement.

IV. PROPOSEDLATTICE-REDUCTION-AIDED

MAX-LOGLISTDEMAPPERS

Two new LR-aided max-log list demappers are proposed here, i.e., one for an iterative receiver and the other for a nonit-erative receiver. After obtaining a reduced basis, a candidate list

Ψ is generated in the lattice-reduced domain, followed by the

calculation of max-LLR using the candidate list. The main dif-ference between the demappers for the iterative receiver and the noniterative receiver is the generation of the candidate list Ψ. A. Generation of Ψ With Prior Information

(Iterative Receiver)

The generation of the candidate list Ψ directly from ˜S is usually too complex to be practical for the case of large Mnt because of constellation irregularity and dependence between layers in the lattice-reduced domain. Here, for the case with prior information (after the first demapping in the iterative

receiver), a two-step algorithm is proposed to reduce the com-plexity. Initially, a setT that contains the K_T most probable signal vectors in ˜S is constructed based on the prior information coming out of the decoder. Then, the candidate list Ψ is generated fromT in a breadth-first search (BFS) manner. Since ˜

s = T−1s,T can be generated by searching over s ∈ S.

Let cn,i denote the ith bit of the nth layer symbol sn

and Pr(cn,i) the a priori probability of cn,i coming out of

the decoder.2 Under the assumption of ideal interleaving, the probability of signal vector s can be expressed as

Pr(s) = nt n=1 Pr(sn) = nt n=1 m i=1 Pr(cn,i) (8) where Pr(cn,i) =

expcn,iψn,io

1 + expψo

n,i

(9)

and ψo

n,iis the max-LLR of cn,icoming out of the decoder. By

disregarding the constant term Πnt

n=1Πmi=1(1 + exp(ψn,io ))−1in

(8), the metric DT(s) = nt n=1 m i=1 cn,iψn,io (10)

is used for the generation ofT , as is detailed below.

First, a binary tree of depth ntm is constructed with the left

and right nodes at level l = (n− 1)m + i indicating cn,i= 0

and cn,i= 1, respectively, where 1≤ n ≤ nt,1≤ i ≤ m. The

root is indexed as level 0, and the other levels are indexed starting from n = 1, i = 1, . . . , m, n = 2, i = 1, . . . , m, and so forth. Furthermore, a path from level 0 to level l is denoted by the l-tuple s(l)= [c. 11, c12, . . . , cn,i] and is associated with the

path metric D_T s(l) = n j=1 i k=1 cj,k· ψj,ko (11) where s(ntm)_{is an n}

tm-tuple that corresponds to signal vector

s (or ˜s). Second, starting from the root node, the tree is searched

in a breadth-first manner, where the most K_T probable paths at each level are retained according to (11). If the number of nodes is less than K_T, all paths are retained. Finally, at level ntm, the

most probable K_T tuples of length ntm bit are used to construct

T after being mapped to the lattice-reduced domain.

To generate the candidate list Ψ⊂ T , we first define the following metric DΨ(˜s(k)) for the partial signal vector ˜s(k)=.

[˜snt−k+1, ˜snt−k+2, . . . , ˜snt] T_{, k = 1, . . . , n} t[see (7)]: DΨ ˜ s(k) _. =˜v(k)− ˜R(k)˜s(k) F 2 − σ2 zlog Pr ˜ s(k) = nt n=nt−k+1 ˜vn− ˜rn,ns˜n− nt j=n+1 ˜ rn,js˜j 2 − σ2 zlog Pr ˜s(k) (12) 2_{For notation simplicity, P}

r(x) is used to denote the probability that random

(4)

where ˜v(k)= [˜vnt−k+1, . . . , ˜vnt]

T_{, ˜}_r

n,j= [ ˜R]n,j, and ˜R(k) is

a k× k matrix consisting of ˜rn,j, n = nt−k+1, . . . , nt, j = nt− k + 1, . . . , nt, that is, ˜R(k)is the lower right square

sub-matrix of R. Using the chain rule, i.e., P˜ r(˜s(k)) =

Πnt n=nt−k+1Pr(˜sn|˜snt, . . . , ˜sn+1), (12) becomes DΨ ˜s(k) = nt n=nt−k+1 ⎛ ⎝ v˜n− ˜rn,n˜sn− nt j=n+1 ˜ rn,j˜sj 2 −σ2 zlog Pr(˜sn|˜snt, . . . , ˜sn+1) ⎞ ⎠ = ˜vnt−k+1− ˜rnt−k+1,nt−k+1˜snt−k+1 − nt j=nt−k+2 ˜ rnt−k+1,js˜j 2 − σ2 zlog Pr(˜snt−k+1|˜snt, . . . , ˜snt−k+2) + DΨ ˜s(k−1) (13) where Pr(˜snt|˜snt,. . . ,˜snt+1) . = Pr(˜snt), and DΨ(˜s (0)₎_{= 0. In}.

addition, for a specific α(k)= [αnt−k+1, αnt−k+2, . . . , αnt]

T_,

the term Pr(˜snt−k+1|˜snt, . . . , ˜snt−k+2) is calculated as Pr(˜snt−k+1= αnt−k+1|˜snt= αnt, . . . , ˜snt−k+2= αnt−k+2) = ˜ s∈T ˜ s(k) =α(k) Pr(˜s) ˜ s∈T ˜ s(k−1) =α(k−1) Pr(˜s) (14) where k = 2, . . . , nt.

In the following, a BFS algorithm is proposed for the gen-eration of the candidate list Ψ. First, before carrying out the search, an nt-level tree, i.e., GT, is constructed from all ˜s∈ T ,

where a node at level k is associated with a partial signal vector

α(k). Since the constellation in the lattice-reduced domain is irregular, the number of nodes per level is not the same, and that leads to different search complexity at different levels. Second, the algorithm given in Table I is applied on the tree GT using the metric in (13). Starting from the root, the best KΨnodes are

retained at each level, and if the number of nodes is less than KΨ, all the nodes are retained. At the end, the retained nodes

at level nt are output as Ψ, which will be used for the LLR

calculation. In the algorithm,T (α(k)_{) is given by}

T α(k) = ˜ s| ˜s ∈ T , ˜s(k)= α(k) (15)

which is employed for an easy calculation of Pr(˜s(k)). As is

shown in (14), the calculation of Pr(˜s(k)) is more complicated

than that in the original-lattice domain because of layer depen-dence in the lattice-reduced domain.

TABLE I

GENERATION OFΨ WITHPRIORINFORMATION

B. Generation of Ψ Without Prior Information

For the case of no prior information (noniterative receiver or the first demapping in the iterative receiver), since Pr(˜s) is the

same for all ˜s∈ ˜S, the max-LLR φo

n,iis calculated by φon,i= max ˜ s∈Ψ1 n,i −1 σ2 z ˜v − ˜R˜sF 2 − max ˜s∈Ψ0 n,i −1 σ2 z ˜v − ˜R˜sF 2 . (16) In addition, due to lack of prior information, there is no way to useT to reduce complexity as in the previous section. The generation of Ψ consists of two steps, i.e., candidate signal-vector search followed by legitimacy check. In the step of candidate signal-vector search, we adopt the idea of treating neighboring complex integers around an estimate of a layer symbol as potential legitimate candidates [19]–[23]. In partic-ular, two popular BFSs are employed, i.e., one is based on a predetermined expansion set [19]–[22], and the other is on the on-demand expansion along with distributed sorting [23]. Since the search method in [23] was developed with a real-signal model, for notational consistency of text, only the method using a predetermined expansion set (denoted as ProposedP DES)

is discussed in detail here; the complexity and performance of the method using on-demand expansion along with dis-tributed sorting (denoted as Proposed_ODEDS) are evaluated in Sections IV-D and V, respectively.

To begin with, we define a metric for the partial vector ˜

t(k)_{= [˜}. _t

nt−k+1, . . . , ˜tnt]

T _{at level k as in (17), shown at}

the bottom of the next page, where e(˜t(k−1)₎_{= (˜}. _v

nt−k+1− nt

j=nt−k+2r˜ntk+1,j˜tj)/˜rnt−k+1,nt−k+1, and DΨ(˜t

(5)

TABLE II

GENERATION OFΨ WITHNOPRIORINFORMATION

With the metric in (17), Ψ can be generated in a breadth-first manner, as summarized in Table II. In particular, at level k, setsV(˜t(k−1))=. {˜t(k)|˜tnt−k+1= round(e(˜t

(k−1)_{)) + β, β}_∈ B} and V(k) .

=∪˜t(k−1)V(˜t(k − 1)) are formed, where B is a set of complex integers closest to the origin with size|B| = J selected to extend the search space [22]. For example, B = {0, ±1 ± j} for J = 5. The best KΨ ˜t(k)’s inV(k) are then

searched and retained based on the metric in (17). It is worth noting that in [22], the term|˜rnt−k+1,ntk+1|

2_{in the calculation}

of (17) is omitted, leading to large performance degradation, as will be shown in Section V.

At level nt, the legitimacy of ˜t(nt)’s is checked before being

considered to be retained in Ψ. This is a very important step for the LR-aided demapper with no prior information since ˜tk

in ˜t(k)_{may not be a legitimate layer symbol. In our method, all}

the searched KΨJ ˜t(nt)’s at level ntare transformed back to

the original-lattice domain for legitimacy check; this way, the size of Ψ can be kept as large as possible, and that improves the performance of the demapper; the improvement is around 0.3 dB from our experience of extensive simulations. The extra complexity incurred by checking KΨJ ˜t(nt)’s is quite

marginal, as compared with checking KΨJ ˜t(nt)’s as in [22],

because only addition and comparison operations are involved in legitimacy checking [20], [26], [32]. In addition, checking legitimacy in the original-lattice domain can take advantage of the regular signal constellation to reduce complexity.3

C. Calculation of Max-Log Likelihood Ratio

After the generation of Ψ, the list-based max-LLRs are calculated for the cases with and without prior information [using (7) for the former with ˜S1_n,i and ˜S0_n,ireplaced by Ψ1

n,i

and Ψ0

n,i, respectively, and (16) for the latter]. Note that it may

happen that Ψb_n,i=∅, b ∈ {1, 0} for some n and i. In such a case, the LLR needs to be clipped to a fixed value [8], [15]. In particular, as proposed in [8], it is plausible to set the clipped LLR as sign(0.5− b) · 8 if Ψb

n,i=∅, where sign(x) denotes

the operation taking sign of x.

D. Complexity Analysis

The number of real multiplications (NRMs) will be used to evaluate the complexity of the proposed demapper because its hardware complexity is much higher than other operations, such as addition/subtraction and taking the maximum of operands. Since the complexity of a demapper is different for the case with or without prior information, the two cases are to be discussed separately.

For the case of no prior information, the demapping op-erations are divided into three parts: initialization, metric 3_{In [25], legitimacy is checked in the lattice-reduced domain such that}

additional complexity is required to determine the constellation boundary.

DΨ ˜ t(k) =˜v(k)− ˜R(k)˜t(k) F 2 = nt n=nt+k−1 ˜vn− ˜rn,n˜tn− nt j=n+1 ˜ rn,j˜tj 2 =|˜rnt−k+1,nt−k+1| 2_· r˜nt−k+1,n1 t−k+1 ⎛ ⎝˜vnt−k+1− nt j=nt−k+2 ˜ rnt−k+1,jt˜j ⎞ ⎠ − ˜tnt−k+1 2 + DΨ ˜_t(k−1) =|˜rnt−k+1,nt−k+1| 2_· _e_˜_t(k−1)_{− ˜t} nt−k+1 2+ DΨ ˜ t(k−1) (17)

(6)

TABLE III

COMPLEXITY OFDIFFERENTDEMAPPERSWITHNOPRIORINFORMATION

TABLE IV

COMPLEXITY OFDIFFERENTDEMAPPERSWITHNOPRIORINFORMATION(nt= nr= 4, M = 16)

calculation in the list generation, and max-LLR calculation. Because the detailed calculation is quite tedious, only the final results are summarized here. Table III shows the complexity of the proposed demapper along with those of the other methods considered in this paper. Note that the complexity of doing LR and (sorted) QR factorizations is not included in Table III because they are only performed once at the beginning of a packet. It is worth noting that, in the initialization stage, the demapper proposed in [26], i.e., the fixed-candidates algorithm (FCA) demapper, has additional complexity since it has to do LR-aided zero forcing over the received signal. Furthermore, the method in [36] is the same as that in [12] in terms of NRMs. For easy comparison, the complexity for the case of nt= 4, nr= 4, M = 16 [16-quadrature amplitude modulation

(QAM)] is given in Table IV. As can be seen, the complexity of all list demappers is much lower than that of the max-log demapper. For the proposed demapper, there is a bit difference in complexity between Proposed_{P DES} and Proposed_ODEDS. Moreover, with KΨ= 3 (KΨ= 5), the complexity of the

pro-posed demapper is similar to that in [22] with KΨ= 8 (KΨ=

13) and in [26] with KΨ= 5 (KΨ= 8), respectively. As will

be shown in Section V, however, the proposed demapper has a much superior performance, particularly with Proposed_{P DES}.

For the case with prior information, the complexity of the proposed method is variable because the number of possible ˜

s(k)_{’s in each of the search levels is different for different}

T ’s. As a result, the cumulative distribution function (CDF) of

the required NRMs is employed for evaluating the complexity. Fig. 2 shows the complexity of the proposed demapper for the case of nt= 4, nr= 4, and M = 16. As can be seen,

the complexity increases with K_T due to that for larger K_T,

Fig. 2. Complexity of the proposed demapper with prior information under different K_T’s.

more possible ˜s(k)_{’s per search level and all of them will be}

visited. As will be shown in the following section, K_T = 256 is generally sufficient to achieve desirable performance. Table V shows the complexity comparison of the proposed demapper with K_T = 256 and those in [16], [26], and [36] under one iteration (including one list search and LLR calculation). If the receiver performs n iterations, the complexity is n times that listed in Table V. The proposed demapper has lower average NRMs, but the maximum NRMs is higher than the others.

In the given complexity analysis, the complexity of finding T is not taken into consideration. Finding matrix T is often

(7)

TABLE V

COMPLEXITY OFDIFFERENTDEMAPPERSWITHPRIORINFORMATION (nt= nr= 4, M = 16)

TABLE VI SIMULATIONPARAMETERS

treated as part of preprocessing, and its complexity is shared by symbols within the coherence time of the channel [34]. In slowly time-varying channels, the complexity of finding T is usually negligible since it can be used for a long time [19], [22]. However, if the channel is rapidly time varying, the complexity of finding T becomes important. This problem was addressed in [34] where an efficient LR algorithm, which is called complex LLL (CLLL), was proposed to achieve a reduction of as large as 50% in the complexity of the traditional algorithm without sacrificing any performance. Therefore, with CLLL, it is also feasible to use the LR-aided schemes even in rapidly time-varying channels [34]. CLLL is adopted in this paper.

V. SIMULATIONRESULTS

Here, the proposed LR-aided list demappers are simulated and compared with the existing demappers in the literature. Table VI summarizes the system parameters adopted in the simulations. The extended channel model in [31] is employed for all simulations, that is, we use the model yext= [y. T 0Tnt]

T_,

and Hext= [H. T, σzInt]

T_{, where 0}

nt is the length-nt zero-entry column vector. The CLLL [34] is then carried out over

Hextto obtain reduced basis ˜Hext = HextT. Carrying out the

LR on the extended channel Hext generally results in a better

Fig. 3. BER comparison of the proposed demapper with that in [19].

performance [22], [31], [35]. For other non-LR-aided schemes, the sorted QR factorization is applied on Hext[12], [17], [37].

For the spatially correlated channels, Jt and Jr are set to be

(18), shown at the bottom of the page, which are the same as those in [29].

Fig. 3 compares the Proposed_{P DES} demapper with that in [19] in a noniterative receiver for both the spatially correlated and uncorrelated channels under different list sizes. Simple quaternary phase-shift keying (QPSK) is employed because the method in [19] would be too complex to be practical for higher order modulations. In addition, J = 5 is used for Proposed_{P DES}. As is shown, a gain of more than 2 dB is provided by ProposedP DESfor the cases of KΨ= 3 and KΨ=

5 and about 1 dB for the case of KΨ= 10 at the BER = 10−5,

and by using KΨ= 10, the proposed demapper performs nearly

as the max-log demapper.

Figs. 4–6 compare different demappers in the noniterative receiver with 16-QAM and rate-1/2 convolutional code. In all methods, J = 9 is used except for [37], where J ’s are set to 16, 9, 9, and 9 for search levels 1–4, respectively. The parameters (KΨ and J ) of the considered demappers are

selected for similar complexity (see Table IV). In addition to Proposed_{P DES}, Proposed_ODEDSis also simulated, and results are given in Figs. 4 and 5 for comparison purposes. As shown in Fig. 4, for the spatially uncorrelated channels, Proposed_{P DES} significantly outperforms the existing demappers with KΨ= 3;

a gain of more than 2 dB is observed at BER = 10−5. For the case of KΨ= 5, ProposedP DES performs similarly to those

in [12] and [16] and has a gain of more than 2 dB over the methods in [22] and [26]. As compared with ProposedP DES,

Proposed_ODEDSsuffers from a loss of 0.3–0.5 dB in this case. For the spatially correlated channels, as is shown in Fig. 5,

Jt= Jr= ⎡ ⎢ ⎣ 1 0. 01 + 0. 7i −0. 47 − 0. 08i 0. 19− 0. 26i 0. 01− 0. 7i 1 0. 01 + 0. 7i −0. 47 − 0. 08i −0. 47 + 0. 08i 0. 01− 0. 7i 1 0. 01 + 0. 7i 0. 19 + 0. 26i −0. 47 + 0. 08i 0. 01− 0. 7i 1 ⎤ ⎥ ⎦ (18)

(8)

Fig. 4. BER comparison of different demappers for spatially uncorrelated channels.

Fig. 5. BER comparison of different demappers for spatially correlated channels.

Fig. 6. BER comparison of the proposed method with that in [36] and [37].

Fig. 7. BER comparison of different demappers for spatially correlated chan-nels with turbo code.

Proposed_{P DES}has a gain of at least 2 dB over other demappers for both cases of KΨ= 3 and KΨ = 5. As compared with

Proposed_{P DES}, the performance loss with Proposed_ODEDS becomes larger; losses of 1.8 and 1.5 dB are observed for KΨ= 3 and KΨ= 5, respectively. In Fig. 6, ProposedP DES

is compared with methods in [36] and [37]. As is seen, for the spatially uncorrelated channels, Proposed_{P DES} outperforms [36] ([37]) by a margin of 5 (1.5) and 6 (3.5) dB for KΨ= 5 and

KΨ= 3, respectively. For the spatially correlated channels, the

gains obtained with the proposed method are more than 10 dB. Fig. 7 shows the simulation results for the noniterative re-ceiver with a turbo code given in Table VI. Eight iterations are performed in the turbo decoder. As can be seen, the proposed method with KΨ= 3 (KΨ= 5) has a gain of 0.6 (0.3) dB over

[12] and [16] and about 0.8 (0.9) dB over [22] with KΨ=

8 (KΨ= 13). As compared with the results in Fig. 5, the gains

obtained with the proposed method become smaller under a powerful channel code. This is something that one might expect because, in this case, the adverse effect of noise enhancement (the effect that the proposed LR-aided demapper intends to remove) can be counteracted by the powerful outer code.

For the iterative receiver, Fig. 8 shows the BER performance of the proposed demapper with different values of K_T. Recall that K_T is the size ofT from which the candidate list Ψ is constructed so as to reduce the complexity of the demapper. As is seen, there is almost no performance loss with K_T ≥ 256. In addition, the iterative receiver indeed provides significant gains over the noniterative receiver; a gain of more than 3 dB is observed with only one iteration, and an additional 1-dB gain is obtained with two iterations at BER = 10−5. The gain, however, diminishes with more than two iterations. In the following, K_T = 256 is used for the proposed demapper, and two iterations are performed for all the considered methods.

Fig. 9 compares the proposed demapper in the iterative receiver with that in [16], the FCA demapper in [26], and modified K-best demapper in [36] under the spatially corre-lated and uncorrecorre-lated channels. For the spatially uncorrecorre-lated channels, both the proposed demapper and that in [16] have a

(9)

Fig. 8. Effect of K_T on the BER performance.

Fig. 9. BER comparison of the proposed demapper with that in [16], [26], and [36].

similar performance and are about 0.8 and 1.9 dB better than FCA and [36] at BER = 10−5 for KΨ= 5, respectively. For

the case of KΨ= 3, the proposed demapper is about 0.2 dB

over that in [16] and is about 1.4 dB better than FCA and [36]. For the spatially correlated channels, the gains provided by the proposed demapper become more prominent, i.e., for KΨ= 5,

0.4 dB over [16], 2.1 dB over FCA, and 6.5 dB over [36] and for KΨ= 3, 1.5 dB over [16], 5 dB over FCA, and more than

6 dB over [36]. Complexity-wise, as compared in Table V, the average NRMs of the proposed demapper are lower than that of [16], FCA, and [36], whereas the maximum NRM is higher with the proposed demapper. In Fig. 9, for a fair comparison, the number of iterations is set to two for all methods. For some other methods, however, more than two iterations may be needed to have the best performance.

In Fig. 10, the number of iterations is set large enough to investigate the achievable performance of the considered methods in this paper for the spatially correlated channels with KΨ= 5. As can be seen, the performance improvement beyond

seven iterations is diminishing for all methods. In particular,

Fig. 10. BER performance under spatially correlated channels. In each sub-figure, curves from right to left represent the performance of no iteration to the ninth iteration, respectively.

12.4, 13.7, 14.2, and 11.5 dB are needed for [16], [26], [36], and the proposed method to achieve BER = 10−5, respectively. In other words, the proposed method is at least 0.9 dB better than others. The performance improvement offered by the proposed method is expected to be larger for a smaller KΨ, as was already

shown in Fig. 9.

VI. CONCLUSION

In this paper, a new LR-aided list demapper for coded MIMO receivers has been proposed, where the candidate list is gener-ated after cancelation of multilayer interference in the lattice-reduced domain. The newly designed metric, legitimacy check, and the use of prior information in the candidate-list generation entail the proposed demapper a superior performance for both the iterative and noniterative receivers, as compared with the existing methods. The performance improvement is particularly prominent for the cases with a small list size and/or under a spa-tially correlated channel. A two-step algorithm is also devised to reduce the complexity of the demapper for application in the iterative receiver, where the prior information from the decoder is exploited to improve performance.

REFERENCES

[1] J. Foschini, Jr., “Layered space–time architecture for wireless commu-nication in a fading environment when using multi-element antennas,”

Bell Labs Tech. J., vol. 1, no. 2, pp. 41–59, Summer 1996.

[2] E. Telatar, “Capacity of multi-antenna Gaussian channels,” Eur. Trans.

Telecommun., vol. 10, no. 6, pp. 585–595, Nov./Dec. 1999.

[3] IEEE Standard for Local and Metropolitan Area Networks, Part 16: Air

Interface for Broadband Wireless Access Systems, IEEE Std.

802.16e-2009, 2009.

[4] 3GPP Technical Specification Group Radio Access Network, Evolved

Universal Terrestrial Radio Access (E-UTRA), Multiplexing and channel coding (release 9), 3GPP TS 36.212, Sep. 2011.

[5] J. Hagenauer, “The turbo principle: Tutorial introduction and state of the art,” in Proc. Int. Symp. Turbo Codes Related Topics, Brest, France, Sep. 1997, pp. 1–11.

(10)

[6] M. Tüchler, R. Koetter, and C. Singer, “Turbo equalization: Principles and new results,” IEEE Trans. Commun., vol. 50, no. 5, pp. 754–767, May 2002.

[7] M. Sellathurai and S. Haykin, “Turbo-BLAST for wireless communica-tions: Theory and experiments,” IEEE Trans. Signal Process., vol. 50, no. 10, pp. 2538–2546, Oct. 2002.

[8] B. Hochwald and S. ten Brink, “Achieving near-capacity on a multiple antenna channel,” IEEE Trans. Commun., vol. 51, no. 3, pp. 389–399, Mar. 2003.

[9] H. Vikalo, B. Hassibi, and T. Kailath, “Iterative decoding for MIMO channels via modified sphere decoding,” IEEE Trans. Wireless Commun., vol. 3, no. 6, pp. 2299–2311, Nov. 2004.

[10] S. Bäro, J. Hagenauer, and M. Witzke, “Iterative detection of MIMO transmission using a list sequential (LISS) detector,” in Proc. IEEE Int.

Conf. Commun., Anchorage, AK, USA, May 2003, pp. 2653–2657.

[11] K.-W. Wong, C.-Y. Tsui, and S.-K. Cheng, “A VLSI architecture of a K-best lattice decoding algorithm for MIMO channels,” in Proc. IEEE

Int. Symp. Circuits. Syst., May 2002, pp. III-273–III-276.

[12] Z. Guo and P. Nilsson, “Algorithm and implementation of the K-best sphere decoding for MIMO detection,” IEEE J. Sel. Areas Commun., vol. 24, no. 3, pp. 491–503, Mar. 2006.

[13] S. Mondal, M. Eltawil, and N. Salama, “Architectural optimizations for low-power K-best MIMO decoders,” IEEE Trans. Veh. Technol., vol. 58, no. 7, pp. 3145–3153, Sep. 2009.

[14] C.-H. Liao, T.-P. Wang, and D. Chiueh, “A 74.8 mW soft-output detec-tor IC for 8× 8 spatial-multiplexing MIMO communications,” IEEE J.

Solid-State Circuits, vol. 45, no. 2, pp. 411–421, Feb. 2010.

[15] C. Mehlführer, D. Seethaler, G. Matz, and M. Rupp, “An iterative MIMO-HSDPA receiver based on a K-Best-MAP algorithm,” in Proc. IEEE

Global Telecommun. Conf., Nov. 2006, pp. 1–5.

[16] L. Ruyet, T. Bertozzi, and B. Özbek, “Breadth first algorithms for APP detectors over MIMO channels,” in Proc. IEEE Int. Conf. Commun., Jun. 2004, pp. 926–930.

[17] L. Milliner, E. Zimmermann, R. Barry, and P. Fettweis, “A framework for fixed complexity breadth-first MIMO detection,” in Proc. IEEE Int. Symp.

Spread Spectr. Tech. Appl., Bologna, Italy, Aug. 2008, pp. 129–132.

[18] B. Reid, J. Grant, and P. Kind, “Low complexity list detection for high-rate multiple-antenna channels,” in Proc. IEEE Int. Symp. Inf. Theory, Yokohama, Japan, Jun. 2003, p. 273.

[19] P. Silvola, K. Hooli, and M. Juntti, “Suboptimal soft-output MAP detector with lattice reduction,” IEEE Signal Process. Lett., vol. 13, no. 6, pp. 321– 324, Jun. 2006.

[20] L. Milliner and R. Barry, “CTH09-4: A lattice-reduction-aided soft de-tector for multiple-input multiple-output channels,” in Proc. IEEE Global

Telecommun. Conf., San Francisco, CA, USA, Nov. 2006, pp. 1–5.

[21] V. Ponnampalam, D. McNamara, A. Lillie, and M. Sandell, “On gener-ating soft outputs for lattice-reduction-aided MIMO detection,” in Proc.

IEEE Int. Conf. Commun., Glasgow, U.K., Jun. 2007, pp. 4144–4149.

[22] X.-F. Qi and K. Holt, “A lattice-reduction-aided soft demapper for high-rate coded MIMO-OFDM systems,” IEEE Signal Process. Lett., vol. 14, no. 5, pp. 305–308, May 2007.

[23] M. Shabany and G. Gulak, “The application of lattice-reduction to the K-best algorithm for near-optimal MIMO detection,” in Proc. IEEE Int.

Symp. Circuits Syst., May 2008, pp. 316–319.

[24] M. Shabany, K. Su, and G. Gulak, “A pipelined scalable high-throughput implementation of a near-ML K-best complex lattice decoder,” in

Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Mar. 2008,

pp. 3173–3176.

[25] S. Roger, A. Gonzalez, V. Almenar, and M. Vidal, “On decreasing the complexity of lattice-reduction-aided K-best MIMO detectors,” in Proc.

Eur. Signal Process. Conf., Glasgow, U.K., Aug. 2009, pp. 2411–2415.

[26] W. Zhang and X. Ma, “Low-complexity soft-output decoding with lattice-reduction-aided detectors,” IEEE Trans. Commun., vol. 58, no. 9, pp. 2621–2629, Sep. 2010.

[27] P. Kermoal, L. Schumacher, I. Pedersen, E. Mogensen, and F. Frederiksen, “A stochastic MIMO radio channel model with experimental validation,”

IEEE J. Sel. Areas Commun., vol. 20, no. 6, pp. 1211–1226, Aug. 2002.

[28] D. Wübben, V. Küne, and K.-D. Kammeyer, “On the robustness of lattice-reduction aided detectors in correlated MIMO systems,” in Proc.

IEEE Veh. Technol. Conf., Los Angeles, CA, USA, Sep. 2004, vol. 5,

pp. 3639–3643.

[29] G. Barbero and S. Thompson, “Performance of the complex sphere de-coder in spatially correlated MIMO channels,” IET Commun., vol. 1, no. 1, pp. 122–130, Feb. 2007.

[30] G. Barbero and S. Thompson, “Fixing the complexity of the sphere de-coder for MIMO detection,” IEEE Trans. Wireless Commun., vol. 7, no. 6, pp. 2131–2142, Jun. 2008.

[31] D. Wübben, R. Böhnke, V. Kühn, and K. Kammeyer, “Near-maximum-likelihood detection of MIMO systems using MMSE-based lattice reduc-tion,” in Proc. IEEE Int. Conf. Commun., Jun. 2004, pp. 798–802. [32] X. Ma and W. Zhang, “Performance analysis for MIMO systems

with lattice-reduction aided linear equalization,” IEEE Trans. Commun., vol. 56, no. 2, pp. 309–318, Feb. 2008.

[33] K. Lenstra, W. Lenstra, Jr., and L. Lovász, “Factoring polynomials with rational coefficients,” Math. Ann., vol. 261, no. 4, pp. 515–534, Dec. 1982. [34] H. Gan and H. Mow, “Complex lattice reduction algorithm for low-complexity full-diversity MIMO detection,” IEEE Trans. Signal Process., vol. 57, no. 7, pp. 2701–2710, Jul. 2009.

[35] C. Studer, A. Burg, and H. Bolcskei, “Soft-output sphere decoding: Algo-rithms and VLSI implementation,” IEEE J. Sel. Areas Commun., vol. 26, no. 2, pp. 290–300, Feb. 2008.

[36] J. Ketonen, M. Juntti, and R. Cavallaro, “Performance–Complexity com-parison of receivers for a LTE MIMO-OFDM system,” IEEE Trans.

Sig-nal Process., vol. 58, no. 6, pp. 3360–3372, Jun. 2010.

[37] L. Milliner, E. Zimmermann, R. Barry, and G. Fettweis, “A fixed-complexity smart candidate adding algorithm for soft-output MIMO de-tection,” IEEE J. Sel. Areas Signal Process., vol. 3, no. 6, pp. 1016–1025, Dec. 2009.

Tung-Jung Hsieh was born in Kaohsiung, Taiwan,

in 1978. He received the B.S. degree from National Chiao Tung University (NCTU), Hsinchu, Taiwan, in 2000 and the M.S. degree from National Sun Yat-Sen University, Kaohsiung, Taiwan, in 2004, both in electrical engineering. He is currently working toward the Ph.D. degree in electrical engineering with NCTU.

His research interests include baseband signal pro-cessing of communication systems, lattice-reduction-aided techniques for multiple-input–multiple-output systems, and soft-input–soft-output demapping algorithms.

Wern-Ho Sheen (M’91) received the Ph.D. degree

in electrical engineering from Georgia Institute of Technology, Atlanta, GA, USA, in 1991.

From 1991 to 1993, he was an Associate Re-searcher with Chunghwa Telecom Laboratories, Taoyuan, Taiwan. From 1993 to 2001, he was a Professor with the Department of Electrical Engi-neering, National Chung Cheng University, Chiayi, Taiwan. From 2001 to 2009, he was a Professor with the Department of Communications Engineering, National Chiao Tung University, Hsinchu, Taiwan. From 2009 to 2013, he was a Professor with the Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung, Taiwan. Since 2013, he has been a Professor with the Department of Commu-nications Engineering, National Chung Cheng University. His research interests include communication theory, wireless communication systems, and signal processing for communications.