Detection of multiuser orthogonal space-time block coded signals via ordered successive interference cancellation

(1)

Detection of Multiuser Orthogonal Space-Time Block Coded Signals via

Ordered Successive Interference Cancellation

Jwo-Yuh Wu, Member, IEEE, Chung-Lien Ho, Member, IEEE, and Ta-Sung Lee, Senior Member, IEEE

Abstract— This paper investigates multiuser orthogonal space-time block coded signal detection within the ordered successive interference cancellation (OSIC) framework. Both the zero-forcing and minimum-mean-square-error ordering criteria are considered. When each user terminal is equipped with no more than four transmit antennas, it is shown that orthogonal transmit redundancy leads to an appealing signal ordering property: in each processing layer the transmitted symbols of an arbitrary user are associated with an identical ordering metric. This guarantees the feasibility of (user based) group-wise symbol recovery through the OSIC mechanism. Analytic bit-error-rate performance is given. Computer simulations and flop count evaluations are also provided for comparing the OSIC based solution with existing multiuser detection schemes reported for the considered system.

Index Terms— Multiuser detection, ordered successive inter-ference cancellation (OSIC), array processing, space-time block codes.

I. INTRODUCTION

M

ULTIUSER orthogonal space-time block code (MU-OSTBC) systems [9, chap. 11], [12], [13], can produce multiple fading-resistant links but the co-channel user interfer-ence then becomes the major impairment dominating system performance. Toward signal recovery, one may in general re-sort to the joint maximum-likelihood detection but this usually suffers from intensive computational effort. For MU-OSTBC systems in particular, the algebraic structure of OSTBC is exploited for developing various alternative signal detectors: typical such proposals are the Naguib’s parallel interference cancellation (PIC) approaches [13], and the Stamoulis’s user-wise decoupled detection method [17], [12]. The successive interference cancellation (SIC) scheme, on the other hand, is first suggested in [20], and later in [2], [4], and [18], regarding the trellis coded transmission case. It has also been considered for separating multi-group OSTBC signals, either in a point-to-point environment [8], [22], or from a multiuser setting [10]. The SIC method combined with certain signal ordering mechanism, i.e., the so-called ordered SIC (OSIC) scheme [5], [21], is known to yield a performance advantage over the un-ordered case at the expense of algorithm complexity. Although the OSIC method is a well-recognized solution for space-time signal detection [15, chap. 7], the related study in the MU-OSTBC scenario is nonetheless not fully investigated. In

Manuscript received July 28, 2004; revised March 27, 2005 and July 5, 2005; accepted July 13, 2005. The associate editor coordinating the review of this letter and approving it for publication was Dr. Helmut B¨oelcskei. This work is sponsored by the National Science Council under joint grant NSC 94-2213-E-002-033 and NSC 95-2752-E-002-009, the MoE ATU Program of the Ministry of Education, and the MediaTeK research center at the National Chiao Tung University, Taiwan.

The authors are with the Department of Communication Engineering, National Chiao Tung University, 1001, Ta Hsueh Road, Hsinchu, Taiwan (e-mail:{jywu, u8713812}@cc.nctu.edu.tw, tslee@mail.nctu.edu.tw)

Digital Object Identifier 10.1109/TWC.2006.04514.

particular, the impacts of the OSTBC structural property on the algorithm characteristics, e.g., the optimal ordering strategy as well as the associated possible low-complexity algorithm implementation, remain important aspects yet to be addressed. This paper studies the OSIC based signal detection for MU-OSTBC system, in which each user terminal is equipped with L transmit antennas. The underlying analysis builds on the linear matrix modulation representation [9] of codeword matrices and the assumption L ≤ 4. The signal ordering rule can be either zero-forcing (ZF) or minimum-mean-square-error (MMSE) criterion. By exploiting the algebraic structure of OSTBC it is shown that, in each processing layer, the transmitted symbol streams of an arbitrary user are associated with an identical ordering metric. The signal ordering in each stage, therefore, is arranged on a user-wise basis; this property is observed to be no longer true for L > 4. As a result, whenever multiuser interference is present, only a subclass of orthogonal codes allows for joint recovery of per user’s symbols via the OSIC mechanism. Analytic bit-error-rate (BER) results are provided and are testified through numer-ical simulations. The established user-wise ordering property offers a potential advantage of computation reduction based on group-wise data processing; an associated low-complexity algorithm implementation which further exploits the OSTBC structure is derived in [7]. Recently it is reported that the uplink performance of MU-OSTBC systems can be further improved by incorporating the codeword rotation technique [14]. Our discussions below, however, will not take such a codeword mapping into account to better focus on the intrinsic properties introduced by OSTBC. It is noted that group-wise detectors are also proposed in [14] for multiuser signal separation. The MMSE based solution in [14, p-325] is basically a parallel interference suppression scheme but does not exploit the algebraic property of OSTBC; the associated refined version with signal ordering in [14, p-326] is however virtually similar to the one-step PIC method introduced in [13]. The rest of the context is organized as follows. Section II describes the system model. Section III presents the main results, and the related performance analysis is given in Section IV. Section V contains the simulation results. Finally, Section VI is the conclusion.

II. SYSTEMMODEL

A. System Description and Basic Assumptions

Consider an MU-OSTBC system over flat fading channels as shown in Figure 1. The qth user’s symbol stream sq(·),

1 ≤ q ≤ Q, is first parsed into consecutive P -dimensional blocks, and the OSTBC encoder then associates per block of symbols sq,p := sq(p − 1), 1 ≤ p ≤ P , with an L × K 1536-1276/06$20.00 c 2006 IEEE

(2)

Q

Q user Transmitteuser Transmitte STBC Encoder STBC Encoder ( ) s k ( ) s k User-Wis OSIC Detector Receiver Receiver ( ) y t ( ) y t ˆ ( ) s k ˆ ( ) s k Q

Q user Transmitteuser Transmitter STBC Encoder STBC Encoder ( ) s k ( ) s k User-Wise OSIC Detector Receiver Receiver ( ) y t ( ) y t ˆ ( ) s k ˆ ( ) s k

Fig. 1. The schematic diagram of the transceiver.

space-time codeword matrix1

Xq := 2P p=1 Ap˜sq,p, (1) where ˜sq,p = Re {sq,p} for 1 ≤ p ≤ P , ˜sq,p(k) =

Im {sq,p−P} for P +1 ≤ p ≤ 2P , and the matrix Ap∈ CL×K

satisfies2 [9]

AiAHi = IL and AiAHj + AjAHi = 0L for i = j. (2)

The structural properties ofA_p’s specified in (2) immediately implies that Xq is orthogonal. Expression of the codeword

matrix in the linear matrix modulation form (1) has the advantage of unifying both the codeword representation and the problem formulation, regardless of the rate of the OSTBC and hence the symbol constellations used. We assume that N antenna elements are located at the receiver. Let yn(·) be the

received discrete-time data, sampled at the symbol-rate, from the nth receive antenna and define y(·) := [y1(·) · · · yN(·)]T ∈

CN_{. By collecting} _{y(·) over K successive symbol periods}

to form Y := [y(0) · · · y(K − 1)], we have the following space-time data model (assuming that the Q users are symbol synchronized) Y(k) = Q q=1 HqXq+ V, (3) whereHq = h(q)mn

∈ CN ×L _{is the channel matrix from the}

qth user’s antenna array to the receiver and V ∈ CN ×K _{is the}

channel noise matrix. The following assumptions are made in the sequel.

(a) The symbol streams sq(k), 1 ≤ q ≤ Q, are i.i.d. with

zero mean and variance σs2.

(b) The noise V is spatially and temporally white, each

entry being with zero mean and variance σv2.

(c) We assume that L ≤ 4 and hence, according to [19], the symbol block length P ∈ {2, 4}. The proposed scheme is exclusively applicable to this subclass of orthogonal codes.

(d) For 3≤ L ≤ 4 with complex-valued constellations, the half rate codes [19, p-1464] are used.

1_{Since block-by-block transmission is considered, we will not include the} block index and consider only the firstP symbols for notational simplicity.

2_{The notations}_(·)T_,_(·)H_,_I_m_{, and}₀_m_{respectively denote the transpose,}

the complex conjugate transpose, them × m identity matrix, and the m × m zero matrix.

B. Equivalent System Model

In the matrix data model (3), the source symbol blocks are spatially and temporally encoded to form the codeword matrices. To formulate the problem into a standard multiuser detection framework, in what follows we will present an equivalent linear system model in which all users’ symbol blocks are “restored” as the signal of interest. Specifically, let us split each user’s data blocksq:= [sq,1, . . . , sq,P]T and each

received data vectory(i) into the respective real and imaginary parts to obtain ˜s_q := ResT q ImsT q T _{∈ R}_2P , for 1 ≤ q ≤ Q, and ˜y(i) :=ReyT_(i) _Im_yT_(i)T _{∈ R}2N_,

for 0 ≤ i ≤ K − 1. Associated with the qth user’s channel matrixH_q, we define the following matrix

˜ Hq IK⊗ Re {Hq} −Im {Hq} Im {Hq} Re {Hq} ∈ R2KN×2KL_{, (4)}

where the notation ⊗ stands for the Kronecker product; also, witha_l,qdenoting the lth column of the matrix Aqand ˜al,q:=

Re {al,q}T Im {al,q}T T ∈ R2L_{, we define} ˜ A ⎡ ⎢ ⎣ ã1,1 · · · ã1,2P .. . . .. ... ãK,1 · · · ãK,2P ⎤ ⎥ ⎦ ∈ R2KL×2P. (5)

Then the matrix data model (3) can be rewritten, after some manipulations, as the following equivalent linear model for signal detection

yc= Hcsc+ vc, (6)

where y_c := ˜yT_{(0) · · · ˜y}T_{(K − 1)}T _{∈ R}2KN _and _s c :=

˜sT

1 · · · ˜sTQ

T _{∈ R}_{2P Q}

are respectively the split real-valued received data and multiuser symbol block,

Hc:= ˜ H1A · · · ˜˜ HQA˜ ∈ R2KN×2P Q ₍₇₎

is the effective channel matrix, andvc∈ R2KN is the

corre-sponding noise component. It is noted that, upon restoration of the transmitted symbols into a vector (sc in (6)), the

structural information of the space-time codeword matrices is incorporated into the equivalent channel matrix Hc (see (7)).

It is such in-built structure in Hc that will lead to the

user-wise signal ordering property, as will be shown in the next section. To manifest the core ideas, we will hereafter focus on the real-valued symbol case with unit-rate codes; for the complex-valued constellation case, essentially the same results can be obtained and these are included in the appendix.

III. MAINRESULTS

To establish the user-wise ordering property, we shall first consider the ZF ordering criterion, in which the detection order is determined based on the post-detection SNR [21]; the MMSE counterpart can be readily deduced from the ZF results and is provided in the Appendix. At the first stage, the ZF scheme forms the decision statistic of the lth symbol stream as (cf. (6))

(3)

where the weighting vectorglnulls the interference from other

substreams so that

gT

l Hc= eTl, (9)

whereel is the lth unit standard vector in RP Q. The unique

solution of gT

l , which minimizes noise amplification due to

interference nulling and fulfills the constraint (9), is thus the lth row of the pseudo-inverse of Hc, namely,

gT l = eTl HT cHc −1_H_T c, 1 ≤ l ≤ P Q. (10)

Since the noisevc is white and the source symbols are i.i.d.

with variance σs2, the post-detection SNR in the lth decision

component [21, p-297] can be computed from (8)-(10) as

ρ(0)_l := σ

2 s

σv2eTl (HcTHc)−1HTc

2. (11)

It is straightforward to verify that eT l HT cHc −1_H₋₁ c 2 = eT l F−1el, (12) with F := HT cHc∈ RP Q×P Q. (13)

Equations (11) and (12) imply that, given σs2and σv2, the SNR

level depends entirely on the lth diagonal entry of the matrix

F−1_{. In particular, small}_F−1

llimplies large ρ (0)

l , and hence

better detection accuracy in the lth decision component. The optimal detection order at the first stage, therefore, is given by the index 1≤ l ≤ P Q at whichF−1_llis minimal. With the adoption of OSTBC, the matrixF defined in (13) will exhibit a distinctive structure as shown in the next lemma (see [6] for a proof). Based on this result, we can further specify the diagonal entries ofF−1 for determining the optimal index.

For a fixed symbol block length P , denote by O(P ) the set of all P × P real orthogonal designs with constant diagonal entries as specified in [19, p-1458].

Lemma 3.1: Let Fp,q ∈ RP ×P, 1 ≤ p, q ≤ Q, be the

(p, q)th P × P block submatrix of the matrix F defined in (13). Then we have Fq,q = αqIP, and Fp,q ∈ O(P ) for

p = q.

It is noted that the matrix F defined in (13) can be interpreted as the space-time matched-filtered channel matrix in the multiuser case. In view of this point, the assertion

Fq,q = αqIP in Lemma 3.1 reflects that, through space-time

matched filtering, the intra-antenna symbol streams of each user remain decoupled. Also, since the off-diagonal blocks

Fp,q’s account for the effective multiuser interference, the

assertionFp,q∈ O(P ) reveals that the interference signatures

are orthogonal designs. This attractive fact, which is true only for the considered code subclass with L ≤ 4, lays the foundation for proving the user-wise ordering property. Based on Lemma 3.1, we state in the following theorem the central result: the matrix F−1 “inherits” the key features ofF.

Theorem 3.1: For a fixed block length P and number of users Q, let FP(Q) be the set consisting of all invertible

real symmetric P Q × P Q matrices possessing the block-wise orthogonal structures as F shown in Lemma 3.1. Then we haveF−1∈ FP(Q).

Proof: The proof is based on a crucial fact about the orthogonal designs [19]. Specifically, it can be checked by

analytic computations that, if O₁, O2 ∈ O(P ), then so are

O1+ O2 andO1O2, that is,

Fact 1: The set O(P ) is closed under addition and multi-plication. Moreover, for anyO₁∈ O(P ), we have O₁+OT

1 =

γIP for some scalar γ.

With Fact 1, the assertion can be shown by induction on Q and the details are relegated to Appendix I.

Theorem 3.1 implies that each P × P block diagonal submatrix of F−1, sayF−1_q,q, is a scalar multiple ofI_P, i.e.,

F−1

q,q= βqIP, 1 ≤ q ≤ Q. (14)

As a result, the P Q diagonal entries of F−1 assume only Q distinct levels, one for a particular user. The optimal index in the first stage is thus

¯q(0)_{= arg min}

1≤q≤Qβq; (15)

the ZF weighting vectors for separating the ¯q(0)th user’s signal are chosen to be ⎡ ⎢ ⎢ ⎣ gT (q¯(0)−1)P +1 .. . gT (q¯(0)−1)P +P ⎤ ⎥ ⎥ ⎦ = ⎡ ⎢ ⎢ ⎣ eT (q¯(0)−1)P +1 .. . eT (q¯(0)−1)P +P ⎤ ⎥ ⎥ ⎦HTcHc −1_H_T c. (16) To validate the user-wise ordering property in subsequent stages, the contribution of the detected user’s signal (assuming error free) is first cancelled from the received data (6) to obtain a reduced-dimensional signal model for next layer detection. With such a detect-and-cancel process and by following essen-tially the same arguments as in (8)-(10), it can be successively verified that, at the (i +1)th processing layer (1 ≤ i ≤ Q − 1), the optimal detection index is determined by 1≤ l ≤ P (Q−i) at which F−1_i _ll is minimal, where Fi = HTc,iHc,i and

Hc,i ∈ RKN ×P (Q−i) is obtained by deleting i block(s) of

P columns (corresponding to the previously detected signals) from Hc. By construction of Hc,i, it is easy to see that Fi

is directly obtained from F (= HT_cHc) by deleting from it

the i associated block(s) of P rows and P columns. This immediately implies thatFi∈ FP(Q − i), and hence F−1i ∈

FP(Q−i) by Theorem 3.1: the user-wise ordering rule at any

subsequent processing stage is thus preserved. The algorithm flow of the user-wise OSIC detector is outlined in Table I.

Remarks:

i) The major implementation concern of the OSIC al-gorithm is the computations of the pseudo-inverse

HT c,iHc,i

−1_H_T

c,i in each layer for determining the

optimal detection index and for separating the desired signals [4]. In light of this fact, the advantages of the user-wise ordering characteristic are two-fold. For the first, since the norms of the P (Q − i) rows of

HT c,iHc,i

₋₁

HT

c,i, or equivalently, the P (Q − i) the

diagonal entries of F−1_i , assume Q − i different lev-els, it suffices to determine only this level set for choosing the detection index; an exhaustive search can thus be avoided. Second, to compute the corresponding weighting matrices, it is plausible to exploit the signal redundancy (due to OSTBC) for further realizing a layer-wide computational efficacy. In [7] an efficient

(4)

TABLE I ALGORITHM SUMMARY. ( ) ( ) ,0 0 ,0 1 1 2 2 ( ) ( ) ( ) ; ; ; For 0 1 for ZF Criterion 1.

2 / for MMSE Criterion

2. (( 1) 1)th diagonal entry of 3. arg mi i T c c c c c c i i s v i P Q i i i q i i Q q P q σ σ β − − − = = = = ≤ ≤ − ⎧⎪⎪ ⎪⎪ = ⎨⎡⎪_⎪⎢ ₊ ⎤_⎥ ⎪⎣ ⎦ ⎪⎩ = − + = Initialization : H H F F H H y y Recursion : F Q F I Q ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 1 , , , 1 2 2 , , ( ) , ( 1) 1 ( 1) , , n for ZF Criterion 4.

/ 2 for MMSE Criterion

5.

6. , is the decision sli

i i i i i q q Q i T c i c i c i i _T s c i c i v P Q i c i i i q P q P P T c q i c i β σ σ ≤ ≤ − − − − − + − + ⎧⎪⎪ ⎪⎪ = ⎨⎪⎡_⎪_⎢ ₊ ⎤_⎥ ⎪⎣ ⎦ ⎪⎩ ⎡ ⎤ = ⎢_⎢ ⎥_⎥ ⎣ ⎦ = ⋅ H H H G H H I H W G e e s W y Q Q ( ) ( ) ( ) , ( ) , , , 1 , , , 1 , 1 , 1 ce

7. , contains the columns of corresponding to 8. , is obtained by deleting block(s) of columns from

i i i c q i c c q c i c i c q c q T c c i i c i c i i P + + + + = − = y y H s H H s F H H H H ( ) ( ) ,0 0 ,0 1 1 2 2 ( ) ( ) ( ) ; ; ; For 0 1 for ZF Criterion 1.

2 / for MMSE Criterion

2. (( 1) 1)th diagonal entry of 3. arg mi i T c c c c c c i i s v i P Q i i i q i i Q q P q σ σ β − − − = = = = ≤ ≤ − ⎧⎪⎪ ⎪⎪ = ⎨⎡⎪_⎪⎢ ₊ ⎤_⎥ ⎪⎣ ⎦ ⎪⎩ = − + = Initialization : H H F F H H y y Recursion : F Q F I Q ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 1 , , , 1 2 2 , , ( ) , ( 1) 1 ( 1) , , n for ZF Criterion 4.

/ 2 for MMSE Criterion

5.

6. , is the decision sli

i i i i i q q Q i T c i c i c i i _T s c i c i v P Q i c i i i q P q P P T c q i c i β σ σ ≤ ≤ − − − − − + − + ⎧⎪⎪ ⎪⎪ = ⎨⎪⎡_⎪_⎢ ₊ ⎤_⎥ ⎪⎣ ⎦ ⎪⎩ ⎡ ⎤ = ⎢_⎢ ⎥_⎥ ⎣ ⎦ = ⋅ H H H G H H I H W G e e s W y Q Q ( ) ( ) ( ) , ( ) , , , 1 , , , 1 , 1 , 1 ce

7. , contains the columns of corresponding to 8. , is obtained by deleting block(s) of columns from

i i i c q i c c q c i c i c q c q T c c i i c i c i i P + + + + = − = y y H s H H s F H H H H

recursive algorithm is derived, within a more general multiuser context, for fulfilling the aforementioned im-plementation facilities.

ii) It is noted that the user-wise ordering property benefits uniquely from the distinctive structure of F specified in Lemma 3.1, which is true only when L ≤ 4. For 5 ≤ L ≤ 8 (hence with symbol block length P = 8 [19]), the structure of F will gradually deviate as L increases. Indeed, the off-diagonal block Fp,q ∈ R8×8

when L = 5 (or L = 6, respectively) can be shown to consist of four (sixteen) 4× 4 (2 × 2) orthogonal design sub-blocks. For L = 7, 8, there is no particular imbedded structure inFp,q. As a result, for 5≤ L ≤ 8,

the diagonal blocks (with dimension 8× 8) of F−1 will not be proportional to the identity matrix I8: the transmitted symbols from a user will no longer share the same ordering metric, and the user-wise ordering characteristic does not hold. We finally note that the constraint L ≤ 4 does not appear stringent in practice since, to economize the implementation cost and to also tolerate a sufficient antenna spacing, it is usually undesirable to place too many antenna elements on the user terminals.

IV. PERFORMANCEANALYSIS

The performance of the SIC/OSIC method is addressed in many works under the error-propagation free assumption, e.g., [11], among others. The general case in which erroneous decisions occur in each layer is recently discussed in [16], and also in [8] for variable-rate OSTBC systems. This section leverages the BER analysis in [8], and resorts to the technique in [3], to derive closed-form approximate BER at high SNR for the considered MU-OSTBC system; in what follows we assume that the symbol constellations are drawn from M -ary PSK constellations.

For a system of Q OSTBC user terminals, each equipped with L transmit antennas, the exact BER formulas for SIC based detection under a fixed SNR can be evaluated via equations (25)-(27) in [8, p-1207]; these results are derived based on the multi-channel receive performance reported in

[3], as well as the analysis framework for ZF group-wise detection shown in [20]. For 1 ≤ i ≤ M, let us define the set Θi := [(2i − 3)π/M, (2i − 1)π/M); also, with fixed

σ2s and σ2v, denote by Pr

θ ∈ Θi| d, σ2s, σv2

the probabil-ity that the received symbol lies in the region Θi through

maximal-ratio combining d receive branches (explicit formula for Prθ ∈ Θi| d, σ2s, σv2

can be found in [8, p-1205]). For the particular Q = 2 case and equally-probable source, the average BER over the two processing stages are computed, after some manipulations, as

Pb= 1₂ 2 q=1 Pqb, (17) where P1b= 1 log₂_M M i=1 Prθ ∈ Θi| L(N − L), σ2s, σv2 (18) and P2b = z1∈A1 ˆz1∈A1 1 MLlog 2M M i=1 × Prθ ∈ Θi| LN, σ2s, σ2v+ ρ |z1− ˆz1|2 × Prz1→ ˆz1| L(N − L), σs2, σv2 (19) are respectively the BER at the first and the second processing layers; in (19), the L-dimensional vectors z1 and ˆz1 are

respectively the transmitted and the detected symbol vectors in the first stages, ρ is a scaled version of the signal variance σ2s [8, p-1207], and A1 denotes the L-fold Cartesian product

of the constellation set {exp (j2πk/M)}_{0≤k≤M−1}. For high SNR σs2/σv2 1, it can be shown that [3]

Prθ ∈ Θi| d, σs2, σ2v ≈ _2d d M − 1 M log₂_{M sin}2π M 4σs2 σ2v d. (20) The approximation (20) allows us to derive closed-form BER expressions (at high SNR) based on (18) and (19) for further quantifying the achievable diversity order in each processing stage. Specifically, from (18) and (20), it can be seen that, at high SNR, Pb

1 roughly behaves as

P1b ≈ G1σ2s/σ2v

−L(N−L)

for some G1, (21)

confirming that the detected signal in the first stage enjoys a diversity order equal to L(N − L): L due to OSTBC transmit diversity and N − L from the receive diversity. In case that the initial symbol decision is correct, i.e.,z1= ˆz1, we have

Prz1→ ˆz1| L(N − L), σs2, σ2v = 1, (22) and P2b in (19) reduces to [8] P2b= 1 log₂_M M i=1 Prθ ∈ Θi| LN, σs2, σ2v . (23)

Equations (23) and (20) in turn imply, at high SNR, P2b≈ G2σ2s/σv2

−LN

for some G2, (24) that is, an LN -fold diversity gain can be attained in the second stage. However, if erroneous decision occurs so thatz1= ˆz1,

(5)

0 5 10 15 20 10-6 10-5 10-4 10-3 10-2 10-1 SNR (dB) BER Analytic BER (8PSK) Simulated BER (8PSK) Analytic BER (QPSK) Simulated BER (QPSK) Analytic BER (BPSK) Simulated BER (BPSK)

Fig. 2. Theoretical and simulated average BER for different constellations.

it can be deduced from (19)-(20) that, when σ2s/σv2 is large

and ρ |z1− ˆz1|2 is small, P2b≈ ˜G2σ2s/ σv2+ ¯ρ −LN σ2s/σ2v −L(N−L) , (25)

for some ˜G2 and ¯ρ. When the symbol power σ2s is fixed and

let the noise variance σ2v→ 0, equation (25) then becomes

P2b ≈ G3σ2s/σ2v −L(N−L) , where G3= ˜G2σ2s/¯ρ −LN . (26) As a result, whenever inter-layer error propagation takes place, the diversity order in the second layer would be limited to only L(N − L) as in the first stage; by further invoking the results in [8, p-1207], and following the above analysis procedures, it can be shown that, in the general multiuser case and subject to error propagation, the achievable diversity gain per layer is equal to that in the first stage. It is noted that a similar phenomenon has also been proved in [16] regarding an L-input N -output i.i.d. Rayleigh fading channel: imperfect symbol decision limits the diversity orders throughout the layers of an SIC receiver to N − L + 1, which is just the attainable gain in the first layer. In our numerical test for the Q = 2 case (see Simulation-A), the BER curve in the second layer, however, is seen to be very close to the corresponding error-propagation-free performance. A plausible rationale behind this would be that, in the first stage, each user’s signal does already enjoy L-fold transmit diversity due to OSTBC: the imbedded signal reliability improves detection accuracy at the first layer and significantly reduces the effect of error leakage.

Remark: The above analysis does not take into account

signal ordering and thus would only provide a BER perfor-mance upper bound with respect to the optimally ordered case. However, even though signal ordering can lead to certain performance gain, it does not enlarge the diversity order in each stage [11], [16].

V. SIMULATIONRESULTS

This section uses several numerical examples for illustrating the performance of the proposed scheme. We consider a system of two users, each one using the Alamouti’s code [1] (and hence with L = 2 transmit antennas). The propagation

0 2 4 6 8 10 12 14 16 18 20 10-6 10-5 10-4 10-3 10-2 10-1 SNR (dB) BER

Analytic BER (Layer 1) Simulated BER (Layer 1) Analytic BER (Layer 2) Simulated BER (Layer 2)

Analytic BER (Layer 2: Perfect Previous Decision)

Fig. 3. Theoretical and simulated BER per layer (8-PSK).

channel between each user terminal and the receiver is quasi-static i.i.d. Rayleigh fading; the channels are perfectly known at the receiver. In the abscissa of all simulation figures, SNR denotes the ratio σ2s/σ2v.

A. Corroboration of the Theoretical Performance

This simulation demonstrates the predicted BER perfor-mances in Section IV and the corresponding simulated out-comes (N = 4 antenna elements are placed at the receiver). Figure 2 shows the average BER for three different symbol constellations: BPSK, QPSK, and 8-PSK; the theoretical val-ues (cf. (17)-(19)) are computed based on the exact formula of Prθ ∈ Θi| d, σs2, σ2v

given in [8, p-1205] other than the simplified expression (20). As we can see from the figure, the simulated results closely match the theoretical solutions. For the particular 8-PSK case, Figure 3 explicitly depicts the BER at both processing layers; the theoretical BER in the second stage assuming perfect previous symbol decision is also included (this indicates the decision result attained with maximally achievable diversity gain). The figure shows that, in the presence of inter-layer error leakage, the simulated BER in the second stage is almost identical to the corresponding error-free benchmark solution. This tends to imply that the error-propagation effect could be slight (due to OSTBC), and the increase in diversity gain remains largely intact.

B. Comparisons with Existing Solutions

This simulation compares the user-wise OSIC method with several existing solutions: the Naguib’s approaches [13], the Stamouli’s method [17], and the MMSE parallel interference suppression method [14, p-325]. The system platform is the one considered in [12] with two receive antennas. Figure 4 shows the respective resultant average BER (8-PSK modu-lation is used). As we can see, the OSIC method leads to the best performance; the Naguib’s two-step approach [13, p-1806] achieves a comparable BER level as the MMSE-based OSIC when SNR is low, but it deteriorates as SNR increases. Also, the OSIC method is seen to significantly outperform the Stamouli’s decoupled based detector, which is free from error-propagation but the diversity gain for each user’s signal branch is fixed to L. This might again confirm that, in the

(6)

TABLE II

FLOP COUNTS OF THREE COMPARATIVE METHODS(D:CONSTELLATION SIZE).

ZF-OSIC 5P Q2 3/ 3+PQ3/ 6 5− P Q2 2+3PQ2+11P Q2 /3+Q2/ 2 7− PQ/ 4+Q/2 2− P−1 Naguib’s two- step method 2 3 3 1 2 2 14_{P Q} / 3₋_PQ ₊(2D+ ₋5)_{P Q} ₊

(

₂D+1₊_{7/ 2}

)

_PQ2₊₄_{P Q}2 _{/ 3}_{+ ⋅}

(

_{3 2}D₋_11/2

)

_PQ_{+ −}_{(2 2}D+1₎_Q₊₃_P₋₂ Stamoulis method 23P Q2 4/12 13− P Q2 3/6 19− P Q2 2/12+PQ2/4+Q2/2−P Q2 /3 8− PQ/3 3 /2 2− Q − P+1 0 5 10 15 20 25 10-5 10-4 10-3 10-2 10-1 SNR (dB) BER Stamoulis method Ng and Sousa method Naguib one-step method Naguib two-step method ZF-OSIC

MMSE-OSIC

Fig. 4. BER performances for various signal detectors (8-PSK).

OSIC based detection, the effect of error propagation is slight so that a layer-wise increase in diversity gain is achieved. It is noted that the Naguib’s approach, the Stamoulis’s method, and the proposed user-wise OSIC detector all exploit the algebraic structure of OSTBC; the respective algorithm complexity measures are listed in Table II (the proposed method is implemented using the recursive scheme in [7]). As one can see, the proposed solution calls for least computational cost.

VI. CONCLUSIONS

In this paper we study the OSIC based signal detection for MU-OSTBC system. Subject to multiuser interference, it is proven that joint recovery of per user’s signal, under either ZF or MMSE ordering criterion, can be achieved only for a restricted class of orthogonal block codes. The established user-wise ordering (detection) property potentially reduces the computations and decoding delays regarding OSIC based signal recovery. Average BER in closed-form is given and the attainable diversity order is quantified. Numerical study shows that, in the considered scenario, error propagation does not seem to incur severe loss in diversity gain. The proposed approach compares favorably with existing multiuser detection schemes reported for the MU-OSTBC system, in terms of both numerical performance and algorithm complexity.

APPENDIXI

DETAILEDPROOF OFTHEOREM3.1

For the Q = 1 case, the result is obvious since F = αIP.

Assume that the result is true for an arbitrary Q > 1, that is,

F ∈ FP(Q) implies F−1 ∈ FP(Q) for such a Q. We have

to check thatF−1 ∈ FP(Q + 1) whenever F ∈ FP(Q + 1).

To see this, let us partition an arbitrary F ∈ FP(Q + 1) as

F = A B BT _D , where A ∈ RP Q×P Q_, _{B ∈ R}P Q×P_,

and D ∈ RP ×P_{. We note that, since} _{F ∈ F}

P(Q + 1),

we have (a) A ∈ F_P(Q) and hence A−1 ∈ F_P(Q) by assumption, (b) D = cIP for some scalar c, and (c) if we

writeB =BT₁ · · · B_QTT, whereBi ∈ RP ×P, then we have

Bi ∈ O(P ). Let us similarly write F−1 = ¯A ¯B_B¯T _D¯

, where ¯A ∈ RP Q×P Q_{, ¯}_{B ∈ R}P Q×P_{, and ¯}_{D ∈ R}P ×P_{. To}

show that F−1 ∈ FP(Q + 1), it suffices to check that (1)

¯

A ∈ FP(Q), (2) ¯B = ¯BT1 · · · ¯BTQ

T

, where ¯Bi∈ RP ×P, is

such that each ¯B_i∈ O(P ), and (3) ¯D = dI_P for some scalar d. Properties (1)-(3) can be shown based on the inversion formula for block matrices, that is

M = M11 M12 M21 M22 ⇒ M−1₌ ¯M11 M¯12 ¯ M21 M¯22 , (27) where ¯ M11 = M11− M12M−122M21−1, ¯ M12 = −M11− M12M22−1M21−1M12M−122, ¯ M21 = −M22− M21M11−1M12−1M21M−111, ¯ M22 = M22− M21M−111M12−1.

Proof of (1): From (27), we have A¯ =

A − BD−1_BT−1 ₌ _{A − c}−1_BBT−1_{, where the}

last equality follows sinceD = cIP. Since eachBi ∈ O(P ),

with direct block matrix multiplication and using Fact 1 it is easy to show that A − c−1BBT _{∈ F}

P(Q) and hence

¯

A ∈ FP(Q), by assumption.

Proof of (2): From (27), we have ¯B = − ¯ABD−1 =

c−1AB. Since ¯¯ A ∈ FP(Q) and each Bi ∈ O(P ), direct

block matrix multiplication together with Fact 1 shows that each submatrix ¯B_i∈ O(P ).

Proof of (3): Since ¯D =D − BT_A−1_B−1 _and_{D =}

cIP, it suffices to check that BTA−1B = c1IP for some

scalar c1. For 1 ≤ p, q ≤ Q, denote by Upq the (p, q)th

P × P block submatrix of A−1. SinceB = BT 1 · · · BTQ

T

, it is easy to verify that

BT_A−1_{B =} Q p,q=1 BT pUpqBq = Q p=1 BT pUppBp + Q p,q=1, p=q BT pUpqBq. (28)

(7)

Since A−1 ∈ FP(Q), we have by definition Upp =

ηpIP for some scalar ηp: the first summation on the

right-hand-side of the second equality in (28) thus simplifies as _Q

p=1BTpUppBp =

_Q

p=1ηpBTpBp = ηIP. On the other

hand, direct multiplication and using Fact 1 shows that each

BT

pUpqBq ∈ O(P ), which, again from Fact 1, implies that

BT

pUpqBq + BTqUqpBp = αp,qIP. The result shows that

Q

p,q=1, p=qBTpUpqBq = ˜ηIP and the assertion follows.

APPENDIXII

USER-WISEMMSE ORDERING

Consider the real constellation case. In the initial stage, the MMSE weight minimizing Esc− WTyc2₂

is obtained asW =σs2HcHTc + σ2v/2 IP Q ₋₁

. The lth symbol mean-square error, i.e., EeTlsc− WTyc2

, is then computed as eT l 2σ2 s/σv2 F + IP Q −1_e l. Since F ∈ FP(Q), it is

easy to see that 2σ_s2/σ2v

F + IP Q ∈ FP(Q) and so is 2σ2 s/σ2v F + IP Q −1

by Theorem 3.1: user-wise MMSE ordering is thus achievable in the first stage. Starting from (6) and with per block detect-and-cancel process, it can be checked that, at the (i + 1)th iteration (1 ≤ i ≤ Q − 1), the symbol mean-square errors are computed as the diago-nal entries of the matrix 2σ_s2/σv2

Fi+ IP (Q−i) −1 . Since Fi ∈ FP(Q − i), so is 2σ2 s/σv2 Fi+ IP (Q−i) −1 and this guarantees user-wise MMSE ordering at each iteration.

APPENDIXIII

COMPLEX-VALUEDCONSTELLATIONCASE

We only highlight the ZF case (the MMSE case similarly follows). It suffices to check that the associatedF matrix in each case exhibits the structure shown in Lemma 3.1. For L = 2 (thus full-rate code and F ∈ R4Q×4Q_{), it is shown in [6]}

thatF ∈ FP(Q): Theorem 3.1 implies that F−1 ∈ FP(Q) and

hence the user-wise ordering property holds. For 2 < L ≤ 4 (half-rate code, P = 4 and F ∈ R8Q×8Q), it is shown in [6] that, if we denote byFp,q the (p, q)th 8 × 8 block submatrix

of F, then we have Fp,p = αpI8 for some αp, whereas, for

p = q, we have Fp,q= O1 0 0 O2

, where O1 andO2 lie inO(4).

(29) The results can then be easily deduced based on the block diagonal structure of F_p,q in (29). Indeed, with (27) and by repeating the same procedures in Appendix I, the matrixF−1 can be proved to be of an identical form asF, and the assertion thus follows in the initial layer. At the ith layer, for either L = 2 or 2 < L ≤ 4, we first note that each Fi matrix is of

the same structure asF but is of lower dimensions. It can be shown by construction thatF−1_i is also and the result follows.

ACKNOWLEDGMENT

The authors thank both reviewers, whose comments im-prove the technical content of this paper.

REFERENCES

[1] S. Alamouti, “A simple transmit diversity scheme for wireless com-munications,” IEEE J. Select. Areas in Commun., vol. 16, no. 8, pp. 1451-1458, Oct. 1998.

[2] R. Chen and K. B. Letaief, “High speed wireless data transmission in layered space-time trellis coded MIMO systems,” in Proc. IEEE VTC

2003-Spring.

[3] S. Chennakeshu and J. B. Anderson, “Error rates for Rayleigh fading multichannel reception of MPSK signals,” IEEE Trans. Commun., vol. 43, no. 2/3/4, pp. 338-346, Feb./March/April 1995.

[4] Y. Dai, Z. Lei and S. Sun, “Ordered array processing for space-time coded systems,” IEEE Commun. Lett., vol. 8, no. 8, pp. 526-528, Aug. 2004.

[5] G. D. Golden et al., “Detection algorithm and initial laboratory results using V-BLAST space-time communication structure,” Electronic Lett., vol. 35, no. 1, pp. 14-16, Jan. 1999.

[6] C. L. Ho, J. Y. Wu and T. S. Lee, “Block-based symbol detection for high rate space-time coded systems,” in Proc. IEEE VTC 2004-Spring, May 2004.

[7] C. L. Ho, J. Y. Wu, and T. S. Lee, “Group-wise V-BLAST detection of multiuser space-time dual-signaling systems,” IEEE Trans. Wireless

Commun., to appear.

[8] I. M. Kim, “Space-time power optimization of variable-rate space-time block codes based on successive interference cancellation,” IEEE Trans.

Commun., vol. 52, no. 7, pp. 1204-1213, July 2004.

[9] E. G. Larsson and P. Stoica, Space-Time Block Coding for Wireless

Communications. Cambridge University Press, 2003.

[10] W. Li, T. A. Gulliver, and W. Chow, “A new SIC algorithm for STBC coded DS-CDMA systems,” Proc. IEEE 6-th Circuits and Systems

Sympoium on Emerging Techniques: Frontiers of Mobile and Wireless Communications, pp. 357-360, June 2004.

[11] S. Loyka and F. Gagnon, “Performance analysis of the V-BLAST algorithm: An analytical approach,” IEEE Trans. Wireless Commun., vol. 3, no. 4, pp. 1326-1337, July 2004.

[12] A. F. Naguib, N. Seshadri, and A. R. Calderbank, “Increasing data rate over wireless channels: Space-time coding and signal processing for high data rate wireless communications,” IEEE Signal Processing Mag., vol. 17, no. 3, pp. 76-92, May 2000.

[13] A. F. Naguib, N. Seshadri, and A. R. Calderbank, “Applications of space-time block codes and interference suppression for high capacity and high data rate wireless systems,” in Proc. IEEE 32th Asilomar Conf.

Signals, Systems, and Computers, vol. 2, pp. 1803-1810, Nov. 1998.

[14] B. K. Ng and E. S. Sousa, “On bandwidth-efficient multiuser space-time signal design and detection,” IEEE J. Select. Areas in Commun., vol. 20, no. 2, pp. 320-329, Feb. 2002.

[15] A. J. Paulraj, R. U. Nabar, and D. A. Gore, Introduction to Space-Time

Wireless Communications. Cambridge University Press 2003.

[16] N. Prasad and M. K. Varanasi, “Analysis of decision feedback detection for MIMO Rayleigh-fading channels and optimization of power and rate allocation,” IEEE Trans. Inform. Theory, vol. 50, no. 6, pp. 1009-1025, June 2004.

[17] A. Stamoulis, N. Al-Dhahir, and A. R. Calderbank, “Further results on interference cancellation and space-time block codes,” in Proc. IEEE

35th Asilomar Conf. Signals, Systems, and Computers, vol. 1, pp.

257-261, Nov. 2001.

[18] M. Tao and R. S. Chen, “Generalized layered space-time codes for high data rate wireless communications,” IEEE Trans. Wireless Commun., vol. 3, no. 4, pp. 1067-1075, July 2004.

[19] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-time block codes from orthogonal designs,” IEEE Trans. Inform. Theory, vol. 45, no. 7, pp. 1456-1467, July 1999.

[20] V. Tarokh et al., “Combined array processing and space-time coding,”

IEEE Trans. Inform. Theory, vol. 45, no. 4, pp. 1121-1128, May 1999.

[21] P. W. Wolniansky et al.,“V-BLAST: An architecture for realizing very high data rates over rich-scattering wireless channels,” Proc. IEEE

ISSSE-98, Italy, pp. 295-300, Sept. 1999.

[22] L. Zhao and V. K. Dubey, “Detection schemes for space-time block code and spatial multiplexing combined systems,” IEEE Trans. Commun.