### Detection of Multiuser Orthogonal Space-Time Block Coded Signals via

### Ordered Successive Interference Cancellation

*Jwo-Yuh Wu, Member, IEEE, Chung-Lien Ho, Member, IEEE, and Ta-Sung Lee, Senior Member, IEEE*

**Abstract— This paper investigates multiuser orthogonal ****space-time block coded signal detection within the ordered successive**
**interference cancellation (OSIC) framework. Both the **
**zero-forcing and minimum-mean-square-error ordering criteria are**
**considered. When each user terminal is equipped with no more**
**than four transmit antennas, it is shown that orthogonal transmit**
**redundancy leads to an appealing signal ordering property: in**
**each processing layer the transmitted symbols of an arbitrary**
**user are associated with an identical ordering metric. This**
**guarantees the feasibility of (user based) group-wise symbol**
**recovery through the OSIC mechanism. Analytic bit-error-rate**
**performance is given. Computer simulations and flop count**
**evaluations are also provided for comparing the OSIC based**
**solution with existing multiuser detection schemes reported for**
**the considered system.**

**Index Terms— Multiuser detection, ordered successive ****inter-ference cancellation (OSIC), array processing, space-time block**
**codes.**

I. INTRODUCTION

**M**

ULTIUSER orthogonal space-time block code
(MU-OSTBC) systems [9, chap. 11], [12], [13], can produce
multiple fading-resistant links but the co-channel user
interfer-ence then becomes the major impairment dominating system
performance. Toward signal recovery, one may in general
re-sort to the joint maximum-likelihood detection but this usually
suffers from intensive computational effort. For MU-OSTBC
systems in particular, the algebraic structure of OSTBC is
exploited for developing various alternative signal detectors:
typical such proposals are the Naguib’s parallel interference
cancellation (PIC) approaches [13], and the Stamoulis’s
user-wise decoupled detection method [17], [12]. The successive
interference cancellation (SIC) scheme, on the other hand, is
first suggested in [20], and later in [2], [4], and [18], regarding
the trellis coded transmission case. It has also been considered
for separating multi-group OSTBC signals, either in a
point-to-point environment [8], [22], or from a multiuser setting
[10]. The SIC method combined with certain signal ordering
mechanism, i.e., the so-called ordered SIC (OSIC) scheme [5],
[21], is known to yield a performance advantage over the
un-ordered case at the expense of algorithm complexity. Although
the OSIC method is a well-recognized solution for
space-time signal detection [15, chap. 7], the related study in the
MU-OSTBC scenario is nonetheless not fully investigated. In
Manuscript received July 28, 2004; revised March 27, 2005 and July 5, 2005; accepted July 13, 2005. The associate editor coordinating the review of this letter and approving it for publication was Dr. Helmut B¨oelcskei. This work is sponsored by the National Science Council under joint grant NSC 94-2213-E-002-033 and NSC 95-2752-E-002-009, the MoE ATU Program of the Ministry of Education, and the MediaTeK research center at the National Chiao Tung University, Taiwan.

The authors are with the Department of Communication Engineering,
National Chiao Tung University, 1001, Ta Hsueh Road, Hsinchu, Taiwan
(e-mail:*{jywu, u8713812}@cc.nctu.edu.tw, tslee@mail.nctu.edu.tw)*

Digital Object Identifier 10.1109/TWC.2006.04514.

particular, the impacts of the OSTBC structural property on the
algorithm characteristics, e.g., the optimal ordering strategy
as well as the associated possible low-complexity algorithm
implementation, remain important aspects yet to be addressed.
This paper studies the OSIC based signal detection for
MU-OSTBC system, in which each user terminal is equipped
*with L transmit antennas. The underlying analysis builds on*
the linear matrix modulation representation [9] of codeword
*matrices and the assumption L ≤ 4. The signal ordering*
rule can be either zero-forcing (ZF) or
minimum-mean-square-error (MMSE) criterion. By exploiting the algebraic structure
of OSTBC it is shown that, in each processing layer, the
transmitted symbol streams of an arbitrary user are associated
with an identical ordering metric. The signal ordering in each
stage, therefore, is arranged on a user-wise basis; this property
*is observed to be no longer true for L > 4. As a result,*
whenever multiuser interference is present, only a subclass
of orthogonal codes allows for joint recovery of per user’s
symbols via the OSIC mechanism. Analytic bit-error-rate
(BER) results are provided and are testified through
numer-ical simulations. The established user-wise ordering property
offers a potential advantage of computation reduction based
on group-wise data processing; an associated low-complexity
algorithm implementation which further exploits the OSTBC
structure is derived in [7]. Recently it is reported that the
uplink performance of MU-OSTBC systems can be further
improved by incorporating the codeword rotation technique
[14]. Our discussions below, however, will not take such
a codeword mapping into account to better focus on the
intrinsic properties introduced by OSTBC. It is noted that
group-wise detectors are also proposed in [14] for multiuser
signal separation. The MMSE based solution in [14, p-325] is
basically a parallel interference suppression scheme but does
not exploit the algebraic property of OSTBC; the associated
refined version with signal ordering in [14, p-326] is however
virtually similar to the one-step PIC method introduced in
[13]. The rest of the context is organized as follows. Section
II describes the system model. Section III presents the main
results, and the related performance analysis is given in
Section IV. Section V contains the simulation results. Finally,
Section VI is the conclusion.

II. SYSTEMMODEL

*A. System Description and Basic Assumptions*

Consider an MU-OSTBC system over flat fading channels
*as shown in Figure 1. The qth user’s symbol stream sq(·),*

*1 ≤ q ≤ Q, is first parsed into consecutive P -dimensional*
blocks, and the OSTBC encoder then associates per block of
*symbols sq,p* *:= sq(p − 1), 1 ≤ p ≤ P , with an L × K*
1536-1276/06$20.00 c* 2006 IEEE*

Q

Q user Transmitteuser Transmitte STBC Encoder STBC Encoder ( ) s k ( ) s k User-Wis OSIC Detector Receiver Receiver ( ) y t ( ) y t ˆ ( ) s k ˆ ( ) s k Q

Q user Transmitteuser Transmitter STBC Encoder STBC Encoder ( ) s k ( ) s k User-Wise OSIC Detector Receiver Receiver ( ) y t ( ) y t ˆ ( ) s k ˆ ( ) s k

Fig. 1. The schematic diagram of the transceiver.

space-time codeword matrix1

**X***q* :=
*2P*
*p=1*
**A***p˜sq,p,* (1)
where ˜*sq,p* *= Re {sq,p} for 1 ≤ p ≤ P , ˜sq,p(k) =*

*Im {sq,p−P } for P +1 ≤ p ≤ 2P , and the matrix Ap∈ CL×K*

satisfies2 [9]

**A***i***A***Hi* **= I***L* and **A***i***A***Hj* **+ A***j***A***Hi* **= 0***L* *for i = j. (2)*

The structural properties of**A*** _{p}*’s specified in (2) immediately
implies that

**X**

*q*is orthogonal. Expression of the codeword

matrix in the linear matrix modulation form (1) has the
advantage of unifying both the codeword representation and
the problem formulation, regardless of the rate of the OSTBC
*and hence the symbol constellations used. We assume that N*
*antenna elements are located at the receiver. Let yn(·) be the*

received discrete-time data, sampled at the symbol-rate, from
* the nth receive antenna and define y(·) := [y*1

*(·) · · · yN(·)]T*

*∈*

*CN*_{. By collecting} _{y(·) over K successive symbol periods}

to form **Y := [y(0) · · · y(K − 1)], we have the following***space-time data model (assuming that the Q users are symbol*
synchronized)
**Y(k) =***Q*
*q=1*
**H***q***X***q + V,* (3)
where

**H**

*q*=

*h(q)mn*

*∈ CN ×L* _{is the channel matrix from the}

**qth user’s antenna array to the receiver and V ∈ C**N ×K_{is the}

channel noise matrix. The following assumptions are made in the sequel.

*(a) The symbol streams sq(k), 1 ≤ q ≤ Q, are i.i.d. with*

*zero mean and variance σs*2.

*(b) The noise* **V is spatially and temporally white, each**

*entry being with zero mean and variance σv*2.

*(c) We assume that L ≤ 4 and hence, according to [19], the*
*symbol block length P ∈ {2, 4}. The proposed scheme*
is exclusively applicable to this subclass of orthogonal
codes.

*(d) For 3≤ L ≤ 4 with complex-valued constellations, the*
half rate codes [19, p-1464] are used.

1_{Since block-by-block transmission is considered, we will not include the}
block index and consider only the first*P symbols for notational simplicity.*

2_{The notations}_{(·)}T_{,}_{(·)}H_{,}_{I}_{m}_{, and}_{0}_{m}_{respectively denote the transpose,}

the complex conjugate transpose, the*m × m identity matrix, and the m × m*
zero matrix.

*B. Equivalent System Model*

In the matrix data model (3), the source symbol blocks
are spatially and temporally encoded to form the codeword
matrices. To formulate the problem into a standard multiuser
detection framework, in what follows we will present an
equivalent linear system model in which all users’ symbol
blocks are “restored” as the signal of interest. Specifically, let
us split each user’s data block**s***q:= [sq,1, . . . , sq,P*]*T* and each

received data vector* y(i) into the respective real and imaginary*
parts to obtain ˜

**s**

*:= Re*

_{q}**s**

*T*

*q*Im

**s**

*T*

*q*

*T*

*, for*

_{∈ R}_{2P}*Re*

**1 ≤ q ≤ Q, and ˜y(i) :=****y**

*T*

_{(i)}_{Im}

_{y}*T*

_{(i)}*T*

_{∈ R}2N_{,}

for 0 *≤ i ≤ K − 1. Associated with the qth user’s channel*
matrix**H*** _{q}*, we define the following matrix

˜
**H***q* ** I***K⊗*
**Re {H**q**} −Im {H**q}**Im {H**q}**Re {H**q}*∈ R2KN×2KL _{, (4)}*

where the notation *⊗ stands for the Kronecker product; also,*
with**a*** _{l,q}denoting the lth column of the matrix Aq*and ˜

**a**

*l,q*:=

**Re {a**l,q}T**Im {a**l,q}T*T*
*∈ R2L*_{, we define}
˜
**A **
⎡
⎢
⎣
**˜a***1,1* * · · · ˜a1,2P*
..
. . .. ...

**˜a**

*K,1*

*⎤ ⎥*

**· · · ˜a**K,2P*⎦ ∈ R2KL×2P.*(5)

Then the matrix data model (3) can be rewritten, after some manipulations, as the following equivalent linear model for signal detection

**y***c***= H***c***s***c***+ v***c,* (6)

where **y*** _{c}* :=

**˜y**

*T*

**T**_{(0) · · · ˜y}_{(K − 1)}*T*

_{∈ R}2KN_{and}

_{s}*c*:=

**˜s***T*

1 **· · · ˜s**TQ

*T* _{∈ R}_{2P Q}

are respectively the split real-valued received data and multiuser symbol block,

**H***c*:=
˜
**H**1* A · · · ˜*˜

**H**

*Q*

**A**˜

*∈ R2KN×2P Q*

_{(7)}

is the effective channel matrix, and**v***c∈ R2KN* is the

corre-sponding noise component. It is noted that, upon restoration
of the transmitted symbols into a vector (**s***c* in (6)), the

structural information of the space-time codeword matrices is
incorporated into the equivalent channel matrix **H***c* (see (7)).

It is such in-built structure in **H***c* that will lead to the

user-wise signal ordering property, as will be shown in the next section. To manifest the core ideas, we will hereafter focus on the real-valued symbol case with unit-rate codes; for the complex-valued constellation case, essentially the same results can be obtained and these are included in the appendix.

III. MAINRESULTS

To establish the user-wise ordering property, we shall first
consider the ZF ordering criterion, in which the detection
order is determined based on the post-detection SNR [21];
the MMSE counterpart can be readily deduced from the ZF
results and is provided in the Appendix. At the first stage,
*the ZF scheme forms the decision statistic of the lth symbol*
stream as (cf. (6))

where the weighting vector**g***l*nulls the interference from other

substreams so that

**g***T*

*l* **H***c***= e***Tl,* (9)

where**e***l* *is the lth unit standard vector in RP Q*. The unique

solution of **g***T*

*l* , which minimizes noise amplification due to

interference nulling and fulfills the constraint (9), is thus the
* lth row of the pseudo-inverse of Hc*, namely,

**g***T*
*l* **= e***Tl*
**H***T*
*c***H***c*
*−1*_{H}_{T}*c,* *1 ≤ l ≤ P Q.* (10)

Since the noise**v***c* is white and the source symbols are i.i.d.

*with variance σs*2*, the post-detection SNR in the lth decision*

component [21, p-297] can be computed from (8)-(10) as

*ρ*(0)* _{l}* :=

*σ*

2
*s*

*σv*2**e***Tl* **(H***cT***H***c*)*−1***H***Tc*

2*.* (11)

It is straightforward to verify that
**e***T*
*l*
**H***T*
*c***H***c*
*−1*_{H}_{−1}*c*
2
**= e***T*
*l* **F***−1***e***l,* (12)
with
**F := H***T*
*c***H***c∈ RP Q×P Q.* (13)

*Equations (11) and (12) imply that, given σs*2*and σv*2, the SNR

*level depends entirely on the lth diagonal entry of the matrix*

**F***−1*_{. In particular, small}_{F}*−1*

*llimplies large ρ*
(0)

*l* , and hence

*better detection accuracy in the lth decision component. The*
optimal detection order at the first stage, therefore, is given by
the index 1*≤ l ≤ P Q at which***F***−1** _{ll}*is minimal. With the
adoption of OSTBC, the matrix

**F defined in (13) will exhibit**a distinctive structure as shown in the next lemma (see [6] for a proof). Based on this result, we can further specify the diagonal entries of

**F**

*−1*for determining the optimal index.

*For a fixed symbol block length P , denote by O(P ) the set*
*of all P × P real orthogonal designs with constant diagonal*
entries as specified in [19, p-1458].

*Lemma 3.1: Let* **F***p,q* *∈ RP ×P*, 1 *≤ p, q ≤ Q, be the*

* (p, q)th P × P block submatrix of the matrix F defined in*
(13). Then we have

**F**

*q,q*

*= αq*

**I**

*P*, and

**F**

*p,q*

*∈ O(P ) for*

*p = q.*

It is noted that the matrix **F defined in (13) can be**
interpreted as the space-time matched-filtered channel matrix
in the multiuser case. In view of this point, the assertion

**F***q,q* *= αq***I***P* in Lemma 3.1 reflects that, through space-time

matched filtering, the intra-antenna symbol streams of each user remain decoupled. Also, since the off-diagonal blocks

**F***p,q*’s account for the effective multiuser interference, the

assertion**F***p,q∈ O(P ) reveals that the interference signatures*

are orthogonal designs. This attractive fact, which is true
*only for the considered code subclass with L ≤ 4, lays the*
foundation for proving the user-wise ordering property. Based
on Lemma 3.1, we state in the following theorem the central
result: the matrix **F***−1* “inherits” the key features of**F.**

*Theorem 3.1: For a fixed block length P and number of*
*users Q, let FP(Q) be the set consisting of all invertible*

*real symmetric P Q × P Q matrices possessing the block-wise*
orthogonal structures as **F shown in Lemma 3.1. Then we**
have**F***−1∈ FP(Q).*

*Proof: The proof is based on a crucial fact about the*
orthogonal designs [19]. Specifically, it can be checked by

analytic computations that, if **O**_{1}* , O*2

*∈ O(P ), then so are*

**O**1**+ O**2 and**O**1**O**2, that is,

*Fact 1: The set* *O(P ) is closed under addition and *
multi-plication. Moreover, for any**O**_{1}**∈ O(P ), we have O**_{1}**+O***T*

1 =

**γI**P*for some scalar γ.*

*With Fact 1, the assertion can be shown by induction on Q*
and the details are relegated to Appendix I.

*Theorem 3.1 implies that each P × P block diagonal*
submatrix of **F***−1*, say**F***−1 _{q,q}*, is a scalar multiple of

**I**

*, i.e.,*

_{P}**F***−1*

*q,q= βq***I***P,* *1 ≤ q ≤ Q.* (14)

**As a result, the P Q diagonal entries of F**−1*assume only Q*
*distinct levels, one for a particular user. The optimal index*
in the first stage is thus

*¯q*(0)_{= arg min}

*1≤q≤Qβq*; (15)

the ZF weighting vectors for separating the ¯*q*(0)th user’s signal
are chosen to be
⎡
⎢
⎢
⎣
**g***T*
(*q*¯(0)*−1*)*P +1*
..
.
**g***T*
(*q*¯(0)*−1*)*P +P*
⎤
⎥
⎥
⎦ =
⎡
⎢
⎢
⎣
**e***T*
(*q*¯(0)*−1*)*P +1*
..
.
**e***T*
(*q*¯(0)*−1*)*P +P*
⎤
⎥
⎥
⎦**H***Tc***H***c*
*−1*_{H}_{T}*c.*
(16)
To validate the user-wise ordering property in subsequent
stages, the contribution of the detected user’s signal (assuming
error free) is first cancelled from the received data (6) to obtain
a reduced-dimensional signal model for next layer detection.
With such a detect-and-cancel process and by following
essen-tially the same arguments as in (8)-(10), it can be successively
*verified that, at the (i +1)th processing layer (1 ≤ i ≤ Q − 1),*
the optimal detection index is determined by 1*≤ l ≤ P (Q−i)*
at which **F***−1 _{i}*

*is minimal, where*

_{ll}**F**

*i*

**= H**

*Tc,i*

**H**

*c,i*and

**H***c,i* *∈ RKN ×P (Q−i)* *is obtained by deleting i block(s) of*

*P columns (corresponding to the previously detected signals)*
from **H***c*. By construction of **H***c,i*, it is easy to see that **F***i*

is directly obtained from **F (= H***T _{c}*

**H**

*c*) by deleting from it

*the i associated block(s) of P rows and P columns. This*
immediately implies that**F***i∈ FP (Q − i), and hence F−1i*

*∈*

*FP(Q−i) by Theorem 3.1: the user-wise ordering rule at any*

subsequent processing stage is thus preserved. The algorithm flow of the user-wise OSIC detector is outlined in Table I.

**Remarks:**

*i) The major implementation concern of the OSIC *
al-gorithm is the computations of the pseudo-inverse

**H***T*
*c,i***H***c,i*

*−1*_{H}_{T}

*c,i* in each layer for determining the

optimal detection index and for separating the desired
signals [4]. In light of this fact, the advantages of
the user-wise ordering characteristic are two-fold. For
*the first, since the norms of the P (Q − i) rows of*

**H***T*
*c,i***H***c,i*

_{−1}

**H***T*

*c,i, or equivalently, the P (Q − i) the*

diagonal entries of **F***−1 _{i}*

*, assume Q − i different*lev-els, it suffices to determine only this level set for choosing the detection index; an exhaustive search can thus be avoided. Second, to compute the corresponding weighting matrices, it is plausible to exploit the signal redundancy (due to OSTBC) for further realizing a layer-wide computational efficacy. In [7] an efficient

TABLE I ALGORITHM SUMMARY. ( ) ( ) ,0 0 ,0 1 1 2 2 ( ) ( ) ( ) ; ; ; For 0 1 for ZF Criterion 1.

2 / for MMSE Criterion

2. (( 1) 1)th diagonal entry of
3. arg mi
i
T
c c c c
c c
i
i
s v i P Q i
i
i
q
i
i Q
q P
q
σ σ
β
−
−
−
= = = =
≤ ≤ −
⎧⎪⎪
⎪⎪
= ⎨⎡⎪_{⎪⎢} _{+} ⎤_{⎥}
⎪⎣ ⎦
⎪⎩
= − +
=
Initialization :
H H F F H H y y
Recursion :
F
Q
F I
Q
( )
( )
( ) ( )
( )
( )
( ) ( )
( )
1
1
, , ,
1
2 2
, , ( ) ,
( 1) 1 ( 1)
, ,
n
for ZF Criterion
4.

/ 2 for MMSE Criterion

5.

6. , is the decision sli

i
i
i i
i
q
q Q i
T
c i c i c i
i _{T}
s c i c i v P Q i c i
i i q P q P P
T
c q i c i
β
σ σ
≤ ≤ −
−
−
−
− + − +
⎧⎪⎪
⎪⎪
= ⎨⎪⎡_{⎪}_{⎢} _{+} ⎤_{⎥}
⎪⎣ ⎦
⎪⎩
⎡ ⎤
= ⎢_{⎢} ⎥_{⎥}
⎣ ⎦
= ⋅
H H H
G
H H I H
W G e e
s W y
Q Q
( ) ( )
( ) , ( ) ,
,
, 1 , ,
,
1 , 1 , 1
ce

7. , contains the columns of corresponding to 8. , is obtained by deleting block(s) of columns from

i i i c q i c c q c i c i c q c q T c c i i c i c i i P + + + + = − = y y H s H H s F H H H H ( ) ( ) ,0 0 ,0 1 1 2 2 ( ) ( ) ( ) ; ; ; For 0 1 for ZF Criterion 1.

2 / for MMSE Criterion

2. (( 1) 1)th diagonal entry of
3. arg mi
i
T
c c c c
c c
i
i
s v i P Q i
i
i
q
i
i Q
q P
q
σ σ
β
−
−
−
= = = =
≤ ≤ −
⎧⎪⎪
⎪⎪
= ⎨⎡⎪_{⎪⎢} _{+} ⎤_{⎥}
⎪⎣ ⎦
⎪⎩
= − +
=
Initialization :
H H F F H H y y
Recursion :
F
Q
F I
Q
( )
( )
( ) ( )
( )
( )
( ) ( )
( )
1
1
, , ,
1
2 2
, , ( ) ,
( 1) 1 ( 1)
, ,
n
for ZF Criterion
4.

/ 2 for MMSE Criterion

5.

6. , is the decision sli

i
i
i i
i
q
q Q i
T
c i c i c i
i _{T}
s c i c i v P Q i c i
i i q P q P P
T
c q i c i
β
σ σ
≤ ≤ −
−
−
−
− + − +
⎧⎪⎪
⎪⎪
= ⎨⎪⎡_{⎪}_{⎢} _{+} ⎤_{⎥}
⎪⎣ ⎦
⎪⎩
⎡ ⎤
= ⎢_{⎢} ⎥_{⎥}
⎣ ⎦
= ⋅
H H H
G
H H I H
W G e e
s W y
Q Q
( ) ( )
( ) , ( ) ,
,
, 1 , ,
,
1 , 1 , 1
ce

7. , contains the columns of corresponding to 8. , is obtained by deleting block(s) of columns from

i i i c q i c c q c i c i c q c q T c c i i c i c i i P + + + + = − = y y H s H H s F H H H H

recursive algorithm is derived, within a more general multiuser context, for fulfilling the aforementioned im-plementation facilities.

*ii) It is noted that the user-wise ordering property benefits*
uniquely from the distinctive structure of **F specified**
*in Lemma 3.1, which is true only when L ≤ 4. For*
*5 ≤ L ≤ 8 (hence with symbol block length P = 8*
[19]), the structure of * F will gradually deviate as L*
increases. Indeed, the off-diagonal block

**F**

*p,q*

*∈ R8×8*

*when L = 5 (or L = 6, respectively) can be shown*
to consist of four (sixteen) 4*× 4 (2 × 2) orthogonal*
*design sub-blocks. For L = 7, 8, there is no particular*
imbedded structure in**F***p,q*. As a result, for 5*≤ L ≤ 8,*

the diagonal blocks (with dimension 8* × 8) of F−1*
will not be proportional to the identity matrix

**I**8: the transmitted symbols from a user will no longer share the same ordering metric, and the user-wise ordering characteristic does not hold. We finally note that the

*constraint L ≤ 4 does not appear stringent in practice*since, to economize the implementation cost and to also tolerate a sufficient antenna spacing, it is usually undesirable to place too many antenna elements on the user terminals.

IV. PERFORMANCEANALYSIS

The performance of the SIC/OSIC method is addressed
in many works under the error-propagation free assumption,
e.g., [11], among others. The general case in which erroneous
decisions occur in each layer is recently discussed in [16],
and also in [8] for variable-rate OSTBC systems. This section
leverages the BER analysis in [8], and resorts to the technique
in [3], to derive closed-form approximate BER at high SNR
for the considered MU-OSTBC system; in what follows we
*assume that the symbol constellations are drawn from M -ary*
PSK constellations.

*For a system of Q OSTBC user terminals, each equipped*
*with L transmit antennas, the exact BER formulas for SIC*
based detection under a fixed SNR can be evaluated via
equations (25)-(27) in [8, p-1207]; these results are derived
based on the multi-channel receive performance reported in

[3], as well as the analysis framework for ZF group-wise
detection shown in [20]. For 1 *≤ i ≤ M, let us define the*
set Θ*i* *:= [(2i − 3)π/M, (2i − 1)π/M); also, with fixed*

*σ*2*s* *and σ*2*v*, denote by Pr

*θ ∈ Θi| d, σ*2*s, σv*2

the
probabil-ity that the received symbol lies in the region Θ*i* through

*maximal-ratio combining d receive branches (explicit formula*
for Pr*θ ∈ Θi| d, σ*2*s, σv*2

can be found in [8, p-1205]). For
*the particular Q = 2 case and equally-probable source, the*
average BER over the two processing stages are computed,
after some manipulations, as

*Pb*= 1_{2}
2
*q=1*
*Pqb,* (17)
where
*P*1*b*=
1
log_{2}_{M}*M*
*i=1*
Pr*θ ∈ Θi| L(N − L), σ*2*s, σv*2
(18)
and
*P*2*b* =
**z**1*∈A*1
**ˆz**1*∈A*1
1
*ML*log
2*M*
*M*
*i=1*
*× Pr**θ ∈ Θi| LN, σ*2*s, σ*2*v + ρ |z*1

*1*

**− ˆz***|*2

*× Pr*

**z**1

*1*

**→ ˆz***| L(N − L), σs*2

*, σv*2 (19) are respectively the BER at the first and the second processing

*1 and ˆ*

**layers; in (19), the L-dimensional vectors z****z**1 are

respectively the transmitted and the detected symbol vectors
*in the first stages, ρ is a scaled version of the signal variance*
*σ*2*s* [8, p-1207], and *A*1 *denotes the L-fold Cartesian product*

of the constellation set *{exp (j2πk/M)} _{0≤k≤M−1}*. For high

*SNR σs*2

*/σv*2

*1, it can be shown that [3]*

Pr*θ ∈ Θi| d, σs*2*, σ*2*v*
*≈*
_{2d}*d*
*M − 1*
*M*
log_{2}* _{M sin}*2

*π*

*M*4

*σs*2

*σ*2

*v*

*d.*(20) The approximation (20) allows us to derive closed-form BER expressions (at high SNR) based on (18) and (19) for further quantifying the achievable diversity order in each processing stage. Specifically, from (18) and (20), it can be seen that, at

*high SNR, Pb*

1 roughly behaves as

*P*1*b* *≈ G*1*σ*2*s/σ*2*v*

*−L(N−L)*

*for some G*1*,* (21)

confirming that the detected signal in the first stage enjoys a
*diversity order equal to L(N − L): L due to OSTBC transmit*
*diversity and N − L from the receive diversity. In case that*
the initial symbol decision is correct, i.e.,**z**1**= ˆz**1, we have

Pr**z**1* → ˆz*1

*| L(N − L), σs*2

*, σ*2

*v*

*= 1,*(22)

*and P*2

*b*in (19) reduces to [8]

*P*2

*b*= 1 log

_{2}

_{M}*M*

*i=1*Pr

*θ ∈ Θi| LN, σs*2

*, σ*2

*v*

*.*(23)

Equations (23) and (20) in turn imply, at high SNR,
*P*2*b≈ G*2*σ*2*s/σv*2

*−LN*

*for some G*2*,* (24)
*that is, an LN -fold diversity gain can be attained in the second*
stage. However, if erroneous decision occurs so that**z**1* = ˆz*1,

0 5 10 15 20 10-6 10-5 10-4 10-3 10-2 10-1 SNR (dB) BER Analytic BER (8PSK) Simulated BER (8PSK) Analytic BER (QPSK) Simulated BER (QPSK) Analytic BER (BPSK) Simulated BER (BPSK)

Fig. 2. Theoretical and simulated average BER for different constellations.

*it can be deduced from (19)-(20) that, when σ*2*s/σv*2 is large

* and ρ |z*1

*1*

**− ˆz***|*2 is small,

*P*2

*b≈ ˜G*2

*σ*2

*s/*

*σv*2+ ¯

*ρ*

*−LN*

*σ*2

*s/σ*2

*v*

*−L(N−L)*

*,*(25)

for some ˜*G*2 and ¯*ρ. When the symbol power σ*2*s* is fixed and

*let the noise variance σ*2*v→ 0, equation (25) then becomes*

*P*2*b* *≈ G*3*σ*2*s/σ*2*v*
*−L(N−L)*
*, where G*3= ˜*G*2*σ*2*s/¯ρ*
*−LN*
*.*
(26)
As a result, whenever inter-layer error propagation takes place,
the diversity order in the second layer would be limited to only
*L(N − L) as in the first stage; by further invoking the results*
in [8, p-1207], and following the above analysis procedures,
it can be shown that, in the general multiuser case and subject
to error propagation, the achievable diversity gain per layer
is equal to that in the first stage. It is noted that a similar
*phenomenon has also been proved in [16] regarding an L-input*
*N -output i.i.d. Rayleigh fading channel: imperfect symbol*
decision limits the diversity orders throughout the layers of
*an SIC receiver to N − L + 1, which is just the attainable gain*
*in the first layer. In our numerical test for the Q = 2 case (see*
Simulation-A), the BER curve in the second layer, however, is
seen to be very close to the corresponding
error-propagation-free performance. A plausible rationale behind this would be
that, in the first stage, each user’s signal does already enjoy
*L-fold transmit diversity due to OSTBC: the imbedded signal*
reliability improves detection accuracy at the first layer and
significantly reduces the effect of error leakage.

**Remark: The above analysis does not take into account**

signal ordering and thus would only provide a BER perfor-mance upper bound with respect to the optimally ordered case. However, even though signal ordering can lead to certain performance gain, it does not enlarge the diversity order in each stage [11], [16].

V. SIMULATIONRESULTS

This section uses several numerical examples for illustrating
the performance of the proposed scheme. We consider a
system of two users, each one using the Alamouti’s code [1]
*(and hence with L = 2 transmit antennas). The propagation*

0 2 4 6 8 10 12 14 16 18 20 10-6 10-5 10-4 10-3 10-2 10-1 SNR (dB) BER

Analytic BER (Layer 1) Simulated BER (Layer 1) Analytic BER (Layer 2) Simulated BER (Layer 2)

Analytic BER (Layer 2: Perfect Previous Decision)

Fig. 3. Theoretical and simulated BER per layer (8-PSK).

channel between each user terminal and the receiver is
quasi-static i.i.d. Rayleigh fading; the channels are perfectly known
at the receiver. In the abscissa of all simulation figures, SNR
*denotes the ratio σ*2*s/σ*2*v*.

*A. Corroboration of the Theoretical Performance*

This simulation demonstrates the predicted BER
perfor-mances in Section IV and the corresponding simulated
*out-comes (N = 4 antenna elements are placed at the receiver).*
Figure 2 shows the average BER for three different symbol
constellations: BPSK, QPSK, and 8-PSK; the theoretical
val-ues (cf. (17)-(19)) are computed based on the exact formula
of Pr*θ ∈ Θi| d, σs*2*, σ*2*v*

given in [8, p-1205] other than the simplified expression (20). As we can see from the figure, the simulated results closely match the theoretical solutions. For the particular 8-PSK case, Figure 3 explicitly depicts the BER at both processing layers; the theoretical BER in the second stage assuming perfect previous symbol decision is also included (this indicates the decision result attained with maximally achievable diversity gain). The figure shows that, in the presence of inter-layer error leakage, the simulated BER in the second stage is almost identical to the corresponding error-free benchmark solution. This tends to imply that the error-propagation effect could be slight (due to OSTBC), and the increase in diversity gain remains largely intact.

*B. Comparisons with Existing Solutions*

This simulation compares the user-wise OSIC method with
several existing solutions: the Naguib’s approaches [13], the
Stamouli’s method [17], and the MMSE parallel interference
suppression method [14, p-325]. The system platform is the
one considered in [12] with two receive antennas. Figure 4
shows the respective resultant average BER (8-PSK
modu-lation is used). As we can see, the OSIC method leads to
the best performance; the Naguib’s two-step approach [13,
p-1806] achieves a comparable BER level as the MMSE-based
OSIC when SNR is low, but it deteriorates as SNR increases.
Also, the OSIC method is seen to significantly outperform
the Stamouli’s decoupled based detector, which is free from
error-propagation but the diversity gain for each user’s signal
*branch is fixed to L. This might again confirm that, in the*

TABLE II

FLOP COUNTS OF THREE COMPARATIVE METHODS(*D:*CONSTELLATION SIZE).

ZF-OSIC 5P Q2 3/ 3+PQ3/ 6 5− P Q2 2+3PQ2+11P Q2 /3+Q2/ 2 7− PQ/ 4+Q/2 2− P−1
Naguib’s two-
step method
2 3 3 1 2 2
14_{P Q} / 3_{−}_{PQ} _{+}(2D+ _{−}5)_{P Q} _{+}

### (

_{2}D+1

_{+}

_{7/ 2}

### )

_{PQ}2

_{+}

_{4}

_{P Q}2

_{/ 3}

_{+ ⋅}

### (

_{3 2}D

_{−}

_{11/2}

### )

_{PQ}

_{+ −}

_{(2 2}D+1

_{)}

_{Q}

_{+}

_{3}

_{P}

_{−}

_{2}Stamoulis method 23P Q2 4/12 13− P Q2 3/6 19− P Q2 2/12+PQ2/4+Q2/2−P Q2 /3 8− PQ/3 3 /2 2− Q − P+1 0 5 10 15 20 25 10-5 10-4 10-3 10-2 10-1 SNR (dB) BER Stamoulis method Ng and Sousa method Naguib one-step method Naguib two-step method ZF-OSIC

MMSE-OSIC

Fig. 4. BER performances for various signal detectors (8-PSK).

OSIC based detection, the effect of error propagation is slight so that a layer-wise increase in diversity gain is achieved. It is noted that the Naguib’s approach, the Stamoulis’s method, and the proposed user-wise OSIC detector all exploit the algebraic structure of OSTBC; the respective algorithm complexity measures are listed in Table II (the proposed method is implemented using the recursive scheme in [7]). As one can see, the proposed solution calls for least computational cost.

VI. CONCLUSIONS

In this paper we study the OSIC based signal detection for MU-OSTBC system. Subject to multiuser interference, it is proven that joint recovery of per user’s signal, under either ZF or MMSE ordering criterion, can be achieved only for a restricted class of orthogonal block codes. The established user-wise ordering (detection) property potentially reduces the computations and decoding delays regarding OSIC based signal recovery. Average BER in closed-form is given and the attainable diversity order is quantified. Numerical study shows that, in the considered scenario, error propagation does not seem to incur severe loss in diversity gain. The proposed approach compares favorably with existing multiuser detection schemes reported for the MU-OSTBC system, in terms of both numerical performance and algorithm complexity.

APPENDIXI

DETAILEDPROOF OFTHEOREM3.1

* For the Q = 1 case, the result is obvious since F = αIP*.

*Assume that the result is true for an arbitrary Q > 1, that is,*

**F ∈ F**P**(Q) implies F**−1*∈ FP(Q) for such a Q. We have*

to check that**F***−1* *∈ FP (Q + 1) whenever F ∈ FP(Q + 1).*

To see this, let us partition an arbitrary **F ∈ F**P(Q + 1) as

**F =**
**A** **B**
**B***T* ** _{D}**
, where

**A ∈ R**P Q×P Q_{,}

**P Q×P**_{B ∈ R}_{,}

and **D ∈ R**P ×P_{. We note that, since} _{F ∈ F}

*P(Q + 1),*

*we have (a)* **A ∈ F**_{P}**(Q) and hence A**−1*∈ F _{P}(Q) by*

*assumption, (b)*

**D = cI**P*for some scalar c, and (c) if we*

write**B =****B***T*_{1} **· · · B**_{Q}T*T*, where**B***i* *∈ RP ×P*, then we have

**B***i* * ∈ O(P ). Let us similarly write F−1* =

**¯A ¯B**¯

_{B}*T*

**¯**

_{D}
,
where ¯**A ∈ R**P Q×P Q_{, ¯}** _{B ∈ R}**P Q×P

_{, and ¯}

**P ×P**_{D ∈ R}_{. To}

show that **F***−1* *∈ FP(Q + 1), it suffices to check that (1)*

¯

**A ∈ F**P(Q), (2) ¯**B = ¯B***T*1 *· · · ¯***B***TQ*

*T*

, where ¯**B***i∈ RP ×P*, is

such that each ¯**B*** _{i}∈ O(P ), and (3) ¯D = dI_{P}* for some scalar

*d. Properties (1)-(3) can be shown based on the inversion*formula for block matrices, that is

**M =**
**M**11 **M**12
**M**21 **M**22
**⇒ M**−1_{=}** ¯M**11 **M**¯12
¯
**M**21 **M**¯22
*, (27)*
where
¯
**M**11 = **M**11* − M*12

**M**

*−1*22

**M**21

*−1,*¯

**M**12

*= −*

**M**11

*12*

**− M****M**22

*−1*

**M**21

*−1*

**M**12

**M**

*−1*22

*,*¯

**M**21

*= −*

**M**22

*21*

**− M****M**11

*−1*

**M**12

*−1*

**M**21

**M**

*−1*11

*,*¯

**M**22 =

**M**22

*21*

**− M****M**

*−1*11

**M**12

*−1.*

*Proof* *of* *(1):* From (27), we have **A**¯ =

**A − BD**−1_{B}*T**−1* _{=} ** _{A − c}**−1

_{BB}*T*

*−1*

_{, where the}

last equality follows since* D = cIP*. Since each

**B**

*i*

*∈ O(P ),*

with direct block matrix multiplication and using Fact 1 it
is easy to show that **A − c**−1**BB***T* _{∈ F}

*P(Q) and hence*

¯

**A ∈ F**P(Q), by assumption.

*Proof of (2): From (27), we have ¯ B = − ¯*

**ABD**

*−1*=

*c−1***AB. Since ¯**¯ **A ∈ F**P**(Q) and each B**i*∈ O(P ), direct*

block matrix multiplication together with Fact 1 shows that
each submatrix ¯**B**_{i}∈ O(P ).

*Proof of (3): Since ¯***D =****D − B**T_{A}*−1*_{B}*−1* _{and}_{D =}

* cIP*, it suffices to check that

**B**

*T*

**A**

*−1*1

**B = c****I**

*P*for some

*scalar c*1. For 1 **≤ p, q ≤ Q, denote by U**pq*the (p, q)th*

* P × P block submatrix of A−1*. Since

**B =**

**B**

*T*1

**· · · B**TQ*T*

, it is easy to verify that

**B***T*_{A}*−1*_{B =}*Q*
*p,q=1*
**B***T*
*p***U***pq***B***q*
=
*Q*
*p=1*
**B***T*
*p***U***pp***B***p*
+
*Q*
*p,q=1, p=q*
**B***T*
*p***U***pq***B***q.* (28)

Since **A***−1* *∈ FP (Q), we have by definition Upp* =

*ηp***I***P* *for some scalar ηp*: the first summation on the

right-hand-side of the second equality in (28) thus simplifies as
_{Q}

*p=1***B***Tp***U***pp***B***p* =

_{Q}

*p=1ηp***B***Tp***B***p* * = ηIP*. On the other

hand, direct multiplication and using Fact 1 shows that each

**B***T*

*p***U***pq***B***q* *∈ O(P ), which, again from Fact 1, implies that*

**B***T*

*p***U***pq***B***q* **+ B***Tq***U***qp***B***p* *= αp,q***I***P*. The result shows that

*Q*

*p,q=1, p=q***B***Tp***U***pq***B***q* * = ˜ηIP* and the assertion follows.

APPENDIXII

USER-WISEMMSE ORDERING

Consider the real constellation case. In the initial stage, the
*MMSE weight minimizing E***s***c − WT*

**y**

*c*2

_{2}

is obtained
as**W =***σs*2**H***c***H***Tc* +
*σ*2*v/2*
**I***P Q*
_{−1}

*. The lth symbol *
*mean-square error, i.e., E***e***Tl***s***c − WT*

**y**

*c*2

, is then computed
as **e***T*
*l*
*2σ*2
*s/σv*2
**F + I***P Q*
*−1*_{e}*l*. Since **F ∈ F**P(Q), it is

easy to see that *2σ _{s}*2

*/σ*2

*v*

**F + I***P Q* *∈ FP(Q) and so is*
*2σ*2
*s/σ*2*v*
**F + I***P Q*
*−1*

by Theorem 3.1: user-wise MMSE
ordering is thus achievable in the first stage. Starting from
(6) and with per block detect-and-cancel process, it can be
*checked that, at the (i + 1)th iteration (1 ≤ i ≤ Q − 1),*
the symbol mean-square errors are computed as the
diago-nal entries of the matrix *2σ _{s}*2

*/σv*2

**F***i***+ I***P (Q−i)*
*−1*
. Since
**F***i* *∈ FP(Q − i), so is*
*2σ*2
*s/σv*2
**F***i***+ I***P (Q−i)*
*−1*
and this
guarantees user-wise MMSE ordering at each iteration.

APPENDIXIII

COMPLEX-VALUEDCONSTELLATIONCASE

We only highlight the ZF case (the MMSE case similarly
follows). It suffices to check that the associated**F matrix in**
*each case exhibits the structure shown in Lemma 3.1. For L =*
**2 (thus full-rate code and F ∈ R**4Q×4Q_{), it is shown in [6]}

that**F ∈ F**P**(Q): Theorem 3.1 implies that F**−1*∈ FP(Q) and*

*hence the user-wise ordering property holds. For 2 < L ≤ 4*
* (half-rate code, P = 4 and F ∈ R8Q×8Q*), it is shown in [6]
that, if we denote by

**F**

*p,q*

*the (p, q)th 8 × 8 block submatrix*

of **F, then we have F***p,p* *= αp***I**8 *for some αp*, whereas, for

*p = q, we have*
**F***p,q*=
**O**1 **0**
**0** **O**2

* , where O*1 and

**O**2 lie in

*O(4).*

(29)
The results can then be easily deduced based on the block
diagonal structure of **F*** _{p,q}* in (29). Indeed, with (27) and by
repeating the same procedures in Appendix I, the matrix

**F**

*−1*can be proved to be of an identical form as

**F, and the assertion**

*thus follows in the initial layer. At the ith layer, for either*

*matrix is of*

**L = 2 or 2 < L ≤ 4, we first note that each F**ithe same structure as**F but is of lower dimensions. It can be**
shown by construction that**F***−1 _{i}* is also and the result follows.

ACKNOWLEDGMENT

The authors thank both reviewers, whose comments im-prove the technical content of this paper.

REFERENCES

[1] S. Alamouti, “A simple transmit diversity scheme for wireless
*com-munications,” IEEE J. Select. Areas in Commun., vol. 16, no. 8, pp.*
1451-1458, Oct. 1998.

[2] R. Chen and K. B. Letaief, “High speed wireless data transmission in
*layered space-time trellis coded MIMO systems,” in Proc. IEEE VTC*

*2003-Spring.*

[3] S. Chennakeshu and J. B. Anderson, “Error rates for Rayleigh fading
*multichannel reception of MPSK signals,” IEEE Trans. Commun., vol.*
43, no. 2/3/4, pp. 338-346, Feb./March/April 1995.

[4] Y. Dai, Z. Lei and S. Sun, “Ordered array processing for space-time
*coded systems,” IEEE Commun. Lett., vol. 8, no. 8, pp. 526-528, Aug.*
2004.

*[5] G. D. Golden et al., “Detection algorithm and initial laboratory results*
*using V-BLAST space-time communication structure,” Electronic Lett.,*
vol. 35, no. 1, pp. 14-16, Jan. 1999.

[6] C. L. Ho, J. Y. Wu and T. S. Lee, “Block-based symbol detection for
*high rate space-time coded systems,” in Proc. IEEE VTC 2004-Spring,*
May 2004.

[7] C. L. Ho, J. Y. Wu, and T. S. Lee, “Group-wise V-BLAST detection
*of multiuser space-time dual-signaling systems,” IEEE Trans. Wireless*

*Commun., to appear.*

[8] I. M. Kim, “Space-time power optimization of variable-rate space-time
*block codes based on successive interference cancellation,” IEEE Trans.*

*Commun., vol. 52, no. 7, pp. 1204-1213, July 2004.*

*[9] E. G. Larsson and P. Stoica, Space-Time Block Coding for Wireless*

*Communications. Cambridge University Press, 2003.*

[10] W. Li, T. A. Gulliver, and W. Chow, “A new SIC algorithm for STBC
*coded DS-CDMA systems,” Proc. IEEE 6-th Circuits and Systems*

*Sympoium on Emerging Techniques: Frontiers of Mobile and Wireless*
*Communications, pp. 357-360, June 2004.*

[11] S. Loyka and F. Gagnon, “Performance analysis of the V-BLAST
*algorithm: An analytical approach,” IEEE Trans. Wireless Commun.,*
vol. 3, no. 4, pp. 1326-1337, July 2004.

[12] A. F. Naguib, N. Seshadri, and A. R. Calderbank, “Increasing data rate
over wireless channels: Space-time coding and signal processing for
*high data rate wireless communications,” IEEE Signal Processing Mag.,*
vol. 17, no. 3, pp. 76-92, May 2000.

[13] A. F. Naguib, N. Seshadri, and A. R. Calderbank, “Applications of
space-time block codes and interference suppression for high capacity
*and high data rate wireless systems,” in Proc. IEEE 32th Asilomar Conf.*

*Signals, Systems, and Computers, vol. 2, pp. 1803-1810, Nov. 1998.*

[14] B. K. Ng and E. S. Sousa, “On bandwidth-efficient multiuser space-time
*signal design and detection,” IEEE J. Select. Areas in Commun., vol.*
20, no. 2, pp. 320-329, Feb. 2002.

*[15] A. J. Paulraj, R. U. Nabar, and D. A. Gore, Introduction to Space-Time*

*Wireless Communications. Cambridge University Press 2003.*

[16] N. Prasad and M. K. Varanasi, “Analysis of decision feedback detection
for MIMO Rayleigh-fading channels and optimization of power and rate
*allocation,” IEEE Trans. Inform. Theory, vol. 50, no. 6, pp. 1009-1025,*
June 2004.

[17] A. Stamoulis, N. Al-Dhahir, and A. R. Calderbank, “Further results on
*interference cancellation and space-time block codes,” in Proc. IEEE*

*35th Asilomar Conf. Signals, Systems, and Computers, vol. 1, pp. *

257-261, Nov. 2001.

[18] M. Tao and R. S. Chen, “Generalized layered space-time codes for high
*data rate wireless communications,” IEEE Trans. Wireless Commun.,*
vol. 3, no. 4, pp. 1067-1075, July 2004.

[19] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-time block
*codes from orthogonal designs,” IEEE Trans. Inform. Theory, vol. 45,*
no. 7, pp. 1456-1467, July 1999.

*[20] V. Tarokh et al., “Combined array processing and space-time coding,”*

*IEEE Trans. Inform. Theory, vol. 45, no. 4, pp. 1121-1128, May 1999.*

*[21] P. W. Wolniansky et al.,“V-BLAST: An architecture for realizing very*
*high data rates over rich-scattering wireless channels,” Proc. IEEE*

*ISSSE-98, Italy, pp. 295-300, Sept. 1999.*

[22] L. Zhao and V. K. Dubey, “Detection schemes for space-time block code
*and spatial multiplexing combined systems,” IEEE Trans. Commun.*