MIMO Channel Estimation in Correlated Fading Environments

(1)

MIMO Channel Estimation in

Correlated Fading Environments

Yen-Chih Chen and Yu T. Su, Senior Member, IEEE

Abstract—This paper presents two analytic correlated

multiple-input multiple-output (MIMO) block fading channel models and their time-variant extensions that encompass the popular Kronecker model and the more general Weichselberger model as special cases. Both static and time-variant models offer compact representations of spatial- and/or time-correlated chan-nels. When the transmit antenna array is such that the associated MIMO channel has a small angle spread (AS), which occurs quite often in a cellular downlink, our models admit reduced-rank channel representations. They also provide compact channel state information (CSI) descriptions which are needed in feedback systems and in many post channel estimation applications. The latter has the important implication of reduced feedback channel bandwidth requirement and lower post-processing complexity.

Based on one of the proposed channel models we present novel iterative algorithms for estimating static and time-variant MIMO channels. The proposed models make it natural to decompose each iteration of our algorithms into two successive stages that are responsible for estimating the correlation coefficients and the signal direction, respectively. Using popular industry-approved standard channel models, we verify through simulations that our algorithms yield good MSE performance which, in many practi-cal cases, is better than that achievable by a conventional least-square estimator. The mean-least-squared error (MSE) performance of our estimators are analyzed and the resulting predictions are consistent with those estimated by simulations.

Index Terms—Channel estimation, space-time signal

process-ing, spatial correlation.

I. INTRODUCTION

I

NCREASING demand for higher wireless system capacity has catalyzed several ground-breaking transmission tech-niques, among which is the multiple-input/multiple-output (MIMO) technology that has attracted the great part of recent attention. It has been shown that in comparison with con-ventional single antenna systems, significant capacity gains are achievable when multi-element antennas (MEA) are used at both the transmit and receive sides [1]. Spatial multiplex-ing techniques, for example, the BLAST (Bell-labs Layered Space-Time) system, were developed to attain very high spectral efficiencies in rich scattering environments.

Manuscript received December 4, 2008; revised May 22, 2009 and October 15, 2009; accepted November 25, 2009. The associate editor coordinating the review of this paper and approving it for publication was M. Morelli.

This work was supported in part by Taiwan’s National Science Council under contact NSC93-2218-E-009-016, and by Computer and Communication Laboratories, Industrial Technology Research Institute, Chu-Tung, Taiwan. The material was presented in part at the PIMRC2004, Barcelona, Spain, September 2004.

Y.-C. Chen is with Realtek Inc., Hsinchu, Taiwan (e-mail: joechen@realtek.com.tw).

Y. T. Su is with the Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan (e-mail: ytsu@mail.nctu.edu.tw).

Digital Object Identifier 10.1109/TWC.2010.03.081603

Ideal rich-scattering environments decorrelate channels be-tween different pairs of transmit and receive antennas so that maximum number of orthogonal subchannels is available. In practice, however, spatial correlations do exist and should be considered when designing a MIMO receiver for evaluating the corresponding system performance [2]. Spatial correlation depends on physical parameters such as antenna spacing, antenna arrangement, and scatters’ distributions. Antenna cor-relations reduce the number of equivalent orthogonal subchan-nels, decrease spectral efficiency, making it more difficult to detect the transmitted data [1].

Among the many analytic models for spatial-correlated MIMO channels that have been proposed, the Kronecker model [2] is perhaps the most popular one. It assumes sep-arable statistics at transmitter and receiver so that the spatial correlation matrix of the vectorized channel matrix is given by the Kronecker product of those of the transmit and receive antenna arrays. The separable assumption of the Kronecker model, however, has been shown to be inappropriate by some recent experiments; see [3] and the references therein. Hence it has been modified and generalized by Sayeed [4] and, more recently, by Weichselberger et al. [3] who considered joint correlation of both link ends.

The analytic models are often used to evaluate the system capacity/performance [5],[6], to design beamformer [7] or training (pilot) sequences [8], [9] and for link level simulations [10]-[12]. These models decompose the channel matrix into terms that respectively characterize various components of the spatial structure thus contains many more parameters than those used in the original channel matrix. As a coherent MIMO receiver requires an accurate channel estimate to per-form critical operations and provide satisfactory perper-formance, these models are not suitable for channel estimation applica-tion and subsequent post-channel-estimaapplica-tion signal processing. We propose two general analytic correlated MIMO block fading models and their time-variant extensions that encom-pass the above-mentioned models as special cases. One of our models results in separable descriptions of channel cor-relations and mean angle of departure (AoD). Spatial and time covariance (or correlation) functions are described by predetermined nonparametric regressions. Unlike the other analytic correlated MIMO models, the proposed models do not incur additional parameters. On the contrary, when the angle spread (AS) of the antenna array is relatively small, they admit reduced-rank representations and compact channel state information (CSI) representation which have significant implications on complexity and bandwidth reduction in many post-channel estimation operations such as data detection and feedback beamforming.

(2)

Various pilot-assisted MIMO channel estimators have been proposed [8]-[14]. Unfortunately, few estimators are specif-ically designed for correlated MIMO channels and those few exploited only channel’s time and frequency correlation characteristics by approximating the time- and/or frequency-domain responses by an analytic model [13], [14]. Because of the special structures of the proposed models, conventional MIMO channel estimation approaches are not applicable. We present novel pilot-assisted channel estimation schemes based on one of the proposed new general MIMO channel representations which does not require information of second-order channel statistics. This representation enables us to develop efficient algorithms to identify the realistic channel responses. Although a model-based scheme inevitably induces a modelling error [13]-[16], as will be shown in Section VI, our algorithms are capable of describing realistic correlated MIMO channels with negligible modelling errors.

After a brief review of the typical space-time antenna setup and a general received MIMO signal model, we derive two new models for spatial-correlated block-faded narrowband MIMO channels and their relations with some established analytic models in Section II. We then propose single-block based iterative least squares (LS) channel estimators in the following section while the extension that takes the time-correlation into account is given in Section IV. In Section V, we analyze the mean squared error (MSE) of the proposed channel estimation algorithms. Numerical examples using in-dustrial standard approved channel models are given in Section VI to validate the proposed channel models and to demonstrate the effectiveness of our algorithms. Concluding remarks are given in the last section.

II. MODELLING OFSPATIAL-CORRELATEDMIMO CHANNELS

A. System Setup

Consider a cellular MIMO system in which the base station (BS) and a mobile station (MS) are equipped with linear arrays of 𝑀 and 𝑁 antennas, respectively.

Indepen-dent data streams x(𝑡) = [𝑥1(𝑡), 𝑥2(𝑡), 𝑥3(𝑡), ⋅ ⋅ ⋅ , 𝑥𝑀(𝑡)]𝑇

are transmitted from the BS at time 𝑡, where 𝑥𝑚(𝑡)

de-notes the source signal of the 𝑚th transmit antenna and

the superscript 𝑇 denotes vector (matrix) transposition. At

the MS, the received baseband signals are given by y(𝑡) =

[𝑦1(𝑡), 𝑦2(𝑡), 𝑦3(𝑡), ⋅ ⋅ ⋅ , 𝑦𝑁(𝑡)]𝑇, where 𝑦𝑛(𝑡) is the signal

received by the𝑛th receive antenna at time 𝑡. With a sampling

interval of △𝑡 seconds, the corresponding 𝑖th transmit and

receive sample vectors are x𝑖 = x(𝑖△𝑡), and y𝑖 = y(𝑖△𝑡),

respectively.

B. Wireless MIMO Channels

A general MIMO channel between BS and MS antennas is modelled as H(𝑡, 𝜏) = 𝐺 ∑ 𝑙=1 H𝑙(𝑡)𝛿(𝜏 − 𝜏𝑙), (1)

where 𝐺 is the maximum number of paths associated with

any sub-channel between a transmit and receive antenna pair,

𝜏𝑙 is the delay of the𝑙th path, and 𝛿 denotes the Dirac delta

function. The complex channel gain matrix associated with the

𝑙th path is given by H𝑙= [ℎ𝑙𝑖𝑗], for 1 ≤ 𝑖 ≤ 𝑁, 1 ≤ 𝑗 ≤ 𝑀,

where ℎ𝑙

𝑖𝑗 is the complex sub-channel gain between the 𝑗th

transmit and 𝑖th receive antennas. For a narrowband fading

channel, (1) is reduced to a single-tap fading matrix and the received vector waveform is y(𝑡) = H(𝑡)x(𝑡) + n(𝑡), where H(𝑡) is an 𝑁 × 𝑀 complex channel matrix and n(𝑡) a zero

mean additive white Gaussian noise (AWGN) vector with covariance matrix 𝐸{nn𝐻_{} = 𝑁}₀_I_𝑁_{. We first consider the}

block fading case in which the channel gain matrix remains unchanged within a block of𝐵 symbol intervals and eliminate

the time parameter 𝑡 in related expressions. Section IV will

discuss the case which takes the time-correlation among blocks into consideration.

C. Spatial-correlated block fading channels

Let Φ, ΦT and ΦR be the spatial correlation matrices

of vec(H), vec(⋅) being the stacking operator, the transmit and receive antennas, respectively. The separable statistics assumption of the Kronecker model impliesΦ = ΦR⊗ ΦT=

Φ1

2(Φ12)𝐻, where the “square root” matrixΦ12 has a similar

decompositionΦ1 2 = Φ 1 2 T ⊗ Φ 1 2

R and therefore gives

H = Φ12

RH𝑤Φ

1 2 𝑇

T , (2)

whereH𝑤is an𝑁 ×𝑀 channel matrix whose entries are i.i.d.

complex zero-mean, unit-variance Gaussian random variables. Weichselberger et al. [3] considered joint correlation of both link ends and suggested the following analytic model

H = UR ( ˜ Ω ⊙ H𝑤 ) U𝑇 T, (3)

whereUTandURare the eigenbases of the one-sided

corre-lation matrices at the transmit and receive sites, respectively. Operator⊙ denotes the Hadamard product operation [17]. ˜Ω is

the element-wise square root of the coupling matrix in which each entry specifies the mean amount of energy coupled with an eigenvector of the transmitter to that of the receiver.

An 𝑁 × 𝑀 matrix H always admit the singular value

decomposition (SVD), H = UΛV𝑇_{, where}_{U is an 𝑁 × 𝑁}

unitary matrix, V is an 𝑀 × 𝑀 unitary matrix, and the

diagonal matrixΛ is 𝑁 ×𝑀 with non-negative entries. When H is random, its SVD component matrices are random and

depend on the sample (matrix) value ofH. As U and V can

be transformed into two predefined unitary matrices QR and

QT by UP1 = QR andVP2 = QT, with both transforms

P1 andP2 being unitary, we have

H = QRP−11 Λ(P−12 )𝑇Q𝑇T= QRCQ𝑇T (4)

and the only random term on the right hand side isC. For the

Weichselberger model, the predefined matrices are eigenbases of the one-sided correlation matrices while Sayeed’s virtual channel representation uses the DFT bases.

Let Φ12

T 𝑑𝑒𝑓= [𝜙T(𝑖, 𝑗)], where 𝜙T(𝑖, 𝑗) represents the root

spatial correlation between𝑖th and 𝑗th transmit antennas. As

the𝑀 column vectors of Φ12

T lie in a𝐾𝑇(≤ 𝑀) dimensional

subspace, we have

(3)

whereQTis an unitary matrix and the coefficient matrixΛT

can be obtained by the Gram-Schimdt orthonormalization pro-cedure. The above equation implies𝜙T(𝑖, 𝑗) =∑𝐾_𝑘=1𝑇 𝜆𝑗_𝑘q𝑘(𝑖),

whereq𝑘(𝑖) is the 𝑖th element of the 𝑘th basis vector, 𝜆𝑗𝑘 is

the projection of the𝑗th column on q𝑘.

Using a similar decomposition for Φ1/2R leads to Φ1/2 =

(QTΛT)⊗(QRΛR) = (QT⊗ QR) (ΛT⊗ ΛR), where we have

invoked the identity [17],

(A1⊗ B1)(A2⊗ B2) ⋅ ⋅ ⋅ (A𝑘⊗ B𝑘)

= (A1A2⋅ ⋅ ⋅ A𝑘) ⊗ (B1B2⋅ ⋅ ⋅ B𝑘). (6)

¿From the canonical representation, vec(H) = Φ1

2vec(H_𝑤), we obtain vec(H) = (QT⊗ QR) (ΛT⊗ ΛR) vec(H𝑤) 𝑑𝑒𝑓_{= (Q} T⊗ QR) vec(C). (7) The identity

vec(ABD) =(D𝑇 _{⊗ A})_vec_(B) ₍₈₎

implies vec(H) = vec(QRCQ𝑇T

)

, and so H = QRCQ𝑇T,

which is the same as (4).

We summarize the above derivation on the relation between the proposed analytic model with the Kronecker, Sayeed, and Weichselberger models in

Proposition 1: An𝑁 ×𝑀 flat-faded MIMO channel matrix

H, can always be expressed as

H = QRCQ𝑇T (9)

whereC is a complex random coefficient matrix, QRandQT

are predefined unitary matrices. The above model is equivalent

to the Kronecker model if the matrixC satisfies the separable

correlation condition

vec(C) = (ΛT⊗ ΛR) vec(H𝑤) (10)

whereΛT andΛR are coefficient matrices that depend on the

spatial correlations among the transmit and the receive an-tenna arrays, respectively. (9) is related to the Weichselberger model via

UT= QTP𝐻T, UR= QRP𝐻R (11)

P𝐻

RΓRPR= 𝐸{CC𝐻}, PT𝐻ΓTPT= 𝐸{C𝑇C∗} (12)

wherePT,PRare unitary matrices andΓR, ΓThave the same

eigenvalues of the matrices 𝐸{HH𝐻} _and _𝐸{_H𝑇_H∗}_,

respectively. When the predefined matrices areUR andUT,

C has the special form ˜Ω ⊙ H𝑤. Moreover, (9) is equivalent

to the virtual representation of Sayeed if columns ofQR and

QT are DFT basis vectors and entries ofC are independent

complex Gaussian random variables.

[10] suggested and [11] verified through field measurements that the mean direction of arrival (DoA) can be embedded in the channel model by post-multiplying the channel matrixH

by a diagonal matrix which is a function of the DoA. We can derive a similar model by invoking the fact that if W is a

diagonal matrix with unit modulus entries and V is unitary

then both VW and W−1_{V are also unitary, to obtain the}

alternative representation (13).

Corollary 1: An equivalent channel matrix for stationary frequency-flat MIMO channel is given by

H = QRCQ𝑇TW (13)

where Q𝑇_TW = Q𝑇

T and W is a diagonal matrix with unit

modulus entries.

Several remarks and observations on the channel models (9) and (13) are given below.

R1. The Kronecker model requires that C has the special

structure (10) while the Weichselberger and Sayeed models demand that the entries of C be independent (but not identical) Gaussian random variables. In contrast, the proposed model does not impose any constraint on the coefficient matrixC and is valid for arbitrary

block-fadedH.

R2. For practical correlated MIMO channels, which are of particular concern to this paper, the entries of H are

not i.i.d. but correlated random variables. Although H

or the corresponding one-sided correlation matrices for a correlated channel is still likely to be of full rank, it often has a large eigen-spread and thus admits a reduced-rank representation by ignoring the weaker eigenmodes. The rank-reduction is most obvious for typical urban macro-cellular environments in which an MS is surrounded by local scatterers, and waveforms impending the re-ceive antennas are richly scattered, while the BS is not obstructed by the local scatterers [2][12]. Appendix A shows that, if the ASΔ is not too large, the diagonal matrix W

W = diag [𝑤1, 𝑤2, ⋅ ⋅ ⋅ , 𝑤𝑀] , (14)

has entries of the form𝑤𝑖 = exp

[

−𝑗2𝜋(𝑖−1)𝑑_𝜆 sin 𝜙],

𝑑 being the inter-element distance, that bear the mean

AoD information. As will become clear later, the sepa-rability of the channel correlation and angle information characterizations has some useful implications.

R3. Given predefined bases QR,QT, orQT, the statistical

properties of the corresponding coefficient matrix is completely determined by those ofC. Identification of

the unknown channelH is equivalent to the estimation

of C or the pair (C, W), which usually has a lower

(dominant) rank and thus much smaller number of entries than those of H for the link environment of

interest. Thus, using model (9) or (13) reduces the number of parameters to be estimated and enhances the performance. Moreover, as the bases in both (9) and (13) are pre-defined, these two models can be easily extended to time-varying block fading and frequency-selective fading environments.

R4. There are several classes of basis functions to choose from. The Taylor and Weierstrass arguments and the results of [21] suggest the use of polynomial bases. If we use polynomials of degree𝑃 as basis functions in

expanding a spatial correlation function of length𝑃 , the

corresponding basis matrixP𝑃 has entries

(4)

Although the column vectors in (15) form a basis, they are not orthogonal. Furthermore, these vectors have different norms, which might result in numerical instability. By applying the QR decomposition to the correspondingP𝑃 [22], we obtain an orthonormalized

polynomial basis matrix P𝑜, i.e. P𝑃 = P𝑜R𝑜, where

R𝑜 is the corresponding upper triangular matrix. Since

the columns of P0 are arranged in ascending order of

polynomial degree, (so that its 𝑘th column represents

an eigenmode describing higher correlation than that described by the 𝑙th column if 𝑙 > 𝑘), we select the

first𝐾𝑇, 𝐾𝑅or𝐾𝐿 columns to form the basis matrices

Q𝑀,𝐾𝑇 andQ𝑁,𝐾𝑅 of (17) or Q𝐿,𝐾𝐿 of (37).

R5. For a fixed base one needs to determine the modelling orders, 𝐾𝑇 and 𝐾𝑅. Either the Akaike information

criterion (AIC) or the minimum description length (MDL) approach can be used to determine the optimal modelling orders that trade-off the system complexity and performance [23]. Time domain modelling order

𝐾𝐿 discussed in Section IV can also be similarly

de-termined. Depending on the application scenario, these order parameter values can be obtained by an one-shot open loop estimate or should be periodically updated. R6. R2 and R3 indicate that, for a MIMO system with

a small-to-medium AS, the model (13) is more use-ful for post-channel-estimation application, hence our subsequent discourse will focus on this model only. Numerical results reported in Section VI also confirm that this representation leads to significant bandwidth and complexity reduction for systems which have to feedback information about or process a large MIMO channel matrix.

R7. Our simulation experiments indicate that, when the AS

△ becomes large, the dominant rank of C increases

accordingly and there is no dominant spatial angle. The steering matrix W becomes an identity matrix which

gives no AoD information and (13) degenerates to (9). III. SINGLE-BLOCKBASEDCHANNELESTIMATION In this section we consider estimation schemes which are based on a single block of observation without taking into account the (time-)correlation among blocks. We propose two iterative schemes in which an iteration consists of two phases. The first phase is responsible for the estimation of the coefficient matrix,C, while the directional matrix, W in

(13), is estimated in the second phase. Both tentative estimates are updated as one proceeds with each new iteration until the stopping criterion is met. The two schemes differ in the second phase only.

Consider the 𝑀 × 𝐵 matrix X = [x1, x2, ⋅ ⋅ ⋅ , x𝐵] formed

by 𝐵 length-𝑀 input symbol vectors, where 𝐵 ≥ 𝑀.

Assuming H remains static during a 𝐵-block period, we

express the received sample block,Y = [y1, y2, ⋅ ⋅ ⋅ , y𝐵] as

Y = HX + N, (16)

where N = [n1, n2, ⋅ ⋅ ⋅ , n𝐵] is the corresponding noise

matrix whose entries are i.i.d. zero mean complex Gaussian random variables. In estimating H, X is assumed to be

composed of either the pilot vectors or some decision feedback results. Substituting two known unitary matricesQ𝑀,𝐾𝑇 and

Q𝑁,𝐾𝑅with ranks𝐾𝑇(≤ 𝑀) and 𝐾𝑅(≤ 𝑁) for Q𝑇 andQ𝑅

in (13), we want to find the optimal solution {W𝑜𝑝𝑡, C𝑜𝑝𝑡}

to the problem arg min W,C ∥Y − Q𝑁,𝐾𝑅CQ 𝑇 𝑀,𝐾𝑇WX∥ 2 ₍₁₇₎

We express the corresponding optimal (least-squares) channel estimate in terms ofW𝑜𝑝𝑡andC𝑜𝑝𝑡

H𝑜𝑝𝑡 = Q𝑁,𝐾𝑅C𝑜𝑝𝑡Q𝑇𝑀,𝐾𝑇W𝑜𝑝𝑡 (18)

so that (16) can be rewritten as

Y = H𝑜𝑝𝑡X + ΔHX + N𝑑𝑒𝑓= H𝑜𝑝𝑡X + ˜N, (19)

where ˜N represents the sum of the modelling error ΔHX due

to the reduced rank representation and the AWGN vector,N.

To derive an iterative algorithm for obtaining the joint directional and channel solution{W𝑜𝑝𝑡, H𝑜𝑝𝑡}, we first notice

that, at the(𝑖 − 1)th iteration,

Y = ˆH𝑖−1X + Δ ˆH𝑖−1X + ˜N (20)

where Δ ˆH𝑖−1 𝑑𝑒𝑓= H𝑜𝑝𝑡− ˆH𝑖−1 is the residual error at the

end of the(𝑖 − 1)th iteration, and consider the estimation of the channel (coefficients) and AoD in two separate phases.

A. Phase I - Coefficient Estimation

Assume that the directional matrix in this phase is optimum, i.e., W = W𝑜𝑝𝑡 . From (16) and (18), we have

vec(Y) ={((W𝑜𝑝𝑡X)𝑇Q𝑀,𝐾𝑇 ) ⊗ Q𝑁,𝐾𝑅 } vec(C) + vec(N). (21) Substituting the definition Z 𝑑𝑒𝑓= ((W𝑜𝑝𝑡X)𝑇Q𝑀,𝐾𝑇) ⊗

Q𝑁,𝐾𝑅 into (21), we have the LS solution

vec(ˆC) = (Z𝐻_Z)−1_Z𝐻_vec(Y)𝑑𝑒𝑓_{= 𝐹 (W}

𝑜𝑝𝑡). (22)

While the optimal directional matrix estimate is not available, we replace it by the tentative estimation from the previous iter-ation, ˆW𝑖−1. vec(ˆC) is then obtained by computing 𝐹 (ˆW𝑖−1)

instead, and the corresponding tentative estimate is denoted by ˆC𝑖. Initially, we can arbitrarily set ˆW0 to be an identity

matrix.

B. Phase II - Direction Estimation

Similar to Phase I, we begin with the assumption that the coefficient matrix in this estimation phase is optimum. The directional information is to be obtained by estimating a diagonal matrixW with unit modulus entries; see (14). Setting

G𝑑𝑒𝑓= Q𝑁,𝐾𝑅C𝑜𝑝𝑡Q𝑇𝑀,𝐾𝑇 (23)

and invoking (18), we have ˆH𝑖−1 = GˆW𝑖−1. As C𝑜𝑝𝑡 is

unavailable, C𝑜𝑝𝑡 is replaced by the previous estimate ˆC𝑖−1

in computing G during the 𝑖th iteration. In the following,

we propose two algorithms to estimate the phase of the unit modulus diagonal entries ofW.

(5)

1) Algorithm A - Maximum Matching Output: To estimate

ˆ

W𝑖 in diagonal form, we start with the following lemma

whose proof is given in Appendix B.

Lemma 1: For two matrices A and B of size 𝑁 × 𝑀 and

𝑀 × 𝐸 respectively, and an arbitrary vector c of size 𝑀 × 1, the following identity holds.

vec(A ⋅ diag(c) ⋅ B) =[(1𝐸⊗ A) ⊙(B𝑇 ⊗ 1𝑁)]c, (24) where “diag” denotes the diagonal operation used to translate a vector into a diagonal matrix, with its diagonal terms being the elements of the original vector.

Combined with matrix G defined in (23), (19) is rewritten

as

Y = GW𝑜𝑝𝑡X + ˜N. (25)

Letw𝑜𝑝𝑡 be the column vector that consists of the diagonal

elements of W𝑜𝑝𝑡, i.e., w𝑜𝑝𝑡(𝑖) = W𝑜𝑝𝑡(𝑖, 𝑖), for any 1 ≤ 𝑖 ≤ 𝑀. Then, by Lemma 1, we have

vec(Y) =[(1𝐵⊗ G) ⊙(X𝑇⊗ 1𝑁)]w𝑜𝑝𝑡+vec

( ˜

N) (26)

and the LS estimate ofw𝑜𝑝𝑡 is given bywˆ𝐿𝑆= T†⋅ vec(Y),

where[(1𝐵⊗ G) ⊙(X𝑇 ⊗ 1𝑁)]𝑑𝑒𝑓= T.

In order to extract the steering vector w, we intro-ˆ duce v(𝜃) 𝑑𝑒𝑓= [1, 𝑣(𝜃), . . . , 𝑣𝑀−1_(𝜃)]𝑇_{, where} _{𝑣(𝜃) =}

exp[−𝑗2𝜋𝑑 𝜆sin(𝜃)

]

. The AoD information ˆ𝜙 is retrieved by

maximizing the matching output ˆ 𝜙 = arg max −𝜋≤𝜃≤𝜋Re { ℙ ( ˆw𝐿𝑆)𝐻v(𝜃) } , (27)

where ℙ(⋅) is defined by the following phase extraction operator,

ℙ([𝑎0𝑒𝑗𝑏0, 𝑎1𝑒𝑗𝑏1, ⋅ ⋅ ⋅ , 𝑎𝐾𝑒𝑗𝑏𝐾])

𝑑𝑒𝑓_{= [1, 𝑒}_𝑗(𝑏₁_−𝑏₀₎_{, ⋅ ⋅ ⋅ , 𝑒}_𝑗(𝑏_𝐾_−𝑏₀₎_],

{𝑎𝑖} ∈ ℛ𝐾+1+ , {𝑏𝑖} ∈ [0, 2𝜋) (28)

Once ˆ𝜙 is available, it is straightforward to obtain ˆW =

diag(v(ˆ𝜙)). Solving (27) over [0, 2𝜋) can be accomplished

by using the conventional line searching algorithm.

Computingwˆ𝐿𝑆in (27) involves a pseudo-inverse operation

of matrixT, and is thus computational expensive. However,

the matrix inversion is not needed if an orthogonal training sequence set, which is optimal for LS channel estimator [8], is used. This can be seen by noting that

T𝐻_{T = (G}𝐻_{G) ⊙ (X}∗_X𝑇_), ₍₂₉₎

and the right-hand side of (29) is a diagonal matrix with nonnegative real elements if X∗_X𝑇 _{= 𝐵I. In this case, we}

have ℙ ( ˆw𝐿𝑆) = ℙ(T†⋅ vec(Y))= ℙ(T𝐻⋅ vec(Y)) 𝑑𝑒𝑓_{= ℙ}(_˜ˆw 𝐿𝑆 ) , (30)

The AoD information can thus be obtained simply by substi-tuting ˜wˆ𝐿𝑆, which is obtained without matrix inversion, for

ˆ

w𝐿𝑆 in (27).

2) Algorithm B - Root Finding Method: An alternative way

to find the optimal phase is to convert (27) into a root finding problem. Note that the elements of w𝑜𝑝𝑡 are of geometric

progression, i.e., they form a row vector of a Vandermonde matrix. Hence if we define the correlation polynomial

𝑃 (𝑧)𝑑𝑒𝑓= ℙ( ˆw𝐿𝑆)𝐻z − 𝑀, (31)

wherez = [1, 𝑧, . . . , 𝑧𝑀−1_{] and let 𝒵 be the set of its zeros}

in the complex plane, then solving (27) is equivalent to ˆ𝑧 = arg min 𝑧∈𝒵∣(∣𝑧∣ − 1)∣ and ˆ𝜙 = sin −1(−Arg{ˆ𝑧}𝜆 2𝜋𝑑 ) (32) and the directional matrix is reconstructed by ˆW = diag(ˆz),

where ˆz = [1, ˆ𝑧, ⋅ ⋅ ⋅ , ˆ𝑧𝑀−1_{]. Similar to Algorithm A, if}

the orthogonal training matrix is used, ˜wˆ𝐿𝑆 of (30) can be

substituted for wˆ𝐿𝑆 in (31) to simplify the computation.

While the accuracy of Algorithm A relies on the resolution the numerical search algorithm used, this algorithm gives the exact analytic solution once (31) is solved.

C. Convergence and Complexity

Since the object function in (17) is jointly convex with respect to C and W and the proposed algorithms have the

form of a nonlinear Gauss-Seidel algorithm, the convergences of our algorithms are guaranteed [24]. All the simulation examples reported in Section VI converge and achieve the theoretical performance lower bound derived in Section V.

The computation complexity of the proposed algorithm is dominated by the LS operations in Phase I and Phase II. The flop counts of the LS operation in Phase I is O(𝐵𝐾2

𝑇),

𝐾𝑇 ≤ 𝑀 while the conventional LS estimator needs O(𝐵𝑀2)

flops [25]. Phase II’s complexity is of the same order as that of the conventional LS estimator, thanks to the special structure

of T. By using an orthogonal training matrix, the

pseudo-inversion in (22) for Phase I can be simplified to a matrix-vector product, which needs no matrix inversion operation as the modelling matrix is unitary and ˆW has unit modulus terms.

The pseudo-inverse operation of Phase II can also be replaced by the product of T𝐻 _{and vec}_{(Y), and the complexity is}

reduced as well. Moreover, except for static channels, the estimates for bothW and C need to be updated periodically.

Let each 𝐵−symbol interval be called an estimation interval

(EI). Since the mean AoD usually change much slower than the channel coefficients (gains) variation, updating frequencies

for W and C can and should be different, i.e., if the two

estimates are updated every𝑇𝑐

𝑜 and𝑇𝑜𝑤EIs, respectively, then

𝑇𝑤

𝑜 ≫ 𝑇𝑜𝑐 (see Fig. 8 of Section VI). This dual updating

frequency option is unique to our approach and implies that

Phase II may be disabled most of the time while Phase I

needs single iteration per update EI, hence our algorithm may be computational more efficient than the conventional LS approach for many non-static channels.

The major advantage of our channel model and estimator lies not in the computational efficiency of the channel esti-mator but in the compactness of CSI representation which is needed in a feedback system and that of post-channel-estimation operations. As mentioned in R2 and R3, a small

(6)

with high transmit spatial correlation. For any post channel estimation operation associated withH, e.g., taking

pseudo-inverse or eigen-decomposition ofH, the computing load is

reduced as it involves the𝐾𝑅×𝐾𝑇 coefficient matrix and the

estimated AoD instead of the original𝑁 ×𝑀 channel matrix.

IV. CHANNELESTIMATION WITHTIMECORRELATION CONSIDERATION

We now extend our investigation to the case that the time correlation among blocks has to be taken into account. Like our spatial modelling approach, we use a set of orthonormal basis functions to describe a snap shot of a fading channel’s time domain behavior. We assume an equally spaced pilot-block arrangement. The issue of the optimal pilot arrangement that minimizes the MSE or bit error rate (BER) was addressed in [9] and [26].

Assuming the two leading pilot symbol vectors of two consecutive pilot blocks are 𝑇 symbol intervals away, we

express the receive signal block at time𝑛𝑇 as

Y𝑛 = H𝑛X𝑛+ N𝑛 (33)

whereY𝑛= Y(𝑛𝑇 ) and X𝑛 = X(𝑛𝑇 ) are the 𝑁 ×𝐵 receive

matrix at time 𝑛𝑇 and the corresponding 𝑀 × 𝐵 transmit

block, respectively.H𝑛 is the 𝑁 × 𝑀 matrix whose (𝑖, 𝑗)th

entry represents the link gain between the𝑖th transmit and the 𝑗th receive antennas at time 𝑛𝑇 .

We consider the time-variant behavior of a MIMO channel within a fixed observation window of 𝐿 blocks (EIs). The

received sample blocks from 𝑛𝑇 to (𝑛 + 𝐿 − 1)𝑇 can be

cascaded into the matrix

Y𝑛,𝐿𝑑𝑒𝑓= [Y𝑛, Y𝑛+1, . . . , Y𝑛+𝐿−1] . (34)

Using (8), we obtain

vec(Y𝑛,𝐿) =(X𝑇𝑛,𝐿⊗ I𝑁)⋅ vec (H𝑛,𝐿) + vec (N𝑛,𝐿) (35)

where vec(H𝑛,𝐿) 𝑑𝑒𝑓= [vec(H𝑛)𝑇, . . . vec(H𝑛+𝐿−1)𝑇]𝑇,

vec(N𝑛,𝐿)𝑑𝑒𝑓= [vec(N𝑛)𝑇, . . . vec(N𝑛+𝐿−1)𝑇]𝑇, and

X𝑇 𝑛,𝐿𝑑𝑒𝑓= ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ X𝑇 𝑛 0 ⋅ ⋅ ⋅ 0 0 X𝑇 𝑛+1 ... 0 .. . ... ... ... 0 0 ⋅ ⋅ ⋅ X𝑇 𝑛+𝐿−1 ⎤ ⎥ ⎥ ⎥ ⎥ ⎦. Substituting (9) for each H𝑛 and assuming the eigenbases

QTandQRremain invariant during an estimation period, we

obtain

vec(H𝑛,𝐿) = (I𝐿⊗ QT⊗ QR) Γ𝑛,𝐿. (36)

Each component of the vector Γ𝑛,𝐿 =

[

𝛾𝑇

𝑛, 𝛾𝑛+1𝑇 , ⋅ ⋅ ⋅ , 𝛾𝑛+𝐿−1𝑇

]𝑇 _{is itself an} _{(𝑁𝑀) × 1 column}

vector 𝛾𝑛 = [𝛾1𝑛, 𝛾2𝑛, ⋅ ⋅ ⋅ , 𝛾(𝑁𝑀)𝑛]𝑇 that represents the

complex fading coefficients for all 𝑁𝑀 MIMO subchannels

at time𝑛𝑇 and, 𝛾𝑝𝑛,1 ≤ 𝑝 ≤ 𝑁𝑀, are independent.

The stacked vector, 𝛾(𝑝) =

[

𝛾𝑝𝑛, 𝛾𝑝(𝑛+1), ⋅ ⋅ ⋅ , 𝛾𝑝(𝑛+𝐿−1)

]_𝑇

, represents a finite-duration sample of the complex random process associated with the

𝑝th subchannel [11]. Such a process can also be expanded by

a set of smooth functions [14], [15], and thus its estimation can be obtained by using a method similar to that developed in the previous section. Hence, we can first apply the orthogonal transform 𝛾(𝑝) = Q𝐿b𝑝𝑛, where Q𝐿 is an

𝐿 × 𝐿 orthogonal matrix, and b𝑝𝑛 is the transform domain

coefficient vector. Then, the time domain channel correlation can be approximated by using the reduced basis matrix

Q𝐿,𝐾𝐿

𝛾(𝑝) ≈ Q𝐿,𝐾𝐿c𝑝𝑛, Γ𝑛,𝐿≈ (Q𝐿,𝐾𝐿⊗ I𝐿𝑀𝑁) ccoef, (37)

where𝐾𝐿 denotes the time domain modelling order, andc𝑝𝑛

is a𝐾𝐿× 1 coefficient vector.

By using (13), (36) and the approximation (37), we decou-ple the signal part of (35) into the product of two modelling domains - space and time domains

vec( ¯Y𝑛,𝐿) ≈(X𝑇𝑛,𝐿⊗ I𝑁) [Q𝐿,𝐾𝐿⊗ (W𝑇QT) ⊗ QR ] ccoef ≈(X𝑇 𝑛,𝐿⊗ I𝑁) [Q𝐿,𝐾𝐿⊗ (W𝑇QT,𝐾𝑇) ⊗ QR,𝐾𝑅 ] ˜ccoef 𝑑𝑒𝑓₌ (_((W 𝐿X𝑛,𝐿)𝑇Q˜T,𝐾𝑇) ⊗ QR,𝐾𝑅 ) ˜ccoef (38) where W𝐿 𝑑𝑒𝑓= (I𝐿⊗ W), ˜QT,𝐾𝑇 𝑑𝑒𝑓_{= Q} 𝐿,𝐾𝐿⊗ QT,𝐾𝑇 and

QT,𝐾𝑇 and QR,𝐾𝑅 are composed of 𝐾𝑇 and 𝐾𝑅 column

vectors ofQ_TandQR, respectively.W is the steering matrix

defined in (14). Since the mean AoD usually varies slowly with respect to a sub-channel’s coherent time, we assume that

W remains the same during a period of 𝐿 data blocks. Just

as the narrowband case (13), we do not impose the implicit Kronecker structure and Gaussian assumption on˜ccoef.

As (38) can be obtained by replacing X, Y, W, vec(C), Q𝑀,𝐾𝑇, and Q𝑁,𝐾𝑅 in (21) by X𝑛,𝐿, Y𝑛,𝐿, W𝐿, ˜ccoef,

˜

QT,𝐾𝑇, and QR,𝐾𝑅, we conclude that both spatial and time

correlations can be described by similar models. Hence, the two-phase iterative estimation scheme developed in Section III can be extended to estimate the coefficient vector˜ccoef, and the

directional matrix W𝐿 in (38). In the following, we describe

two-phase channel estimation schemes with time correlation consideration.

A. Phase I - Coefficient Estimation

Following an argument similar to that used in Section III, we assume that the directional matrix W𝐿 is optimal in the

coefficient estimation phase and define ˜Z𝑑𝑒𝑓₌ (_(W

𝐿,𝑜𝑝𝑡X𝑛,𝐿)𝑇Q˜T,𝐾𝑇

)

⊗ QR,𝐾𝑅. (39)

The LS estimate of˜ccoef is

ˆ˜ccoef= (˜Z𝐻˜Z)−1˜Z𝐻vec(Y𝑛,𝐿)𝑑𝑒𝑓= ˜𝐹 (W𝐿,𝑜𝑝𝑡), (40)

which is a function of the optimal directional matrixW𝐿,𝑜𝑝𝑡.

At the𝑖th iteration, since the optimal directional matrix is not

available, the tentative estimation, ˆW𝐿,𝑖−1, replacesW𝐿,𝑜𝑝𝑡. B. Phase II - Direction Estimation

Similar to the single-block based case, we propose two AoD estimation algorithms. Again, we assume the optimal coefficient vector is available, i.e., ˜ccoef = ˜ccoef,𝑜𝑝𝑡, when

(7)

Define a new matrix ˜G𝑑𝑒𝑓= QR,𝐾𝑅C˜coef,𝑜𝑝𝑡Q˜𝑇T,𝐾𝑇, where

˜

Ccoef,𝑜𝑝𝑡 is a 𝐾𝑅× 𝐾𝐿𝐾𝑇 matrix derived from ˜ccoef,𝑜𝑝𝑡 by

˜

Ccoef,𝑜𝑝𝑡(𝑖, 𝑗) = ˜ccoef,𝑜𝑝𝑡(𝐾𝑅(𝑗 − 1) + 𝑖), 1 ≤ 𝑖 ≤ 𝐾𝑅, 1 ≤

𝑗 ≤ 𝐾𝐿𝐾𝑇. We rewrite the received matrix in vector form

vec(Y𝑛,𝐿) = vec ( ˜ GW𝐿X𝑛,𝐿 ) + ˜N𝑛,𝐿 =(X𝑇 𝑛,𝐿⊗ ˜G ) vec(I𝐿⊗ W) + ˜N𝑛,𝐿, (41)

where ˜N𝑛,𝐿 represents the sum of the modelling error

asso-ciated with ˜G and the AWGN term N𝑛,𝐿.

1) Algorithm A - Maximum Matching Output: If W is

constrained to be a diagonal matrix, i.e.,W = diag(w), then

I𝐿⊗ W = diag(1𝐿⊗ w) and therefore

vec(Y𝑛,𝐿) = vec ( ˜ G ⋅ diag(1𝐿⊗ w) ⋅ X𝑛,𝐿 ) + ˜N𝑛,𝐿. (42)

¿From Lemma 1, we have

vec(G ⋅ diag(1˜ 𝐿⊗ w)) ⋅ X𝑛,𝐿 ) = ((1𝐿𝐸⊗ ˜G ) ⊙(X𝑇 𝑛,𝐿⊗ 1𝑁))(1𝐿⊗ I𝑀)w 𝑑𝑒𝑓_{= ˜}_Tw. ₍₄₃₎

The LS estimate ofw𝑜𝑝𝑡, like its counterpart in Algorithm A

of the previous subsection, is given bywˆ𝐿𝑆= ˜T†⋅vec(Y𝑛,𝐿).

To improve the estimate and reconstruct a steering

vec-tor w, we analogously define a steering vector v(𝜃)ˆ 𝑑𝑒𝑓=

[

1, 𝑣(𝜃), ⋅ ⋅ ⋅ , 𝑣𝑀−1_(𝜃)]𝑇_{, where}_{𝑣(𝜃) = exp(−𝑗2𝜋}𝑑

𝜆sin(𝜃)).

The AoD information ˆ𝜙 can be retrieved by

ˆ

𝜙 = arg max

−𝜋≤𝜃≤𝜋Re

{

ℙ( ˆw𝐿𝑆)𝐻v(𝜃)}, (44)

whereℙ denotes the phase extraction operator defined by (28). Having obtained ˆ𝜙, we then proceed to compute ˆW𝐿= I𝐿⊗

V(ˆ𝜙), where V(ˆ𝜙) = diag(v(ˆ𝜙)).

The pseudo-inverse operation ˜T† _{is not necessary if the}

orthogonal training matrix is used forX𝑛, i.e.,X𝑛X𝐻𝑛 = 𝐵I

for each𝑛. We then have

ℙ( ˜T†_{⋅ vec (Y} 𝑛,𝐿) ) = ℙ( 𝑑𝑒𝑓 = ˜ˆw𝐿𝑆 ˜ T𝐻_{⋅ vec (Y} 𝑛,𝐿) ), (45)

and wˆ𝐿𝑆 is to be replaced by ˜wˆ𝐿𝑆 defined in the above

equation.

2) Algorithm B - Root-Finding Method: The root-finding

approach for the block fading case can be used as well. It is easy to see that (44) is equivalent to searching for the root of the correlation polynomial𝑃 (𝑧) which is the closest to the

unit circle, i.e., ˆ𝑧 = arg min

𝑧 ∣∣𝑧∣ − 1∣,

subject to 𝑃 (𝑧)𝑑𝑒𝑓= ℙ( ˆw𝐿𝑆)𝐻z − 𝑀 = 0 (46)

and then retrieving the AoD information from ˆ𝑧 = exp[−𝑗2𝜋𝑑

𝜆sin(ˆ𝜙)

]

. The directional matrix is to be recon-structed by ˆW𝐿= I𝐿⊗ diag(ˆz), where ˆz = [1, ˆ𝑧, . . . , ˆ𝑧𝑀−1].

Similarly, using orthogonal training matrices, we replacewˆ𝐿𝑆

by ˜wˆ𝐿𝑆 to avoid pseudo-matrix inversion.

The total complexity per block of the proposed algorithm, like the single-block based case, is still larger than the conven-tional LS estimator when the dual updating frequencies option is not used. However, if the operating scenario allows the use of the latter option, the complexity can be asymptotically reduced to 𝐾2𝑇

𝑀2 of that of the conventional LS method if𝑇_𝑜𝑤≪

𝑇𝑐

𝑜. Furthermore, by using an orthogonal training matrix, we

need no matrix inversion in both Phase I and Phase II and significantly reduce the computing complexity. For slowing time-variant channels, the required time domain modelling order, 𝐾𝐿, is small, the number of channel representation

parameters is reduced from 𝐿𝑀𝑁 to 𝐾𝐿𝐾𝑇𝐾𝑅+ 1. Such

a reduction yields compact CSI representation and benefits many post channel estimation operations involvingH, as was

discussed at the end of last section.

V. PERFORMANCEANALYSIS In analyzing the MSE performance

𝜖𝑑𝑒𝑓= 𝐸{∥H − ˆH∥2 𝐹 } = 𝐸{∥vec(H) − vec( ˆH)∥2 2 } . (47)

of the proposed ˆH, we first make the optimistic assumptions

that the optimal orthogonal training matrix [8] for conventional LS channel estimator is used and the directional matrix estimate ˆW is perfect.

Notations

For notational simplicity and when there is no danger of ambiguity,H and W in this section denote the channel and

directional matrices of (16)/(17) or (35)/ (38) for single-block based or time-correlated based estimators, andX𝑝 represents

X in (21) or X𝑛,𝐿 in (38). Furthermore, QT andQR denote

either the modelling bases Q𝑀,𝐾𝑇 and Q𝑁,𝐾𝑅 in (21), or

˜

QT,𝐾𝑇 andQR,𝐾𝑅 in (38).

Then (47) can be expressed as

𝜖(X𝑝; W)=𝐸 { ∥vec(H) − vec(QRCQˆ 𝑇TW)∥22 } =𝐸{∥vec(H) − ΨΩ𝑧vec(HX𝑝+ N)∥22 } (48) where Ψ 𝑑𝑒𝑓= (W𝑇_Q_T_{) ⊗ Q}_R _and _Ω_𝑧 𝑑𝑒𝑓_{= (ˆZ}𝐻ˆZ)−1ˆZ𝐻_{, ˆ}_Z

being the LS estimate of Z defined in III-A, i.e.,

ˆZ𝑑𝑒𝑓_{= ((WX}

𝑝)𝑇QT) ⊗ QR (49)

AsHX𝑝andN are statistically independent, the MSE can be

separated into two terms which are contributed by modelling error (reduced-rank basis matrices) and AWGN, respectively.

𝜖(X𝑝; W) = 𝐸{∥vec(H) − ΨΩ𝑧vec(HX𝑝)∥22 } + 𝐸{∥ΨΩ𝑧vec(N)∥22 } 𝑑𝑒𝑓_{= 𝜖} ℎ(X𝑝, W) + 𝜖𝑛(X𝑝, W) (50)

By defining the projections PW 𝑑𝑒𝑓=

[ W𝑇_Q_T_(Q𝑇 TW∗X∗𝑝X𝑇𝑝W𝑇QT)−1Q𝑇TW∗X∗𝑝X𝑇𝑝 ] ⊗ QRQ𝑇R and ˜PW 𝑑𝑒𝑓= [W𝑇QT(Q𝑇TQT)−1Q𝑇TW∗ ] ⊗ QRQ𝑇R. we

rewrite the first term on the RHS of (50) as

𝜖ℎ(X𝑝; W) = 𝐸∥(I − PW) vec(H)∥22 = tr((I − ˜P𝐻 W)(I − ˜PW)Rℎ ) = 𝜒 ∑ 𝑘=1 𝜆𝑘∥(I − ˜PW)f𝑘∥22 (51)

(8)

where Rℎ = 𝐸{vec(H)vec(H)𝐻} is the channel

correla-tion matrix and f𝑘 is Rℎ’s eigenvector associated with the

eigenvalue𝜆𝑘, 𝜆1 ≥ 𝜆2 ≥ ⋅ ⋅ ⋅ ≥ 𝜆𝜒; 𝜒 being the degree of

freedom of H. For the single-block based case, 𝜒 = 𝑁𝑀

and it is equal to 𝑁𝑀𝐿 when the estimator considers the

time correlation effect. (51) is valid since the orthogonal training matrix X𝑝 is used. Let 1 < 𝐾 ≤ 𝜒 be the rank

of the dominant signal subspace of the channel covariance matrix. Then Rℎ = ∑𝜒𝑘=1𝜆𝑘f𝑘f𝑘𝐻 ≃

∑_𝐾

𝑘=1𝜆𝑘f𝑘f𝑘𝐻, with

𝜆𝑘 ≪ 1 for 𝐾 < 𝑘 ≤ 𝜒. Since ∥(I − ˜PW)f𝑘∥22 ≤ 1,

we have ∑𝜒_𝑘=𝐾+1𝜆𝑘∥(I − ˜PW)f𝑘∥22 ≤ ∑𝐾𝑘=𝐾+1𝑠 𝜆𝑘 ≪ 1.

Let the compound modelling order 𝐾𝑠 be equal to 𝐾𝑇𝐾𝑅

and 𝐾𝑇𝐾𝑅𝐾𝐿 for the two cases under investigation. If 𝐾𝑠 is chosen to be larger than 𝐾, the rank of Rℎ, i.e.,

𝐾 < 𝐾𝑠 ≤ 𝜒, and the basis matrices QT and QR span

the dominant signal subspace of Rℎ, then the matrix ˜PW

is a projection operator whose range lies mostly in the space spanned by {f𝑘}, 1 ≤ 𝑘 ≤ 𝐾 and we conclude that

∥(I − ˜PW)f𝑘∥22 𝑑𝑒𝑓= ∣ ˜P⊥Wf𝑘∥22 ≪ 1, for 1 ≤ 𝑘 ≤ 𝐾.

Therefore, the modelling error𝜖ℎis negligible in this case. On

the other hand, if the modelling order is not enough to span the signal subspace, there is under-modelling error contributed by those non-negligible terms 𝜆𝑘∥(I − ˜PW)f𝑘∥22 which will

dominate the mean squared error when the AWGN is small (high SNR region).

As for the MSE due to thermal noise–the second term on the RHS of (50), we can show that

𝜖𝑛(X𝑝, W) = 𝐸{∥ΨP𝑧vec(N)∥22 } = tr ( 𝑁0 𝐵 ˜PW ) = 𝑁_𝐵0𝐾𝑠, (52)

where we have invoked the facts that (i) the training signal

X𝑝 and the noise N are independent, (ii) unitary training

matrix is used, and (iii) elements ofN is i.i.d. complex white

Gaussian noise with variance 𝜎2

𝑛 = 𝑁0. (52) implies that

thermal noise induced MSE can be reduced by using a small modelling order. In Section VI (Figs. 3-5), we find that this noise-reduction effect is significant in low SNR environments where thermal noise dominates the MSE performance while the modelling error of (51) dominates in high SNR region.

If ˆW is not perfect and W = ˆW + ΔW, then

ˆZ𝑑𝑒𝑓_{= Z + ΔZ}

= ((WX𝑝)𝑇QT) ⊗ QR+ ((ΔWX𝑝)𝑇QT) ⊗ QR. (53)

The coefficient vector estimation vec(ˆC) can be approximated

up to the first order ofΔZ as [27]

vec(ˆC) ≃ vec(C) − Z†_{ΔZvec(C) + Z}†_vec(N)

+ (Z𝐻_Z)−1_ΔZ𝐻_𝑃⊥

Zvec(N) − Z†ΔZZ†vec(N), (54)

where𝑃⊥

Z = I−Z(Z𝐻Z)−1Z𝐻. The above equation indicates

that, besides the terms that have to do with the noise N,

the coefficient vector estimation error is determined by the projection errorΔZ. Hence, when the projection error ΔW is small (and thusΔZ is small), vec(ˆC) is a good approximation

of vec(C) at high SNR region.

VI. NUMERICALRESULTS ANDDISCUSSION Simulation results reported here use the reference MIMO channel model [28], the IEEE 802.11 TGn channel model [29], and the SCM model [30]. The former two are stochastic models whose spatial correlation matrices are generated by the power azimuth spectrum (PAS) at the BS and MS, respectively. The SCM model generates the channel coefficients according to a set of selected parameters (e.g., AS, AoD, AoA, etc.). It is a popular parametric stochastic model whose spatial cross correlations are functions of the joint distribution of the AoD at the transmit side and the AoA at the receive side. We assume that the environment surrounding MS is rich scattering with negligible spatial correlations. Hence, a full rank basis matrix is used to characterize the spatial correlation at the receive side. Other assumptions and conditions used in our simulation are: (i) the antenna spacings at transmit and receive arrays are both 0.5𝜆, (ii) an orthogonal training matrix is used, (iii) 10 iterations are used for all simulations (although in most cases convergence occurs in less than 3 iterations), and (iv) SNR (𝐸𝑏/𝑁0) is defined as the average

signal to noise power ratio at the input of each receive antenna, (v) orthonormal polynomial basis matrices are used. Both algorithms compute ˆH by substituting the final result of Phase

I–estimated coefficient matrix ˆC–and that of Phase II–ˆW–into

(18).

Solid curves in Fig. 1 are the MSE performance of

Algo-rithm B in Section III for an8×8 MIMO system with Δ = 2∘

and are based on the channel model of [28]. The channel is a block fading channel with an approximated rank of two. Since the BS spatial correlations are high, the corresponding correlation function lies in a low-dimension subspace so that a small 𝐾𝑇 is sufficient to describe the channel. Dotted curves

in Fig. 1 show the system performance when Δ = 15∘_.

It is obvious that as Δ increases, the spatial correlations among the transmit antennas elements decrease and a higher modelling order is necessary to describe rapid-changing spatial waveforms at the transmitter side. However, as can be seen from Figs. 2-5, an optimal𝐾𝑇 exists for any given SNR and

Δ; increasing the modelling order does not necessary improves the channel estimator’s performance. As expected, we find that modelling errors dominate the MSE performance when SNR is high. Such a behavior is consistent with what the performance analysis given in Section V has predicted and is similar to those observed in other model-based approaches [13]-[16].

The MSE performance of Algorithm B of Section IV for a time-correlated fading channel [28] are depicted in Fig. 2 and Fig. 3 using an observation window of 12 EIs and

𝑓𝑑𝑇𝑠 = 0.031772 or 0.015886. Similar to the single-blocked

based case (Fig. 1), the processing dimension (𝐾𝑇) can be

drastically reduced provided that either the spatial or time domain correlation is high enough. Performance degradation occurs when the modelling order is not large enough to capture the channel characteristics. In Fig. 4, we compare the theoretical MSE derived in Section V with the simulated performance and find that the latter is very close to the theoretical bound which assumes a perfect ˆW. When used for

estimating other reference channels, the proposed estimators exhibit similar performance behaviors. Fig. 5 depicts the

(9)

1e-05 0.0001 0.001 0.01 0.1 0 5 10 15 20 25 30 35 40

Normalized Mean Squared Error

Eb/N0 (dB) KT=2 (AS=2°) KT=3 (AS=2°) KT=4 (AS=2°) KT=5 (AS=15°) KT=6 (AS=15°) KT=7 (AS=15°) Least Squared

Fig. 1. MSE performance of Algorithm B as a function of SNR with different modelling orders; solid curves: AS=2∘_{, dotted curves: AS=15}∘_.

1e-05 0.0001 0.001 0.01 0.1 0 5 10 15 20 25 30 35 40

Eb/N0 (dB) KT=3, KL=4 (fdTs=0.015886) KT=3, KL=5 (fdTs=0.015886) KT=4, KL=5 (fdTs=0.015886) KT=3, KL=4 (fdTs=0.031772) KT=3, KL=5 (fdTs=0.031772) KT=4, KL=5 (fdTs=0.031772) Least Squared

Fig. 2. The effect of the modelling order on Algorithm B’s MSE performance in a channel generated by the model described in [10] with AS=2∘_.

MSE performance in an IEEE 802.11 TGn channel [29] with

𝐿 = 12, Δ = 15∘_{, and}_𝑓

𝑑𝑇𝑠= 0.0022, while Fig. 6 shows the

MSE performance in a 3GPP-SCM channel [30] with𝐿 = 12,

Δ = 15∘_and_𝑓_𝑑_𝑇_𝑠_{= 0.02844. When 𝐾}_𝑇 _{is large enough, the}

time-domain modelling order needed to characterize a slow fading channel like the IEEE 802.11 TGn channel is smaller than that for a fast fading SCM channel. Note that in all cases, the performance becomes independent of the AS when the full modelling order is used (i.e.,𝐾𝑇 = 8) and is equivalent to

that of the conventional LS approach.

The remaining numerical results assume that the algorithms developed in Section III are used and, except for Fig. 8, the same channel model as that used for Fig. 1. Fig. 7 compares the MSE performance of Algorithms A and B developed in Section III whenΔ = 15∘_{. If the maximum matching output}

is obtained by selecting the best one from the outputs using 100 candidate phases uniformly distributed within[−𝜋, 𝜋),

Al-gorithm A and AlAl-gorithm B give almost identical performance.

However, if only 20 candidate phases are used, Algorithm A results in a little performance degradation with respect to that obtained by Algorithm B when SNR is high. Fig. 8 examines the MSE performance when ˆW is updated with different EI

lengths for various channel settings. Smaller performance loss

1e-05 0.0001 0.001 0.01 0.1 0 5 10 15 20 25 30 35 40

Eb/N0 (dB) KT=7, KL=4 KT=8, KL=4 KT=7, KL=5 KT=8, KL=5 KT=7, KL=6 KT=8, KL=6 Least Squared

Fig. 3. The effect of modelling order on Algorithm B’s MSE performance in a channel generated by the model described in [10] with AS=15∘ _and 𝑓𝑑𝑇𝑠=0.031772. 1e-05 0.0001 0.001 0.01 0 5 10 15 20 25 30 35 40

Eb/N0 (dB) KT=5, KL=4 KT=6, KL=4 KT=7, KL=4 KT=8, KL=4 KT=5, KL=5 KT=6, KL=5 KT=7, KL=5 KT=8, KL=5 KT=5, KL=4 (Theory) KT=6, KL=4 (Thoery) KT=7, KL=4 (Thoery) KT=8, KL=4 (Thoery) KT=5, KL=5 (Theory) KT=6, KL=5 (Theory) KT=7, KL=5 (Theory) KT=8, KL=5 (Theory)

Fig. 4. Comparison of theoretical and simulated MSE performance of

Algorithm B in a channel generated by the model described in [10]; AS=15∘

and𝑓𝑑𝑇𝑠=0.031772.

results if the channel is more static or less dynamic (smaller

𝑓𝑑𝑇𝑠). When𝐾𝑇 ≥ 3 for Channel-1 [28] and 𝐾𝑇 ≥ 2 for

Channel-2 [29], the performance loss is negligible for all the update frequencies. Recall that more computation saving is obtained by a larger 𝑇𝑤

𝑜. It is clear that our reduced-order

modelling approach outperform the conventional LS estimator for most 𝐸𝑏

𝑁0 when a proper𝐾𝑇 is used.

To show the advantage of the proposed schemes in post pro-cessing, we demonstrate here a feedback eigen-beamforming scheme. This scheme can adapt to the feedback estimated CSI to optimize the reception performance in a spatial correlated environment [31]. To use the optimal beamforming scheme of [31] which minimizes the mean squared error between the transmit symbol and equalized received sample, we need to substitute QRCQˆ 𝑇TW for H in (7), (16) and (22) of [31].ˆ

The flops of the eigen-decomposition operation needed by the beamforming system decrease from O(𝑀3_{) to O(𝐾}3

𝑇).

Moreover, the total number of feedback floating-point vari-ables in our approach is 𝐾2

𝑇 + 𝐾𝑇 + 1 while that of the

conventional CSI is𝑀2_{+ 𝑀. MSE in Fig. 9 is defined as the}

mean power of the error vector between a transmit data vector and the corresponding received/equalized vector. Simulation

(10)

1e-05 0.0001 0.001 0.01 0.1 0 5 10 15 20 25 30 35 40

Eb/N0 (dB) KT=5, KL=1 KT=6, KL=1 KT=7, KL=1 KT=8, KL=1 KT=5, KL=2 KT=6, KL=2 KT=7, KL=2 KT=8, KL=2 KT=5, KL=3 KT=6, KL=3 KT=7, KL=3 KT=8, KL=3 Least Squared

Fig. 5. The effect of the modelling order on the MSE performance of

Algorithm B in a channel generated by IEEE 802.11 TGn channel model A;

AS=15∘_{, and}_𝑓 𝑑𝑇𝑠=0.0022. 1e-05 0.0001 0.001 0.01 0.1 0 5 10 15 20 25 30 35 40

Eb/N0 (dB) KT=5, KL=3 KT=6, KL=3 KT=7, KL=3 KT=8, KL=3 KT=5, KL=4 KT=6, KL=4 KT=7, KL=4 KT=8, KL=4 KT=5, KL=5 KT=6, KL=5 KT=7, KL=5 KT=8, KL=5 Least Squared

Fig. 6. The effect of the modelling order(𝐾𝑇) on the MSE performance

of Algorithm B in a 3GPP-SCM channel; AS=15∘_and_𝑓

𝑑𝑇𝑠=0.02844.

results shown in Fig. 9 reveal that even for a weakly correlated environment, i.e. AS=15∘_{, the performance degradation is}

negligible for𝐾𝑇 ≥ 4 when_𝑁𝐸𝑏₀ ≥ 12 dB while there is

perfor-mance improvement over the conventional LS approach when

𝐸𝑏

𝑁0 ≤ 12 dB. More performance and complexity-reduction

improvements are achievable for channels with higher spatial correlation, e.g., when AS=4∘_.

VII. CONCLUSION

This paper presents a novel analytic model which spans the spatial and/or time correlation functions over the dominant signal subspace and provides additional directional informa-tion. Iterative algorithms are proposed for estimating spatial-correlated MIMO channels. These estimators are extended to time-varying cases in which the time-correlation has to be taken into account. We simulate the estimators’ performance in various popular industry-approved and standardized channels to validate the accuracy of our model and the usefulness of our channel estimators. Numerical results show that in many instants the proposed algorithms give superior MSE perfor-mance. Our estimators offer tradeoffs between performance and complexity. They are easily extendable for use in

1e-05 0.0001 0.001 0.01 0.1 0 5 10 15 20 25 30 35 40

Eb/N0 (dB) Algorihtm A, Segment=20, KT=4 Algorihtm A, Segment=20, KT=5 Algorihtm A, Segment=20, KT=6 Algorihtm A, Segment=20, KT=7 Algorihtm A, Segment=20, KT=8 Algorihtm A, Segment=100, KT=4 Algorihtm A, Segment=100, KT=5 Algorithm A, Segment=100, KT=6 Algorithm A, Segment=100, KT=7 Algorithm A, Segment=100, KT=8 Algorihtm B, KT=4 Algorithm B, KT=5 Algorithm B, KT=6 Algorithm B, KT=7 Algorithm B, KT=8

Fig. 7. MSE performance comparison of Algorithm A and Algorithm B; AS=15∘_. 1e-06 1e-05 0.0001 0.001 0.01 0.1 0 5 10 15 20 25 30 35 40

Eb/N0 (dB) Channel-1 KT=1, Tw o=1 KT=2, Tw o=1 KT=3, Tw o=1 KT=4, T w o=1 KT=1, Tw o=5 KT=2, Tw o=5 KT=3, T w o=5 KT=4, Tw o=5 KT=1, Tw o=20 KT=2, T w o=20 KT=3, Tw o=20 KT=4, Tw o=20 Least Squared 1e-06 1e-05 0.0001 0.001 0.01 0.1 0 5 10 15 20 25 30 35 40

Eb/N0 (dB) Channel-2 KT=1, Tw o=1 KT=2, Tw o=1 KT=3, Tw o=1 KT=4, T w o=1 KT=1, Tw o=5 KT=2, Tw o=5 KT=3, T w o=5 KT=4, Tw o=5 KT=1, Tw o=20 KT=2, T w o=20 KT=3, Tw o=20 KT=4, Tw o=20 Least Squared

Fig. 8. The effect of the update period on the MSE performance of Algorithm

B. Channel-1 is based on [28] with𝑓𝑑𝑇𝑠=0.015886 while Channel-2 is based

on [29] with𝑓𝑑𝑇𝑠= 0.0022. AS=2∘,𝑇𝑜𝑐= 1; both 𝑇𝑜𝑐and𝑇𝑜𝑤are measured

in EIs.

band MIMO systems and are most effective when the chan-nel’s AS is small, i.e., when the dimension of the dominant subspace is much smaller than full channel correlation rank. Not only do they offer fast and accurate estimates, give MSE performance improvement due to the noise reduction effect but, more importantly, also provide compact and useful CSI that lead to significant feedback channel bandwidth reduction and other potential post processing complexity cutbacks.

APPENDIXA

AOD INFORMATIONEXTRACTION

For smallΔ, the correlation between two transmit antennas

𝑖, 𝑗 can be approximated by [2] 𝐸{ℎ𝑚𝑖ℎ∗𝑚𝑗 } ≈ exp { −𝑗2𝜋_𝜆(𝑖 − 𝑗)𝑑 sin 𝜙 } 𝐽0 ( Δ2𝜋_𝜆(𝑖 − 𝑗)𝑑 cos 𝜙 ) . (A.1)

In addition, correlation between two receive antennas 𝑝, 𝑞

can be approximated by 𝐸{ℎ𝑝𝑖ℎ∗𝑞𝑖

}

≈ 𝐽0(2𝜋_𝜆(𝑝 − 𝑞)𝑑), for

𝑑

𝑅 ≪ 1. By using the W defined in (14), the definition

˜

Δ 𝑑𝑒𝑓= 2𝜋𝑑