How Much Coherent Interval Should be Dedicated to Non-Redundant Diagonal Precoding for Blind Channel Estimation in Single-Carrier Block Transmission?

(1)

How Much Coherent Interval Should be Dedicated

to Non-Redundant Diagonal Precoding for

Blind Channel Estimation in

Single-Carrier Block Transmission?

Jwo-Yuh Wu, Member, IEEE

Abstract—Transmit precoding is a key technique for

facilitat-ing blind channel estimation at the receiver but the impact due to precoding on the channel capacity is scarcely addressed in the literature. In this paper we consider the single-carrier block transmission with cyclic prefix, in which a recently proposed diagonal-precoding assisted blind channel estimation scheme via covariance matching is adopted to acquire the channel information. It is shown that, when perfect channel knowledge is available at the receiver, the optimal noise resistant precoder proposed in the literature incurs the worst-case capacity penalty. When the coherent interval is finite, channel mismatch occurs due to finite-sample covariance matrix estimation. Thus, we aim to determine how much of the coherent interval should be dedicated to precoding in order to trade channel estimation accuracy for the maximal capacity. Toward this end, we leverage the matrix perturbation theory to derive a closed-form capacity measure which explicitly takes account of the channel uncertainty in the considered blind estimation setup. Such a capacity metric is seen to be a complicated function of the precoding interval. To facilitate analysis, an approximate formula for the derived capacity measure is further given. This allows us to find a closed-form estimate of the capacity-maximizing precoding time fraction, and can also provide insights into the optimal tradeoff between channel estimation accuracy and achievable capacity. Numerical simulations are used for evidencing the proposed analytic study.

Index Terms—Blind channel estimation; channel capacity;

precoding; single-carrier block transmission; cyclic prefix; ma-trix perturbation analysis; sample covariance mama-trix; circulant matrix.

I. INTRODUCTION

A. Motivation and Paper Contributions

B

LIND channel estimation is widely known as a bandwidth-efficient alternative as opposed to the training technique for acquiring the channel information at the re-ceiver [11]. Among the existing blind estimation methods, the transmit-precoding assisted solutions attracted considerable Manuscript received July 8, 2009; revised November 21, 2009; accepted April 29, 2010. The associate editor coordinating the review of this paper and approving it for publication was C. Tellambura.

This work is sponsored by the National Science Council of Taiwan under grants NSC 97-2221-E-009-101-MY3; by the Ministry of Education of Taiwan under the MoE ATU Program; by the Academia-Research Co-operation Program of MoEA under grant 98-EC-17-A-02-S2-0050; by the Telecommunication Laboratories, Chunghwa Telecom Co., Ltd. under grant TL-99-G107; and by the MediaTek Research Center at National Chiao Tung University, Taiwan.

The author is with the Department of Electrical Engineering, and the Institute of Communications Engineering, National Chiao Tung University, Taiwan (e-mail: jywu@cc.nctu.edu.tw).

Digital Object Identifier 10.1109/TWC.2010.061010.091019

attention in recent years [11]. Various channel estimation algorithms associated with different precoding schemes have been proposed, either in the serial transmission case [7], [17], [25], [30], or in the block transmission counterpart, e.g., [4], [18], [23], [26], [31], [33]. Unlike the multi-channel receive-diversity estimation algorithms, e.g., [20], [28], the transmit-precoding assisted approach is quite robust against channel order mismatch and the channel zero locations [17], [25], [30]. Nevertheless, most of the precoding based methods are developed under the assumption that certain second-order statistics of the received signal can be perfectly obtained. The resultant channel estimation performance, therefore, is dominated by the finite-sample estimation errors in the com-puted data statistics. Moreover, symbol precoding at the trans-mitter may have significant impact on the channel capacity [6]. Under the perfect channel knowledge assumption the capacity performances attained by various redundant and non-redundant precoding schemes were analyzed in [6]. More in-depth study of the achievable system capacity that explicitly takes into account the channel mismatch effect in the context of precoding-based blind estimation has not been seen in the literature yet.

In this paper we consider the single-carrier block transmis-sion with cyclic prefix (CP)1 _{[10], in which the transmitter}

implements a non-redundant diagonal precoder and, at the receiver the channel information is acquired through the precoding-assisted blind estimation scheme [31]. The main purpose of this paper is to investigate the optimal noise-resistant precoder [31] from a capacity perspective, and to further characterize an inherent tradeoff between channel estimation quality and the achievable system capacity. Specif-ically, it is shown that when the received signal covariance matrix is perfectly obtained, thus the channel estimate is exact2_{, the optimal noise-resistant precoder results in the}

minimal capacity in the high SNR regime. Hence, if we adopt the considered precoder to improve the channel estimation accuracy, there is a potential loss in the achievable information 1_{CP-based single carrier systems have been considered as one}

next-generation wireless standard, e.g., SC-OFDMA in LTE uplink [19].

2_{It is well-known that all blind estimation schemes can identify the channel}

only up to a scalar ambiguity, which has to be removed by further inserting some pilot symbols [11]. As in many previous works regarding performance analysis of blind algorithms [7], [32], we assume for analytical simplicity that the ambiguity is removed. In this sense, the channel estimate is considered to be exact if the received signal covariance matrix can be obtained without errors.

(2)

rate. We then turn to consider the more realistic block fading environment: (1) the channel remains constant over a finite time duration and can change independently across different time frames; (2) during each coherent interval the blind algo-rithm [31] is implemented for obtaining the channel estimate. Clearly, if all the symbol blocks within a coherent interval are precoded, and the receiver uses all the available data blocks to form a sample covariance matrix, one can come up with the utmost channel estimation accuracy since the covariance matrix estimation error is kept as small as possible. Such an advantage, however, comes at the expense of capacity loss due to precoding. On the contrary, if only a certain fraction of the coherent time is spent for precoding, the quality of channel estimation could be relatively poor though the precoding-induced capacity loss can be reduced. This naturally motivates the following question: How much the coherent interval should be dedicated to precoding so that one can achieve the optimal tradeoff between channel estimation accuracy and the maximal system capacity ?

To pin down such a tradeoff, one needs a capacity measure that can explicitly reflect the channel mismatch effect in the considered blind estimation setup. By leveraging the matrix perturbation analysis and following the technique in [12], we derive one such capacity measure, which is seen to be a complicated function of the precoding interval. To facilitate analysis, we further derive an approximate expression for the obtained capacity metric that has the following threefold advantages. Firstly, it allows us to specify the aggregate capac-ity cost incurred by the considered blind channel estimation scheme as a sum of two terms, one due to precoding whereas the other caused by channel estimation errors (or finite-sample covariance matrix estimation). Secondly, based on this decom-position there is a simple procedure for obtaining an analytic estimate of the capacity-maximizing time fraction spent for precoding. Thirdly, there are very informative interpretations regarding the two penalty terms that will provide further insights into the optimal tradeoff. The capacity results pre-dicted by our analysis are further corroborated via numerical simulations.

B. Paper Organization and Notation List

The rest of this paper is organized as follows. Section II introduces the signal model, briefly reviews the blind channel estimation scheme in [31], and highlights the essentials regard-ing the optimal noise-resistant precoder. Section III discusses the capacity performance of the considered precoder, assuming that perfect channel knowledge is available at the receiver. Section IV considers the block fading channel environment and derives the capacity measure for specifying the design tradeoff. Section V addresses the optimal precoding interval selection problem. Section VI provides several numerical experiments for evidencing the proposed analytical results. Finally, Section VII concludes this paper.

Notation: Letℝ𝑚×𝑛_and_ℂ𝑚×𝑛_{be the sets of real and complex}

matrices. Denote by (⋅)𝑇_, _(⋅)∗_{, and} _(⋅)𝐻_{, respectively the}

transpose, complex conjugate, and Hermitian operations. The symbols I𝑚 and 0𝑚 denote the 𝑚 × 𝑚 identity and zero

matrices; 0𝑚×𝑛 is the 𝑚 × 𝑛 zero matrix. The notation ⊗

stands for the Kronecker product [14, p-242], and vec(X)

is the vectorized operation of the matrix X [14]. For X =

[

x1 ⋅ ⋅ ⋅ x𝑛] ∈ ℂ𝑚×𝑛 and Y =[y1 ⋅ ⋅ ⋅ y𝑛] ∈ ℂ𝑘×𝑛,

X□Y :=[x1⊗ y1 ⋅ ⋅ ⋅ x𝑛⊗ y𝑛]denotes the column-wise

Kronecker product [32]. For x ∈ ℂ𝑚_{, let diag}_{{x} be the}

𝑚 × 𝑚 diagonal matrix with the elements of x on the main diagonal. The notation 𝐸{𝑦} stands for the expected value of the random variable 𝑦. Denote by Tr[M] the trace of the square matrix M.

II. DIAGONALPRECODINGBASEDBLINDCHANNEL

ESTIMATION

A. Signal Model

We consider a precoded single-carrier CP-based system over an 𝐿-order frequency-selective fading channel described as [31]

y𝑘 = GPs𝑘+ v𝑘, 𝑘 ⩾ 0, (2.1)

where s𝑘 ∈ ℂ𝑁 and y𝑘 ∈ ℂ𝑁 are, respectively, the source

symbol and the received signal blocks (with𝑁 denoting the dimension of the source symbol vector),v𝑘∈ ℂ𝑁 is the noise

vector,

P := diag{[𝑝(0) ⋅ ⋅ ⋅ 𝑝(𝑁 − 1)]}∈ ℝ𝑁_{, 𝑝(𝑛) ∈ ℝ, (2.2)}

is a diagonal precoding matrix, and G ∈ ℂ𝑁×𝑁 _{is the}

circulant channel matrix whose first column is given by

g :=[ℎ(0) ⋅ ⋅ ⋅ ℎ(𝐿) 0 ⋅ ⋅ ⋅ 0]𝑇 ∈ ℂ𝑁 _, _(2.3)

withℎ(𝑛) being the 𝑛th channel tap, 0 ≤ 𝑛 ≤ 𝐿. The purpose of diagonal precoding is to deliberately induce certain transmit power variation so as to facilitate blind channel estimation at the receiver. The following assumptions are made throughout the paper.

a) The sources𝑘 is a white vector sequence with zero mean

and unit variance.

b) The noisev𝑘 is white circularly complex Gaussian with

zero mean, covariance 𝜎2

𝑣I, and is independent of the

source signals𝑘 .

B. Blind Channel Estimation Algorithm

The approach in [31] exploits the circulant structure of the channel matrix G as well as a resultant

covariance-matching channel estimation setup. More specifically, since

G is circulant, it can be expressed in terms of its first column

as G =[g Jg ⋅ ⋅ ⋅ J𝑁−2_{g J}𝑁−1_g]_, _(2.4) where J := [ 01×(𝑁−1) 1 I𝑁−1 0(𝑁−1)×1 ] ∈ ℝ𝑁×𝑁 _(2.5)

is the permutation matrix. We assume for the moment that the channel noise is absent, hencev𝑘 = 0. With (2.1), (2.2), and

(2.4), the covariance matrix of the received signal is easily shown to be Ry:= 𝐸{y𝑘y𝐻𝑘 } = GP2G𝐻 ₌𝑁−1∑ 𝑛=0 𝑝(𝑛)2_J𝑛_gg𝐻_(J𝑇₎𝑛_. _(2.6)

(3)

We observe that, for a givenRy, (2.6) defines a set of linear equations with the product channel coefficientsℎ(𝑘)ℎ(𝑙)∗_(i.e.,

entries in gg𝐻_{) as unknowns. To exploit such an inherent}

linear signal structure for channel estimation, we shall first rearrange (2.6) into a standard linear equation form. By resorting to the vec(⋅) operation, (2.6) can be rearranged into (2.7), shown at the bottom of this page.

To solve for the product unknowns ℎ(𝑘)ℎ(𝑙)∗ _{via (2.7),}

we have to first reduce the dimension of the equations by removing the null variables in vec(gg𝐻_{). Specifically, it can}

be shown that (2.7) is equivalent to [31] ˜

Q vec(hh𝐻)= vec(Ry), (2.8) where h := [ℎ(0) ⋅ ⋅ ⋅ ℎ(𝐿)]𝑇 is the desired channel impulse response vector, and

˜ Q := QJ₁(I𝐿+1⊗ J2) (2.9) with J1:= [ I𝑁(𝐿+1) 0𝑁(𝑁−𝐿−1)×𝑁(𝐿+1) ] ∈ ℝ𝑁2_{×𝑁(𝐿+1)} J2:= [ I𝐿+1 0(𝑁−𝐿−1)×(𝐿+1) ] ∈ ℝ𝑁×(𝐿+1)_.

Assume that the matrix ˜Q is of full column rank; an associated

sufficient condition characterized in terms of the precoder 𝑝(𝑛) is established in [31, Proposition 4.1]. Then the product channel coefficient vector can be computed as

vec(hh𝐻)=( ˜Q𝑇_Q˜)−1_Q˜𝑇_vec(R

y). (2.10) Once vec(hh𝐻) is obtained, let us form the rank-one matrix

H := hh𝐻= [ℎ(𝑘)ℎ(𝑙)∗_]

0≤𝑘,𝑙≤𝐿. (2.11)

The channel impulse response vectorh can then be estimated,

up to a scalar ambiguity, by computing the dominant eigen-vector associated with the matrixH. We note that a similar

matrix outer-product approach has also been adopted in the previous works [9], [16], [17], [26], [32]. To obtain the full channel estimate, some pilot symbols should be inserted in the symbol block for removing the scalar ambiguity; the detail is referred to [31, p-1119].

C. Optimal Noise-Resistant Precoder

With noise corruption the autocorrelation matrix in (2.6) instead reads Ry= 𝐸{y𝑘y𝑘𝐻 } =𝑁−1∑ 𝑛=0 𝑝(𝑛)2_J𝑛_gg𝐻_(J𝑇₎𝑛_+𝜎2 𝑣I . (2.12)

The identification equation (2.8) is then accordingly modified as

vec(Ry) = ˜Q vec(hh𝐻) + 𝜎𝑣2vec(I) . (2.13)

In (2.13) we can think of the product channel coefficients vec(hh𝐻) as the signal of interest, hence the range space of ˜Q defining the signal subspace, and treat the white-noise

perturbation vec(I) as spanning the noise subspace. It is noted that the matrix ˜Q depends entirely on the precoder coefficients 𝑝(𝑛)’s. To mitigate the noise effect on the estimated channel, one natural approach is thus to design𝑝(𝑛) so that the signal and noise subspaces are rendered as close to being orthogonal as possible. In [31] the precoder design problem is specifically formulated as minimizing the largest correlation index among the noise signature vec(I) and the columns of ˜Q, subject to

the power normalization constraint

𝑁−1_∑ 𝑛=0

𝑝(𝑛)2_{= 𝑁} _(2.14)

and the threshold requirement

𝑝(𝑛)2_{≥ 𝛿 > 0, ∀ 0 ≤ 𝑛 ≤ 𝑁 − 1 ,} _(2.15)

where 𝛿 is the power threshold, which is constrained to be strictly positive to avoid symbol nulling. The optimal noise-resistant precoder admits the following two-level form (see [31, p-1121]): for a fixed but arbitrary0 ≤ 𝑚 ≤ 𝑁 − 1,

𝑝(𝑚)2_{= 𝑁(1 − 𝛿) + 𝛿 and 𝑝(𝑛)}2_{= 𝛿 for 𝑛 ∕= 𝑚 .}

(2.16) The rest of this paper aims to study the optimal noise-resistant precoder (2.16) from a capacity perspective, and to address the optimal tradeoff between channel estimation accuracy and achievable capacity when the coherent interval is finite and

Ry is estimated via a finite amount of data.

Remark: With the optimal precoder (2.16), it is shown in [31] that a small threshold𝛿 results in small noise corruption on the product channel coefficients, thereby improving the estimation accuracy. However, an unlimitedly small𝛿 should be avoided since it not only renders the symbol decision process quite prone to noise (see [7], [31]), but will also incur a high capacity penalty as will be shown next. For 𝑁 = 32 and 𝐿 = 8 (with the CP interval set equal to the channel order), our simulation study in Section VI (see Figures 4 and 5) indicates that 𝛿 ≈ 0.9 is the compromising choice regarding the tradeoff between channel estimation accuracy and the achievable capacity.

⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 𝑝(0)2_I 𝑁 𝑝(𝑁 − 1)2J𝑁−1 ⋅ ⋅ ⋅ 𝑝(2)2J2 𝑝(1)2J 𝑝(1)2_J _𝑝(0)2_I_𝑁 _{⋅ ⋅ ⋅ 𝑝(3)}2_J3 _𝑝(2)2_J2 .. . ... ⋅ ⋅ ⋅ ... ... 𝑝(𝑁 − 2)2_J 𝑁−2 𝑝(𝑁 − 3)2J𝑁−3 ⋅ ⋅ ⋅ 𝑝(0)2I𝑁 𝑝(𝑁 − 1)2J𝑁−1 𝑝(𝑁 − 1)2_J_𝑁−1 _{𝑝(𝑁 − 2)}2_J𝑁−2 _{⋅ ⋅ ⋅} _𝑝(1)2_J _𝑝(0)2_I_𝑁 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ :=Q vec(gg𝐻_{) = vec(R} y). (2.7)

(4)

III. CAPACITYPERFORMANCE WITHEXACTCHANNEL

KNOWLEDGE

This section investigates the impact of the precoder (2.16) on capacity, assuming that the channel is perfectly known to the receiver. In such an idealized case, the ergodic capacity (in bits per block transmission, neglecting the CP overhead) of the system (2.1) is well-known to be [27]

𝐼 = 𝐸{log det(I + 𝜎−2

𝑣 GP2G𝐻)

}

. (3.1) SinceG is circulant, we have G = F𝐻_{DF, where F is the}

FFT matrix andD is a diagonal matrix containing the channel

frequency responses. This implies

GP2_G𝐻 _{= F}𝐻_DFP2_F𝐻_D𝐻_F, _(3.2) and (3.1) becomes 𝐼 = 𝐸{log det(I + 𝜎−2 𝑣 F𝐻DFP2F𝐻D𝐻F) } (𝑎) = 𝐸{log det(I + 𝜎−2 𝑣 FP2F𝐻D𝐻FF𝐻D) } (𝑏)_{= 𝐸}{_{log det(I + 𝜎}₋₂ 𝑣 FP2F𝐻D)˜ } , (3.3)

where (a) holds due to det(I + AB) = det(I + BA) for A and B with compatible dimensions, and (b) follows

since FF𝐻 = I and by defining ˜D := D𝐻_{D, which is a}

positive definite diagonal matrix. According to the Hadamard’s inequality [13, p-477], the termlog det(I+𝜎−2

𝑣 FP2F𝐻D) in˜

the capacity expression (3.3) is maximized if𝑝(𝑛) is chosen so thatFP2_F𝐻_{D is diagonal. Since ˜}˜ _{D is diagonal, the matrix}

FP2F𝐻_{D can be diagonalized only when FP}_˜ 2_F𝐻 _{is also}

diagonal. Subject to the fact thatFP2_F𝐻 _{is circulant [8], the}

only such diagonalFP2F𝐻 _{is a scalar multiple of the identity}

matrix. As a result, the capacity-maximizing𝑝(𝑛) should be chosen so that

FP2_F𝐻 _{= 𝛼I for some 𝛼 > 0.} _(3.4)

The unique 𝑝(𝑛) which satisfies (3.4) as well as the two constraints (2.14) and (2.15) is

𝑝(𝑛) = 1, for 0 ≤ 𝑛 ≤ 𝑁 − 1. (3.5)

Hence, for i.i.d. sources, the equal-power scheme (3.5) is capacity-optimal3_{. The modulated symbol power induced by}

diagonal precoding, therefore, will inevitably incur a capacity loss (i.e., precoding is harmful from the capacity perspective). To characterize the capacity performance of the two-level precoder (2.16) we further note that, in the high SNR region, (3.3) is well approximated by [22] 𝐼 ≈ 𝐸{log det(𝜎−2 𝑣 FP2F𝐻D˜ )} (3.6) = log { 𝜎−2𝑁 𝑣 𝑁−1_∏ 𝑛=0 𝑝(𝑛)2 } + 𝐸{log det ˜D}. From (3.6) we can see that, when SNR is high, the impact on the capacity due to precoding is entirely characterized by the product term ∏𝑁−1_𝑛=0 𝑝(𝑛)2_{. The smaller such a product}

3_{If the uniform scheme (3.5) is used, the matrix ˜}_{Q in (2.9) will however}

be rank deficient and the channel is then rendered unidentifiable.

is, the larger the capacity penalty will be. The optimal noise-resistant preocoder (2.16), however, turns out to be the worst-case choice regarding channel capacity, as established in the next theorem (see Appendix A for a proof).

Theorem 3.1: Among the 𝑝(𝑛) satisfying the constraints (2.14) and (2.15), the precoder (2.16) minimizes the quantity ∏_𝑁−1 𝑛=0 𝑝(𝑛)2, yielding min𝑁−1∏ 𝑛=0 𝑝(𝑛)2_{= 𝛿}𝑁−1[_{𝑁 − (𝑁 − 1)𝛿}]_. _(3.7) Discussions:

1. It is easy to check that the minimal product in (3.7) increases as the threshold 𝛿 is increased toward unity; when 𝛿 = 1, (2.16) reduces to the capacity-optimal equal-power scheme (3.5). This implies that a large 𝛿, though resulting in poor channel estimation accuracy [31, Sec. V], will limit the capacity penalty incurred by the precoding scheme (2.16). When channel estimation error occurs, the compromising choice of𝛿 between the channel estimation accuracy and achievable capacity is investigated in the simulation section.

2. The capacity (3.1) obtained under the perfect channel knowledge assumption can be regarded as a yardstick upper bound for the realistic situation when channel error occurs. In this sense, the optimal noise-resistant precoder (2.16) leads to the worst-case benchmark capacity when SNR is high. Hence, if one adopts the precoder (2.16) for improving the channel estimation accuracy, there could be a substantial loss in the achievable system capacity. 3. The acquisition of exact channel knowledge via the blind

technique shown in Section III calls for an infinitely long coherent interval, during which the received covariance matrix Ry in (2.6) can be estimated, e.g., via the time average, without errors. In the finite coherent interval case, we can only obtain a sample covariance matrix by using a finite number of data blocks, and the resultant channel estimate will be no longer exact. If a large portion of the source symbol blocks within the coherent interval are precoded by (2.16), the finite sample error in estimating Ry is relatively small, and a more accurate channel estimate can be obtained. However, this would come at the expense of reduced capacity since (2.16) is the worst-case choice regarding the benchmark capacity performance. On the other hand, if a small fraction of the coherent interval is spent for precoding, the quality of the channel estimate could be quite poor, though the capacity penalty due to precoding can be limited. The op-timal tradeoff between channel estimation accuracy and the achievable system capacity regarding the precoding interval selection is elaborated on next.

IV. CAPACITYMEASURE OVERFINITECOHERENT

INTERVALS

In the sequel we focus on the block fading environment, in which the channel remains constant over some interval of𝑇 symbol block periods, after which it changes independently to another value that it holds for the next interval𝑇 , and so on. In

(5)

the training based counterpart, the coherent interval is typically divided into two phases, one for placing training pilots and the other for carrying data symbols, and the capacity-optimal training period has been addressed in [1], [12], [28] for various scenarios. Motivated by these works and to study the optimal tradeoff problem in the considered precoding based blind estimation setup, we thus assume that, during each coherent interval, only the initial1 < 𝑇𝑝≤ 𝑇 source symbol blocks are

precoded by (2.16) to facilitate channel estimation. The signal model within a coherent interval of 𝑇 blocks can then be described in a two-phase form as4 _{(4.1), shown at the bottom}

of this page. The𝑇𝑝 received blocks in the precoding phase

are used to form the sample covariance matrix ˆ R𝑦= _𝑇1 𝑝 𝑇𝑝 ∑ 𝑘=1 y𝑘y𝐻𝑘 (4.2)

for channel estimation via the blind technique shown in Section III; the remaining𝑇 − 𝑇𝑝time slots are left for direct

data transmission to boost capacity. To seek for the optimal 𝑇𝑝 which attains the optimal tradeoff between blind channel

estimation accuracy and the achievable capacity, we need a capacity metric that can explicitly take the channel mismatch effect (or imperfect estimation of ˆRy) into account. This is the main focus of this section.

A. Capacity Measure

For a given channel estimate ˆh, and hence the associated

channel matrix ˆG, let us follow the idea of [1], [12], [28] to

rewrite the system (2.1) by treating the estimated ˆG as the

known channel matrix, and relegating the channel mismatch into the noise component so that

y𝑘 = GPs𝑘+ v𝑘 (4.3)

= ˆGPs𝑘+ ˜GPs 𝑘+ v𝑘

:=˜v𝑘

, where ˜G = G − ˆG.

We note that, while in (2.1) the channel remains unknown, the channel matrix ˆG in (4.3) is otherwise known to the

receiver. Also, although the additive noisev𝑘 in (2.1) is white

Gaussian, the effective noise ˜v𝑘 in (4.3), which depends also

on the channel estimation error ˜G, could be neither white

nor Gaussian. To characterize the capacity performance of the channel (4.3), one needs to specify the covariance of the effective noise˜v𝑘. For this we first observe that, since ˆRy in (4.2) is unbiased, the computed outer-product ˆhˆh𝐻 _{as a linear}

function of ˆRy (cf. (2.10)) is also an unbiased estimator of ˆhˆh𝐻_{. However, the resultant channel estimate ˆ}_{h, obtained as}

the dominant eigenvector of ˆhˆh𝐻_{, is an unbiased estimator of}

the trueh only when 𝑇𝑝→ ∞ [2], [15], [24]. For a finite 𝑇𝑝,

the statistical property of ˆh is quite difficult to characterize

[24]; exact expressions for the bias term 𝐸{ˆh − h}, and 4_{Without loss of generality we consider the initial coherent interval to}

simplify notation.

consequently the covariance of _˜v_𝑘, are thus intractable. To facilitate analysis in the finite-sample case, we propose to adopt the approach similar to [21] and [32], in which 𝑇𝑝

is assumed to be large so that the unbiased-ness condition 𝐸{ˆh − h} = 0 (or 𝐸{ ˜G} = 0) is deemed to hold from the

first-order perturbation perspective5_{. Under this assumption}

and by using a similar technique as in [1], [12], [28], a capacity lower bound for the channel (4.3) is derived in the next lemma (see Appendix B for a proof).

Lemma 4.1: Let 𝐼(y𝑘; s𝑘∣ ˆG) = max 𝐼(y𝑘; s𝑘∣ ˆG), namely,

the maximal mutual information (over the source distributions) between the source and received signals for a fixed channel estimate ˆG. Then the following inequality holds for a large 𝑇𝑝: 𝐼(y𝑘; s𝑘∣ ˆG) ≥ log det { I +[𝐸{GP˜ 2_G_˜𝐻}_{+ 𝜎}2 𝑣I ]−1 ˆ GP2_G_ˆ𝐻}_{. (4.4)}

The lower bound in (4.4) is particularly appealing, as it is a function of the channel mismatch ˜G = G − ˆG and can

therefore serve as a capacity measure when channel estimation error occurs. To address the optimal tradeoff based on the capacity lower bound in (4.4), the first task is to find an explicit expression of𝐸{ ˜GP2_G˜𝐻_{} in terms of the precoding interval}

𝑇𝑝. This is done in the next subsection.

B. Covariance of the Channel Estimation Error: A Perturba-tion Analysis

To proceed, we note from (2.4) that the circulant nature of ˜

G again yields

˜

G =[˜g J˜g ⋅ ⋅ ⋅ J𝑁−2_{˜g J}𝑁−1_˜g]_, _(4.5)

in which J is defined in (2.5) and ˜g = [˜h𝐻 _{0 ⋅ ⋅ ⋅ 0}]𝑇

is the zero-padded channel estimation error vector with ˜h =

ˆh − h. Since ˜g = J2˜h, where J2 is defined in (2.9), based on

(4.5) it is straightforward to verify 𝐸{ ˜GP2_G˜𝐻}₌𝑁−1∑

𝑛=0

𝑝(𝑛)2_J𝑛_J

2𝐸{˜h˜h𝐻}J𝑇2(J𝑇)𝑛. (4.6)

Equation (4.6) shows that 𝐸{ ˜GP2_G˜𝐻_{} is completely}

deter-mined by 𝐸{˜h˜h𝐻_{}, namely, the covariance of the channel}

estimation error. Hence it remains to find a closed-form expression of 𝐸{˜h˜h𝐻_{} in terms of 𝑇}

𝑝. For this let us recall

from Section III-A that, if the perfect covariance matrix Ry can be obtained, the exact channel h is identified via a

”two-step” approach: first compute the rank-one outer-product matrix (2.11) followed by an eigen-decomposition for finding 5_{Through our simulation study the normalized average bias per channel}

tap, namely, 𝐸∥{ˆh − h}∥2_/[_𝐸{∥h∥2_{}(𝐿 + 1)}]_{, is below -75 dB for a}

wide SNR range even if𝑇𝑝is as small as𝑇𝑝= 100. Hence, in the

finite-sample case,𝐸{ˆh − h} = 0 is a plausible assumption, and the lower bound

(4.4) can be a valid capacity measure. {

y𝑘= GPs𝑘+ v𝑘, 1 ≤ 𝑘 ≤ 𝑇𝑝, (precoding & channel acquisition phase)

(6)

the associated dominant eigenvector. When only a sample covariance matrix ˆRy as in (4.1) is available, the channel mismatch ˜h is entirely caused by the deviation of ˆRy from the trueRy. Although the outer-product ˆhˆh𝐻 is linear in the entries of ˆRy(see (2.10)), the eigen-decomposition procedure on ˆhˆh𝐻_{, however, will render the exact expression for ˆ}_{h in}

terms of ˜Ry = ˆRy − Ry intractable. By assuming 𝑇𝑝 to

be sufficiently large so that the deviation ˜Ry is small, we can nonetheless leverage the matrix perturbation technique [3], [21], [32] to find one approximate, yet tractable, such expression; the result will further enable us to derive a formula of𝐸{˜h˜h𝐻_{}, and hence 𝐸{ ˜}_GP2_G˜𝐻_{}, in terms of 𝑇}_𝑝_{. More}

specifically, through first-order perturbation analysis we have the following linear relation between ˜h and ˜Ry(see Appendix C for a derivation) ˜h = Avec(˜Ry), (4.7) where A :=_∥h∥1 ₂ΣhΣ𝐻h 𝐿 ∑ 𝑖=0 ℎ(𝑖)e𝑇 𝑖 ⊗ I𝐿+1( ˜Q𝑇Q)˜ −1Q˜𝑇,

in which e𝑖 denotes the 𝑖th column of I𝐿+1 and Σh ∈ ℂ(𝐿+1)×𝐿 _{is a matrix whose columns are orthonormal and}

span the𝐿-dimensional subspace orthogonal to h. Based on (4.7), we have the following lemma (see also Appendix C for a proof).

Lemma 4.2: Assume that the source symbols are drawn from a complex constellation with a finite forth-order cumulant𝜅4s. Then we have 𝐸{˜h˜h𝐻}₌ 1 𝑇𝑝A { 𝜅4s(G∗P∗□GP)(G∗P∗□GP)𝐻 + R∗ y⊗ Ry}A𝐻, (4.8) where the matrixA is defined in (4.7).

Combining (4.6) and (4.8) leads to the crucial relation (4.9), shown at the bottom of this page. Equation (4.9) specifies the weighted channel error covariance matrix in terms of the precoding interval𝑇𝑝. In particular, as𝑇𝑝 increases, the error

covariance will decay at the rate1/𝑇𝑝.

C. Capacity Lower Bound

With (4.4) and (4.9), the ergodic capacity lower bound (per block transmission) during the precoding phase is thus

𝐼_𝑝(𝑇𝑝) = 𝐸 { log det[I +[R𝑒/𝑇𝑝+ 𝜎𝑣2I ]₋₁ _ˆ GP2_G_ˆ𝐻]}_; (4.10) similarly one such bound for the direct data transmission phase can be accordingly obtained from (4.10) by settingP = I:

𝐼_𝑑(𝑇𝑝) = 𝐸 { log det[I +[R𝑒/𝑇𝑝+ 𝜎𝑣2I]−1G ˆˆG𝐻 ]} . (4.11)

Based on (4.10) and (4.11), the average ergodic capacity lower bound over the entire𝑇 symbol periods is given by

𝐼(𝑇𝑝) = 𝑇_𝑇𝑝𝐼𝑝(𝑇𝑝) +(𝑇 − 𝑇_𝑇 𝑝)𝐼𝑑(𝑇𝑝) = 𝑇_𝑇𝑝𝐸{log det[I +[R𝑒/𝑇𝑝+ 𝜎𝑣2I ]−1_GP_ˆ ₂_G_ˆ_𝐻]} +(𝑇 − 𝑇_𝑇 𝑝)𝐸{log det[I +[R𝑒/𝑇𝑝+ 𝜎𝑣2I ]₋₁ _ˆ G ˆG𝐻]}_. (4.12) The problem of selecting 𝑇𝑝 toward maximizing 𝐼(𝑇𝑝) is

addressed in the next section.

We note that the capacity measure (4.12) is a function of not only the precoding interval 𝑇𝑝 but also the precoder

coefficients 𝑝(𝑛) (in particular, the power threshold 𝛿 if the two-level precoder (2.16) is used). Hence, true capacity maximization should be done based on joint optimization over both 𝑇𝑝 and 𝛿. However, as one can see from (4.9),

the channel error covariance R𝑒 involves the multiplication

of column-wise Kronecker products of the precoding matrix and is therefore a function of 𝑝(𝑛)2_{𝑝(𝑖)𝑝(𝑗)𝑝(𝑘)𝑝(𝑚). This}

shows that the capacity metric (4.12) is highly nonlinear in terms of𝛿, and joint optimization 𝐼(𝑇𝑝) of over both 𝑇𝑝 and

𝛿 appears completely intractable. To overcome this difficulty, a reasonable suboptimal approach as we shall adopt in the sequel is to determine the best𝑇𝑝 under a fixed𝛿 (as we will

see later, even for such a suboptimal scheme the analysis turns out to be totally nontrivial).

V. SELECTION OFPRECODINGINTERVAL

To facilitate subsequent analysis and discussions, let us define

𝜏𝑝= 𝑇𝑝/𝑇, 0 < 𝜏𝑝≤ 1, (5.1)

to be the relative time fraction of the precoding phase nor-malized with respect to the coherent interval𝑇 . We can thus alternatively express the capacity lower bound (4.12) in terms of𝜏𝑝 as 𝐼(𝜏𝑝) = 𝜏𝑝𝐸 { log det[I +[R𝑒/(𝜏𝑝𝑇 ) + 𝜎𝑣2I ]₋₁ _ˆ GP2_G_ˆ𝐻]} + (1 − 𝜏𝑝)𝐸 { log det[I +[R𝑒/(𝜏𝑝𝑇 ) + 𝜎𝑣2I]−1G ˆˆG𝐻 ]} . (5.2) For a given𝑇 , the optimal tradeoff problem can be formulated as maximizing𝐼(𝜏𝑝) in (5.2) with respect to all 0 < 𝜏𝑝≤ 1.

We shall note that, since both𝑇𝑝 and𝑇 are positive integers,

0 < 𝜏𝑝 ≤ 1 is a rational number. To ease analysis 𝜏𝑝 is

relaxed to be a positive real number; once the best such 𝜏𝑝 is found, the corresponding 𝑇𝑝 (though suboptimal) can

be determined as the lower integer floor of 𝜏𝑝𝑇 . An exact

closed-form solution to the considered optimization problem,

𝐸{ ˜GP2_G˜𝐻}₌ 1 𝑇𝑝 𝑁−1_∑ 𝑛=0 𝑝(𝑛)2_J𝑛_J 2A{𝜅4s(G∗P∗□GP)(G∗P∗□GP)𝐻+ R∗y⊗ Ry}A𝐻J𝑇2(J𝑇)𝑛. :=R𝑒 (4.9)

(7)

however, is formidable to derive since the objective function (5.2) is highly nonlinear in𝜏𝑝. Even though one can instead

resort to numerical simulation to search for a solution, in what follows we shall aim to analytically characterize the optimal 𝜏𝑝. Specifically, we will first derive an approximate, but more

tractable, expression of the capacity lower bound in (5.2). The result will lead to very simple, and insightful, procedures for determining a closed-form estimate of the optimal𝜏𝑝.

A. Approximate Capacity Lower Bound Expression

The proposed approach is based on the key result shown in the next lemma (see Appendix D for a proof).

Lemma 5.1: Let𝐼(𝜏𝑝) be defined in (5.2). Assuming that SNR

is high, so that𝜎2

𝑣 is small, and6 I/𝑇 ≪ 𝜎2𝑣𝜏𝑝R−1𝑒 , we have

the following approximation

𝐼(𝜏𝑝) ≈ 𝐸{log det[I + 𝜎𝑣−2GG𝐻 ]} +_{𝑇 𝜎}1₄ 𝑣𝐸 { Tr[(I + 𝜎−2 𝑣 GG𝐻 )−1 R𝑒GG𝐻 − (I + 𝜎−2 𝑣 GP2G𝐻 )−1_R 𝑒GP2G𝐻 ]} − 𝑓(𝜏𝑝), (5.3) where 𝑓(𝜏𝑝) := 𝛼𝜏𝑝+_𝜏𝛽 𝑝 (5.4) with 𝛼 := 𝐸{log det[I + 𝜎−2 𝑣 GG𝐻 ]} − 𝐸{log det[I + 𝜎−2 𝑣 GP2G𝐻 ]} (5.5) and 𝛽 := 1 𝑇 𝜎4 𝑣𝐸 { Tr[(_{I + 𝜎}−2 𝑣 GG𝐻 )₋₁ R𝑒GG𝐻 ]} . (5.6) Several important comments are in order.

1. Lemma 5.1 is quite appealing in that the dependency of 𝐼(𝜏𝑝) on the design parameter 𝜏𝑝is completely

character-ized by 𝑓(𝜏𝑝) defined in (5.4). Compared with 𝐼(𝜏𝑝) in

(5.2),𝑓(𝜏𝑝) is a simple rational function in 𝜏𝑝; it is such

an attractive feature that can facilitate analytical study of the performance tradeoff, as will be shown later. We also note that, even though the approximation (5.3) is derived based on the high-SNR assumption, the proposed analytic estimate based on (5.3) can well predict the true optimal solution, as well as the resultant capacity performance, over a wide SNR region (this will be seen in the simulation section).

2. We observe that𝛼 in (5.5) represents the capacity gap of two Gaussian channels characterized by, respectively, the channel matricesG and GP. Based on the discussions

in Section III we have 𝛼 ≥ 0, with equality attained when P = I. Roughly speaking, we can think of 𝛼 6_By_{A ≪ B ( A and B both Hermitian and positive definite) we mean (i)}

B−A is positive definite and, (ii) if the eigenvalues of A and B are arranged in the same (either increasing or decreasing) order, then𝜆𝑘(A) ≪ 𝜆𝑘(B)

(we note that the positive-definiteness ofB − A does guarantee 𝜆𝑘(A) < 𝜆𝑘(B) [13, p-471]). Hence, if A ≪ B, we have ∥A∥2𝐹 =∑𝑛𝑘=1𝜆2𝑘(A) ≪

∑_𝑛

𝑘=1𝜆2𝑘(B) = ∥B∥2𝐹and∥A∥22= max_𝑘 𝜆2𝑘(A) ≪ max_𝑘 𝜆2𝑘(B) = ∥B∥22,

which are two commonly used conditions for “A is small when compared withB” in the context of matrix perturbation analysis.

as the worst-case average capacity penalty induced by precoding with 𝑇𝑝 = 𝑇 . In light of this point and since

0 < 𝜏𝑝 ≤ 1, the first term 𝛼𝜏𝑝 in 𝑓(𝜏𝑝) thus reflects

the proportional reduction in the penalty when symbol precoding is performed only over a 𝜏𝑝 fraction of the

coherent time.

3. On the other hand, the quantity 𝛽 in (5.6) accounts for the channel mismatch effect, and can be treated as the minimal capacity loss incurred by channel estimation errors (also attained when 𝑇𝑝 = 𝑇 ). The second term

𝛽𝜏−1

𝑝 in (5.4), therefore, specifies the enlargement of

the capacity penalty beyond this minimum if a mere 𝜏𝑝 portion of the coherent interval is spent for symbol

precoding to aid channel estimation.

4. With the above facts in mind,𝛼𝜏𝑝and𝛽𝜏𝑝−1 thus specify

the capacity loss due to, respectively, precoding over a𝜏𝑝

fraction of the coherent interval and the resultant channel estimation errors. The term𝑓(𝜏𝑝) in (5.4), therefore, can

be deemed as the aggregate capacity penalty incurred by the precoding-based blind channel estimation scheme [31]. With the aid of Lemma 5.1 and the informative interpretations of 𝑓(𝜏𝑝), there is a simple yet insightful

way of finding an analytic estimate of the optimal𝜏𝑝, as

shown below.

B. Optimal Performance Tradeoff: An Analytic Characteriza-tion

Based on (5.3) and (5.4), the first-order derivative of𝐼(𝜏𝑝)

with respect to𝜏𝑝 can be approximately obtained as

𝐼′_(𝜏

𝑝) ≈ −𝑓′(𝜏𝑝) =(𝛽 − 𝛼𝜏𝑝2

) 𝜏−2

𝑝 . (5.7)

With (5.7) it is straightforward to show that the maximum of 𝐼(𝜏𝑝) occurs nearby

¯𝜏𝑝:=√𝛽/𝛼. (5.8)

When𝜏𝑝 <

√

𝛽/𝛼, it is expected from (5.7) that 𝐼′_(𝜏 𝑝) > 0

and the capacity lower bound increases with 𝜏𝑝 . As 𝜏𝑝 is

enlarged beyond √𝛽/𝛼, we have 𝐼′_(𝜏

𝑝) < 0 and 𝐼(𝜏𝑝) will

instead be a decreasing function of 𝜏𝑝. Since we are only

concerned about 0 < 𝜏𝑝 ≤ 1, the selection of 𝜏𝑝 toward the

maximal capacity lower bound depends on whether √𝛽/𝛼 exceeds unity or not.

Case 1: If 1 ≤ √𝛽/𝛼 and hence 𝛼 ≤ 𝛽, i.e., the worst-case

average capacity loss due to precoding is less severe than the minimal capacity penalty caused by channel mismatch,𝐼(𝜏𝑝) is an increasing function in 0 < 𝜏𝑝≤ 1.

This implies that we shall simply set 𝜏𝑝 = 1, i.e.,

to precode the symbols throughout the entire coherent interval, to maximize𝐼(𝜏𝑝). This is intuitively reasonable

sine, as long as the channel mismatch effect is more deleterious, a long precoding interval is needed to reduce the channel estimation errors, and in turn enlarge the total capacity.

Case 2: If 1 > √𝛽/𝛼 and hence 𝛼 > 𝛽, meaning that the

precoding-induced capacity loss is more harmful, in this case placing 𝜏𝑝 at √𝛽/𝛼 can attain the maximal

capacity lower bound. This reflects the fact that, when the precoding effect is more detrimental, symbol

(8)

precoding throughout the entire coherent interval should be avoided. Rather, one should limit the fraction of the precoding phase to√𝛽/𝛼 in order to realize the largest capacity gain.

Based on the above discussions, the proposed closed-form estimate of the capacity-maximizing time fraction for precod-ing is thus

˜𝜏𝑝 = min

{

1, √𝛽/𝛼}, (5.9) where 𝛼 and 𝛽 defined, respectively, in (5.5) and (5.6). We note that both 𝛼 and 𝛽 involve the average over the true channel realizations, and can thus be computed off-line once the channel statistics (e.g., Gaussian) are known. The estimated optimal precoding interval is then given by the lower integer floor of 𝑇 ˜𝜏𝑝. The accuracy of the proposed analytic

estimate (5.9) is assessed through numerical simulations as shown in the next section.

Remark: Lemma 5.1 together with the analytic estimate of the optimal precoding fraction (5.9) allow us to investigate the achievable capacity performance in the asymptotic regime 𝑇 → ∞. Assume that the optimal precoding fraction ˜𝜏𝑝 is

adopted. Then, from (5.3), the difference between the idealized capacity𝐸{log det[I + 𝜎−2

𝑣 GG𝐻

]}

and the capacity lower bound𝐼(𝜏𝑝) when 𝑇 → ∞ can be immediately obtained as

(5.10), shown at the bottom of this page. Also, from (5.9) and by definition of 𝛼 and 𝛽 in (5.5) and (5.6), it is easy to see lim𝑇 →∞˜𝜏𝑝= 0. This result together with (5.10) assert

𝐸{log det[I + 𝜎−2 𝑣 GG𝐻

]}

− 𝐼( ˜𝜏𝑝) ≈ 0 , as 𝑇 → ∞.

(5.11) i.e., the achievable capacity of the considered blind scheme [31] converges to the idealized performance. An intuitive reason for (5.11) is that, when the length of the coherent interval grows without bound(𝑇 → ∞), an arbitrarily small precoding fraction ˜𝜏𝑝 can provide a sufficient amount of

precoded data blocks for obtaining a quite accurate channel estimate. As a result, the capacity penalty caused by both precoding and channel mismatch can be kept arbitrarily small if𝑇 → ∞. Our simulation results (see Section VI-C) confirm this tendency.

C. On Selection of the Threshold𝛿

As mentioned in the last paragraph of Section IV, the capacity lower bound (4.12) is a highly nonlinear function of the power threshold𝛿; the effect of different 𝛿 on the capacity performance is thus extremely difficult to characterize. To provide some guidelines about the selection of𝛿 and to also

keep the analysis tractable, a plausible approach is to focus on some special case in which the dependence of the lower bound (4.12) on 𝜏𝑝 (or 𝑇𝑝) is suppressed. Specifically, we

will consider the situation that 𝜏𝑝 = 1 (or 𝑇𝑝 = 𝑇 ), i.e., the

entire coherent interval is used for precoding; based on our simulation such a scenario typically occurs when the coherent interval is not large, and thus accounts for the more realistic mobile environment. Based on (5.3) and with𝜏𝑝= 1, the gap

between the idealized capacity 𝐸{log det[I + 𝜎−2 𝑣 GG𝐻

]} and the lower bound 𝐼(1) can be approximated as (5.12) shown at the bottom of this page (see Appendix E for a derivation). A very rough interpretation of (5.12) is that, when the entire coherent interval7 _{is dedicated to precoding, the}

channel estimation error can be mitigated, and the performance gap is mainly caused by the capacity loss due to precoding. In this case, the achievable capacity can be enhanced if the negative effect due to precoding can be reduced. In Section III-A it has been shown that 𝐸{log det[I + 𝜎−2

𝑣 GP2G𝐻 ]} ≤ 𝐸{log det[I + 𝜎−2 𝑣 GG𝐻 ]}

, with equality attained by the uniform precoding scheme𝑝(𝑛) = 1, 0 ≤ 𝑛 ≤ 𝑁 − 1. Hence, to limit the capacity loss when the two-level precoder (2.16) is used, we shall enlarge the power threshold 𝛿 so that the resultant power pattern is close to being uniform; however, the selection𝛿 = 1, which results in uniform power allocation, should be precluded since this will render the channel uniden-tifiable (cf. footnote 3). Hence, small𝛿 should be disregarded in the algorithm implementation. We shall note that the exact 𝛿 that can yield the maximal capacity remains difficult to characterize even in the considered special case; the best𝛿 is a highly complicated function of several system parameters, e.g., length of symbol block𝑁, coherent interval 𝑇 , and SNR. Through simulation a rule of thumb is0.6 ≤ 𝛿 ≤ 0.9 (𝛿 ≥ 0.9 usually leads to poor channel estimation error, and, thus, is not plausible from the equalization perspective, cf., [31]).

VI. SIMULATIONRESULTS

This section provides numerical results for corroborating the proposed analytic guidelines for𝜏𝑝selection. In each coherent

interval the channel taps are drawn from i.i.d. complex Gaus-sian random variables with zero mean and unit-variance. As in [31] the system parameters are likewise set as𝑁 = 32, 𝐿 = 8 (the length of CP is the same as the channel order 𝐿), the symbol constellation is QPSK, and the optimal noise-resistant precoder (2.16) is used for channel estimation. To remove the scalar ambiguity we perform a least-squares fit between the computed dominant eigenvector and the true channel; such 7_{Since (5.12) is derived on the basis of Lemma 5.1, the coherent interval}

is implicitly assumed to be not too small, say,𝑇 > 100, so that the channel

estimate is reasonably accurate and perturbation analysis is valid (cf. the discussions before Lemma 4.1 and footnote 5).

𝐸{log det[I + 𝜎−2 𝑣 GG𝐻 ]} − 𝐼( ˜𝜏𝑝) ≈ ˜𝜏𝑝[𝐸{log det[I + 𝜎−2𝑣 GG𝐻 ]} − 𝐸{log det[I + 𝜎−2 𝑣 GP2G𝐻 ]}] (5.10) 𝐸{log det[I + 𝜎−2 𝑣 GG𝐻 ]} − 𝐼(1) ≈ 𝐸{log det[I + 𝜎−2 𝑣 GG𝐻 ]} − 𝐸{log det[I + 𝜎−2 𝑣 GP2G𝐻 ]} (5.12)

(9)

TABLE I

THE ESTIMATED AND OPTIMAL PRECODING INTERVALS(SNR = 25DB).

𝑇 = 200 𝑇 = 2000 𝑇 = 8000 𝑇 = 20000 √ 𝛽/𝛼 12.23 3.54 2.84 1.22 Estimated𝜏𝑝by (5.9) 1 1 1 1 Optimal𝜏𝑝by simulation 1 1 1 1 TABLE II

THE ESTIMATED AND OPTIMAL PRECODING INTERVALS(SNR =0DB).

𝑇 = 200 𝑇 = 2000 𝑇 = 8000 𝑇 = 20000 √ 𝛽/𝛼 1.22 0.55 0.32 0.21 Estimated𝜏𝑝by (5.9) 1 0.55 0.32 0.21 Optimal𝜏𝑝by simulation 1 0.60 0.35 0.25 0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 τ_p

capacity lower bound

T=200 0.2 0.4 0.6 0.8 1 2 2.5 3 3.5 4 τ_p

T=2000 0.2 0.4 0.6 0.8 1 3 3.5 4 4.5 5 τ_p

T=8000 0.2 0.4 0.6 0.8 1 4 4.5 5 5.5 τ_p

T=20000

Fig. 1. Capacity lower bounds for different coherent intervals (SNR = 25 dB). 0.2 0.4 0.6 0.8 0.2 0.3 0.4 0.5 0.6 τ_p

T=200 0.2 0.4 0.6 0.8 0.55 0.6 0.65 τ_p

T=2000 0.2 0.4 0.6 0.8 0.66 0.67 0.68 0.69 0.7 τ_p

T=8000 0.2 0.4 0.6 0.8 0.69 0.695 0.7 0.705 τ_p

T=20000

Fig. 2. Capacity lower bounds for different coherent intervals (SNR = 0 dB).

a technique has been adopted in, e.g., [9], [16], [17], for fixing the channel estimate. The capacity lower bounds in all figures are plotted in the unit of bits per channel use, that is,𝐼(𝜏𝑝)/(𝑁 + 𝐿). In Simulations A∼C the threshold of the

precoder (2.16) is set to be𝛿 = 0.9.

A. Precoding Interval Selection in the High SNR Regime We set SNR = 25 dB, and consider four different cases of coherent intervals: 𝑇 = 200, 2000, 8000, 20000. Associated

0 5 10 15 20 25 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 SNR (dB)

Achievable Capacity (bits/channel use)

benchmark T=20000 (simulated) T=20000 (predicted) T=8000 (simulated) T=8000 (predicted) T=2000 (simulated) T=20000 (predicted) T=200 (simulated) T=200 (predicted)

Fig. 3. Achievable capacity lower bounds versus SNR for different coherent intervals.

with each 𝑇 the value of √𝛽/𝛼 are computed and listed in Table I. As we can see from the table, √𝛽/𝛼 exceeds unity for all 𝑇 . Hence, based on (5.9) 𝜏𝑝 = 1 is expected

to maximize the capacity lower bound. Figure 1 plots the experimental 𝐼(𝜏𝑝) (computed based on (5.2) via averaging

the results of Monte-Carlo trials) with respect to the four choices of𝑇 . All the curves of 𝐼(𝜏𝑝)are seen to increase with

𝜏𝑝, and attain the maximum at 𝜏𝑝 = 1: This thus confirms

the analytical study. A rough interpretation of the observed tendency of 𝐼(𝜏𝑝) is that, when SNR is high, the effective

background noise in the system (4.3) is dominated by channel estimation errors, and the achievable capacity advantage can be realized by mitigating channel uncertainty through a long precoding period.

B. Precoding Interval Selection in the Low SNR Regime We repeat the above experiment by instead setting SNR = 0 dB. The computed√𝛽/𝛼 with respect to the four different 𝑇 are listed in Table II. The results show that √𝛽/𝛼 > 1 when𝑇 = 200; as 𝑇 increases, the value of√𝛽/𝛼 gradually falls below unity. Toward the maximal capacity bound, (5.9) implies that we shall thus set 𝜏𝑝 = 1 for 𝑇 = 200, and

place 𝜏𝑝 at √𝛽/𝛼 for the other three 𝑇 . Figure 2 plots the

experimental 𝐼(𝜏𝑝), based on which the true optimal 𝜏𝑝 for

(10)

0 5 10 15 20 25 1 1.5 2 2.5 3 3.5 4 4.5 SNR (dB)

delta=0.9 delta=0.6 delta=0.1 delta=0.3

Fig. 4. Achievable capacity lower bounds versus SNR for precoding thresholds0.1 ≤ 𝛿 ≤ 0.9. 0 5 10 15 20 25 0.5 1 1.5 2 2.5 3 3.5 4 4.5 SNR (dB)

delta=0.91 delta=0.93 delta=0.96 delta=0.99

Fig. 5. Achievable capacity lower bounds versus SNR for different precoding thresholds0.91 ≤ 𝛿 ≤ 0.99. 0.2 0.4 0.6 0.8 1.8 2 2.2 2.4 2.6 δ

T=200 0.2 0.4 0.6 0.8 2.5 3 3.5 4 δ

T=2000 0.2 0.4 0.6 0.8 3 3.5 4 4.5 5 δ

T=8000 0.2 0.4 0.6 0.8 3.5 4 4.5 5 5.5 δ

T=20000

Fig. 6. Capacity lower bounds for0.1 ≤ 𝛿 ≤ 0.9 with 𝑇𝑝= 𝑇 (SNR = 25

dB).

shows that there is a good agreement between the estimated solutions via (5.9) and the optimal 𝜏𝑝. Even tough a slight

discrepancy remains, as we will see in the next simulation the difference between the resultant capacity performances turns

out to be negligible. A plausible rationale for the capacity tendency seen in Figure 2 is that, when SNR is low and𝑇 is small, the quality of channel estimation is likely to be quite poor and will be the dominant factor for capacity loss. The entire coherent interval should then be dedicated to precoding for improving channel estimation accuracy, and 𝜏𝑝 = 1 thus

maximizes the capacity bound. However, as 𝑇 gets larger, spending too much coherent interval for precoding cannot largely reduce the channel estimation errors (since the error covariance decays only at the rate 1/𝑇𝑝 = 1/𝜏𝑝𝑇 , cf. (4.8)),

but, rather, will enlarge the precoding induced penalty. As a result, 𝜏𝑝 should be kept below unity so that the maximal

capacity bound can be attained. C. Achievable System Capacity

For each considered 𝑇 , the peak capacity lower bounds at different SNR levels are further determined based on, respectively, the simulated 𝐼(𝜏𝑝) and the proposed analytical

solution (5.9). The results are depicted in Figure 3; the idealized performance measure

𝐼0:= _{(𝑁 + 𝐿)}1 𝐸{log det[I + 𝜎−2𝑣 GG𝐻

]}

, (6.1) whereG is the exact channel matrix, is also included as the

benchmark. The figure shows that the analytic solutions via (5.9) do accurately predict the experimental counterparts. Also we can see that, when 𝑇 is small, there is a large capacity gap between the achievable lower bound and the benchmark performance (6.1). The reason is that, for small𝑇 , the channel estimation quality is likely to be poor, even when the entire coherent interval (𝜏𝑝 = 1) is used for achieving the optimal

tradeoff. As a result, there tends to be a large capacity loss due to potentially severe channel estimation errors as well as the use of a large precoding time fraction. As 𝑇 increases, the figure shows that the lower bounds then improve. This is because, as𝑇 is large, a relatively small 𝜏𝑝will suffice to yield

a good channel estimate and achieve the maximal capacity bound (recalling from (5.5) and (5.6) that√𝛽/𝛼 is inversely proportional to √𝑇 and, eventually, ˜𝜏𝑝 =√𝛽/𝛼 < 1 as 𝑇

continuously increases). Hence the degradation due to both channel mismatch and precoding can be limited, resulting in a large average capacity gain. We thus conclude that, when the coherent interval is large, the capacity performance attained by the blind estimation scheme [31] approaches the idealized bound (6.1).

D. Impact of the Precoding Threshold

In the last experiment we first test the achievable capacity lower bounds when different thresholds 𝛿 of the precoder (2.16) are used. For the coherent interval𝑇 = 2000, Figures 4 and 5 show the achievable capacity results for two threshold sets {0.1, 0.3, 0.6, 0.9} and {0.91, 0.93, 0.96, 0.99}. In Figure 4 the best capacity performance is seen to be attained with 𝛿 = 0.9. This reflects the fact that, although a large 𝛿 results in a less accurate channel (see [31, Sec. V]), it can otherwise limit the capacity loss due to pecoding (cf. Discussion 1 in Section III). However, if 𝛿 increases beyond 0.9, severe channel estimation error occurs: this will then dominate the performance and degrades the capacity, as can be seen from

(11)

0.2 0.4 0.6 0.8 0.2 0.3 0.4 0.5 0.6 δ

T=200 0.2 0.4 0.6 0.8 0.2 0.3 0.4 0.5 0.6 δ

T=2000 0.2 0.4 0.6 0.8 0.2 0.3 0.4 0.5 0.6 0.7 δ

T=8000 0.2 0.4 0.6 0.8 0.2 0.3 0.4 0.5 0.6 0.7 δ

T=20000

Fig. 7. Capacity lower bounds for0.1 ≤ 𝛿 ≤ 0.9 with 𝑇𝑝= 𝑇 (SNR = 0

dB). 0.92 0.94 0.96 0.98 0.2 0.3 0.4 0.5 0.6 δ

T=200 0.92 0.94 0.96 0.98 0.55 0.6 0.65 δ

T=2000 0.92 0.94 0.96 0.98 0.66 0.67 0.68 0.69 0.7 δ

T=8000 0.92 0.94 0.96 0.98 0.68 0.685 0.69 0.695 0.7 δ

T=20000

Fig. 8. Capacity lower bounds for0.91 ≤ 𝛿 ≤ 0.99 with 𝑇𝑝= 𝑇 (SNR =

0 dB).

Figure 5. We thus conclude that𝛿 ≈ 0.9 is the compromising choice for 𝑇 = 2000 under the current system setup. Recall that, in the proposed scheme, the threshold 𝛿 is fixed and optimization is conducted only with respect to𝜏𝑝. We go on

to investigate the capacity performance by fixing 𝑇𝑝 = 𝑇

(i.e., the whole coherent interval is used for precoding) and varying the parameter𝛿. For SNR = 25 dB, Figure 6 plots the proposed capacity lower bound as a function of 𝛿 for 𝑇 = 200, 2000, 8000, 20000. Notably it is seen that, for 𝑇 = 200, 𝛿 ≈ 0.6 (but not 𝛿 = 0.9 as used in the previous simulation) attains the maximal capacity. Our further simulation confirms that, for𝑇 = 200, the optimal precoding fraction is𝜏𝑝= 1 for all considered 𝛿. Hence, when SNR is

high and𝑇 is small, it is plausible to just set 𝑇𝑝 = 𝑇 and

choose 𝛿 to maximize the capacity. However, even though the formula of the proposed capacity lower bound (4.12) is somewhat simplified with 𝑇𝑝 = 𝑇 (or 𝜏𝑝 = 1), it is still a

quite involved function of𝛿 and exact characterization of the optimal𝛿 in this scenario remains difficult. For SNR= 0 dB, the capacity results corresponding to the two threshold sets 0.1 ≤ 𝛿 ≤ 0.9 and 0.91 ≤ 𝛿 ≤ 0.99 are, respectively, shown

in Figures 7 and 8. As we can see, for 𝑇 = 200, 𝛿 = 0.9 attains the maximal capacity. When 𝑇 is large (𝑇 = 8000 and 20000), the peak capacity through varying 𝛿 is below 0.7 bits/channel use, and is less than the achievable capacity via optimal precoding fraction design with fixed 𝛿 = 0.9 (above 0.7 bits/channel use, from Fig. 2). Hence, in this case, optimization with respect to𝜏𝑝should be explicitly taken into

account in order to realize the maximal capacity advantage. VII. CONCLUSION

This paper, to the best of our knowledge, is the first contribution in the literature that investigates the capacity performance for wireless communication systems in the blind channel estimation setup. We focus on the CP-based single-carrier system scenario which employs the diagonal-precoding assisted blind channel estimation scheme [31]. When the channel is perfectly known, we show that the optimal noise-resistant two-level precoder proposed in [31] tends to incur the largest capacity penalty. In case that the coherent time is finite and the sample covariance matrix estimation is subject to errors, the optimal tradeoff between channel estimation ac-curacy and the achievable capacity through precoding interval selection is addressed. By leveraging the matrix perturbation techniques, we derived a closed-form capacity metric in the presence of channel mismatch that is a complicated function of the precoding interval. To simplify analysis an associated tractable approximation of the considered capacity metric is also given. The established results facilitate an analytic approach for finding an estimate of the capacity-maximizing precoding interval, and also allow for informative interpre-tations regarding the optimal tradeoff. Computer simulations show that the proposed analytic solution can very well predict the experimental results. Also, it is seen that, for small co-herent intervals (hence a high mobility environment), channel estimation error is the dominant factor and a large precoding fraction is needed, irrespective of the SNR.

APPENDIXA

PROOF OFTHEOREM3.1

The optimization problem considered is Minimize

𝑁−1_∏ 𝑛=0

𝑝(𝑛)2 _(A.1)

subject to the constraints (2.14) and (2.15), or equivalently, Minimize ln (_𝑁−1 ∏ 𝑛=0 𝑝(𝑛)2 ) =𝑁−1∑ 𝑛=0 ln[𝑝(𝑛)2] _(A.2)

subject to the constraints (2.14) and (2.15),

since ln(⋅) is a monotone increasing function. Let us define 𝑞𝑛 := 𝑝(𝑛)2− 𝛿, for 0 ≤ 𝑛 ≤ 𝑁 − 1. Then the optimization

problem (A.2) becomes Minimize 𝑁−1_∑ 𝑛=0 ln [𝑞𝑛+ 𝛿] subject to 𝑁−1_∑ 𝑛=0 𝑞𝑛= 𝑁(1 − 𝛿)