• 沒有找到結果。

The Fading Number of Multiple-Input Multiple-Output Fading Channels With Memory

N/A
N/A
Protected

Academic year: 2021

Share "The Fading Number of Multiple-Input Multiple-Output Fading Channels With Memory"

Copied!
40
0
0

加載中.... (立即查看全文)

全文

(1)

The Fading Number of Multiple-Input

Multiple-Output Fading Channels With Memory

Stefan M. Moser, Member, IEEE

Abstract—The fading number of a general (not necessarily Gaussian) regular multiple-input multiple-output (MIMO) fading channel with arbitrary temporal and spatial memory is derived. The channel is assumed to be noncoherent, i.e., neither receiver nor transmitter have knowledge about the channel state, but they only know the probability law of the fading process. The fading number is the second term in the asymptotic expansion of channel capacity when the signal-to-noise ratio (SNR) tends to infinity. It is related to the border of the high-SNR region with double-logarithmic capacity growth.

It is shown that the fading number can be achieved by an input that is the product of two independent processes: a stationary and circularly symmetric direction- (or unit-) vector process whose dis-tribution is chosen such that the fading number is maximized, and a nonnegative magnitude process that is independent and identi-cally distributed (i.i.d.) and escapes to infinity. Additionally, in the more general context of an arbitrary stationary channel model sat-isfying some weak conditions on the channel law, it is shown that there exists an optimal input distribution that is stationary apart from some edge effects.

Index Terms—Channel capacity, circular symmetry, escaping to infinity, fading number, flat fading, high signal-to-noise ratio (SNR), memory, multiple-input multiple-ouput (MIMO), nonco-herent detection, stationary input distribution.

I. INTRODUCTION

A. General Background

F

UTURE mobile communication systems will have to pro-vide much higher data rates than what currently is avail-able. To be able to design such systems we need to study wire-less communication channels and try to understand how their behavior depends on various parameters like the number of an-tennas at the transmitter and receiver, the available power, feed-back, or the implicitly available memory in the channel. An im-portant parameter that is part of this theoretical understanding is the so-called channel capacity. It describes the ultimate physical limit on the maximum rate for which reliable information trans-mission still is possible. Note that this parameter is theoretical in the sense that we only assume limited power at the transmitter,

Manuscript received April 06, 2007; revised May 02, 2008. Current version published May 20, 2009. This work was supported by the Industrial Technology Research Institute (ITRI), Zhudong, Taiwan, under Contracts G1-95003 and G1-96001 and by the National Science Council (NSC) under Grant NSC 96-2221-E-009-012-MY3. The material in this work was presented in part at the IEEE International Symposium on Information Theory (ISIT), Nice, France, June 2007.

The author is with the Department of Communication Engineering, National Chiao Tung University (NCTU), Hsinchu, Taiwan, (e-mail: stefan.moser@ieee. org).

Communicated by K. Kobayashi, Associate Editor for Shannon Theory. Digital Object Identifier 10.1109/TIT.2009.2018180

but ignore other constraints of real systems like a maximum al-lowed transmission delay or limited computing resources.

In the case of wireless communication channels, the channel capacity is limited due to two main sources of transmission er-rors. First, the receiver introduces thermal noise that can be well modeled by an additive random noise process. Second, because the signals are electromagnetic waves transmitted through air, the received signals suffer from random fluctuations in the mag-nitude and phase. This effect is known as fading and can be de-scribed by a multiplicative random noise process.

While the additive noise can be well approximated by an in-dependent and identically distributed (i.i.d.) complex Gaussian process for almost all channels of interest, the detailed prop-erties of the multiplicative noise depends on many parameters, system-internal and -external, and should therefore be kept as general as possible. Unfortunately, the analysis of the channel capacity in such generality is very difficult so that commonly the model is simplified in certain aspects.

One possible simplification is to assume that the receiver per-fectly knows the fading realizations. This assumption is based on the idea that the transmitter will first transmit some known training symbols from which the receiver learns the current state of the multiplicative noise process. The capacity is then com-puted without taking into account the estimation scheme. It is common to call this the coherent capacity of fading channels. Such an approach will definitely lead to an overly optimistic ca-pacity value because

• even with a large amount of training data the channel knowledge will never be perfect, but only an estimate; and because

• the data rate that is wasted for the training symbols is com-pletely ignored.

In this paper we will not make this simplification, but stick with noncoherent detection where the receiver has no additional knowledge about the channel state. Note that the receiver is free to do anything in its power to gain knowledge about the fading based on the received signals.

Marzetta and Hochwald [2] simplify the noncoherent channel model by assuming that during blocks consisting of several symbol periods the fading remains constant, while the fading coefficients corresponding to different blocks are assumed to be independent. This model is generally known as block fading model. Note that it is pessimistic to assume that the blocks are independent of each other because memory provides additional information about the current fading level which in general will increase capacity. However, it is more problematic to conjec-ture that the fading coefficients are perfectly constant during one block. This means that for high enough signal-to-noise

(2)

Fig. 1. An upper bound on the capacity of a Rician-fading channel as a function of the output SNR = (1 + jdj )SNR for different values of the specular componentd. The dashed line corresponds to the situation of a Rayleigh-fading channel with a zero line-of-sight component d = 0. The dotted line depicts the capacity of an additive Gaussian noise channel (without fading) of equal output SNR, namely, log(1 + ).

ratios (SNR) and for long enough blocks the receiver can get an (almost) perfect estimate of the fading value within a block and use this knowledge to decode the received signal similarly to coherent detection. For larger SNR this seems to be overly op-timistic. Indeed, as shown in [2] for single-input single-output (SISO) Gaussian block fading and in [3] for multiple-input multiple-output (MIMO) Gaussian block fading, the capacity of the block-fading channel grows logarithmically in the SNR at high SNR, i.e., the capacity has the same growth rate as the coherent capacity (and, as a matter of fact, as the capacity of an additive noise channel without fading, too).

In [4], Liang and Veeravalli generalize the SISO Gaussian block fading model by allowing some temporal correlation be-tween the different fading coefficients within one block. They show that the rank of the block correlation matrix is crucial when determining the high-SNR channel capacity: if we have a rank-deficient correlation matrix, the effect of perfect predictability comes into play again similar to the situation of Marzetta and Hochwald [2]. This then again leads to a logarithmic growth of capacity. For a full-rank correlation matrix this is not true any-more. In this case, the channel model reduces to a special case of the more general model described next.

The most general models only restrict the random noise processes to be stationary and ergodic, with additional varia-tions in the exact fading law, the number of antennas, and the memory [5]–[13]. In [5], the authors investigate a memoryless SISO Rayleigh-fading channel and derive some bounds. In [6], it is shown that the capacity-achieving input distribution for

the memoryless SISO Rayleigh-fading channel is discrete. In [7]–[9], the channel model is then generalized to MIMO and to general non-Gaussian fading distributions (possibly with memory) where the fading process is assumed to be regular, i.e., its differential entropy rate is finite. The complementary situation of nonregular fading processes has been studied in [10]–[13].

It turns out that the capacity at high SNR is very sensitive to the exact assumptions of the channel model, in particular to the regularity assumption. If we assume a regular fading process, then the capacity grows only double-logarithmically in the SNR at high SNR [7, Theorem 4.2], [9, Theorem 6.10]. This means that at high power such a channel becomes extremely power-efficient in the sense that whenever the capacity shall be in-creased by only one bit, the SNR needs to be squared or, on a decibel scale, the SNR needs to be doubled! So the high-SNR behavior is dramatically different from the optimistic models mentioned above.

For nonregular Gaussian fading, the high-SNR behavior of capacity depends on the specific power spectral density and can be anything between the logarithmic and the double-logarithmic growth [11].

However, it is interesting to observe that for low SNR, the difference between the different models is relatively small. Indeed, the capacity of regular fading channels usually shows a very distinct turn at a certain SNR level where the growth rate changes from logarithmic to double-logarithmic. As an example, Fig. 1 shows the capacity of a noncoherent

(3)

Ri-cian-fading channel with various values of the line-of-sight component. One clearly sees that the capacity curve, while growing logarithmically at lower SNR, suddenly has a sharp bend at a certain threshold where its growth becomes very slow. Moreover, one sees that this threshold depends strongly on the channel law, i.e., on the line-of-sight component.

We conclude that at lower SNR, the exact choice of the channel model has only a small impact on the capacity analysis, i.e., the described simplifications (even the assumption of coherent detection) are useful in that regime. However, at high SNR, many simplifications seem to lose their validity. Based on this observation we immediately ask ourselves whether we can say something about the separation between these two regimes. Particularly, in the situation of a regular fading model, we would like to know more about the threshold between the effi-cient low- to medium-SNR regime, where the capacity grows logarithmically in the SNR and the highly inefficient high-SNR regime with a double-logarithmic growth. The dependence of this threshold on some system parameters like the number of antennas, the memory in the channel, or the availability of feedback might give valuable insight in good design criteria of wireless and mobile communication systems.

B. The Fading Number

In an attempt to more precisely quantify the mentioned threshold between the power-efficient and the power-inefficient regime, [7, Sec. IV.C] and [9, Sec. 6.5.2], define the fading number as the second term in the high-SNR asymptotic expansion of capacity, i.e., at high SNR, the channel capacity can be expressed as

(1) Here, denotes some terms that tend to zero as .

Based on (1), we define the high-SNR regime to be the re-gion where the -terms in (1) are negligible, i.e., we say that a wireless communication system operates in the inefficient high-SNR regime if its capacity can be well approximated by

(2) The important point to notice is that due to the extremely slow growth of , the fading number is usually the dominant term in the lower range of the high-SNR regime.

In other words, is only much larger

than for extremely large values of SNR. An illustration of this behavior is given in Fig. 2.

The fading number is therefore strongly connected to the point where the bend of the capacity curve occurs. As an ex-ample consider the following situation [13], [14]: assume for the moment that the threshold lies somewhere between 30 and 80 dB (it can be shown that this is a reasonable assump-tion for many channels that are encountered in practice). In this case, the threshold capacity must be somewhere in the following interval:

30 dB

80 dB (3)

Fig. 2. Illustration of the different regimes of a typical regular fading channel. At low SNR, theo(1) terms are dominant, in the lower range of the high-SNR regime , the fading number is dominant, and only at very high SNR, the log(1 + log(1 + SNR)) term takes the lead.

i.e.,

2.1 nats 3 nats (4)

Hence, even though we have assumed a wide range from 30 to 80 dB, the capacity changes only very little (this is because the -term is growing extremely slowly). Hence, we get the following rule of thumb.

Conjecture 1 ([13], [14]): A communication system over a noncoherent regular fading channel1that operates at rates appre-ciably above 2 nats is in the high-SNR regime and therefore extremely power-inefficient.

The fading number can therefore be regarded as quality at-tribute of the channel: the larger the fading number is, the higher is the maximum rate at which the channel can be used without being extremely power-inefficient. It follows from this obser-vation that a good system design will aim at achieving a large fading number.

The rest of this paper will concentrate on the analysis of the fading number of general MIMO fading channels with memory. So far explicit expressions for the fading number are known in the special situation of general SISO fading channels with memory2[7, Theorem 4.41], [9, Theorem 6.41]:

(5) and of general single-input multiple-output (SIMO) fading channels with memory [8, Theorem 1], [9, Theorem 6.44]

(6) Here denotes the memoryless SIMO fading number with partial side information at the receiver [7, Note 4.31], [9, eq. (6.194)]

(7)

1For more details about the exact assumptions made in this paper we refer to

Section III.

(4)

The fading number of the multiple-input single-output (MISO) fading channel has only been derived for the memory-less case [7, Theorem 4.27], [9, Theorem 6.27]

(8) This fading number is achievable by inputs that can be ex-pressed as the product of a constant unit vector in and a circularly symmetric, scalar, complex random variable of the same law that achieves the memoryless SISO fading number [7]. Hence, the asymptotic capacity of a memoryless MISO fading channel is achieved by beam-forming where the beam-direction is chosen not to maximize the SNR, but the fading number.

For MISO fading with memory, some bounds have been found [15]–[17]

(9) and

(10) The MIMO case has been solved recently in the memoryless situation [18]

(11) This paper generalizes these special cases to the most general situation of MIMO fading channels with memory and specifies the fading number exactly. We remark that the proofs are based on several new preliminary results that are interesting by them-selves. In particular, we prove a theorem which states that the optimal input to a stationary channel may be assumed to be sta-tionary.

C. Outline

The rest of this paper is organized as follows: after a section about notation we will introduce the channel model in detail in Section III. In Section IV, some preliminary results will be given. In particular, we will present there a new theorem which states that—apart from edge effects and some weak conditions on the channel model—a stationary channel model has a ca-pacity-achieving input distribution that is stationary. The corre-sponding proofs are found in the Appendices.

Section V then presents the main result, i.e., the fading number of a general MIMO fading channel with memory. We will give an outline of the proof there. The details can be found in Appendices D to I.

In Section VI, we will consider some interesting special cases, in particular, the fading number of MISO fading with memory which has been unknown so far. Note that in parallel

to this paper a second publication [19] will treat the important special case of Gaussian MIMO fading channels in detail.

We conclude in Section VII. II. NOTATION

As is by now fairly customary, we usually try to use upper case letters for random quantities and lower case letters for their realizations. This rule becomes awkward when dealing with ma-trices because mama-trices are usually written in upper case even if they are deterministic. To better differentiate between scalars, vectors, and matrices, we have resorted to using different fonts for the different quantities. Upper case letters such as are used to denote scalar random variables taking value in the reals or in the complex plane . Their realizations are typically written in lower case, e.g., . Random vectors in the -dimensional complex Euclidean space are described by bold-face capi-tals, e.g., ; for their realizations we use bold lower case, e.g., . Deterministic matrices are denoted by upper case letters but of a special font, e.g., ; and random matrices are denoted using another special upper case font, e.g., .

However, there will be a few exceptions to these rules. Since they are widely used in the literature, we will stick with the common customary shape of the entropy of a discrete random variable and of the mutual information functional . Moreover, we have decided to use the capital to denote the probability distribution of an input of a channel. In particular, and denote the probability distribution of a random variable and random vector , respectively. Given an al-phabet we denote the set of all probability distributions over

by .

The capacity is denoted by , the energy per symbol by , and the signal-to-noise ratio is denoted by .

We use the shorthand for . For more

complicated expressions, such as

we use the dummy variable to clarify notation: . The subscript is reserved to denote discrete time. Curly brackets are used to distinguish between a random process and its manifestation at time : is a discrete random process over time, while is the random variable of this process at time .

Hermitian conjugation is denoted by , and stands for the transpose (without conjugation) of a matrix or vector. We use to denote the Euclidean norm of vectors or the Euclidean operator norm of matrices. That is

(12) (13) Thus, is the maximal singular value of the matrix .

The Frobenius norm of matrices is denoted by and is given by the square root of the sum of the squared magnitudes of the elements of the matrix, i.e.,

(5)

where denotes the trace of a matrix. Note that for every matrix

(15) as can be verified by upper-bounding the squared magnitude of each of the components of using the Cauchy–Schwarz inequality.

We will often split a complex vector up into its magnitude and its direction

(16) where we reserve this notation exclusively for unit vectors, i.e., throughout the paper every vector carrying a hat, or , denotes a (deterministic or random, respectively) vector of unit length

(17) To be able to work with such direction vectors we shall need a differential entropy-like quantity for random vectors that take value on the unit sphere in . Note that with respect to a prob-ability distribution over , the surface of the unit sphere in has zero measure such that the corresponding differential entropy is undefined. We therefore introduce a new probability space that only lives on the surface of the unit sphere in and denote its measure by . If a random vector takes value in the unit sphere and has the density with respect to , then we shall let

(18) if the expectation is defined.

We note that just as ordinary differential entropy is invariant under translation, so is invariant under rotation. That is, if is a deterministic unitary matrix, then

(19) Also note that is maximized if is uniformly distributed on the unit sphere, in which case

(20) where denotes the surface area of the unit sphere in

(21) The definition (18) can be easily extended to conditional en-tropies: if is some random vector, and if conditional on

the random vector has density , then we can de-fine

(22) and we can define as the expectation (with respect

to ) of .

Based on these definitions we have the following lemma. Lemma 2: Let be a complex random vector taking value in and having differential entropy . Let denote its norm and denotes its direction as in (16). Then

(23)

(24) whenever all the quantities in (23) and (24), respectively, are defined. Here is the differential entropy of when viewed as a real (scalar) random variable.

Proof: This lemma follows from a change of variables. Let denote the real random vector in that consists of the real and imaginary parts of stacked on top of each other.

Then we define

and (25)

and note that the infinitesimal volume in the -dimen-sional Euclidean space corresponds to where denotes an infinitesimal area on the unit sphere in . Hence, the joint probability densities can be written as

(26) (27)

The result now follows from .

We shall write if is a circularly sym-metric, zero-mean, complex Gaussian random vector of

covari-ance matrix . By we

denote a random variable that is uniformly distributed on the in-terval .

Throughout the paper, denotes a complex random process that is i.i.d. according to a uniform distribution over the unit circle

i.i.d. uniform on (28)

When it appears in formulas with other random variables or pro-cesses, is always to be understood as being independent of these other processes.

All rates specified in this paper are in nats per channel use, i.e., denotes the natural logarithmic function. The ab-breviation RHS stands for right-hand side and LHS stands for left-hand side.

III. THECHANNELMODEL

We consider a channel with transmit antennas and receive antennas whose time- output is given by

(29) Here denotes the time- channel input vector; the random matrix denotes the time- fading ma-trix; and the random vector denotes additive noise. We assume that the fading process and the additive noise process are independent and of a joint law that does not depend on the channel input .

The random vector process is assumed to be a spatially and temporally white, zero-mean, circularly symmetric, com-plex Gaussian random process, i.e., is temporally i.i.d.

for some . Here denotes the identity matrix.

(6)

As for the multivariate fading process , we shall only assume that it is stationary, ergodic, of finite second moment

(30) and of finite differential entropy rate

(31) (the regularity assumption). Hence, the components of are in general correlated and depend on the past. Moreover, note that we do not necessarily assume that is Gaussian, but allow any distribution that satisfies the above assumptions, i.e., that is stationary, ergodic, regular, and of finite second moment. The important special case of Gaussian fading is analyzed in more detail in a separate publication [19].

We would like to briefly comment on these assumptions. The assumption of stationarity reflects our lack of knowledge about the exact dependence of the fading law on time. Obviously, we cannot assume stationarity for all time as the fading law will change drastically if, e.g., we move from an urban to a rural area. However, in a certain setting and for a reasonable time period, stationarity seems a natural choice. Note that the block fading model [2] is not stationary.

Ergodicity reflects our assumption that we are allowing very large block lengths so that the channel “averages out.” For sys-tems with strong delay constraints this assumption will not be justified. Finally, by asking for a fading process that is regular we ensure that the fading process is “fully random” in the (en-gineering) sense that even if the past is perfectly known, the present values of the fading cannot be predicted error-free.3This assumption will be appropriate in certain situations and will not be in others. It seems therefore clear to us that both situations, regular and nonregular fading, should be investigated. We would like to emphasize once more that at high SNR this assumption has a dramatic effect on the capacity behavior [11].

As for the input, we consider two different constraints: a peak-power constraint or an average-power constraint. We use to denote the maximal allowed instantaneous power in the former case, and to denote the allowed average power in the latter case. For both cases we set

(32) The capacity of the channel (29) is given by

(33) where the supremum is over the set of all probability distribu-tions on that satisfy the constraints, i.e.,

almost surely (34)

for a peak-power constraint, or

(35) for an average-power constraint.

3Note that this is not a strictly mathematical explanation in general, but it is

precise in the special case of a spatially independent Gaussian fading process.

From [7, Theorem 4.2], [9, Theorem 6.10] we have

(36) Note that [7, Theorem 4.2], [9, Theorem 6.10] is stated under the assumption of an average-power constraint only. However, since a peak-power constraint is more stringent than an average-power constraint, (36) also holds in the situation of a peak-average-power constraint.

The fading number is now defined as in [7, Definition 4.6], [9, Definition 6.13] by

(37) Prima facie the fading number depends on whether a peak-power constraint (34) or an average-peak-power constraint (35) is im-posed on the input. However, it will turn out that the MIMO fading number with memory is identical for both cases.

Finally, we remark that for an arbitrary constant nonsingular matrix and an arbitrary constant nonsingular

matrix

(38) see [7, Lemma 4.7], [9, Lemma 6.14].

IV. PRELIMINARYRESULTS

The proof of the main result relies on some observations that hold in a more general context and are therefore interesting by themselves. Some of these observations are known already and are repeated here without proof for the sake of completeness only, but some are new.

A. Capacity-Achieving Input Distributions and Stationarity One of the main assumptions about our channel model is that the fading process and the additive noise are stationary. From an intuitive point of view it is clear that a stationary channel model should have a capacity-achieving input distribution that is also stationary. Unfortunately, we are not aware of a rigorous proof of this claim.

In [8, Lemma 5], [9, Lemma B.1] it is proven that—apart from edge effects—the optimum input distribution can be assumed to have equal marginals. Here we will extend this statement and prove that the capacity can be approached up to an by a distribution that looks stationary apart from edge effects.

Theorem 3: Consider a channel model with input

and output . Assume that the channel is both sta-tionary and unaffected by zero input vectors in the following sense: for every choice of and , for some

inte-gers and , and for every distribution

we have

(39) whenever both on the LHS and on the RHS have the same distribution .

(7)

Now fix some nonnegative integer and some power . Then for every there corresponds some positive integer

and some distribution such

that for a block length sufficiently large there exists some input satisfying the following conditions.

1) The input nearly achieves capacity in the sense that (40) 2) For every integer with , every

length-block of adjacent vectors

(41) taken from within the sequence

(42) has the same joint distribution , where this distribu-tion is given as the corresponding marginal distribu-tion of .

3) In particular, all vectors in (42) have the same marginal .

4) The marginal distribution gives rise to a second mo-ment

(43) 5) The first vectors and the last vectors satisfy

the power constraint possibly strictly

(44) Proof: The proof relies on a shift-and-mix argument based on the fact that when using deterministic zeros at the input, the corresponding output yields zero information. The details are given in Appendix A.

Remark 4: Neglecting the edge effects for the moment, The-orem 3 basically says that, for every , every block of adjacent vectors has the same distribution independent of the time shift. From this it immediately follows that the distri-bution of every subset of (not necessarily adjacent) vectors of a block does not change when the vectors are shifted in time (simply marginalize those vectors out that are not members of the subset). Hence, Theorem 3 almost proves that the capacity-achieving input distribution is stationary: the only problem are the edge effects. Note that can be chosen freely, but has to re-main fixed until has been loosened to infinity. That is, to get rid of the edge effects one needs to first let tend to infinity, before one can let grow.

Throughout the paper, we will refer to and to a block of vectors as quasi-stationary.

B. Capacity-Achieving Input Distributions and Circular Symmetry

The next observation concerns circular symmetry. We say that a random vector is circularly symmetric if

(45) where is independent of and where stands for equal in law. Note that being circularly symmetric is not to be confused with isotropically distributed, which means that a vector has equal probability to point in every direction. Cir-cular symmetry only concerns the phase of the components of a vector, not the vector’s direction.

In case of a random process we make the following definition. Definition 5: A vector random process is said to be circularly symmetric if

(46) where the process is i.i.d. and independent

of .

Remark 6: Note some subtleties of this definition: a random process being circularly symmetric does not only mean that for every time the corresponding random vector is circularly symmetric, but also that from past vectors one cannot gain any knowledge about the present phase, i.e., the phase is i.i.d. On the other hand, however, knowing the phase of one component of in general does yield some knowledge about the phase of some other components at the same time .

The following proposition says that for our channel model an optimal input can be assumed to be circularly symmetric.

Proposition 7: Assume a channel as given in (29). Then the capacity-achieving input process can be assumed to be circularly symmetric, i.e., the input vectors can be re-placed by , where the random process is i.i.d. and independent of every other random quantity. Proof: A proof is given in Appendix B.

Remark 8: Note that the Proof of Proposition 7 relies only on the fact that the additive noise is assumed to be circularly sym-metric. Hence, for this proposition to hold, the additive noise process does not need to be Gaussian distributed and may even have memory as long as it is circularly symmetric.

C. Capacity-Achieving Input Distributions and Escaping to Infinity

Next we give a brief review about the concept of input distri-butions that escape to infinity: a sequence of input distridistri-butions parametrized by the allowed cost (in our case of fading chan-nels the cost is the available power or SNR) is said to escape to infinity if it assigns to every fixed compact set a probability that tends to zero as the allowed cost tends to infinity. In other words, this means that in the limit—when the allowed cost tends to infinity—such a distribution does not use finite-cost symbols.

(8)

This notion is important because the asymptotic capacity of many channels of interest can only be achieved by input dis-tributions that escape to infinity. As a matter of fact, one can show that every input distribution that only achieves a mutual information of identical asymptotic growth rate as the capacity must escape to infinity. Loosely speaking, for many channels it is not favorable to use finite-cost input symbols whenever the cost constraint is loosened completely.

In the following, we will only state this result specialized to the situation at hand. For a more general description and for all proofs we refer to [8, Sec. VII.C.3], [9, Sec. 2.6].

Definition 9: Let be a family of input distribu-tions for the memoryless version of the fading channel (29), i.e., input distributions of the channel

(47) with input . Let this family be parametrized by the available average power such that

(48) We say that the input distributions escape to infinity if for every

(49) We now have the following lemma.

Lemma 10: Let the memoryless MIMO fading channel be given as in (47) and let be a family of distribu-tions on the channel input that satisfy the power constraint (48). Let denote the mutual information between input and output of channel (47) when the input is distributed according to the law . Assume that the family of input distributions

is such that the following condition is satisfied: (50) Then must escape to infinity.

Proof: A proof can be found in [8, Theorem 8, Remark 9], [9, Corollary 2.8].

D. An Upper Bound on Channel Capacity

Since capacity is by definition a maximization of mutual in-formation, it is implicitly difficult to find upper bounds on it. The following upper bound has been derived based on a dual expression of mutual information [7, Sec. V], [9, Sec. 2.3].

Lemma 11: Consider a memoryless channel with input and output . Then for an arbitrary distribution on the input the mutual information between input and output of the channel is upper-bounded as follows:

(51)

where and are parameters that can be chosen freely, but must not depend on the distribution of .

Proof: A proof can be found in [7, Sec. IV], [9, Sec. 2.4]. E. Generalized Entropy Rates

The main result will be presented in various different forms, all containing some types of “entropy rates.” The original defini-tion of the differential entropy rate of a stochastic process is [20, Sec. 12.5]

(52) if the limit exists. Moreover, it can be shown that for stationary processes this limit always exists and is identical to

(53) We will now extend this definition to conditional versions of en-tropy rates and prove that the limit exists as long as the involved processes are well-behaved (in particular, stationary). We will show this only for one example that, however, is representative for other forms as well.

Lemma 12: Let be stationary, ergodic, of finite energy and regular. Let be a stationary unit-vector process. Then

1) the sequence is nonincreasing in

;

2) the sequence is

nonin-creasing in ;

3) for all we have

(54) and

4) the limits exist and are equal

(55) (56) Proof: See Appendix C.

V. THEFADINGNUMBER OFMIMO FADINGCHANNELS

WITHMEMORY

A. Main Result

We are now ready to state the main result.

Theorem 13: Consider a MIMO fading channel with memory (29) where the stationary and ergodic fading process

takes value in and satisfies and

(9)

constraint (34) or an average-power constraint (35) is imposed on the input, the fading number is given by

(57)

Here the supremum is over all stochastic unit-vector processes that are stationary and circularly symmetric.

Moreover, the fading number is achievable by a stationary input that can be expressed as a product of two independent processes

(58) where is a stationary and circularly symmetric unit-vector process with the distribution that achieves the max-imum in (57), and is a scalar nonnegative i.i.d. random process such that

(59) Note that this input satisfies the peak-power constraint (34) (and therefore also the average-power constraint (35)).

Proof: The proof is long and obscured by many technical details. We will therefore provide here an outline emphasizing the important key steps. For the details we refer to Appen-dices D to I.

The proof consists of two parts: first, we derive an upper bound on the fading number assuming an average-power con-straint (35) on the channel input (see Appendix D). The key in-gredients for this part are the four concepts introduced in Sec-tions IV-A to IV-D.

Second, we derive a lower bound on the fading number by assuming one particular input distribution on the channel that satisfies the peak-power constraint (34) (see Appendix G). We then show that the fading number that is achieved by this choice is identical to the upper bound on the fading number derived be-fore. Since a peak-power constraint is more restrictive than the corresponding average-power constraint, the theorem follows.

a) Outline of Upper Bound: To derive the upper bound we consider the average-power constraint (35). Similarly to the proof of the SIMO fading number with memory [8, Sec. VII], [9, Sec. B.5.9], we use the chain rule to write

(60) and would like to separate each term on the RHS into terms that are memoryless and terms that take care of the memory. It is shown in (169)–(176) that

(61)

(62) which would nicely do the trick. Unfortunately, (62) is not tight for two reasons. First, note that in the situation of only one transmit antenna it is possible to get a good estimate of the fading realizations by simply dividing the received vector by the decoded value of

for

(63) as the SNR gets large. This is not possible anymore once we have multiple antennas at the transmitter as we cannot “divide by a vector.” Instead we divide by the vector’s norm

for (64)

This estimation still depends on the direction of the input vector. Hence, we cannot gain the full knowledge but only

.

The second reason why (62) is not tight is the term that (similar to the SIMO situation) we must not discard because it contains information about the past fading values even if we do not know the corresponding inputs. To see this note that from we can easily get

(65)

(66)

for (67)

which is an estimate for the “direction” of the fading. How-ever, note that similarly to (64) and unlike to the SIMO case we cannot gain full knowledge about because the fading is a matrix-valued process.

So we get the following bound instead:

(68) Note that we have jumped over many details here, in particular, we need to rely on the observation of Lemma 10 that the ca-pacity-achieving input distribution escapes to infinity

in order to be able to discard the noise.

The first term on the RHS of (68) corresponds to memoryless MIMO fading. Hence, we might use the knowledge of the mem-oryless MIMO fading number (11) or we could use the bounding techniques known from [7, Sec. IV.A], [9, Sec. 2.4] to get an upper bound on this term. Unfortunately, both approaches fail, the former because the memoryless MIMO fading number con-tains a maximization that will loosen the bound when introduced

(10)

at this stage. The latter approach turns out to lead to an even less tight bound.

Instead, we split up into a magnitude term and a term that takes care of the direction

(69) and show that

(70) (where we again need to rely on the fact that the input distribu-tion escapes to infinity).

The first term on the RHS of (69) almost looks like the mutual information between input and output of a memoryless MISO fading channel. We fix the problem that the output is nonneg-ative by multiplying by an independent circularly sym-metric phase . Because we assume that is independent of all other random quantities, particularly of , this does not change the mutual information.

The bound (68) then looks as follows:

(71) This bound still depends on the unknown capacity-achieving input distribution. In order to eliminate this dependence, we need to maximize it over all joint distributions on

that satisfy the average-power constraint. Unfortunately, when we only consider one fixed , this maximization will loosen our bound. The reason lies in the third term on the RHS of (71) which can be loosely upper-bounded by zero. This loose upper bound can be achieved by the (obviously very bad) choice

.

So it seems that we cannot consider each term of the sum in (60) separately. Fortunately, this is possible once we take The-orem 3 and Proposition 7 into account. They allow us to restrict ourselves to stationary and circularly symmetric input distribu-tions. This excludes the mentioned bad choice and yields the following bound:

(72)

where the supremum is over all stationary and circularly sym-metric input processes.

Note that Theorem 3 has also allowed us to get rid of the dependence on , i.e., we can let tend to infinity. Then we are free to loosen the power constraint (i.e., let ) and to use the definition of the fading number (37).

We might now be tempted to use our knowledge about mem-oryless MISO fading. But again this approach fails due to the maximization in the expression of the memoryless MISO fading number (8). Instead, we rely on Lemma 11 to get an upper bound on the first term on the RHS of (72). This bound will look very similar to the memoryless MISO fading number, however, it does not involve a local beam-forming maximization. To make the expressions easier to read, we will use here the notation to refer to this part of the bound.

Hence, we get the following:

(73) We see that the upper bound consists of a term that corresponds to the memoryless MISO fading number when the receiver only considers the magnitude of the received vector, a term that takes care of the contribution of the direction of the channel output, and two terms take care of the contribution of the memory in the channel.

Note that in the whole derivation we rely on the fact that the input distribution does not take on any value smaller than an arbitrary . However, Lemma 10 only guarantees this in the limit when the power tends to infinity. In order to solve that problem, we need to introduce the event and condition everything on this event.

For more details we refer to Appendix D.

b) Outline of Lower Bound: To derive a lower bound we choose a specific input distribution which naturally yields a lower bound to channel capacity and hence to the fading number. Let be of the form

(74) Here is a sequence of random unit vectors forming a sto-chastic process that is stationary and circularly symmetric, but whose exact distribution will be specified later. The stochastic process consists of random variables that are i.i.d. with

(11)

where we choose as

(76)

We assume that .

Note that this choice of satisfies the peak-power con-straint (34) and therefore also the average-power concon-straint (35).

We then again start with the chain rule and write

(77) where we would like to treat each term separately. Note that for the same reason that we were not allowed to discard the term in the derivation of the upper bound, we are not allowed to discard the future outputs on the RHS of (77). After some algebraic changes we get the following lower bound:

(78) Note that the first term is bounded and that the second term corresponds to a memoryless MIMO fading channel with some side information. To simplify notation let us denote this side information by

(79) Contrary to the derivation of the upper bound that has been based on the memoryless MISO case, we will base the deriva-tion of the lower bound on memoryless SIMO, i.e., we split the second term on the RHS of (78) into the following two parts:

(80) Now we have the problem that the second mutual information term on the RHS of (80) does not correspond exactly to the SIMO situation since the input of the channel is real instead of complex. This is fixed by various arithmetic changes which at the end yield the following expression:

(81) (82) where we have introduced to be i.i.d. uniformly dis-tributed on , independent of all other random quantities.

Note that our choice of guarantees that

achieves the fading number of memoryless SIMO fading with side information [7, Proposition 4.30], [9, Proposition 6.30]. Hence, we get from (78) and (82)

(83)

(84) where we have used the expression of the fading number of memoryless SIMO fading with side information (7).

Note that we have been cheating here since we have inter-changed the order of the limits of and . To correct this we will need to introduce a parameter , get rid of , and use the stationarity of our channel model and our choice of . Furthermore, we will have to discard the influence of the noise process in various places which is possible once we let because with probability .

The result now follows by showing that (84) is equivalent to the upper bound. This will follow from some arithmetic changes, from stationarity, and from the fact that we choose the distribution of to achieve the supremum given in the upper bound.

For more details we refer to Appendix G. B. Alternative Expressions and an Upper Bound

In the following, we will state several equivalent expressions for the fading number given in (57). Depending on the context one particular form might be more convenient.

We start by defining a constrained memoryless MIMO fading number given a fixed distribution on

(85) This corresponds to the situation where we additionally con-strain the transmitter to use a fixed (possibly suboptimal) distri-bution on , i.e., the memoryless MIMO fading number is then given as (see (11))

(86)

Note the difference to the memoryless fading number with par-tial side information at the receiver

(87) Here we assume that the realization of is known to the re-ceiver which changes the problem from memoryless MIMO to memoryless SIMO with side information.

From (85) we next define the following natural extension: the constrained memoryless MIMO fading number with partial side

(12)

information given a fixed distribution on is defined as follows:

(88) Using these definitions we get the following alternative expres-sions.

Corollary 14: The MIMO fading number with memory can be written in the following five equivalent forms:

(89)

(90)

(91)

(92)

(93) Here denotes the th row of , and denotes the phase of . Moreover, in (92) we have defined

(94) (95) and in (93) (96) (97) (98) Proof: A proof can be found in Appendix J.

Here the expression (92) is interesting because it expresses the fading number without using the differential entropy-like quantity .

Note that the various forms of entropy rates used in (89)–(98) are all well-defined because the underlying processes are sta-tionary. This has been proven in Lemma 12 for one particular case that is representative for all other cases.

Since the evaluation of (57) is in general rather difficult, we will next state two upper bounds to the MIMO fading number that are usually easier to compute.

Corollary 15: The fading number of a MIMO fading channel with memory as defined in Theorem 13 can be upper-bounded as follows:

(13)

(99)

(100) where the infimum is over all nonsingular complex matrices and nonsingular complex matrices .

Proof: Note that from conditioning that reduces entropy and from (20) and (21)

(101)

where the latter upper bound is achieved with equality only if is uniformly distributed on the sphere, i.e., isotropically distributed.

We now get from Theorem 13 and from (38)

(102) (103) (104) (105) (106) (107) Here, (102) follows from (38); the subsequent inequality (103) from (57) and (101); (104) holds due to linearity of expectation;

then in (105) we upper-bound the expectation by the supremum over all possible values (this proves (99)); (106) is due to conditioning that reduces entropy; and the final equality (107) holds because conditional on , is independent of

.

VI. SOMESPECIALCASES

In this section, we will now specialize the general result to some important special situations. While some of them have been known already, the case of MISO fading with memory has not been solved before.

A. Memoryless Fading

We start with the situation where the fading process has no temporal memory, i.e., is i.i.d. over time . In this situa-tion, we will usually drop the time index and write .

The expression for the fading number of a memoryless MIMO fading channel (11) can be derived from (57) as fol-lows: first, note that only the first and the last terms in (57) are influenced by memory. However, once we assume that there is no memory in the fading process , the past can only influence the present values via some memory in the input process . Now note that the fourth term is conditioned on the input of the past and of the present. Hence, the past has no influence on this term either. Finally, note that the first term can be upper-bounded by dropping the conditioning, and note further that this upper bound can actually be achieved if the input is chosen to be i.i.d. Hence, an optimum choice of will be memoryless, and (11) follows.

All other memoryless situations follow directly from this. In case of a SIMO fading channel, there is only one possible choice for a circularly symmetric unit random variable , which is therefore implicitly the optimum one

(108) For the MISO case note that, independently of the distribution of and , the distribution of

is identical to the distribution of . Hence

(109) and the fading number becomes

(110)

(111) (112)

(14)

which can be achieved for a distribution of that with proba-bility takes on the value that achieves the fading number in (112) (beam forming).

Finally, the SISO case is a combination of the arguments of the SIMO and MISO case, i.e.,

(113) which yields

(114) B. MISO Fading With Memory

Next we are going to study the special case of MISO fading with memory for which the fading number has been unknown so far. If we specialize Theorem 13 to the situation of only one antenna at the receiver we get the following corollary.

Corollary 16: Consider a MISO fading channel with memory where the stationary and ergodic fading process takes

value in and satisfies and

. Then, irrespective of whether a peak-power constraint (34) or an average-power constraint (35) is imposed on the input, the fading number is given by

(115) where denote vectors of unit length, and where the maximization is over all stochastic processes that are stationary.

Moreover, the fading number is achievable by an input that can be expressed as a product of three independent processes

(116) Here, is a stationary unit-vector process with the distribution that maximizes (115); is a scalar nonnegative i.i.d. random process satisfying (59); and is i.i.d. as defined in Definition 5.

Proof: This follows directly from Theorem 13 by the ob-servation that independently of the distribution of and

, in the MISO case the distribution of

is identical to the distribution of and therefore

(117) Note that the remaining terms do not depend on the phase

of .

Remark 17: We would like to point out that in the case of a MISO fading process without memory the optimal input uses

beam forming with a deterministic direction that maximizes the fading number (see (8)). Once the fading process has memory this is not the case anymore. However, it is straightforward to derive the upper bound (9) and the lower bound (10) that are of beam-forming type [16]: the upper bound follows by upper-bounding the expectation over by the supremum over . For the lower bound, we choose the following stationary and circularly symmetric distribution on :

(118) where is the deterministic direction that achieves the max-imum in (10).

C. SIMO and SISO Fading With Memory

In the case of only one antenna at the transmitter, the input vector is reduced to a random variable and therefore the input direction to a phase . Hence, the expression (57) gets simplified considerably by the fact that there is only one choice of a circularly symmetric distribution of

(119) i.e., the supremum disappears.

The fading number of a SIMO fading channel with memory follows then from (57) in a straightforward way from the fact that

(120) The expression (6) can be derived from the alternative form (91). For the fading number of a SISO fading channel with memory (5) we use

(121)

VII. CONCLUSION

We have derived the fading number of a general MIMO fading channel with memory where the distribution of the fading process is not restricted to be Gaussian, but may have any stationary, ergodic, and regular distribution of finite energy. In particular, we allow both temporal and spatial memory. The channel is assumed to be noncoherent, i.e., neither receiver nor transmitter know the realization of the fading process, but they only know its probability distribution.

We have shown that the MIMO fading number is achiev-able by an input that can be written as a product of two inde-pendent processes: an i.i.d. nonnegative “magnitude” process and a stationary and circularly symmetric “direction” process. The former has the standard logarithmically uniform distribu-tion (59) that has been used in earlier publicadistribu-tions about the fading number [7]–[9]. It escapes to infinity as guaranteed by Lemma 10. The “direction” process depends on the particular law of the fading process, i.e., it needs to be chosen such as to maximize the fading number.

(15)

Note that the fading number is not given in a completely closed form but as an expression that still contains a maximiza-tion. This is to be expected due to the generality of the result. Re-call that the stationary and ergodic matrix-valued fading process is only constrained4to be regular (see (31)). It can contain various types of temporal and spatial memory and have various different distributions. In particular, we do not restrict it to be Gaussian. We believe that it will be hardly possible to further simply (57) without making more detailed assumptions about

.

Unfortunately, the evaluation of (57) is difficult even if we specify the fading process in more detail. There are several rea-sons for this. First, the fading number is not determined by the fading process directly, but by the projection of the fading process into using an optimal choice of the input “direc-tion” process . Note that not only determines this projection, but simultaneously also conveys information in it-self.5Hence, we should choose such as to find a good projection and also to maximize the amount of information it can convey. The optimal choice of the input therefore is a tradeoff between these two—in general contradicting—objec-tives.

Second, even though straightforward in theory, the evaluation of the entropy with respect to the surface of the unit sphere in might be cumbersome.

In spite of these difficulties the fading number offers very in-teresting insights about the general behavior of capacity in the high-SNR regime. We refer to the discussion about the fading number in Section I-B, and also, as an example, to the special sit-uation of Gaussian fading processes with memory where (57) al-lows general statements about the dependence of the high-SNR capacity on the memory and the number of transmit and receive antennas. For more details about this important special case of Gaussian fading we refer to a separate publication [19].

The proof of the main result is strongly based on a new the-orem showing that the capacity-achieving input distribution of a stationary channel model can (almost) be assumed to be sta-tionary (see Theorem 3). Even though this result is very intu-itive, we are not aware of any proof in the literature. We believe this preliminary result to be of importance in many other situa-tions as well.

We also have derived the MISO fading number with memory and the already known SIMO and SISO fading numbers as spe-cial cases from this general result. In the case of MISO fading with memory, it is interesting to note that in contrast to the mem-oryless situation the fading number is in general not achieved by beam forming.

APPENDIXA

PROOF OFTHEOREM3

The proof follows the same lines as the proofs of [8, Lemma 5] and [9, Lemma B.1]. It is based on a shift-and-mix argument.

4Note that the restriction of having finite energy is obvious as otherwise the

capacity is unbounded.

5This is in stark contrast to the special case of memoryless MISO fading,

where the^xxx is only used for beam forming without conveying information.

Fix some arbitrary and an integer .

Recalling that

(122) where the supremum is over all joint distributions on

under which ,

we conclude that there must exist some integer

and some joint distribution such that if then

(123) and

(124)

Let be a random matrix whose

distri-bution consists of independent blocks that are distributed according to . The distribution of can then be written as the product of distributions

(125) Let us next compute the marginal distribution of for a cer-tain block of length . This marginal dis-tribution depends on the particular choice of the starting point of the block, however, note that different choices of will re-sult in at most different marginal distributions. This follows from the definition of in (125). Let be the probability law on that is a mixture of these different block-marginals of , i.e., for every Borel set

(126)

Note that in the situation when can alternatively be written as

(127) where we used to denote the set of all corresponding submatrices of that are created by taking only columns to of each matrix in .

Note further that from our definition it follows that is

(16)

length- subblock has the same distribution for all . The distribution can be computed from as marginal distribution, .

In the theorem we have assumed that is given and suffi-ciently large. In particular, we will assume that . We shall next describe the required input distribution as follows: let

(128) and let the length- sequence of random -vectors be defined by (129) so that if if if (130) where is the zero -vector and where

are i.i.d. (131) If we choose as input for our channel, then it follows from the fact that zeros have no effect and from (124) that

(132) Again, since the lead-in and trailing zeros are of no consequence and since shifting does not change mutual information, this same mutual information results if we shift by (provided that and is large enough so that we do not lose any nonzero input vector)

(133) Consequently, if we define by the mixture of the time shift of , i.e.,

(134) where

(135) is independent of , then by the concavity of mutual information in the input distribution we obtain that

(136)

(137) which exceeds for sufficiently large .

Except at the edges, the above mixture guarantees that every block of vectors has the same distribution

(138)

for every and every Borel set ,

i.e., is (apart from the edges) quasi-stationary. Note further that by (123) we have for

(139) The power in the edges can be smaller than because of the mixture with deterministic zero vectors.

APPENDIXB PROOF OFPROPOSITION7

Assume that are i.i.d. , independent of every other random quantity. Then

(140) (141) (142) (143) (144) (145) (146) (147) Here (140) follows because is independent of every other random quantity; (142) follows because is circularly sym-metric; in (143) we define the new input ; and (146) follows since conditioning reduces entropy.

Hence, a circularly symmetric input achieves a mutual infor-mation that is at least as big as the original mutual inforinfor-mation.

APPENDIXC PROOF OFLEMMA12

We start with the proof of the second statement which follows directly from the fact that conditioning cannot increase differ-ential entropy

(17)

(149) where the last equality follows from stationarity.

We next use the second statement to prove the third

(150)

(151)

(152) (153) where (150) follows from the chain rule; (151) from the fact that conditional on is independent of all other random variables in the expression; and (152) follows from the second statement.

Next, we prove the first statement

(154)

(155) (156) (157) where (154) follows from the chain rule; where (155) follows from the second statement and from the fact that given the random vector is independent of , and where (156) follows from the third statement.

Finally, to prove the fourth statement we note that for an ar-bitrary (158) (159) (160) (161) (162) Here, in (158) and (159) we use the chain rule; (160) follows from the fact that conditional on are independent of all other random variables in the expression; and (161) follows from the second statement. Hence

(163) Letting tend to infinity we then get

(164) which combined with the third statement proves the fourth state-ment.

APPENDIXD

DERIVATION OF ANUPPERBOUND FORTHEOREM13 Fix some power , and let be arbitrary. Let

be a positive integer whose existence is guaran-teed in Theorem 3. Fix a nonnegative integer , and let

be the block-input distribution whose existence is guaranteed in Theorem 3. Let block length and input satisfy (40)–(44) so that

(165) (166)

For and for we use

the crude bound

(167) (168) where denotes the capacity of the memoryless MIMO fading channel as given in (47) with an available average power of at most as guaranteed in (43) and (44). The first inequality can be derived as follows:

(169) (170)

(18)

(171) (172) (173) (174) (175) (176) (177) (178) Here, (169) follows from the chain rule; (170) follows because we prohibit feedback; (171) from the inclusion of the additional random vectors in the mutual information term; (172) fol-lows because, conditional on the past fading and the present input, the past inputs and outputs are independent of the present output; (173) follows from the chain rule; in (174) we use the chain rule and note that ; the following two steps (175)–(176) are analogous to (171)–(172); (177) follows once more from the inclusion of additional random vectors in the mutual information and from stationarity; and the final in-equality (178) from the nonnegativity of mutual information.

Note that (168) is uniformly bounded in . So we conclude that

(179)

(180)

This allows us to focus on for which

Theorem 3 guarantees that every -block has the same distribution .

We now continue by further upper-bounding for such (181) (182) (183) (184) (185) (186) (187) Here the first two steps (181)–(182) are identical to (169)–(170); (183) follows from the inclusion of the additional random

ma-trices and the random vectors in the

mutual information term; (184) follows because, conditional on , the past outputs and the past inputs are independent of the present output ; then (185) follows from the chain rule; (186) from the inde-pendence of the past inputs and the present output when condi-tioning on the present input; and in (187) we use the following lemma.

Lemma 18: Let be quasi-stationary as defined in The-orem 3 and Remark 4. Then

(188) where does neither depend on nor on the input and tends to zero as tends to infinity.

Proof: See Appendix E. We continue as follows: (189) (190) (191) (192) (193) (194) Here (189) follows from the inclusion of the random vector in the mutual information term; (190) from chain rule; (191) follows because the additive noise is independent of the fading ; in (192) we introduce ; (193) follows from dividing each term by the magnitude of the input vectors and from the fact that given is in-dependent of ; and in the final inequality (194) we drop some arguments in the negative mutual information term.

(19)

Note that (194) only depends on which, according to Theorem 3, has a distribution . Hence, using stationarity and combining (194) with (180) we find

(195)

(196)

(197)

(198) Here, in (197) we shift all to which is possible due to the stationarity of and due to the fact that for all the distribution of is given in Theorem 3.

To ease readability and to prevent the need of carrying throughout the rest of the proof, we will introduce here a slight misuse in notation: for pure notational convenience we will as-sume from now on that , i.e., that from now on is quasi-stationary. Note that this is a notational choice and therefore does not contradict the edge-effects of Theorem 3! Hence, we rewrite (198) as follows:

(199)

Before we continue bounding (199) note that Lemma 10 guar-antees that must escape to infinity as defined in Definition 9, i.e., for an arbitrary

(200) So fix and define an indicator random variable as follows:

if

otherwise. (201)

Let

(202) and note that by the union bound

(203) (204) (205) i.e.,

(206) Hence, from Lemma 10 then follows that

(207) Finally, in order to simplify notation, we will write for

, i.e., we have .

We start with the second term on the RHS of (199) and derive the following lower bound:

(208) (209) (210) (211) (212) (213) (214) (215) (216)

(20)

where

(217) denotes the binary entropy function. Here the first two equal-ities (208) and (209) follow from the chain rule; in (210) we drop a nonnegative mutual information term; (211) and (212) follow from the definition of mutual information and the fact that entropy is nonnegative (note that since is a binary random variable we deal with discrete entropies and not differential en-tropies at this point); in (213) we then drop some conditioning which increases entropy; and in (216) we again drop some non-negative mutual information terms.

We now proceed as follows:

(218) (219) (220) (221) (222) (223) (224) where we have used the following lemma.

Lemma 19: Let be the event that for

all . Then

(225) (226)

where and are independent of and

de-pend on such that

(227) (228) Proof: See Appendix F.

We continue by splitting into magnitude and direction

(229) (230) (231) Hence, we get (232) (233) (234) where the inequality (232) follows from dropping some argu-ments of mutual information. Note that even though

by definition, it still might depend on because we are not al-lowed to assume that and are independent. At this stage of the proof, we only know that where

is defined in Theorem 3 and does depend on . We lower-bound the first term in (234) by

(235)

(21)

(237)

and upper-bound the second term in (234) by

(238) (239) (240) (241) (242) (243) (244) (245) (246) Here (238) follows from adding additional arguments to mutual information; (239) holds because conditional on , is deterministic; in (240) we use the definition of mutual infor-mation; (241) follows from the fact that conditioning reduces entropy; in (242) we drop because given and it is independent of . In (243), we rely on the fact that of all random vectors of given marginals differential en-tropy is maximized by the one whose components are indepen-dent [20], on the fact that of all complex random variables of a given second-moment differential entropy is maximized by the circularly symmetric Gaussian distribution, and by the con-cavity (in the variance) of the differential entropy of a circu-larly symmetric complex Gaussian. The inequality (244) fol-lows from Cauchy–Schwarz; and (245) from the fact that is a unit vector. Note that

(247) because our assumption implies by (15) that

and because our assumption

implies by a conditional version of [7, Lemma 6.6], [9, Lem-ma A.14] that

(248)

Plugging (246) and (237) into (234), (224), and then into (199) we get

(249) Before we continue to bound the first term on the RHS of (249), we define another indicator random variable

if

otherwise (250)

and let

(251) From Lemma 10 then follows that

(252) Similarly to (208)–(216), we get (253) (254) (255) (256) (257) (258) Here, in (253) we add to the argument of mutual information; (254) and (255) follow from the chain rule and the definition of mutual information (note again that is a binary random variable, i.e., we need entropy instead of differential entropy); in (256) we lower-bound an entropy term by zero; and in the last inequality (258) we upper-bound and interpret the second mutual information term as mutual information between input and output of a memoryless MIMO fading channel (see (47)) under the constraint that the maximum available average power is . Hence, we can upper-bound this term by the mem-oryless capacity .

We remark that we do not know an analytic expression for . However, we know that it is finite (bounded) and independent of so that from (252) we have

(22)

Using (231) we continue with the mutual information term in (258) as follows: (260) (261) (262) (263) (264) (265) Here, (260) follows from adding an additional random vector to the argument of the mutual information; (261) from sub-tracting the known vector from and from the indepen-dence between the noise and all other random quantities; then in (262) we split into magnitude and direction vector (see (231)); (263) follows from the chain rule again; in (264) we use the chain rule and introduce that is independent of all the other random quantities and that is uniformly distributed on the complex unit circle; and the last equality (265) follows from the independence of from all other random quantities.

Next we apply Lemma 11 to the first term in (265), i.e., we

choose and . Note that we need to

condition everything on the event

(266) where and can be chosen freely, but must not depend on .

Note that from a conditional version of Lemma 2 with follows that

(267) (268)

where we have used that is independent of all other random quantities and uniformly distributed on the unit circle. Taking the expectation over conditional on and noting that by the law of total expectation

(269) we then get (270) (271) (272) (273) where (271) follows from the definition of ; where (272) follows from the scaling property of entropy with a real argument; and where (273) follows because given

is independent of .

We assume such that . Then we define (274) such that (275) (276) (277) (278) Note that in (276) we use our knowledge , i.e., .

Finally, we bound

(279) (280) (281)

(23)

where we use Cauchy–Schwarz in (279), then in (280) split the expectation of a product into a product of expectations because and are independent, and finally in (281) use the fact that needs to satisfy the average-power constraint (35) to get the following bound:

(282) (283) (284) Plugging (281), (278), and (273) into (266) yields

(285) Next we continue with the second term in (265)

(286)

(287)

(288)

Here, (287) follows because given and , the term does not depend on ; and (288) follows be-cause conditioning cannot increase entropy.

Hence, using (288) and (285) in (265) we get

(289)

(290)

(291)

(292) Here, (290) follows from a conditional version of Lemma 2 sim-ilar to (267)–(273) which allows us to combine the second and the last term in (289); in (291) we arithmetically rearrange the terms; and (292) follows from the following bound:

(293) (294) (295) where the last line should be read as definition for . Notice that

(296) as can be argued as follows: the lower bound on follows from [7, Lemma 6.7f)], [9, Lemma A.15f)] because

and . The upper bound on can be verified using the concavity of the logarithm function and Jensen’s in-equality.

Hence, plugging (292) and (258) into (249) we get the fol-lowing bound on capacity:

(24)

(297)

(298) This bound still depends on the distribution which is guaranteed to exist by Theorem 3, but whose exact form is not known. In order to get around this problem we will now further upper-bound this expression by maximizing it over all possible choices of that are quasi-stationary and circularly symmetric6

(299)

6Note that by Proposition 7 we know thatQ is circularly symmetric.

(300) where (300) follows from the definition of mutual information. As mentioned, the supremum is over all distributions of blocks of unit vectors

(301) that are quasi-stationary in the sense of Theorem 3 and that are circularly symmetric as guaranteed by Proposition 7. That is, for every integer with , every block of vectors

has the same distribution, and the vectors are circularly symmetric according to Definition 5.

Note that from (299) the alternative expression (90) can be derived.

We next use this upper bound on capacity to get an upper bound on the fading number (37)

(302)

數據

Fig. 1. An upper bound on the capacity of a Rician-fading channel as a function of the output SNR  = (1 + jdj )SNR for different values of the specular component d
Fig. 2. Illustration of the different regimes of a typical regular fading channel. At low SNR, the o(1) terms are dominant, in the lower range of the high-SNR regime , the fading number  is dominant, and only at very high SNR, the log(1 + log(1 + SNR)) te

參考文獻

相關文件

• P u is the price of the i-period zero-coupon bond one period from now if the short rate makes an up move. • P d is the price of the i-period zero-coupon bond one period from now

You are given the wavelength and total energy of a light pulse and asked to find the number of photons it

Reading Task 6: Genre Structure and Language Features. • Now let’s look at how language features (e.g. sentence patterns) are connected to the structure

volume suppressed mass: (TeV) 2 /M P ∼ 10 −4 eV → mm range can be experimentally tested for any number of extra dimensions - Light U(1) gauge bosons: no derivative couplings. =>

incapable to extract any quantities from QCD, nor to tackle the most interesting physics, namely, the spontaneously chiral symmetry breaking and the color confinement.. 

• Formation of massive primordial stars as origin of objects in the early universe. • Supernova explosions might be visible to the most

2-1 註冊為會員後您便有了個別的”my iF”帳戶。完成註冊後請點選左方 Register entry (直接登入 my iF 則直接進入下方畫面),即可選擇目前開放可供參賽的獎項,找到iF STUDENT

The design of a sequential circuit with flip-flops other than the D type flip-flop is complicated by the fact that the input equations for the circuit must be derived indirectly