@2_{R}_{R}_{R}
n()
@2
=0= 12n02
n
i=0
Cni(2ix + 2(n 0 i)y 0 n)2
= 1_{2}_{n02} n
i=0
Ci
n((2x 0 2y)i + 2ny 0 n)2
= (2x 0 2y)2 n
i=0
Ci
ni2+ (2ny 0 n)2
n
i=0
1 + 2(2x 0 2y)(2ny 0 n) n
i=0
Ci
ni:

From the fact that the derivatives ofH(Z) with respect to " are
uni-formly bounded on[0; 1=2] (see [6], also implied by Theorem 1.1 of
[4] and the computation ofH"(Z)j_{"=0}), we draw the conclusion that
the second coefficient ofH(Z) is equal to

H00_{(Z)j}
"=1=2= 04 _{}100 01
10+ 01
2
:
ACKNOWLEDGMENT

The authors wish to thank the anonymous reviewer for pointing out the Faa Di Bruno formula, which greatly simplified the proof of Lemma 2.3.

REFERENCES

[1] D. Blackwell, “The entropy of functions of finite-state Markov chains,”
*in Trans. 1st Prague Conf. Information Thoery, Statistical Decision*
*Functions, Random Processes, Prague, Czechoslovakia, 1957, pp.*
13–20.

[2] G. Constantine and T. Savits, “A multivariate Faa Di Bruno formula
*with applications,” Trans. Amer. Math. Soc., vol. 348, no. 2, pp.*
503–520, Feb. 1996.

[3] R. Gharavi and V. Anantharam, “An upper bound for the largest
Lyapunov exponent of a Markovian product of nonnegative matrices,”
*Theor. Comp. Sci., vol. 332, no. 1–3, pp. 543–557, Feb. 2005.*
[4] G. Han and B. Marcus, “Analyticity of entropy rate of hidden Markov

*chains,” IEEE Trans. Inf. Theory, vol. 52, no. 12, pp. 5251–5266, Dec.*
2006.

[5] T. Holliday, A. Goldsmith, and P. Glynn, “Capacity of finite state
*chan-nels based on Lyapunov exponents of random matrices,” IEEE Trans.*
*Inf. Theory, vol. 52, no. 8, pp. 3509–3532, Aug. 2006.*

[6] P. Jacquet, G. Seroussi, and W. Szpankowski, “On the entropy of a
*hidden Markov process,” in Proc. Data Compression Conf., Snowbird,*
UT, Mar. 2004, pp. 362–371.

*[7] J. Kemeny and J. Snell, Finite Markov Chains.* Princeton, N.J.: Van
Nostrand, 1960.

[8] R. Leipnik and T. Reid, “Multivariable Faa Di Bruno formulas,” in
*Electronic Proc 9th Annu. Int. Conf. Technology in Collegiate *
*Mathe-matics [Online]. Available: http://archives.math.utk.edu/ICTCM/EP-9.*
html#C23

*[9] D. Lind and B. Marcus, An Introduction to Symbolic Dynamics and*
*Coding.* Cambridge, U.K.: Cambridge Univ. Press, 1995.

[10] B. Marcus, K. Petersen, and S. Williams, “Transmission rates and
*fac-tors of Markov chains,” Contemp. Math., vol. 26, pp. 279–294, 1984.*
[11] E. Ordentlich and T. Weissman, “On the optimality of symbol by

*symbol filtering and denoising,” IEEE Trans. Inf. Theory, vol. 52, no.*
1, pp. 19–40, Jan. 2006.

[12] E. Ordentlich and T. Weissman, “New bounds on the entropy rate of
*hidden Markov process,” in Proc. Information Theory Workshop, San*
Antonio, TX, Oct. 2004, pp. 117–122.

[13] E. Ordentlich and T. Weissman, Personal Communication.

[14] Y. Peres, “Analytic dependence of Lyapunov exponents on transition
*probabilities,” in Lyapunov’s Exponents, Proceedings of a Workshop*
*(Lecture Notes in Mathematics).* Berlin, Germany: Springer-Verlag,
1990, vol. 1486.

[15] Y. Peres, “Domains of analytic continuation for the top Lyapunov
*ex-ponent,” Ann. Inst. H. Poincaré Probab. Statist., vol. 28, no. 1, pp.*
131–148, 1992.

[16] O. Zuk, I. Kanter, and E. Domany, “Asymptotics of the entropy rate
*for a hidden Markov process,” J. Statist. Phys., vol. 121, no. 3–4, pp.*
343–360, 2005.

[17] O. Zuk, E. Domany, I. Kanter, and M. Aizenman, “From finite-system
*entropy to entropy rate for a hidden Markov process,” IEEE Signal*
*Process. Lett., , vol. 13, no. 9, pp. 517–520, Sep. 2006.*

**The Fading Number of Memoryless**
**Multiple-Input Multiple-Output**

**Fading Channels**
*Stefan M. Moser, Member, IEEE*

**Abstract—In this correspondence, we derive the fading number of ****mul-tiple-input multiple-output (MIMO) flat-fading channels of general (not**
**necessarily Gaussian) regular law without temporal memory. The channel**
**is assumed to be noncoherent, i.e., neither receiver nor transmitter have**
**knowledge about the channel state, but they only know the probability law**
**of the fading process. The fading number is the second term, after the**
**double-logarithmic term, of the high signal-to-noise ratio (SNR) expansion**
**of channel capacity. Hence, the asymptotic channel capacity of **
**memory-less MIMO fading channels is derived exactly. The result is then specialized**
**to the known cases of single-input–multiple-output (SIMO), multiple-input**
**single-output (MISO), and single-input–single-output (SISO) fading **
**chan-nels, as well as to the situation of Gaussian fading.**

**Index Terms—Channel capacity, fading number, Gaussian fading, ****gen-eral flat fading, high signal-to-noise ratio (SNR), multiple antenna, **
**mul-tiple-input multiple-output (MIMO), noncoherent.**

I. INTRODUCTION

It has been recently shown in [1], [2] that, whenever the
matrix-valued fading process is of finite differential entropy rate (a so-called
*regular process), the capacity of noncoherent input *
multiple-output (MIMO) fading channels typically grows only
double-logarith-mically in the signal-to-noise ratio (SNR).

*This is in stark contrast to both, the coherent fading channel where*
*the receiver has perfect knowledge about the channel state, and to*
*the noncoherent fading channel with nonregular channel law, i.e.,*
the differential entropy rate of the fading process is not finite. In the
former case the capacity grows logarithmically in the SNR with a

Manuscript received June 1, 2006; revised March 12, 2007. This work was supported by the Industrial Technology Research Institute (ITRI), Zhudong, Taiwan, under Contract G1-95003.

The author is with the Department of Communication Engineering, National Chiao Tung University (NCTU), Hsinchu, Taiwan (e-mail: stefan.moser@ieee. org).

Communicated by K. Kobayashi, Associate Editor for Shannon Theory. Digital Object Identifier 10.1109/TIT.2007.899512

factor in front of the logarithm that is related to the number of receive and transmit antennas [3].

In the latter case, the asymptotic growth rate of the capacity
de-pends highly on the specific details of the fading process. In the case of
Gaussian fading, nonregularity means that the present fading
*realiza-tion can be predicted precisely from the past realizarealiza-tions. However, in*
*every noncoherent system the past realizations are not known a priori,*
but need to be estimated either by known past channel inputs and
out-puts or by means of special training signals. Depending on the spectral
distribution of the fading process, the dependence of such estimations
on the available power can vary largely which gives rise to a huge
va-riety of possible high-SNR capacity behaviors: it is shown in [4], [5],
and [6] that depending on the spectrum of the nonregular Gaussian
fading process, the asymptotic behavior of the channel capacity can
be varied in a large range: it is possible to have very slow
double-loga-rithmic growth, fast logadouble-loga-rithmic growth, or even exotic situations where
the capacity grows proportionally to a fractional power oflog SNR.

Similarly, Liang and Veeravalli show in [7] that the capacity of a Gaussian block-fading channel depends critically on the assumptions one makes about the time-correlation of the fading process: if the cor-relation matrix is rank deficient, the capacity grows logarithmically in the SNR, otherwise double-logarithmically.

In this correspondence we will only consider noncoherent channels
with regular fading processes, i.e., the capacity at high SNR will be
growing double-logarithmically. To quantify the rates at which this
*poor power efficiency begins, [1], [2] introduce the fading number as*
the second term in the high-SNR asymptotic expansion of channel
ca-pacity. Hence, the capacity can be written as

C(SNR) = log(1 + log(1 + SNR)) + + o(1) (1)
whereo(1) tends to zero as the SNR tends to infinity, and where is a
*constant, denoted fading number, that does not depend on the SNR.*

Explicit expressions of the fading number are known for a number of fading models. For channels with memory, the fading number of single-input–single-output (SISO) fading channels is derived in [1], [2] and the single-input–multiple-output (SIMO) case is derived in [8] and [2].

For memoryless fading channels, the fading number is known in the situation of only one antenna at transmitter and receiver (SISO)

(H) = log + log jHj2 _{0 h(H)} _{(2)}

in the situation of a SIMO fading channel1

(H) = h( ^He2) + nR log kHk2 0 log 2 0 h(H) (3) (both are special cases from the corresponding situation with memory), and also in the case of a multiple-input single-output (MISO) fading channel [1], [2]

(H ) = sup ^

x log + log jH ^xj

2 _{0 h(H ^x) :} _{(4)}
The most general situation of multiple antennas at both transmitter and
receiver, however, has been solved so far only in the special case of a
particular rotational symmetry of the fading process: if every rotation
of the input vector of the channel can be “undone” by a corresponding
rotation of the output vector, and vice-versa, then the fading number
has been shown in [1], [2] to be

( ) = log _{0(n}n

R)+ nR log k ^ek

2 _{0 h( ^e)} _{(5)}

1_{For a precise definition of the notation used in this corrspondence, we refer}

to Section II.

where^e 2 n is an arbitrary constant vector of unit length, and where nRdenotes the number of receive antennas. Such fading channels are called rotation-commutative in the generalized sense (for a detailed definition see Section V).

In this correspondence, we will extend these results and derive the fading number of general memoryless MIMO fading channels.

The remainder of this correspondence is structured as follows. Be-fore we proceed in Section III to introduce the channel model in detail, the following section will clarify our notation. We will then present the main result, i.e., the fading number of the general memoryless MIMO fading channel in Section IV. The corresponding proof is found in Sec-tion VII.

In Section V, the known fading numbers of SISO, SIMO, MISO, and rotation-commutative MIMO fading channels are derived once more as special cases of the new general result from Section IV. In Section VI, we investigate the situation of Gaussian fading processes. We will con-clude in Section VIII.

II. NOTATION

We try to use uppercase letters for random quantities and lower-case letters for their realizations. This rule, however, is broken when dealing with matrices and some constants. To better differentiate be-tween scalars, vectors, and matrices we have resorted to using different fonts for the different quantities. Uppercase letters such asX are used to denote scalar random variables taking value in the reals or in the complex plane . Their realizations are typically written in low-ercase, e.g.,x. For random vectors we use bold face capitals, e.g., X and bold lowercase for their realizations, e.g.,x. Deterministic ma-trices are denoted by uppercase letters but of a special font, e.g., ; and random matrices are denoted using another special uppercase font, e.g., . The capacity is denoted byC, the energy per symbol by E, and the signal-to-noise ratio is denoted by SNR.

We use the shorthand H_{a}b for (Ha; Ha+1; . . . ; Hb). For more
complicated expressions, such as H_{a}^xa; Ha+1^xa+1; . . . ; Hb^xb ,
we use the dummy variable` to clarify notation: H_{`}^x_{`} b

`=a. Hermitian conjugation is denoted by(1)y, and(1) stands for the transpose (without conjugation) of a matrix or vector. The trace of a matrix is denoted bytr (1).

We usek1k to denote the Euclidean norm of vectors or the Euclidean operator norm of matrices. That is

kxk
m
t=1
jx(t)_{j}2_{; x 2} m _{(6)}
k k max
k ^wk=1k ^wk: (7)

Thus,k k is the maximal singular value of the matrix .

The Frobenius norm of matrices is denoted byk 1 k_{F}and is given by
the square root of the sum of the squared magnitudes of the elements
of the matrix, i.e.,

k kF tr ( y ): (8)

Note that for every matrix

k k k kF (9)

as can be verified by upper bounding the squared magnitude of each of the components of w using the Cauchy–Schwarz inequality.^

We will often split a complex vectorv 2 mup into its magnitude
*kvk and its direction*

where we reserve this notation exclusively for unit vectors, i.e., throughout the correspondence every vector carrying a hat, ^v or ^V, denotes a (deterministic or random, respectively) vector of unit length

k^vk = k ^Vk = 1: (11)

*To be able to work with such direction vectors we shall need a *
differ-ential entropy-like quantity for random vectors that take value on the
unit sphere in m: let denote the area measure on the unit sphere in
m_{. If a random vector ^}_{V takes value in the unit sphere and has the}
densityp_{V}_{^}(^v) with respect to , then we shall let

h( ^V) 0 log pV^( ^V) (12) if the expectation is defined.

We note that just as ordinary differential entropy is invariant under
translation, so ish_{}( ^V) invariant under rotation. That is, if is a
de-terministic unitary matrix, then

h( ^V) = h( ^V): (13)

Also note thath_{}( ^V) is maximized if ^V is uniformly distributed on
the unit sphere, in which case

h( ^V) = log cm (14)

wherecmdenotes the surface area of the unit sphere in m cm= 2

m

0(m): (15)

The definition (12) can be easily extended to conditional entropies: if
W is some random vector, and if conditional on W = w the random
vector ^V has density p_{VjW}_{^} (^vjw) then we can define

h( ^V j W = w) 0 log p_{VjW}^ ( ^VjW) W = w (16)
and we can defineh_{}( ^Vj W) as the expectation (with respect to W)
ofh( ^V j W = w).

Based on these definitions, we have the following lemma.

*Lemma 1: Let*V be a complex random vector taking value in m
and having differential entropyh(V). Let kVk denote its norm and ^V
denote its direction as in (10). Then

h(V) = h(kVk) + h( ^V j kVk)

+ (2m 0 1) [log kVk] (17)

= h( ^V) + h(kVk j ^V) + (2m 0 1) [log kVk] (18) whenever all the quantities in (17) and (18), respectively, are defined. Hereh(kVk) is the differential entropy of kVk when viewed as a real (scalar) random variable.

*Proof: Omitted.*

We shall write X N (; ) if X 0 is a circularly
sym-metric, zero-mean, Gaussian random vector of covariance matrix
(X 0 )(X 0 )y _{= . By X U ([a; b]) we denote a random}
variable that is uniformly distributed on the interval[a; b]. The
prob-ability distribution of a random variableX or random vector X is
denoted byQXorQX, respectively.

Throughout the correspondencee2denotes a complex random vari-able that is uniformly distributed over the unit circle

e2_{ Uniform on fz 2} _{: jzj = 1g:} _{(19)}

When it appears in formulas with other random variables,e2is always assumed to be independent of these other variables.

All rates specified in this correspondence are in nats per channel use, i.e.,log(1) denotes the natural logarithmic function.

III. THECHANNELMODEL

We consider a channel withnTtransmit antennas andnRreceive
antennas whose time-k output Y_{k}2 n is given by

Yk= kxk+ Zk: (20)

Herexk 2 n denotes the time-k input vector; the random matrix k 2 n 2n denotes the time-k fading matrix; and the random vectorZk2 n denotes the time-k additive noise vector.

We assume that the random vectors fZ_{k}g are spatially and
tem-porally white, zero-mean, circularly symmetric, complex Gaussian
random vectors, i.e.,fZkg IID N 0; 2 for some2 > 0.
Here denotes the identity matrix.

As for the matrix-valued fading processf kg we will not specify a particular distribution, but shall only assume that it is stationary, er-godic, of a finite-energy fading gain, i.e.,

k kk2F < 1 (21)

*and regular, i.e., its differential entropy rate is finite*

h(f kg) lim n"1

1

nh( 1; . . . ; n) > 01: (22)
Furthermore, we will restrict ourselves to the memoryless case, i.e.,
we assume thatf _{k}g is independetn and identically distributed (IID)
with respect to timek. Since there is no memory in the channel, an IID
input processfX_{k}g will be sufficient to achieve capacity and we will
therefore drop the time indexk hereafter, i.e., (20) simplifies to

Y = x + Z: (23)

Note that while we assume that there is no temporal memory in the channel, we do not restrict the spatial memory, i.e., the different fading componentsH(i;j)of the fading matrix may be dependent.

We assume that the fading and the additive noiseZ are indepen-dent and of a joint law that does not depend on the channel inputx.

As for the input, we consider two different constraints: a peak-power constraint and an average-power constraint. We useE to denote the maximal allowed instantaneous power in the former case, and to denote the allowed average power in the latter case. For both cases we set

SNR E

2: (24)

The capacityC(SNR) of the channel (23) is given by C(SNR) = sup

Q I(X; Y) (25)

where the supremum is over the set of all probability distributions on
*X satisfying the constraints, i.e.*

kXk2_{ E; almost surely} _{(26)}

for a peak-power constraint, or

kXk2 E (27)

Specializing [1, Theorem 4.2], [2, Theorem 6.10] to memoryless MIMO fading, we have

lim

SNR"1fC(SNR) 0 log log SNRg < 1:

(28)

Note that [1, Theorem 4.2], [2, Theorem 6.10] is stated under the as-sumption of an average-power constraint only. However, since a peak-power constraint is more stringent than an average-peak-power constraint, (28) also holds in the situation of a peak-power constraint.

The fading number is now defined as in [1, Definition 4.6], [2, Definition 6.13] by

( ) lim

SNR"1fC(SNR) 0 log log SNRg: (29)
*Prima facie the fading number depends on whether a peak-power *
con-straint (26) or an average-power concon-straint (27) is imposed on the input.
However, it will turn out that the memoryless MIMO fading number is
identical for both cases.

IV. MAINRESULT
*A. Preliminaries*

Before we can state our main result, we need to introduce three con-cepts: The first concerns probability distributions that escape to infinity, the second a technique of upper bounding mutual information, and the third concept concerns circular symmetry.

*1) Escaping to Infinity: We start with a discussion about the concept*
of capacity-achieving input distributions that escape to infinity.

A sequence of input distributions parameterized by the allowed cost (in our case of fading channels the cost is the available power or SNR) is said to escape to infinity if it assigns to every fixed compact set a probability that tends to zero as the allowed cost tends to infinity. In other words this means that in the limit—when the allowed cost tends to infinity—such a distribution does not use finite-cost symbols.

This notion is important because the asymptotic capacity of many
channels of interest can only be achieved by input distributions that
es-cape to infinity. As a matter of fact one can show that every input
distri-bution that only achieves a mutual information of identical asymptotic
*growth rate as the capacity must escape to infinity. Loosely speaking,*
for many channels it is not favorable to use finite-cost input symbols
whenever the cost constraint is loosened completely.

In the following we will only state this result specialized to the situa-tion at hand. For a more general descripsitua-tion and for all proofs we refer to [8, Sec. VII.C.3], [2, Sec. 2.6].

*Definition 2: Let*fQ_{X;E}g_{E0}be a family of input distributions for
the memoryless fading channel (23), where this family is parameterized
by the available average powerE such that

Q kXk2 E; E 0: (30)

We say that the input distributionsfQX;EgE0escape to infinity if for everyE0 > 0

lim

E"1QX;E(kXk
2_{ E}

0) = 0: (31)

We now have the following lemma.

*Lemma 3: Let the memoryless MIMO fading channel be given as*
in (23) and letfQX;EgE0be a family of distributions on the channel
input that satisfy the power constraint (30). LetI(Q_{X;E}) denote the
mutual information between input and output of channel (23) when the

input is distributed according to the lawQ_{X;E}. Assume that the family
of input distributionsfQX;EgE0is such that the following condition
is satisfied:

lim E"1

I(QX;E)

log log E = 1: (32)

ThenfQX;EgE0must escape to infinity.

*Proof: A proof can be found in [8, Theorem 8, Remark 9], [2,*
Corollary 2.8].

*2) An Upper Bound on Channel Capacity: In [1] and [2] a new*
approach of deriving upper bounds to channel capacity has been
intro-duced. Since capacity is by definition a maximization of mutual
*infor-mation, it is implicitly difficult to find upper bounds to it. The proposed*
technique bases on a dual expression of mutual information that leads
to an expression of capacity as a minimization instead of a
maximiza-tion. This way it becomes much easier to find upper bounds.

Again, here we only state the upper bound in a form needed in the derivation of Theorem 7. For a more general form, for more mathemat-ical details, and for all proofs we refer to [1, Sec. IV], [2, Sec. 2.4].

*Lemma 4: Consider a memoryless channel with input*s 2 n and
outputT 2 . Then for an arbitrary distribution on the input S the
mutual information between input and output of the channel is upper
bounded as follows:

I(S; T ) 0h(T jS) + log + log + log 0 ; _{}
+(1 0 ) log(jT j2_{+ ) + 1}

jT j2 + (33) where; > 0 and 0 are parameters that can be chosen freely, but must not depend on the distribution ofS.

*Proof: A proof can be found in [1, Sec. IV], [2, Sec. 2.4].*
*3) Capacity-Achieving Input Distributions and Circular Symmetry:*
The final preliminary remark concerns circular symmetry. We say that
a random vector*W is circularly symmetric if*

W= W 1 eL 2 _{(34)}

where2 U ([0; 2]) is independent of W and where= stands forL
*“equal in law”. Note that this is not to be confused with isotropically*
*distributed, which means that a vector has equal probability to point*
in every direction. Circular symmetry only concerns the phase of the
components of a vector, not the vector’s direction.

The following lemma says that for our channel model an optimal input can be assumed to be circularly symmetric.

*Lemma 5: Assume a channel as given in (23). Then the *
ca-pacity-achieving input distribution can be assumed to be circularly
symmetric, i.e., the input vectorX can be replaced by Xe2, where
2 U ([0; 2]) is independent of every other random quantity.

*Proof: A proof is given in Appendix A.*

*Remark 6: Note that the proof of Lemma 5 relies only on the fact*
that the additive noise is assumed to be circularly symmetric.

*B. Fading Number of General Memoryless MIMO Fading Channels*
We are now ready for the main result, i.e., the fading number of a
memoryless MIMO fading channel.

*Theorem 7: Consider a memoryless MIMO fading channel (23)*
where the random fading matrix takes value in n 2n and satisfies

and

k k2

F < 1: (36)

Then, irrespective of whether a peak-power constraint (26) or an av-erage-power constraint (27) is imposed on the input, the fading number ( ) is given by (37) shown at the bottom of the page. Here ^X denotes a random vector of unit length andQX^ denotes its probability law, i.e., the supremum is taken over all distributions of the random unit-vector

^

X. Note that the expectation in the second term is understood jointly over and ^X.

Moreover, this fading number is achievable by a random vectorX = ^

X1R where ^X is distributed according to the distribution that achieves the fading number in (37) and whereR is a nonnegative random vari-able independent of ^X such that

log R2_{ U ([log log E; log E]) :} _{(38)}
*Proof: A proof is given in Section VII.*

Note that—even if it might not be obvious at first sight—it is not
hard to show that the distributionQ_{X}_{^} that achieves the supremum in
(37) is circularly symmetric. This is in agreement with Lemma 5.

The evaluation of (37) can be pretty awkward mainly due to the first term, i.e., the differential entropy with respect to the surface area mea-sure . We therefore will derive next an upper bound to the fading number that is easier to evaluate.

To that goal firstly note that for an arbitrary constant nonsingular nR2 nR matrix and an arbitrary constant nonsingularnT2 nT matrix

( ) = ( ); (39)

see [1, Lemma 4.7], [2, Lemma 6.14]. Second, note that for an arbitrary random unit vector ^Y 2 n

h( ^Y) log cn = log 2 n

0(nR) (40)

wherecn denotes the surface area of the unit sphere in n as defined in (15) and where the upper bound is achieved with equality only if ^Y is uniformly distributed on the sphere, i.e., ^Y is isotropically distributed. Using these two observations we get the following upper bound on the fading number.

*Corollary 8: The fading number of a memoryless MIMO fading*
channel as given in Theorem 7 can be upper bounded as follows:
( ) nRlog 0 log 0(nR)

+ inf

; sup^x nR log k ^xk

2 _{0 h(} _{^x)} _{(41)}
where the infimum is over all nonsingularn_{R}2 n_{R}complex matrices

and nonsingularnT2 nTcomplex matrices .

*Proof: Using the two observations (39) and (40), we immediately*
get from Theorem 7

( ) inf

; supQ X^ nRlog 0 log 0(nR)

+ nR [log k Xk^ 2j ^X = ^x]

0 h( X j ^^ X = ^x) : (42)

The result now follows by noting that (41) can always be achieved by
choosingQ_{X}^ in (42) to be the distribution which with probability 1
takes on the value^x that achieves the maximum in (41).

This upper bound is possibly tighter than the upper bound given in [1, Lemma 4.14], [2, Lemma 6.16] because of the additional infimum over .

V. SOMEKNOWNSPECIALCASES

In this section we will briefly show how some already known results of various fading numbers can be derived as special cases from this new more general result.

We start with the situation of a fading matrix that is
rotation-com-mutative in the generalized sense, i.e., the fading matrix is such that
for every constant unitarynT2 nTmatrix tthere exists annR2 nR
constant unitary matrix _{r}such that

r =L t (43)

where= stands for “has the same law”; and for every constant unitaryL nR2 nRmatrix rthere exists a constant unitarynT2 nTmatrix t such that (43) holds [1, Definition 4.37], [2, Definition 6.37].

The property of rotation-commutativity for random matrices is a generalization of the isotropic distribution of random vectors, i.e., we have the following lemma.

*Lemma 9: Let* be rotation-commutative in the generalized sense.
Then the following two statements hold.

• If ^X 2 n is an isotropically distributed random vector that is independent of , then X 2^ n is isotropically distributed. • If^e; ^e02 n are two constant unit vectors, then

k ^ek= k ^eL 0_{k; k^ek = k^e}0_{k = 1} _{(44)}
h( ^e) = h( ^e0_{); k^ek = k^e}0_{k = 1:} _{(45)}
*Proof: For a proof see, e.g., [1, Lemma 4.38], [2, Lemma 6.38].*

From Lemma 9 it immediately follows that in the situation of
rota-tion-commutative fading the only term in the expression of the fading
number (37) that depends onQ_{X}_{^} is

h ^ X k ^Xk : This entropy is maximized if X^

k ^Xk is uniformly distributed on the
surface of the n_{R}-dimensional complex unit sphere, which can be
achieved according to Lemma 9 by the choice of an isotropic
distribu-tion forQ_{X}_{^}. Then according to (14) and (15)

h ^ X k ^Xk = log 2 n 0(nR): (46) ( ) = sup Q h ^ X k ^Xk + nR log k ^Xk2 0 log 2 0 h( ^X j ^X) : (37)

The expression of the fading number (37) then reduces to (5)

( ) = log 2_{0(n}n

R)0 log 2 + nR log k ^ek

2 _{0 h( ^e)} _{(47)}
where^e is an arbitrary constant unit vector in n .

In case of a SIMO fading channel, the direction vector ^X reduces to
a phase terme8. From Lemma 5 we know that an optimal choice of
e8_{is circularly symmetric, such that (37) becomes}

(H) = h( ^He2) + nR log kHk2 0 log 2 0 h(H): (48) Before we continue with the MISO case, we would like to remark that the only term in (37) that depends on the distribution of the phase of each component ofX is

h ^ X k ^Xk :

Since we know from Lemma 5 that ^X is circularly symmetric, we can therefore equivalently write

h
^
X
k ^Xk = h
^
X
k ^Xke
2 _{:} _{(49)}

Turning to the MISO case now note that the distribution of H ^X

jH ^Xje 2

is identical to the distribution ofe2, independently of the distribution ofH and ^X. Hence,

h H ^X jH ^Xje

2 _{= h}

(e2) = log 2: (50) The fading number (37) then becomes

(H ) = sup
Q log 2 + log jH ^Xj
2 _{0 log 2}
0 h(H ^X j ^X) (51)
= sup
Q X^ log + log jH ^xj
2 _{j ^}_{X = ^x}
0 h(H ^x ^X = ^x) (52)
sup
^
x log + log jH ^xj
2 _{0 h(H ^x)} _{(53)}
which can be achieved for a distribution ofQ_{X}_{^} that with probability1
takes on the value^x that achieves the fading number in (53).

Finally, the SISO case is a combination of the arguments of the SIMO and MISO case, i.e., using

h(e2) = log 2 (54)

we get

(H) = log 2 + log jHj2 _{0 log 2 0 h(H)} _{(55)}

= log + log jHj2 0 h(H): (56)

VI. GAUSSIANFADING

The evaluation of the fading number is rather difficult even for the usually simpler situation of Gaussian fading processes. However, we are able to give the exact value for some important special cases, and we will give bounds on some others.

Throughout this section we assume that the fading matrix can be written as

= + ~ (57)

where all components of ~ are independent of each other and zero-mean, unit-variance Gaussian distributed, and where denotes a con-stant line-of-sight matrix.

Note that for some constant unitarynR2 nRmatrix and some
constant unitaryn_{T}2 n_{T} matrix the law of ~ is identical to
the law of ~ . Therefore, without loss of generality, we may restrict
ourselves to matrices that are “diagonal,” i.e., forn_{R} n_{T}

= ( ~ n 2(n 0n )) (58)

or, fornR > nT

= ~

(n 0n )2n (59)

where ~ is aminfnR; nTg 2 minfnR; nTg diagonal matrix with the singular values of on the diagonal.

*A. Scalar Line-of-Sight Matrix*

We start with a scalar line-of-sight matrix, i.e., we assume ~ = d where denotes the identity matrix.

Under these assumptions the fading number has been known already fornR = nT = m, in which case the fading matrix is rotation commutative [1], [2]:

( ) = mgm(jdj2) 0 m 0 log 0(m): (60) Heregm(1) is a continuous, monotonically increasing, concave func-tion defined as shown in (61) at the bottom of the page, form 2 , whereEi (1) denotes the exponential integral function defined as

Ei (0x) 0 1

x e0t

t dt; x > 0 (62)

and (1) is Euler’s psi function given by

(m) 0 +

m01

j=1 1

j (63)

with
0:577 denoting Euler’s constant. This function g_{m}(1) is the
expected value of the logarithm of a noncentral chi-square random

gm() log() 0 Ei (0) +
m01
j=1(01)
j _{e}0_{(j 0 1)! 0} (m01)!
j(m010j)! 1
j
; > 0
(m); = 0
(61)

variable, i.e., for some Gaussian random variables fU_{j}gm_{j=1} IID
N (0; 1) and for some complex constants fjgmj=1we have

log m j=1 jUj+ jj2 = gm(s2) (64) where s2 m j=1 jjj2 (65)

(see [9], [1, Lemma 10.1], [2, Lemma A.6] for more details and a
proof). We would like to emphasize thatgm() is continuous for all
0, i.e., in particular
lim
#0 log() 0 Ei (0) +
m01
j=1
(01)j
2 e0_{(j 0 1)! 0} (m 0 1)!
j(m 0 1 0 j)!
1
j
= (m) (66)
for allm 2 . Moreover, for all m 2 and 0

log 0 Ei (0) gm() log(m + ): (67) A derivation of (67) can be found in Appendix B.

We now consider the case wheren_{R} n_{T}:

*Corollary 10: Assume*nR nTand a Gaussian fading matrix as
given in (57). Let the line-of-sight matrix be given as

= d ( n n 2(n 0n )) : (68)

Then

( ) = nRgn (jdj2) 0 nR0 log 0(nR) (69) wheregm(1) is defined in (61).

*Proof: We write for the unit vector ^*X
^

X = _{44}_{4}4440 (70)

where444 2 n and4440 2 n 0n . Then
^
X = ^X + ~ ^X = d444 + ~H (71)
where ~H N (0; _{n} ). Hence
h( ^X j ^X) = h( ~H) = nRlog e (72)
nR log k ^Xk2 = nRgn (jdj2k444k2) nRgn (jdj2) (73)
h
^
X
k ^Xk log 2
n
0(nR): (74)

Here, the equality in (73) follows from the fact thatkd444 + ~Hk2 is
noncentral chi-square distributed and from (64); the inequality in (73)
follows from the monotonicity ofg_{m}(1) and is tight if k444k = 1, i.e.,
44

40 _{= 0; and the inequality in (74) follows from (14) and (15) and is}
tight if444 is uniformly distributed on the unit sphere in n so that X^
is isotropically distributed. The result now follows from Theorem 7.

The casen_{R}> n_{T}is more difficult since then (74) is in general not
tight. We will only state an upper bound.

*Proposition 11: Assume*n_{R}> n_{T}and a Gaussian fading matrix as
given in (57). Let the line-of-sight matrix be given as

= d n
(n 0n )2n : (75)
Then
( ) nTlog 1 + jdj
2
nT + nRlog nR0 nR0 log 0(nR): (76)
*Proof: This result is a special case of Proposition 13 and has been*
published before in [1, Eq. (128)], [2, Eq. (6.224)].

*B. General Line-of-Sight Matrix*

Next we assume Gaussian fading as defined in (57) with a general line-of-sight matrix having singular values d1; . . . ; dminfn ;n g. Hence, ~ , defined in (58) and (59), is given as

~ = diag d1; . . . ; dminfn ;n g (77)
wherejd_{1}j jd_{2}j 1 1 1 jd_{minfn ;n g}j > 0.

We again start with the casenR nT.

*Corollary 12: Assume*nR nT and a Gaussian fading matrix
as given in (57). Let the line-of-sight matrix have singular values
d1; . . . ; dn , wherejd1j jd2j 1 1 1 jdn j > 0. Then

( ) nRgn (k k2) 0 nR0 log 0(nR) (78)
whereg_{m}(1) is given in (61) and where k k2 = jd_{1}j2.

*Proof: A proof is given in Appendix C.*

The situationn_{R}> n_{T}is again more complicated. We include this
case in a new upper bound based on (41) which holds independently of
the particular relation betweenn_{R}andn_{T}.

*Proposition 13: Assume a Gaussian fading matrix as given in*
(57) and let the line-of-sight matrix be general with singular values
d1; . . . ; dminfn ;n g, wherejd1j jd2j 1 1 1 jdminfn ;n gj > 0.
Then the fading number is upper bounded as follows:

( ) minfnR; nTg log
2
minfnR; nTg
+nRlog nR0 nR0 log 0(nR) (79)
where
2 jd1j21 1 1 1 1 jdminfn ;n gj2 1= minfn ;n g
1 1 + 1
jd1j2 + 1 1 1 +
1
jdminfn ;n gj2 : (80)
*Proof: A proof is given in Appendix D.*

VII. PROOF OF THEMAINRESULT

The proof of Theorem 7 consists of two parts. First, we derive an upper bound to the fading number assuming an average-power con-straint (27) on the input. The key ingredients here are the preliminary results from Section IV-A.

In a second part we then show that this upper bound can actually be achieved by an input that satisfies the peak-power constraint (26). Since a peak-power constraint is more restrictive than the corresponding av-erage-power constraint, the theorem follows.

Because the proof is rather technical, we will give a short overview to clarify the main ideas.

The upper bound relies strongly on Lemma 3 which says that the input can be assumed to take on large values only, i.e., at high SNR the additive noise will become negligible so that we can bound

I(X; Y) I(X; X): (81)

This term is then split into a term that only considers the magnitude of X and a term that takes into account the direction:

I(X; X) = I(X; k Xk) + I X; X^

k ^Xk k Xk : (82)

For the first term—which is related to MISO fading—we then use the bounding technique of Lemma 4.

Because Lemma 3 only holds in the limit whenE tends to infinity, we introduce an eventkXk2> E0for some fixedE0 0 and condition everything on this event.

To derive a lower bound on capacity we choose a specific input dis-tribution of the form

X = R 1 ^X (83)

where the distribution ofR is such that it achieves the fading number of an SIMO fading channel and where the distribution of ^X is independent ofR and will be only specified at the very end of the derivation (it will be chosen to maximize the fading number). We then split the mutual information into two terms:

I(X; Y) = I(R; Y j ^X) + I( ^X; Y): (84) The first term (almost) corresponds to an SIMO fading channel with side-information for which the fading number is known. The second term is treated separately.

*A. Derivation of an Upper Bound*

In the following we will use the notationR kXk to denote the
magnitude of the input vectorX, i.e., we have X = R1 ^X. Note that in
this section we are not allowed to assume thatR is independent of ^X.
From Lemma 3, we know that the capacity-achieving input
distribu-tion must escape to infinity. Hence, we fix an arbitrary finiteE_{0} 0
and define an indicator random variable as follows:

E 1; if kXk2 E0

0; otherwise. (85)

Let

p Pr [E = 1] = Pr kXk2_{ E}

0 (86)

where we know from Lemma 3 that lim

E"1p = 1: (87)

We now bound as follows:

I(X; Y) I(X; E; Y) (88)

= I(E; Y) + I(X; Y j E) (89)

= H(E) 0 H(E j Y) + I(X; Y j E) (90)

H(E) + I(X; Y j E) (91) = Hb(p) + pI(X; Y j E = 1) + (1 0 p)I(X; Y j E = 0) (92) Hb(p) + I(X; Y j E = 1) + (1 0 p)C(E0) (93) where Hb() 0 log 0 (1 0 ) log(1 0 ) (94) is the binary entropy function. Here, (88) follows from adding an ad-ditional random variable to mutual information; the subsequent two equalities follow from the chain rule and from the definition of mutual information (notice that we use entropy and not differential entropy becauseE is a binary random variable); in the subsequent inequality we rely on the nonnegativity of entropy; and the last inequality follows from boundingp 1 and from upper bounding the mutual information term by the capacityC for the available power which—conditional on E = 0—is E0.

We remark that even thoughC(E_{0}) is unknown, we know that it is
finite and independent ofE so that from (87) we have

lim

E"1fHb(p) + (1 0 p)C(E0)g = 0: (95) We continue with the second term of (93) as follows:

I(X; Y j E = 1) = I(X; X + Z j E = 1) (96)
I(X; X + Z; Z j E = 1) (97)
= I(X; X; Z j E = 1) (98)
= I(X; X j E = 1)
+ I(X; Z j X; E = 1) (99)
= I(X; X j E = 1) (100)
= I X; k Xk; X
k Xk E = 1 (101)
= I X; k Xk; X^
k ^Xk E = 1 (102)
= I(X; k Xk j E = 1)
+ I X; X^
k ^Xk k Xk; E = 1 (103)
I(X; k Xk; e2_{j E = 1)}
+ I X; X^
k ^Xk k Xk; E = 1 (104)
= I(X; k Xke2_{j E = 1)}
+ I X; X^
k ^Xk k Xk; E = 1 : (105)
Here, (97) follows from adding an additional random vectorZ to the
argument of the mutual information; the subsequent equality from
sub-tracting the known vectorZ from Y; the subsequent two equalities
follow from the chain rule and the independence between the noise and
all other random quantities; then we split X into magnitude and
di-rection vector and use the chain rule again; (104) follows from adding
a random variable to mutual information: we introducee2that is
in-dependent of all the other random quantities and that is uniformly
dis-tributed on the complex unit circle; and the last equality holds because
fromk Xke2we can easily get backk Xk and e2.

We next apply Lemma 4 to the first term in (105), i.e., we choose
S = X and T = k Xke2_{. Note that we need to condition everything}
on the eventE = 1. We get

I(X; k Xke2j E = 1)

0h(k Xke2_{j X; E = 1) + log + log }
+ log 0 ; _{}

+ (1 0 ) log k Xk2_{+ E = 1}
+ 1_{} k Xk2 _{E = 1 + }

(106)

where; > 0, and 0 can be chosen freely, but must not depend onX.

Notice that from a conditional version of Lemma 1 withm = 1
follows that
h(k Xke2_{j X = x; E = 1)}
= h(e2j X = x; E = 1)
+ h(k Xk j e2_{; X = x; E = 1)}
+ [log k Xk j X = x; E = 1] (107)
= log 2 + h(k Xk j X = x; E = 1)
+ [log k Xk j X = x; E = 1] (108)

where we have used thate2is independent of all other random quanti-ties and uniformly distributed on the unit circle. Taking the expectation overX conditional on E = 1 we then yield

h(k Xke2_{j X; E = 1)}
= log 2 + h(k Xk j X; E = 1)
+ [log k Xk j E = 1] (109)
= log 2 + h(k ^Xk 1 R j ^X; R; E = 1)
+ log k ^Xk 1 R E = 1 (110)
= log 2 + h(k ^Xk j ^X; R; E = 1)
+ log R E = 1
+ log k ^Xk j E = 1
+ [log RjE = 1] (111)
= log 2 + h(k ^Xk j ^X; E = 1)
+ 2 [log R j E = 1]
+ log k ^Xk E = 1 (112)

where the second equality follows from the definition ofR kXk;
where the third equality follows from the scaling property of entropy
*with a real argument; and where the last equality follows because given*

^

X, k ^Xk is independent of R.

Next we assume0 < < 1 such that 1 0 > 0. Then we define

sup
kxk E log k xk
2_{+ 0} _{log k xk}2 _{(113)}
such that
(1 0 ) log(k Xk2+ ) E = 1
= (1 0 ) log(k Xk2_{+ ) 0 log k Xk}2 _{E = 1}
+ (1 0 ) log k Xk2 _{E = 1} _{(114)}
(1 0 ) sup
kxk E log(k xk
2_{+ ) 0} _{log k xk}2
+ (1 0 ) log k Xk2 E = 1 (115)
= (1 0 )+ (1 0 ) log k Xk2 E = 1 (116)
+ (1 0 ) log k Xk2 E = 1 : (117)

Note that in the first inequality we have made use of the fact thatE = 1,
i.e., thatkXk2 E_{0}. Finally, we bound

1
k Xk2 E = 1
= 1_{} k ^Xk2_{1 R}2 _{E = 1} _{(118)}
1_{}sup
^
x k ^xk
2_{1 R}2 _{E = 1} _{(119)}
= 1_{}sup
^
x k ^xk
2 _{1} _{R}2 _{E = 1} _{(120)}
1_{}sup
^
x k ^xk
2 _{1 E}
p (121)

where we have used the fact thatR needs to satisfy the average-power constraint (27) to get the following bound:

E R2 _{(122)}

= p R2 E = 1 + (1 0 p) R2 E = 0 (123)

p R2 _{E = 1 :} _{(124)}

Plugging (112), (117), and (121) into (106) we yield

I(X; k Xke2_{j E = 1)}

0 log 2 0 h(k ^Xk j ^X; E = 1) 0 2 [log R j E = 1]
0 log k ^Xk E = 1 + log + log 0 ; _{}
+ (1 0 ) log k Xk2_{E = 1 + }
+ 1_{}sup
^
x k ^xk
2 E
p + : (125)

Next, we continue with the second term in (105):

I X; X^ k ^Xk k Xk; E = 1 = h ^ X k ^Xk k Xk; E = 1 0 h ^ X k ^Xk k ^Xk 1 R; ^X; R; E = 1 (126) = h ^ X k ^Xk k Xk; E = 1 0 h ^ X k ^Xk k ^Xk; ^X; R; E = 1 (127) h ^ X k ^Xk E = 1 0 h ^ X k ^Xk k ^Xk; ^X; E = 1 : (128) Here, the last inequality follows because conditioning cannot increase entropy and because given ^X and k ^Xk, the term ^X=k ^Xk does not depend onR.

Hence, using (128), (125), and (105) in (93), we get I(X;Y)

Hb(p) + (1 0 p)C(E0) 0 log 2 0 h(k ^Xk j ^X; E = 1)
0 2 [log R j E = 1] 0 log k ^Xk E = 1 + log
+ log 0 ; _{} + (1 0 ) log k Xk2 _{E = 1}
+ + 1_{}sup
^
x k ^xk
2 E
p + + h
^
X
k ^Xk E = 1
0 h
^
X
k ^Xk k ^Xk; ^X; E = 1 (129)
= Hb(p) + (1 0 p)C(E0) 0 log 2 0 h( ^X j ^X; E = 1)
+ (2nR0 1) log k ^Xk E = 1 0 2 log R E = 1
0 log k ^Xk jE = 1 + log + log 0 ; _{}
+ 2 [log k Xk j E = 1] 0 log k Xk2_{j E = 1 + }
+ 1_{}sup
^
x k ^xk
2 E
p+ + h
^
X
k ^Xk E = 1 (130)
= h
^
X
k ^Xk E = 1 0 h( ^X j ^X; E = 1)
+ nR log k ^Xk2jE = 1 0 log 2 + log 0 ; _{}
+ 1_{}sup
^
x k ^xk
2 E
p+ + + Hb(p) + (1 0 p)C(E0)
+ log 0 log k Xk2_{E = 1} _{(131)}
h
^
X
k ^Xk E = 1 0 h( ^X j ^X; E = 1)
+ nR log k ^Xk2 E = 1 0 log 2 + log 0 ; _{}
+ 1_{}sup
^
x k ^xk
2 E
p+ + + Hb(p) + (1 0 p)C(E0)
+ (log 0 log E00 ): (132)

Here, (130) follows again from a conditional version of Lemma 1 sim-ilar to (107)–(112) which allows us to combine the fourth and the last term in (129); in the subsequent equality we arithmetically rearrange the terms; and the final inequality follows from the following bound:

log k Xk2 _{E = 1 } _{inf}
kxk E log k xk
2 _{(133)}
= log E0+ inf
^
x log k ^xk
2 _{(134)}
log E0+ (135)

where the last line should be taken as a definition for. Notice that

01 < < 1 (136)

as can be argued as follows: the lower bound on follows from [1, Lemma 6.7f)], [2, Lemma A.15f)] because h( ) > 01 and

k k2

F < 1. The upper bound on can be verified using the concavity of the logarithm function and Jensen’s inequality.

Note that (132) does not depend on the distribution ofR anymore,
but only on ^X! Hence, we can get an upper bound on capacity by taking
the supremum over all possible distributionsQ_{X}_{^}. This then gives us the
following upper bound on the fading number:

( ) = lim

E"1 C(E) 0 log 1 + log 1 + E2 (137)

= lim

E"1 _{Q}supI(X; Y) 0 log 1 + log 1 + E2 (138)
lim
E"1 sup_{Q} h
^
X
k ^Xk 0 h( ^X j ^X) 0 log 2
+ nR log k ^Xk2 + log 0 ; _{}
+ 1
supx^ k ^xk
2 E
p+ + + Hb(p)
+ (1 0 p)C(E0) + (log 0 log E00 )

0 log 1 + log 1 + E_{}_{2} (139)
= lim
E"1 supQ h
^
X
k ^Xk 0 h( ^X j ^X) 0 log 2
+ nR log k ^Xk2 + log 0 ; _{}
+ 1_{}sup
^
x k ^xk
2 E
p+ + + Hb(p)
+ (1 0 p)C(E0) + (log 0 log E00 )
0 log 1 + log 1 + E
2 (140)
= sup
Q h
^
X
k ^Xk 0 h( ^X j ^X) 0 log 2
+ nR log k ^Xk2
+ lim

E"1 log 0 ; 0 log 1 + 1

sup^x k ^xk 2 E

p + + + Hb(p)
+ (1 0 p)C(E0) + (log 0 log E00 )
+ log 1_{}0 log 1 + log 1 + E_{}_{2} (141)
= sup
Q h
^
X
k ^Xk 0 h( ^X j ^X) + nR log k ^Xk
2
0 log 2 + log(1 0 e0_{) + + }
0 log : (142)
Here, the first two equalities follows from the definition of the fading
number (29); the subsequent inequality from (132); (140) follows
be-cause the parameters, , and must not depend on the input
distribu-tionQX^ (however, note that we are allowed to let them depend onE);
the subsequent equality follows since the first four terms do not depend
onE; and in the last equality we have used (95) and made the following
choices on the free parameters and

(E) = _{log E + log sup}
^

x [k ^xk2]

(E) = 1

(E)e= (144)

for some constant 0. For this choice, note that lim

E"1 log 0 ; 0 log 1 = log(1 0 e
0_{)}

(145) lim

E"1(log 0 log E00 ) = (146) lim E"1 1 supx^ k ^xk 2 E p+ = 0 (147) lim

E"1 log 10 log 1 + log 1 + E2 = 0 log : (148) (Compare with [1, App. VII], [2, Sec. B.5.9].)

To finish the derivation of the upper bound, we let go to zero. Note
that_{} ! 0 as # 0 as can be seen from (113). Note further that

lim
#0flog(1 0 e
0_{) 0 log g = 0:} _{(149)}
Therefore, we get
( ) sup
Q h
^
X
k ^Xk
0h( ^X j ^X) + nR log k ^Xk2 0 log 2 : (150)

*B. Derivation of a Lower Bound*

To derive a lower bound on capacity (or the fading number, respec-tively) we choose a specific input distribution. LetX be of the form

X = R 1 ^X: (151)

Here ^X 2 n is assumed to be a random unit-vector that is circularly
symmetric, but whose exact distribution will be specified later. The
random variableR 2 +_{0} is chosen to be independent of ^X and such
that

log R2_{ U [log x}2

min; log E] (152)

where we choosex2_{min}as
x2

min log E: (153)

Note that this choice ofR satisfies the peak-power constraint (26) and therefore also the average-power constraint (27).

Using such an input to our MIMO fading channel we get the fol-lowing lower bound to channel capacity:

C(E) I(X; Y) (154)
= I(R; ^X; Y) (155)
= I( ^X; Y) + I(R; Y j ^X) (156)
= I( ^X; Y) + I(R; Ye2_{j ^}_{X) 0 I(R; Ye}2_{j ^}_{X)}
+ I(R; Y j ^X) (157)
= I( ^X; Y) + I(R; e2_{; Ye}2_{j ^}_{X)}
0 I(e2; Ye2j ^X; R)
0 I(R; Ye2_{j ^}_{X) + I(R; Y j ^}_{X):} _{(158)}

Here we have introduced a new random variable2 U ([0; 2]) which is assumed to be independent of every other random quantity.

The last two terms can be rearranged as follows:

0I(R; Ye2_{j ^}_{X) + I(R; Y j ^}_{X)}

= 0h(Ye2_{j ^}_{X) + h(Ye}2_{j ^}_{X; R) + h(Y j ^}_{X)}

0 h(Y j ^X; R) (159)

= 0h(Ye2j ^X) + h(Ye2j ^X; R) + h(Ye2j ^X; e2)

0 h(Ye2_{j ^}_{X; R; e}2_{)} _{(160)}

= 0I(e2_{; Ye}2_{j ^}_{X) + I(e}2_{; Ye}2_{j ^}_{X; R):} _{(161)}
Here the second equality follows becausee2is independent of
every-thing else so that we can add it to the conditioning part of the entropy
without changing its values, and because differential entropy remains
unchanged if its argument is multiplied by a constant complex number
of magnitude1.

Combining this with (158), we yield

C(E) I( ^X; Y) + I(R; e2_{; Ye}2_{j ^}_{X) 0 I(e}2_{; Ye}2_{j ^}_{X)}
(162)
= I( ^X; Y) + I(Re2_{; Ye}2_{j ^}_{X) 0 I(e}2_{; Ye}2_{j ^}_{X)}

(163)

where the last equality follows because fromRe2 the random vari-ablesR and e2can be gained back.

We continue with bounding the first term in (163)

I( ^X; Y) = I( ^X; Y; Z) 0 I( ^X; Z j Y) (x ) (164) I( ^X; Y; Z) 0 (xmin) (165) = I( ^X; ^XR) 0 (xmin) (166) = I ^X; X^ k ^Xk; k ^Xk 1 R 0 (xmin) (167) = I ^X; X^ k ^Xk + I X; k ^^ Xk 1 R ^ X k ^Xk 0 (xmin): (168)

Here the first equality follows from the chain rule; in the subsequent in-equality we lower bound the second term by0(xmin) which is defined in Appendix E and is shown there to be independent of the input distri-butionQXand to tend to zero asxmin" 1; in the subsequent equality we useZ in order to extract ^XR from Y and then drop (Y; Z) since given XR it is independent of the other random variables; and the^ last equality follows again from the chain rule.

Similarly, we bound the third term in (163)

I(e2_{; Ye}2_{j ^}_{X) I(e}2_{; Ye}2_{; Ze}2_{j ^}_{X)} _{(169)}
= I(e2_{; Xe}2_{; Ze}2_{j ^}_{X)} _{(170)}
= I(e2_{; Xe}2_{j ^}_{X) + I(e}2_{; Ze}2 _{j Xe}2_{; ^}_{X)}
(171)
= I(e2; Xe2j ^X) (172)
= I e2_{; k ^}_{Xk 1 R;} X^
k ^Xke
2_{X}_{^} _{(173)}

= I e2; X^
k ^Xke
2 _{X}_{^}
+ I e2_{; k ^}_{Xk 1 R} X^
k ^Xke
2_{; ^}_{X :} _{(174)}
Hence, plugging these results into (163), we get

C(E) I(Re2_{; Ye}2_{j ^}_{X) + I ^}_{X;} X^
k ^Xk
+ I X; k ^^ Xk 1 R X^
k ^Xk
0 I e2_{;} X^
k ^Xke
2 _{X}_{^}
0 I e2_{; k ^}_{Xk 1 R} X^
k ^Xke
2_{; ^}_{X}
0 (xmin): (175)

We next bound the third and fifth mutual information term in (175)
I X; k ^^ Xk 1 R X^
k ^Xk 0 I e
2_{; k ^}_{Xk 1 R} Xe^ 2
k ^Xk; ^X
= h k ^Xk 1 R X^
k ^Xk 0 h k ^Xk 1 R
^
X
k ^Xk; ^X
0 h k ^Xk 1 R X^
k ^Xke
2_{; ^}_{X}
+ h k ^Xk 1 R X^
k ^Xke
2_{; ^}_{X; e}2 _{(176)}
= h k ^Xk 1 R X^
k ^Xk 0 h k ^Xk 1 R
^
X
k ^Xk; ^X
0 h k ^Xk 1 R X^
k ^Xke
2_{; ^}_{X}
+ h k ^Xk 1 R X^
k ^Xk; ^X (177)
= h k ^Xk 1 R X^
k ^Xk 0 h k ^Xk 1 R
^
X
k ^Xke
2_{; ^}_{X}
(178)
h k ^Xk 1 R X^
k ^Xk 0 h k ^Xk 1 R
^
Xe2
k ^Xk (179)
= h k ^Xk 1 R X^
k ^Xk 0 h k ^Xk 1 R
^
X
k ^Xk (180)
= 0: (181)

Here, the inequality follows from conditioning that reduces entropy; and the second last equality holds because we have assumed ^X to be circularly symmetric, i.e., ^X “destroys” the random phase shift of e2.

Therefore, we are left with the following bound:

C(E) I(Re2_{; Ye}2_{j ^}_{X) + I ^}_{X;} X^
k ^Xk
0I e2; X^
k ^Xke
2 _{X 0 (x}_{^}
min): (182)
Now, we rewrite the second and third term as follows:

I ^X; X^
k ^Xk 0 I e
2_{;} X^
k ^Xke
2 _{X}_{^}
= h
^
X
k ^Xk
0 h
^
X
k ^Xk X 0 h^
^
X
k ^Xke
2 _{X}_{^}
+ h
^
X
k ^Xke
2 _{X; e}_{^} 2 _{(183)}
= h
^
X
k ^Xk 0 h
^
X
k ^Xk X 0 h^
^
X
k ^Xke
2 _{X}_{^}
+ h
^
X
k ^Xk X^ (184)
= h
^
X
k ^Xk 0 h
^
X
k ^Xke
2 _{X}_{^} _{(185)}

where the second equality follows from (13) with a choice = e0

n and from the fact that e2 is independent of all other random quantities.

This leaves us with

C(E) I(Re2_{; Ye}2_{j ^}_{X) + h}
^
X
k ^Xk
0h
^
X
k ^Xke
2 _{X 0 (x}_{^}
min): (186)
Next, we let the power grow to infinityE ! 1 and use the
defi-nition of the fading number (29). SinceRe2is circularly symmetric
with a magnitude distributed according to (152), we know from [1, Eq.
(108) and Theorem 4.8], [2, Eq. (6.194) and Theorem 6.15], thatRe2
achieves the fading number of a memoryless SIMO fading channel with
partial side-information. In our situation we have

I(Re2_{; Ye}2_{j ^}_{X) = I(Re}2_{; ^}_{XRe}2_{+ Z j ^}_{X)} _{(187)}
= I(Re2; ^XRe2+ Z; ^X) (188)
where ^X serves as partial receiver side-information (that is independent
of the SIMO inputRe2). Note that a random vectorA is said to contain
*only partial side-information about*B if h(BjA) > 01, i.e., in our
case we need

h( ^X j ^X) > 01 (189)

which is satisfied since we assume thath( ) > 01 and k k2_{F} <
1 (see [1, Lemma 6.6], [2, Lemma A.14]).

Hence
( ) lim
E"1 I(Re
2_{; ^}_{XRe}2_{+ Z j ^}_{X) + h}
X^
k ^Xk
0 h
^
X
k ^Xke
2 _{X 0 (x}_{^}
min)
0 log 1 + log 1 + E
2 (190)
= lim
E"1 I(Re
2_{; ^}_{XRe}2_{+ Z j ^}_{X)}

0 log 1 + log 1 + E_{}_{2} 0 (xmin)
+ h
^
X
k ^Xk 0 h
^
X
k ^Xke
2 _{X}_{^} _{(191)}
= ( ^X j ^X) + h
^
X
k ^Xk 0 h
^
X
k ^Xke
2 _{X}_{^} _{(192)}
= h
^
X
k ^Xke
2 _{X + n}_{^}
R log k ^Xk2 0 log 2
0 h( ^X j ^X) + h
^
X
k ^Xk 0 h
^
X
k ^Xke
2 _{X}_{^}
(193)
= h
^
X
k ^Xk + nR log k ^Xk
2 _{0 log 2 0 h( ^}_{X j ^}_{X):}
(194)
Here in (192), we have used the fact that our choice (153) guarantees
that(xmin) tends to zero as E ! 1 (see Appendix E) and that we
achieve the SIMO fading number for a channel with inputRe2 and
output ^xRe2+ Z; the subsequent equality follows from the fading
number of a memoryless SIMO fading channel where the receiver has
access to some partial side-information [1, Eq. (108)], [2, Eq. (6.194)]:

(HjS) = h( ^He2j S) + nR log kHk2 0 log 2 0 h(HjS): (195)

The result now follows by choosing the distribution Q_{X}_{^} such as to
maximize the lower bound (194) to the fading number.

VIII. CONCLUSION

We have derived the fading number of a MIMO fading channel of
general fading law including spatial, but without temporal memory.
Since the fading number is the second term after the double-logarithmic
term of the high-SNR expansion of channel capacity, this means that we
have precisely specified the behavior of the channel capacity
asymp-totically when the power grows to infinity. The result shows that the
asymptotic capacity can be achieved by an input that consists of the
product of two independent random quantities: a circularly symmetric
*random unit vector (the direction) and a nonnegative (i.e., real) random*
*variable (the magnitude). The distribution of the random direction is*
chosen such as to maximize the fading number and therefore depends
on the particular law of the fading process. The nonnegative random
variable is such that (38) is satisfied. This is the well-known choice that
also achieves the fading number in the SISO and SIMO case and is also
used in the MISO case where it is multiplied by a constant
beam-direc-tion^x. All these special cases follow nicely from this new result.

We have then derived some new results for the important special
sit-uation of Gaussian fading. For the case of a scalar line-of-sight matrix
(68) assuming at least as many transmit as receive antennasn_{R} n_{T}
we have been able to state the fading number precisely

= nRgn (jdj2) 0 nR0 log 0(nR) (196)
where g_{m}(1) denotes the expected value of a noncentral chi-square
random variable (see (61)). We see that the asymptotic capacity only
depends on the number of receive antennas and is growing
proportion-ally tonRlog jdj2.

For a general line-of-sight matrix, we have shown an upper bound that grows likeminfnR; nTg log 2where2is a certain kind of

av-erage of all singular values of the line-of-sight matrix (see (79) and (80)).

We would like to emphasize that even though all results on the fading number are asymptotic results for the theoretical situation of infinite power, they are still of relevance for finite SNR values: it has been shown that the approximation

C(SNR) log(1 + log(1 + SNR)) + (197) holds already for moderate values of the SNR. Actually, pulling our-selves by our bootstraps, let us consider for the moment that (197) starts to be valid for an SNR somewhere in the range of 30 to 80 dB. In this caselog(1 + log(1 + SNR)) will have a value between 2 and 3 nats. Hence, once the capacity is appreciably above + 2 nats, the approx-imation (197) is likely to be valid [10], [11].

Therefore, the fading number can be seen as an indicator of the max-imal rate at which power efficient communication is possible on the channel. For a further discussion about the practical relevance of the fading number we refer to [10] and [12].

APPENDIXA PROOF OFLEMMA5

Assume that2 U ([0; 2]), independent of every other random
quantity. Then
I(X; Y) = I(X; Y j e2) (198)
= I(Xe2_{; Ye}2_{j e}2_{)} _{(199)}
= I(Xe2_{; Xe}2_{+ Z j e}2_{)} _{(200)}
= I( ~X; ~X + Z j e2_{)} _{(201)}
= h( ~X + Z j e2_{) 0 h( ~}_{X + Z j ~}_{X; e}2_{) (202)}
= h( ~X + Z j e2) 0 h( ~X + Z j ~X) (203)
h( ~X + Z) 0 h( ~X + Z j ~X) (204)
= I( ~X; ~X + Z): (205)

Here the first equality follows because2 is independent of every other random quantity; the third equality follows becauseZ is circularly sym-metric; in the subsequent equality we substitute ~X = Xe2; and the inequality follows since conditioning reduces entropy.

Hence, a circularly symmetric input achieves a mutual information that is at least as big as the original mutual information.

APPENDIXB DERIVATION OFBOUNDS(67)

In this appendix we will derive the bounds (67) ong_{m}(1). We start
with the upper bound which follows directly from (64) and (65) and
from Jensen’s inequality:

gm(s2) = log m j=1 jUj + jj2 (206) log m j=1 jUj + jj2 (207) = log m j=1 (1 + jjj2) (208) = log(m + s2): (209)

For the lower bound we also start with (64) and choose_{1} = s and
2 = 1 1 1 = m = 0. Then we get

gm(s2) = log m

j=1

log jU1+ 1j2 (211)

= g1(s2) (212)

= log s2_{0 Ei 0s}2 _{:} _{(213)}

Here, (211) follows from dropping some nonnegative terms in the sum;
and in the subsequent two equalities we use the definition ofg_{1}(1).

APPENDIXC
PROOF OFCOROLLARY12
We choose a constantn_{T}2 n_{T}matrix as follows:

diag 1_{d}

1; . . . ; 1dn ; 1d1; . . . ; 1d1 (214) and then we note that for a unit vector^x = (^x(1); . . . ; ^x(n ))

^x = ^x + ~ ^x =
^x(1)
..
.
^x(n )
+ ~ ^x + ~H (215)
where ~H N 0; 2(^x)n with
2_{(^x)} j^x(1)j2
jd1j2 + 1 1 1 + j^x
(n )_{j}2
jdn j2 + j^x
(n +1)_{j}2
jd1j2 + 1 1 1 + j^x
(n )_{j}2
jd1j2
(216)

and where 2 n withkk 1. Therefore

h( X j ^^ X = ^x) = nRlog e2(^x) (217)
log k ^xk2 _{= log }2_{(^x) + g}

n kk 2

2_{(^x)} (218)
(where the last equality follows from (64)) and hence

nR log k Xk^ 2 0 h( X j ^^ X) = nR gn j ^X

(1)_{j}2_{+ 1 1 1 + j ^}_{X}(n )_{j}2
2_{( ^}_{X)}

0 nRlog e: (219)

The upper bound on the fading number now follows from (39); from
Theorem 7 by upper bounding theh-term bylog cn ; and from the
additional observations thatgm(1) is a monotonically increasing
func-tion, that
j ^X(1)_{j}2_{+ 1 1 1 + j ^}_{X}(n )_{j}2 _{ 1} _{(220)}
and that
2_{( ^}_{X) = j}X^(1)j2
jd1j2 + 1 1 1 + j
^
X(n )_{j}2
jdn j2
+ jX^(n +1)j2
jd1j2 + 1 1 1 + j
^
X(n )_{j}2
jd1j2 (221)
jX_{jd}^(1)j2
1j2 + 1 1 1 + j
^
X(n )_{j}2
jd1j2 (222)
= 1_{jd}
1j2 j ^X
(1)_{j}2_{+ 1 1 1 + j ^}_{X}(n )_{j}2 _{(223)}
= 1_{jd}
1j2 =
1
k k2 (224)

where the inequality follows sincejd_{1}j jd_{2}j 1 1 1 jd_{n} j.

APPENDIXD PROOF OFPROPOSITION13

This upper bound is based on the upper bound given in Corollary 8 for a choice of = n . IfnR> nTwe choose for

diag a_{d}

1; . . . ; adn ; b; . . . ; b (225) with

b _{n}2

T (226)

for as given in (80), and with a such that det = 1, i.e.,

a (d11 1 1 1 1 dn ) 1 b : (227)

For such a choice we note that

^x = a ^x_{0} + N 0; jaj_{jd} 2
1j2 ; . . . ; N 0; jaj
2
jdn j2 ;
N 0; b2 _{; . . . ; N 0; b}2 _{(228)}
so that
k ^xk2 _{= }2 _{b}2 _{+ (n}
R0 nT)b2 (229)
= nR
2
nT : (230)

Hence, using Jensen’s inequality and the fact thatdet = 1 we get nR log k ^xk2 0 h( ^x)

nRlog k ^xk2 0 log det 0 h( ^x) (231) = nRlog nR

2 nT

n =n

0 nRlog e: (232) Plugging this into the upper bound (41) of Corollary 8, we yield

nRlog 0 log 0(nR) + nRlog nR
+ nTlog
2
nT 0 nRlog e (233)
= nTlog
2
nT + nRlog nR0 log 0(nR) 0 nR:
(234)
Ifn_{R} n_{T}we choose for
= diag a_{d}
1; . . . ; adn (235)

witha such that det = 1, i.e.,

a (d11 1 1 1 1 dn ) : (236)

For such a choice we note that

^x = a ^x(1)_{; . . . ; ^x}(n )
+ N 0; jaj_{jd} 2

1j2 ; . . . ; N 0; jaj 2

so that
k ^xk2 _{= jaj}2 _{j^x}(1)_{j}2_{+ 1 1 1 + j^x}(n )_{j}2
+ jaj_{jd} 2
1j2 + 1 1 1 + jaj
2
jdn j2 (238)
2 (239)

where we have boundedj^x(1)j2+ 1 1 1 + j^x(n )j2 1. Hence, using Jensen’s inequality and the fact thatdet = 1 we get

nR log k ^xk2 0 h( ^x)

nRlog k ^xk2 0 log det 0 h( ^x) (240)

nRlog 20 nRlog e: (241)

Plugging this into the upper bound (41) of Corollary 8, we yield

nRlog 0 log 0(nR) + nRlog 20 nRlog e (242) = nRlog

2

nR + nRlog nR0 log 0(nR) 0 nR: (243) The result now follows by combining (234) and (243).

APPENDIXE

ADDITIONALDERIVATION FOR THEPROOF OF THELOWERBOUND In the derivation of the lower bound to the fading number we need to find the following upper bound

I( ^X; Z j Y) (xmin) (244)

and to show that(x_{min}) does not depend on the input distribution Q_{X}
and tends to zero asxmintends to infinity.

Such a bound can be found as follows:

I( ^X; Z j Y) = h(ZjY) 0 h(Z j Y; ^X) (245) h(Z) 0 h(Z j Y; ^X; R) (246) = h(Z) 0 h(Z j ^XR + Z; ^X; R) (247) h(Z) 0 inf ^ x rxinf h(Z j ^xr + Z) (248) = h(Z) 0 inf ^ x h(Z j ^xxmin+ Z) (249) = sup ^ x I(Z; ^xxmin+ Z) (250) = sup ^ x I Z xmin; ^x + Zxmin (251) = sup ^ x h ^x + Zxmin 0 h( ^x) (252) (xmin) (253)

where we have used the fact that we have chosenR such that R xmin. Note that (252) does not depend on the inputX anymore. The convergence

lim

x "1(xmin) = 0 (254) follows from [1, Lemma 6.11], [2, Lemma A.19].

ACKNOWLEDGMENT

The author would like to thank Amos Lapidoth for long, fruitful dis-cussions and Tobias Koch for giving the right hint to fix a serious bug in an earlier version of the proof. Moreover, the author thanks the anony-mous reviewer for his detailed and very helpful comments.

REFERENCES

[1] A. Lapidoth and S. M. Moser, “Capacity bounds via duality with
*ap-plications to multiple-antenna systems on flat fading channels,” IEEE*
*Trans. Inf. Theory, vol. 49, pp. 2426–2467, Oct. 2003.*

[2] S. M. Moser, “Duality-based bounds on channel capacity,” Ph.D.
dis-sertation, Swiss Fed. Inst. Technol., Zurich, Switzerland, 2004.
*[3] E. ˙I. Telatar, “Capacity of multi-antenna Gaussian channels,” Europ.*

*Trans. Telecommun., vol. 10, no. 6, pp. 585–595, Nov.–Dec. 1999.*
[4] A. Lapidoth, “On the high SNR capacity of stationary Gaussian fading

*channels,” in Proc. 41st Allerton Conf. Commun., Contr. Comput.,*
Monticello, IL, Oct. 1–3, 2003, pp. 410–419.

[5] T. Koch, “On the asymptotic capacity of multiple-input single-output fading channels with memory,” Master’s thesis, Signal and Inf. Proc. Lab., ETH Zurich, Zurich, Switzerland, 2004.

[6] A. Lapidoth, “On the asymptotic capacity of stationary Gaussian fading
*channels,” IEEE Trans. Inf. Theory, vol. 51, pp. 437–446, Feb. 2005.*
[7] Y. Liang and V. V. Veeravalli, “Capacity of noncoherent

*time-selec-tive Rayleigh-fading channels,” IEEE Trans. Inf. Theory, vol. 50, pp.*
3095–3110, Dec. 2004.

[8] A. Lapidoth and S. M. Moser, “The fading number of single-input
*mul-tiple-output fading channels with memory,” IEEE Trans. Inf. Theory,*
vol. 52, pp. 437–453, Feb. 2006.

[9] A. Lapidoth and S. M. Moser, “The expected logarithm of a noncen-tral chi-square random variable,” [Online]. Available: http://moser.cm. nctu.edu.tw/explog.html

[10] T. Koch and A. Lapidoth, “The fading number and degrees of freedom
*in non-coherent MIMO fading channels: a peace pipe,” in Proc. IEEE*
*Int. Symp. Information Theory, Adelaide, Australia, Sep. 2005, pp.*
661–665.

[11] A Lapidoth, “The Fading Number and Degrees of Freedom: A Peace Pipe,” presented at the Shushan Purim, Israel, Mar. 27, 2005. [12] T. Koch and A. Lapidoth, “Degrees of freedom in non-coherent

*sta-tionary MIMO fading channels,” in Proc. Winter School Coding Inf.*
*Theory, Bratislava, Slovakia, Feb. 20–25, 2005, pp. 91–97.*