We start with a formal definition of stationarity.
Definition 2. A discrete-time stochastic process {Vk} is called stationary if for every positive integer n ∈ N, for every choice of integers k1, . . . , kn, ki ∈ Z for i = 1, . . . , n, and for every κ ∈ Z we have
Pr[Vk1+κ = v1, . . . , Vkn+κ = vn] = Pr[Vk1 = v1, . . . , Vkn = vn] , ∀v1, . . . , vn (1.12) i.e., the probability distribution of any subset of random variables from the process does not change when shifting over time.
In this report we will derive some properties of stationary processes and station-ary channel models. Even though these results were derived with the main intention of learning more about MIMO fading channels, most of them are very general and in no way restricted to fading channels.
More precisely, we are going to derive the following results: In Section 3.1 we will show that—under weak conditions on the channel—a stationary channel has a capacity-achieving input distribution that is stationary. This seems very intuitive, however, we are not aware of any rigorous proof in the literature. There we will also introduce the concept of quasi-stationarity: a finite sequence of random variables is said to be quasi-stationary if any subset of random variables from the sequence has a probability distribution that does not change when it is shifted in time (within the range of the finite sequence). Hence, a quasi-stationary block of random variables is basically a “stationary” process of finite duration.
In Section 3.2 we investigate the entropy rates of stationary processes. It is a fundamental result of (discrete) entropy that an “average” entropy, called entropy rate, exists for stationary processes. Similarly to the entropy that describes the un-certainty of a random variable, the entropy rate accurately describes the unun-certainty in a random process.
We will generalize the definition of entropy rate for finite-alphabet processes to differential entropy rates of continuous-alphabet processes and to even more compli-cated forms of conditional differential entropy rates. We will show that due to the stationarity assumption all limits still exist and these definitions make sense.
Finally, in Section 3.3 we investigate one particular mutual information term that intuitively is supposed to tend to zero because from a practical point of view the memory of any process is supposed to fade away once we let the time tend to infinity. However, it turns out that this is rather difficult to prove. Again, the clue to a mathematically correct derivation is stationarity (or, actually, quasi-stationarity).
The derivations and proofs for these statements can be found in Chapter 4, and we conclude in Chapter 5.
Next, in the following chapter, we will give some more details about our notation and about the channel model used in the situation of fading channels.
Chapter 2
Definitions and Notation
2.1 Notation
As is by now fairly customary, we usually try to use upper-case letters for random quantities and lower-case letters for their realizations. This rule becomes awkward when dealing with matrices because matrices are usually written in upper case even if they are deterministic. To better differentiate between scalars, vectors, and matrices we have resorted to using different fonts for the different quantities. Upper-case letters such as X are used to denote scalar random variables taking value in the reals R or in the complex plane C. Their realizations are typically written in lower-case, e.g., x. Random vectors in the m-dimensional complex Euclidean space Cmare described by bold face capitals, e.g., X; for their realizations we use bold lower-case, e.g., x. Deterministic matrices are denoted by upper-case letters but of a special font, e.g., H; and random matrices are denoted using another special upper-case font, e.g., H.
However, there will be a few exceptions to these rules. Since they are widely used in the literature, we will stick with the common customary shape of the en-tropy H(·) of a discrete random variable and of the mutual information functional I(·; ·). Moreover, we have decided to use the capital Q to denote the probability dis-tribution of on input of a channel. In particular, QX and QX denote the probability distribution of a random variable X and random vector X, respectively. Given an alphabet A we denote the set of all probability distributions over A by P(A).
The capacity is denoted by C, the energy per symbol by E, and the signal-to-noise ratio is denoted by snr.
We use the shorthand Hab for (Ha, Ha+1, . . . , Hb). For more complicated expres-sions, such as
(HTaxˆa, HTa+1xˆa+1, . . . , HTbxˆb) we use the dummy variable ℓ to clarify notation: {HTℓxˆℓ}bℓ=a.
The subscript k is reserved to denote discrete time. Curly brackets are used to distinguish between a random process and its manifestation at time k: {Xk} is a discrete random process over time, while Xk is the random variable of this process at time k.
Hermitian conjugation is denoted by (·)†, and (·)Tstands for the transpose (with-out conjugation) of a matrix or vector. We use k · k to denote the Euclidean norm
of vectors or the Euclidean operator norm of matrices. That is,
kxk, vu ut
Xm t=1
|x(t)|2, x∈ Cm (2.1)
kAk, max
k ˆwk=1kA ˆwk. (2.2)
Thus, kAk is the maximal singular value of the matrix A.
The Frobenius norm of matrices is denoted by k · kF and is given by the square root of the sum of the squared magnitudes of the elements of the matrix, i.e.,
kAkF ,q
tr (A†A) (2.3)
where tr (·) denotes the the trace of a matrix. Note that for every matrix A
kAk ≤ kAkF (2.4)
as can be verified by upper-bounding the squared magnitude of each of the compo-nents of A ˆwusing the Cauchy-Schwarz inequality.
We will often split a complex vector v ∈ Cm up into its magnitude kvk and its direction
ˆ v, v
kvk (2.5)
where we reserve this notation exclusively for unit vectors, i.e., throughout this report every vector carrying a hat, ˆv or ˆV, denotes a (deterministic or random, respectively) vector of unit length
kˆvk = k ˆVk = 1. (2.6)
To be able to work with such direction vectors we shall need a differential entropy-like quantity for random vectors that take value on the unit sphere in Cm. Note that with respect to a probability distribution over Cm, the surface of the unit sphere in Cm has zero measure such that the corresponding differential entropy is undefined.
We therefore introduce a new probability space that only lives on the surface of the unit sphere in Cm and denote its measure by λ. If a random vector ˆV takes value in the unit sphere and has the density pλVˆ(ˆv) with respect to λ, then we shall let
hλ( ˆV), −Eh
log pλVˆ( ˆV)i
(2.7) if the expectation is defined.
We note that just as ordinary differential entropy is invariant under translation, so is hλ( ˆV) invariant under rotation. That is, if U is a deterministic unitary matrix, then
hλ(U ˆV) = hλ( ˆV). (2.8)
Also note that hλ( ˆV) is maximized if ˆVis uniformly distributed on the unit sphere, in which case
hλ( ˆV) = log cm (2.9)
where cm denotes the surface area of the unit sphere in Cm cm= 2πm
Γ(m). (2.10)
The definition (2.7) can be easily extended to conditional entropies: if W is some random vector, and if conditional on W = w the random vector ˆVhas density pλVˆ
Based on these definitions we have the following lemma.
Lemma 3. Let V be a complex random vector taking value in Cm and having dif-ferential entropy h(V). Let kVk denote its norm and ˆV denotes its direction as in (2.5). Then whenever all the quantities in (2.12) and (2.13), respectively, are defined. Here h(kVk) is the differential entropy of kVk when viewed as a real (scalar) random variable.
Proof. This follows from a change of variables. Let W denote the real random vector in R2m that consists of the real and imaginary part of V stacked on top of each other. Then we define
R , kWk and Wˆ , W
kWk (2.14)
and note that the infinitesimal volume dw in the 2m-dimensional Euclidean space corresponds to dr · r2m−1d ˆw where d ˆw denotes an infinitesimal area on the unit sphere in R2m. Hence, the joint probability densities can be written as
pR kvk complex Gaussian random vector of covariance matrix E
(X − µ)(X − µ)†
= K.
By X ∼ U ([a, b]) we denote a random variable that is uniformly distributed on the interval [a, b].
All rates specified in this report are in nats per channel use, i.e., log(·) denotes the natural logarithmic function. The abbreviation RHS stands for right-hand side and LHS stands for left-hand side.