Stationarity - 各式多天線與多用戶中聯合估計與檢測之容量分析

We start with a formal definition of stationarity.

Definition 2. A discrete-time stochastic process {V_k} is called stationary if for every positive integer n ∈ N, for every choice of integers k₁, . . . , k_n, k_i ∈ Z for i = 1, . . . , n, and for every κ ∈ Z we have

Pr[V_k₁_+κ = v₁, . . . , V_k_n_+κ = v_n] = Pr[V_k₁ = v₁, . . . , V_k_n = v_n] , ∀v1, . . . , v_n (1.12) i.e., the probability distribution of any subset of random variables from the process does not change when shifting over time.

In this report we will derive some properties of stationary processes and station-ary channel models. Even though these results were derived with the main intention of learning more about MIMO fading channels, most of them are very general and in no way restricted to fading channels.

More precisely, we are going to derive the following results: In Section 3.1 we will show that—under weak conditions on the channel—a stationary channel has a capacity-achieving input distribution that is stationary. This seems very intuitive, however, we are not aware of any rigorous proof in the literature. There we will also introduce the concept of quasi-stationarity: a finite sequence of random variables is said to be quasi-stationary if any subset of random variables from the sequence has a probability distribution that does not change when it is shifted in time (within the range of the finite sequence). Hence, a quasi-stationary block of random variables is basically a “stationary” process of finite duration.

In Section 3.2 we investigate the entropy rates of stationary processes. It is a fundamental result of (discrete) entropy that an “average” entropy, called entropy rate, exists for stationary processes. Similarly to the entropy that describes the un-certainty of a random variable, the entropy rate accurately describes the unun-certainty in a random process.

We will generalize the definition of entropy rate for finite-alphabet processes to differential entropy rates of continuous-alphabet processes and to even more compli-cated forms of conditional differential entropy rates. We will show that due to the stationarity assumption all limits still exist and these definitions make sense.

Finally, in Section 3.3 we investigate one particular mutual information term that intuitively is supposed to tend to zero because from a practical point of view the memory of any process is supposed to fade away once we let the time tend to infinity. However, it turns out that this is rather difficult to prove. Again, the clue to a mathematically correct derivation is stationarity (or, actually, quasi-stationarity).

The derivations and proofs for these statements can be found in Chapter 4, and we conclude in Chapter 5.

Next, in the following chapter, we will give some more details about our notation and about the channel model used in the situation of fading channels.

Chapter 2

Definitions and Notation

2.1 Notation

As is by now fairly customary, we usually try to use upper-case letters for random quantities and lower-case letters for their realizations. This rule becomes awkward when dealing with matrices because matrices are usually written in upper case even if they are deterministic. To better differentiate between scalars, vectors, and matrices we have resorted to using different fonts for the different quantities. Upper-case letters such as X are used to denote scalar random variables taking value in the reals R or in the complex plane C. Their realizations are typically written in lower-case, e.g., x. Random vectors in the m-dimensional complex Euclidean space C^mare described by bold face capitals, e.g., X; for their realizations we use bold lower-case, e.g., x. Deterministic matrices are denoted by upper-case letters but of a special font, e.g., H; and random matrices are denoted using another special upper-case font, e.g., H.

However, there will be a few exceptions to these rules. Since they are widely used in the literature, we will stick with the common customary shape of the en-tropy H(·) of a discrete random variable and of the mutual information functional I(·; ·). Moreover, we have decided to use the capital Q to denote the probability dis-tribution of on input of a channel. In particular, Q_X and QX denote the probability distribution of a random variable X and random vector X, respectively. Given an alphabet A we denote the set of all probability distributions over A by P(A).

The capacity is denoted by C, the energy per symbol by E, and the signal-to-noise ratio is denoted by snr.

We use the shorthand H_a^b for (H_a, H_a+1, . . . , H_b). For more complicated expres-sions, such as

(H^T_axˆ_a, H^T_a+1xˆ_a+1, . . . , H^T_bxˆ_b) we use the dummy variable ℓ to clarify notation: {H^T_ℓxˆ_ℓ}^b_ℓ=a.

The subscript k is reserved to denote discrete time. Curly brackets are used to distinguish between a random process and its manifestation at time k: {X_k} is a discrete random process over time, while X_k is the random variable of this process at time k.

Hermitian conjugation is denoted by (·)^†, and (·)^Tstands for the transpose (with-out conjugation) of a matrix or vector. We use k · k to denote the Euclidean norm

of vectors or the Euclidean operator norm of matrices. That is,

kxk, vu ut

Xm t=1

|x^(t)|², x∈ C^m (2.1)

kAk, max

k ˆwk=1kA ˆwk. (2.2)

Thus, kAk is the maximal singular value of the matrix A.

The Frobenius norm of matrices is denoted by k · k_F and is given by the square root of the sum of the squared magnitudes of the elements of the matrix, i.e.,

kAkF ,q

tr (A^†A) (2.3)

where tr (·) denotes the the trace of a matrix. Note that for every matrix A

kAk ≤ kAkF (2.4)

as can be verified by upper-bounding the squared magnitude of each of the compo-nents of A ˆwusing the Cauchy-Schwarz inequality.

We will often split a complex vector v ∈ C^m up into its magnitude kvk and its direction

ˆ v, v

kvk (2.5)

where we reserve this notation exclusively for unit vectors, i.e., throughout this report every vector carrying a hat, ˆv or ˆV, denotes a (deterministic or random, respectively) vector of unit length

kˆvk = k ˆVk = 1. (2.6)

To be able to work with such direction vectors we shall need a differential entropy-like quantity for random vectors that take value on the unit sphere in C^m. Note that with respect to a probability distribution over C^m, the surface of the unit sphere in C^m has zero measure such that the corresponding differential entropy is undefined.

We therefore introduce a new probability space that only lives on the surface of the unit sphere in C^m and denote its measure by λ. If a random vector ˆV takes value in the unit sphere and has the density p^λ_V_ˆ(ˆv) with respect to λ, then we shall let

h_λ( ˆV), −Eh

log p^λ_V_ˆ( ˆV)i

(2.7) if the expectation is defined.

We note that just as ordinary differential entropy is invariant under translation, so is h_λ( ˆV) invariant under rotation. That is, if U is a deterministic unitary matrix, then

h_λ(U ˆV) = h_λ( ˆV). (2.8)

Also note that h_λ( ˆV) is maximized if ˆVis uniformly distributed on the unit sphere, in which case

h_λ( ˆV) = log c_m (2.9)

where cm denotes the surface area of the unit sphere in C^m c_m= 2π^m

Γ(m). (2.10)

The definition (2.7) can be easily extended to conditional entropies: if W is some random vector, and if conditional on W = w the random vector ˆVhas density p^λ_V_ˆ

Based on these definitions we have the following lemma.

Lemma 3. Let V be a complex random vector taking value in C^m and having dif-ferential entropy h(V). Let kVk denote its norm and ˆV denotes its direction as in (2.5). Then whenever all the quantities in (2.12) and (2.13), respectively, are defined. Here h(kVk) is the differential entropy of kVk when viewed as a real (scalar) random variable.

Proof. This follows from a change of variables. Let W denote the real random vector in R^2m that consists of the real and imaginary part of V stacked on top of each other. Then we define

R , kWk and Wˆ , W

kWk (2.14)

and note that the infinitesimal volume dw in the 2m-dimensional Euclidean space corresponds to dr · r^2m−1d ˆw where d ˆw denotes an infinitesimal area on the unit sphere in R^2m. Hence, the joint probability densities can be written as

p_R kvk complex Gaussian random vector of covariance matrix E

(X − µ)(X − µ)^†

= K.

By X ∼ U ([a, b]) we denote a random variable that is uniformly distributed on the interval [a, b].

All rates specified in this report are in nats per channel use, i.e., log(·) denotes the natural logarithmic function. The abbreviation RHS stands for right-hand side and LHS stands for left-hand side.

在文檔中各式多天線與多用戶中聯合估計與檢測之容量分析 (頁 11-15)