Examples - 自我類化訊流的合成及分散式辨識

To discuss the accuracy of the upper and lower bounds of the minimum mutual information function, we first consider the simple case that both random variables X and Y are binary random variables, each taking values from {0, 1}. In this case, given the correlation coefficient ρ and marginal mean, a = E[X] and b = E[Y ], one can determine the joint distribution of X and Y , i.e.,

P_X,Y(0, 0) = (1 − a)(1 − b) + r P_X,Y(0, 1) = (1 − a)b − r PX,Y(1, 0) = a(1 − b) − r P_X,Y(1, 1) = ab + r

where r = E[XY ] − E[X]E[Y ] = ρ[a(1 − a)b(1 − b)]^1/2. The mutual information I(X; Y ) can be written as

I(X; Y ) = H_b(b) − aH_b

³ b + r

− (1 − a)H_b µ

1 − b + r 1 − a

¶ ,

where H_b(b) = −b log (b) − (1 − b) log (1 − b) is the binary entropy function. Thus, in binary case, Imin(Sρ) = I(X; Y ). We then take the uniform marginal distributions as an example, i.e., a = ¹₂ and b = ¹₂, and obtain Dmin(Tρ, PX, PY) = ρ tanh⁻¹(ρ) +¹₂log (1 − ρ²) = I(X; Y ).

Therefore, the lower bound used in the proof coincides with the minimum mutual information function. Notably, the simple binary case has already been examined in [28].

A good example that meets the boundedness assumption of our theorem is the Morgen-stern distribution [31] that has the density of

p(x, y) = 1 + α(2x + 1)(2y + 1),

where 0 ≤ x, y ≤ 1, and its correlation coefficient equals C_x,y = α/3. Its asymptotic mutual

information with respect to the correlation coefficient can be obtained easily as:

I(X; Y ) = α² 18 + α⁴

300 + o(α⁵) = ρ²

2 + o(ρ²).

An example that can be used to show that I_min(S_ρ) is indeed a lower bound to the mutual information of P_X,Y ∈ S_ρ is the bivariate density p(x, y) = p_{Y |X}(y|x)p_X(x), where p_X(x) =

2a · l[|X| ≤ a] and p_{Y |X}(y|x) = _2b¹ · l[|Y − αX| ≤ b], which exactly define a uniform diagonal strip. The asymptotic mutual information of the uniform diagonal strip can be derived easily from [31] as ^|ρ|₂ −^|ρ|₄³ + o(|ρ|⁴). This indicates that in some situations, I(X; Y ) > I_min(S_ρ).

The validity of the theorem statement can be extended to the (unbounded) case that P_X and P_Y are both Gaussian distributed. In this case, the minimum value of mutual information can be achieved by a jointly Gaussian distributed Q_X,Y. One can derive that for Gaussian PX and PY, Imin(Sρ) = −¹₂log (1 − ρ²). The lower bound, however, is given by:

D_min(T_ρ, P_X, P_Y) = −1 2+

µ1 4 + ρ²

¶¹

+ 1 2log

Ã−¹₂ + (¹₄ + ρ²)¹² ρ²

! ,

and is smaller than the simple hyperbolic approximation of I_min(S_ρ) ≈ ^ρ₂². In addition, the

“upper bound” used in the proof ^ρ₂² + ρ⁴ may become smaller than I_min(S_ρ) at large |ρ|.

Since we only use the upper bound under |ρ| ¿ 1, we would not expect it to be useful outside the concerned range.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

-1 -0.5 0 0.5 1

I_min(S_ρ) D_min(T_ρ, P_X, P_y) ρ²/2 I(J_X,Y)

Figure 3.1: The bounds and minimum mutual information for Gaussian distributed P_X and P_Y.

Chapter 4 Bayesian Decentralized Detection for Exponential Distributions

A decentralized detection system consists of n sensors, sometimes geographically dispersed, and a remote fusion center. Each of the sensor observes a phenomenon (often modeled as a random variable Xi), summarizes it into a single bit ui, and then transmits ui to the fusion center uncooperatively. Based on received {u_i}ⁿ_i=1, the fusion center determines whether {X_i}ⁿ_i=1are drawn from the null distribution P (·|H₀) or the alternative distribution P (·|H₁).

Decentralized detection, despite that it has a simple scenario, and has been studied extensively for more than two decades, still has many unsolved issues in the fundamental level. One of these unsolved issues concerns the global optimal strategy for the design of sensors and the fusion center. The difficulties comes from several points. Firstly, only the necessary conditions for the optimal strategy are known; hence, one have to search all the solutions to the equations of the necessary conditions in order to determine the global optimum. Moreover, these equations are coupled and nonlinear, and hence, to solve them is proved to be a hard mission [32]. The knowledge about the global optimal strategy is so little that there are almost no analytical results for the system with more than two sensors.

The asymptotic results, however, had been found more pleasantly: the system with identical

sensors has the same exponents of error probabilities as the optimal system [37]; the ratio of error probabilities between these two systems are shown bounded from above and from below [36]. Yet the exact and analytical results for the system with some finite n > 2 are still absent, although such results will give us more insight about the global optimum than the asymptotic results.

In this chapter, we analyze the decentralized classification problem for exponential sources for n > 2, and validate an intuition that the optimal system is the system with identical sensors. To our knowledge, there is no similar analytical result for the global optimum for the system with more than two sensors.

4.1 Preliminaries

Definition 4.1. If X is a random variable with an exponential distribution, then the prob-ability that X is greater than some number x is given by

1 − F_X(x) = Pr(X > x) = e^−αx

for x ≥ 0, where α is a positive parameter, and F_X(x) is the cumulative distribution function (CDF) of X.

It follows that the probability density function (PDF) of an exponential distribution has the form

f_X(x) = αe^−αx, for x ≥ 0.

In this chapter, we concern the following binary hypothesis testing problem for exponen-tial distributions:

H1 : fX(xi) = βe^−βxⁱ

versus variable X_i at the i-th sensor. We assume that {X_i}ⁿ_i=1 form a set of independent and identically distributed (i.i.d.) random variables. The prior probabilities of H₁ and H₀ are denoted as r1 and r0 or simply r and 1 − r, respectively. For a fixed fusion rule, it is known that the optimal local decision rule for each sensor is the local likelihood ratio test (LLRT), i.e., probability and the false alarm probability for the i-th sensor, where

P_D(λ_i) = Prob(u_i = 1|H₁) = 1

PF, i.e.,

where we abuse the notations to let PF(i) and PD(PF(i)) represent the false alarm probability and the detection probability of the i-th sensor, respectively. The graph consists of all (P_F, P_D) pairs is referred to as Receiver Operating Characteristics (ROC curve), which is identical for all sensors since the statistics of their observations are all the same.

The sensors transmit their decisions {ui}ⁿ_i=1 to the fusion center that makes the final decision u₀, which equals ` when the fusion center favors H_`. Once the fusion rule is fixed, we can then evaluate the system detection probability Q_D(λ₁, · · · , λ_n) = Prob(u₀ = 1|H₁), the system false alarm probability Q_F(λ₁, · · · , λ_n) = Prob(u₀ = 1|H₀) and the probability of error Pe⁽ⁿ⁾(λ₁, · · · , λ_n) = r(1 − Q_D(λ₁, · · · , λ_n)) + (1 − r)Q_F(λ₁, · · · , λ_n) as functions of the local thresholds λ₁, · · · , λ_n.

It is known from classical detection theory that the fusion center should make the overall decision u₀ based on the likelihood ratio test of received u₁, , u₂. . . , u_n. Therefore, the error probability can be expressed as

P_e⁽ⁿ⁾(λ1, · · · , λn) = X

The above formula, however, is in general not differentiable, and could give us little insight into the optimal choice of LLRT thresholds (λ₁, · · · , λ_n).

For identical sensor system design, it is known that the optimal fusion rule should be a

k-out-of-n rule,

u0 =

½ 1, if u1+ · · · + un ≥ k 0, if u₁+ · · · + u_n < k,

where k is some positive integer smaller or equal to n. However, to our knowledge, the validity of the converse statement, i.e., for any k-out-of-n fusion rule, the optimal strategy is to apply identical local decision rules for all sensors, is still unknown.

Now, let us define a function A(λ), and prove a relevant lemma that is useful in the subsequent sections. Define a function A(λ) as

A(λ) = logPr(u = 1|H1)

Pr(u = 1|H₀)− log Pr(u = 0|H1) Pr(u = 0|H₀). Then we have the following result.

Lemma 4.1. A(λ) is a positive and monotonically increasing function of λ.

Proof. Firstly, is positive because for the ROC curve,

P_D(λ)

PF(λ) > 1 − P_D(λ) 1 − PF(λ). Taking derivative of A(λ) with respect to λ, we obtain

A⁰(λ) = (−P_F⁰ (λ))

In terms of A(λ) defined above, it can be shown that the likelihood ratio test of u1, , u2. . . , un

at the fusion center is equivalent to Xn

在文檔中自我類化訊流的合成及分散式辨識 (頁 43-51)