To discuss the accuracy of the upper and lower bounds of the minimum mutual information function, we first consider the simple case that both random variables X and Y are binary random variables, each taking values from {0, 1}. In this case, given the correlation coefficient ρ and marginal mean, a = E[X] and b = E[Y ], one can determine the joint distribution of X and Y , i.e.,
PX,Y(0, 0) = (1 − a)(1 − b) + r PX,Y(0, 1) = (1 − a)b − r PX,Y(1, 0) = a(1 − b) − r PX,Y(1, 1) = ab + r
where r = E[XY ] − E[X]E[Y ] = ρ[a(1 − a)b(1 − b)]1/2. The mutual information I(X; Y ) can be written as
I(X; Y ) = Hb(b) − aHb
³ b + r
a
´
− (1 − a)Hb µ
1 − b + r 1 − a
¶ ,
where Hb(b) = −b log (b) − (1 − b) log (1 − b) is the binary entropy function. Thus, in binary case, Imin(Sρ) = I(X; Y ). We then take the uniform marginal distributions as an example, i.e., a = 12 and b = 12, and obtain Dmin(Tρ, PX, PY) = ρ tanh−1(ρ) +12log (1 − ρ2) = I(X; Y ).
Therefore, the lower bound used in the proof coincides with the minimum mutual information function. Notably, the simple binary case has already been examined in [28].
A good example that meets the boundedness assumption of our theorem is the Morgen-stern distribution [31] that has the density of
p(x, y) = 1 + α(2x + 1)(2y + 1),
where 0 ≤ x, y ≤ 1, and its correlation coefficient equals Cx,y = α/3. Its asymptotic mutual
information with respect to the correlation coefficient can be obtained easily as:
I(X; Y ) = α2 18 + α4
300 + o(α5) = ρ2
2 + o(ρ2).
An example that can be used to show that Imin(Sρ) is indeed a lower bound to the mutual information of PX,Y ∈ Sρ is the bivariate density p(x, y) = pY |X(y|x)pX(x), where pX(x) =
1
2a · l[|X| ≤ a] and pY |X(y|x) = 2b1 · l[|Y − αX| ≤ b], which exactly define a uniform diagonal strip. The asymptotic mutual information of the uniform diagonal strip can be derived easily from [31] as |ρ|2 −|ρ|43 + o(|ρ|4). This indicates that in some situations, I(X; Y ) > Imin(Sρ).
The validity of the theorem statement can be extended to the (unbounded) case that PX and PY are both Gaussian distributed. In this case, the minimum value of mutual information can be achieved by a jointly Gaussian distributed QX,Y. One can derive that for Gaussian PX and PY, Imin(Sρ) = −12log (1 − ρ2). The lower bound, however, is given by:
Dmin(Tρ, PX, PY) = −1 2+
µ1 4 + ρ2
¶1
2
+ 1 2log
Ã−12 + (14 + ρ2)12 ρ2
! ,
and is smaller than the simple hyperbolic approximation of Imin(Sρ) ≈ ρ22. In addition, the
“upper bound” used in the proof ρ22 + ρ4 may become smaller than Imin(Sρ) at large |ρ|.
Since we only use the upper bound under |ρ| ¿ 1, we would not expect it to be useful outside the concerned range.
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
-1 -0.5 0 0.5 1
Imin(Sρ) Dmin(Tρ, PX, Py) ρ2/2 I(JX,Y)
Figure 3.1: The bounds and minimum mutual information for Gaussian distributed PX and PY.
Chapter 4
Bayesian Decentralized Detection for Exponential Distributions
A decentralized detection system consists of n sensors, sometimes geographically dispersed, and a remote fusion center. Each of the sensor observes a phenomenon (often modeled as a random variable Xi), summarizes it into a single bit ui, and then transmits ui to the fusion center uncooperatively. Based on received {ui}ni=1, the fusion center determines whether {Xi}ni=1are drawn from the null distribution P (·|H0) or the alternative distribution P (·|H1).
Decentralized detection, despite that it has a simple scenario, and has been studied extensively for more than two decades, still has many unsolved issues in the fundamental level. One of these unsolved issues concerns the global optimal strategy for the design of sensors and the fusion center. The difficulties comes from several points. Firstly, only the necessary conditions for the optimal strategy are known; hence, one have to search all the solutions to the equations of the necessary conditions in order to determine the global optimum. Moreover, these equations are coupled and nonlinear, and hence, to solve them is proved to be a hard mission [32]. The knowledge about the global optimal strategy is so little that there are almost no analytical results for the system with more than two sensors.
The asymptotic results, however, had been found more pleasantly: the system with identical
sensors has the same exponents of error probabilities as the optimal system [37]; the ratio of error probabilities between these two systems are shown bounded from above and from below [36]. Yet the exact and analytical results for the system with some finite n > 2 are still absent, although such results will give us more insight about the global optimum than the asymptotic results.
In this chapter, we analyze the decentralized classification problem for exponential sources for n > 2, and validate an intuition that the optimal system is the system with identical sensors. To our knowledge, there is no similar analytical result for the global optimum for the system with more than two sensors.
4.1 Preliminaries
Definition 4.1. If X is a random variable with an exponential distribution, then the prob-ability that X is greater than some number x is given by
1 − FX(x) = Pr(X > x) = e−αx
for x ≥ 0, where α is a positive parameter, and FX(x) is the cumulative distribution function (CDF) of X.
It follows that the probability density function (PDF) of an exponential distribution has the form
fX(x) = αe−αx, for x ≥ 0.
In this chapter, we concern the following binary hypothesis testing problem for exponen-tial distributions:
H1 : fX(xi) = βe−βxi
versus variable Xi at the i-th sensor. We assume that {Xi}ni=1 form a set of independent and identically distributed (i.i.d.) random variables. The prior probabilities of H1 and H0 are denoted as r1 and r0 or simply r and 1 − r, respectively. For a fixed fusion rule, it is known that the optimal local decision rule for each sensor is the local likelihood ratio test (LLRT), i.e., probability and the false alarm probability for the i-th sensor, where
PD(λi) = Prob(ui = 1|H1) = 1
PF, i.e.,
where we abuse the notations to let PF(i) and PD(PF(i)) represent the false alarm probability and the detection probability of the i-th sensor, respectively. The graph consists of all (PF, PD) pairs is referred to as Receiver Operating Characteristics (ROC curve), which is identical for all sensors since the statistics of their observations are all the same.
The sensors transmit their decisions {ui}ni=1 to the fusion center that makes the final decision u0, which equals ` when the fusion center favors H`. Once the fusion rule is fixed, we can then evaluate the system detection probability QD(λ1, · · · , λn) = Prob(u0 = 1|H1), the system false alarm probability QF(λ1, · · · , λn) = Prob(u0 = 1|H0) and the probability of error Pe(n)(λ1, · · · , λn) = r(1 − QD(λ1, · · · , λn)) + (1 − r)QF(λ1, · · · , λn) as functions of the local thresholds λ1, · · · , λn.
It is known from classical detection theory that the fusion center should make the overall decision u0 based on the likelihood ratio test of received u1, , u2. . . , un. Therefore, the error probability can be expressed as
Pe(n)(λ1, · · · , λn) = X
The above formula, however, is in general not differentiable, and could give us little insight into the optimal choice of LLRT thresholds (λ1, · · · , λn).
For identical sensor system design, it is known that the optimal fusion rule should be a
k-out-of-n rule,
u0 =
½ 1, if u1+ · · · + un ≥ k 0, if u1+ · · · + un < k,
where k is some positive integer smaller or equal to n. However, to our knowledge, the validity of the converse statement, i.e., for any k-out-of-n fusion rule, the optimal strategy is to apply identical local decision rules for all sensors, is still unknown.
Now, let us define a function A(λ), and prove a relevant lemma that is useful in the subsequent sections. Define a function A(λ) as
A(λ) = logPr(u = 1|H1)
Pr(u = 1|H0)− log Pr(u = 0|H1) Pr(u = 0|H0). Then we have the following result.
Lemma 4.1. A(λ) is a positive and monotonically increasing function of λ.
Proof. Firstly, is positive because for the ROC curve,
PD(λ)
PF(λ) > 1 − PD(λ) 1 − PF(λ). Taking derivative of A(λ) with respect to λ, we obtain
A0(λ) = (−PF0 (λ))
In terms of A(λ) defined above, it can be shown that the likelihood ratio test of u1, , u2. . . , un
at the fusion center is equivalent to Xn