4 Hierarchical Models and Mixture Distributions
Example 4.1 (Binomial-Poisson hierarchy) Perhaps the most classic hierarchical model is the following. An insect lays a large number of eggs, each surviving with probability p. On the average, how many eggs will survive?
The large number of eggs laid is a random variable, often taken to be Poisson(λ). Furthermore, if we assume that each egg’s survival is independent, then we have Bernoulli trials. Therefore,, if we let X=number of survivors and Y =number of eggs laid, we have
X|Y binomial(Y, p), Y ∼ Poisson(λ), a hierarchical model.
The advantage of the hierarchy is that complicated process may be modeled by a sequence of relatively simple models placed in a hierarchy.
Example 4.2 (Continuation of Example 4.1) The random variable X has the distribution given by
P (X = x) =
∞
X
y=0
P (X = x, Y = y) =
∞
X
y=0
P (X = x|Y = y)P (Y = y)
=
∞
X
y=x
[y x
px(1 − p)y−x][e−yλy
y! ] (conditional probability is 0 if y < x)
= (λp)xe−λ x!
∞
X
y=x
((1 − p)λ)y−x (y − x)!
= (λp)xe−λ x! e(1−p)λ
= (λp)x x! e−λp,
so X ∼ P oisson(λ). Thus, any marginal inference on X is with respect to a Poisson(λp) dis- tribution, with Y playing no part at all. Introducing Y in the hierarchy was mainly to aid our understanding of the model. On the average,
EX = λp eggs will survive.
Sometimes, calculations can be greatly simplified be using the following theorem.
14
Theorem 4.1 If X and Y are any two random variables, then EX = E(E(X|Y )), provided that the expectations exist.
Proof: Let f (x, y) denote the joint pdf of X and Y . By definition, we have EX =
Z
inf xf (x, y)dxdy = Z
[ Z
xf (x|y)dx]fY(y)dy Z
E(X|y)fY(y)dy = E(E(X|Y )) Replacing integrals by sums to prove the discrete case.
Using Theorem 4.1, we have
EX = E(E(X|Y )) = E(pY ) = pλ for Example 4.2.
Definition 4.1 A random variable X is said to have a mixture distribution if the distribution of X depends on a quantity that also has a distribution.
Thus, in Example 4.1 the Poisson(λp) distribution is a mixture distribution since it is the result of combining a binomial(Y, p) with Y ∼ Poisson(λ).
Theorem 4.2 (Conditional variance identity) For any two random variables X and Y , VarX = E(Var(X|Y )) + Var(E(X|Y )),
provided that the expectations exist.
Proof: By definition, we have
VarX = E([X − EX]2) = E([X − E(X|Y ) + E(X|Y ) − EX]2)
= E([X − E(X|Y )]2) + E([E(X|Y ) − EX]2) + 2E([X − E(X|Y )][E(X|Y ) − EX]).
The last term in this expression is equal to 0, however, which can easily be seen by iterating the expectation:
E([X − E(X|Y )][E(X|Y ) − EX]) = E(E{[X − E(X|Y )][E(X|Y ) − EX]|Y })
15
In the conditional distribution X|Y , X is the random variable. Conditional on Y , E(X—Y) and EX are constants. Thus,
E{[X − E(X|Y )][E(X|Y ) − EX]|Y } = (E(X|Y ) − E(X|Y ))(E(X|Y ) − EX) = 0 Since
E([X − E(X|Y )]2) = E(E{[X − E(X|Y )]2|Y }) = E(¯(X|Y )).
and
E([E(X|Y ) − EX]2) = Var(E(X|Y )), Theorem 4.2 is proved.
Example 4.3 (Beta-binomial hierarchy) One generalization of the binomial distribution is to allow the success probability to vary according to a distribution. A standard model for this situation is
X|P ∼ binomial(P ), i = 1, . . . , n, P ∼ beta(α, β).
The mean of X is then
EX = E[E(X|p)] = E[nP ] = nα α + β. Since P ∼ beta(α, β),
Var(E(X|P )) = Var(np) = n2 αβ
(α + β)2(α + β + 1). Also, since X|P is binomial(n, P ), Var(X|P ) = nP (1 − P ). We then have
E[Var(X|P )] = nE[P (1 − P )] = nΓ(α + β) Γ(α)Γ(β)
Z 1
0 p(1 − p)pα−1(1 − p)β−1dp
= nΓ(α + β) Γ(α)Γ(β)
Γ(α + 1)Γ(β + 1)
Γ(α + β + 2) = nαβ
(α + β)(α + β + 1). Adding together the two pieces, we get
VarX = nαβ(α + β + n) (α + β)2(α + β + 1).
16
5 Covariance and Correlation
In earlier sections, we have discussed the absence or presence of a relationship between two random variables, Independence or nonindependence. But if there is a relationship, the relationship may be strong or weak. In this section, we discuss two numerical measures of the strength of a relationship between two random variables, the covariance and correlation.
Throughout this section, we will use the notation EX = µX, EY = µY, VarX = σX2, and VarY = σ2Y.
Definition 5.1 The covariance of X and Y is the number defined by Cov(X, Y ) = E((X − µX)(Y − µY)).
Definition 5.2 The correlation of X and Y is the number defined by ρXY = Cov(X, Y )
σXσY . The value ρXY is also called the correlation coefficient.
Theorem 5.1 For any random variables X and Y ,
Cov(X, Y ) = EXY − µXµY.
17