5 Covariance and Correlation

(1)

4 Hierarchical Models and Mixture Distributions

Example 4.1 (Binomial-Poisson hierarchy) Perhaps the most classic hierarchical model is the following. An insect lays a large number of eggs, each surviving with probability p. On the average, how many eggs will survive?

The large number of eggs laid is a random variable, often taken to be Poisson(λ). Furthermore, if we assume that each egg’s survival is independent, then we have Bernoulli trials. Therefore,, if we let X=number of survivors and Y =number of eggs laid, we have

X|Y binomial(Y, p), Y ∼ Poisson(λ), a hierarchical model.

The advantage of the hierarchy is that complicated process may be modeled by a sequence of relatively simple models placed in a hierarchy.

Example 4.2 (Continuation of Example 4.1) The random variable X has the distribution given by

P (X = x) =

∞

X

y=0

P (X = x, Y = y) =

∞

X

y=0

P (X = x|Y = y)P (Y = y)

=

∞

X

y=x

[y x

p^x(1 − p)^y−x][e^−yλ^y

y! ] (conditional probability is 0 if y < x)

= (λp)^xe^−λ x!

∞

X

y=x

((1 − p)λ)^y−x (y − x)!

= (λp)^xe^−λ x! e^(1−p)λ

= (λp)^x x! e^−λp,

so X ∼ P oisson(λ). Thus, any marginal inference on X is with respect to a Poisson(λp) distribution, with Y playing no part at all. Introducing Y in the hierarchy was mainly to aid our understanding of the model. On the average,

EX = λp eggs will survive.

Sometimes, calculations can be greatly simplified be using the following theorem.

14

(2)

Theorem 4.1 If X and Y are any two random variables, then EX = E(E(X|Y )), provided that the expectations exist.

Proof: Let f (x, y) denote the joint pdf of X and Y . By definition, we have EX =

Z

inf xf (x, y)dxdy = Z

[ Z

xf (x|y)dx]fY(y)dy Z

E(X|y)fY(y)dy = E(E(X|Y )) Replacing integrals by sums to prove the discrete case.

Using Theorem 4.1, we have

EX = E(E(X|Y )) = E(pY ) = pλ for Example 4.2.

Definition 4.1 A random variable X is said to have a mixture distribution if the distribution of X depends on a quantity that also has a distribution.

Thus, in Example 4.1 the Poisson(λp) distribution is a mixture distribution since it is the result of combining a binomial(Y, p) with Y ∼ Poisson(λ).

Theorem 4.2 (Conditional variance identity) For any two random variables X and Y , VarX = E(Var(X|Y )) + Var(E(X|Y )),

provided that the expectations exist.

Proof: By definition, we have

VarX = E([X − EX]²) = E([X − E(X|Y ) + E(X|Y ) − EX]²)

= E([X − E(X|Y )]²) + E([E(X|Y ) − EX]²) + 2E([X − E(X|Y )][E(X|Y ) − EX]).

The last term in this expression is equal to 0, however, which can easily be seen by iterating the expectation:

E([X − E(X|Y )][E(X|Y ) − EX]) = E(E{[X − E(X|Y )][E(X|Y ) − EX]|Y })

15

(3)

In the conditional distribution X|Y , X is the random variable. Conditional on Y , E(X—Y) and EX are constants. Thus,

E{[X − E(X|Y )][E(X|Y ) − EX]|Y } = (E(X|Y ) − E(X|Y ))(E(X|Y ) − EX) = 0 Since

E([X − E(X|Y )]²) = E(E{[X − E(X|Y )]²|Y }) = E(¯(X|Y )).

and

E([E(X|Y ) − EX]²) = Var(E(X|Y )), Theorem 4.2 is proved.

Example 4.3 (Beta-binomial hierarchy) One generalization of the binomial distribution is to allow the success probability to vary according to a distribution. A standard model for this situation is

X|P ∼ binomial(P ), i = 1, . . . , n, P ∼ beta(α, β).

The mean of X is then

EX = E[E(X|p)] = E[nP ] = nα α + β. Since P ∼ beta(α, β),

Var(E(X|P )) = Var(np) = n² αβ

(α + β)²(α + β + 1). Also, since X|P is binomial(n, P ), Var(X|P ) = nP (1 − P ). We then have

E[Var(X|P )] = nE[P (1 − P )] = nΓ(α + β) Γ(α)Γ(β)

Z 1

0 p(1 − p)p^α−1(1 − p)^β−1dp

= nΓ(α + β) Γ(α)Γ(β)

Γ(α + 1)Γ(β + 1)

Γ(α + β + 2) = nαβ

(α + β)(α + β + 1). Adding together the two pieces, we get

VarX = nαβ(α + β + n) (α + β)²(α + β + 1).

16

(4)

5 Covariance and Correlation

In earlier sections, we have discussed the absence or presence of a relationship between two random variables, Independence or nonindependence. But if there is a relationship, the relationship may be strong or weak. In this section, we discuss two numerical measures of the strength of a relationship between two random variables, the covariance and correlation.

Throughout this section, we will use the notation EX = µX, EY = µY, VarX = σ_X², and VarY = σ²_Y.

Definition 5.1 The covariance of X and Y is the number defined by Cov(X, Y ) = E((X − µX)(Y − µY)).

Definition 5.2 The correlation of X and Y is the number defined by ρ_XY = Cov(X, Y )

σ_Xσ_Y . The value ρXY is also called the correlation coefficient.

Theorem 5.1 For any random variables X and Y ,

Cov(X, Y ) = EXY − µXµ_Y.

17