Multivariate Distribution

(1)

Multivariate Distribution

The random vector X = (X₁, . . . , X_n) has a sample space that is a subset of Rⁿ. If X is dis- crete random vector, then the joint pmf of x is the function defined by f (x) = f (x₁, . . . , x_n) = P (X₁ = x₁, . . . , X_n− x_n) for each (x₁, . . . , x_n) ∈ Rⁿ. Then for any A ⊂ Rⁿ,

P (X ∈ A) = X

x∈A

f (x).

If X is a continuous random vector, the joint pdf of X is a function f (x1, . . . , xn) that satisfies

P (X ∈ A) = Z

· · · Z

A

f (x)dx = Z

· · · Z

A

f (x1, . . . , xn)dx1· · · dxn.

Let g(x) = g(x₁, . . . , x_n) be a real-valued function defined on the sample space of X. Then g(X) is a random variable and the expected value of g(X) is

Eg(X) = Z _∞

−∞

· · · Z _∞

−∞

g(x)f (x)dx

and

Eg(X) = X

x∈Rⁿ

g(x)f (x) in the continuous and discrete cases, respectively.

The marginal distribution of (X₁, . . . , X_n) , the first k coordinates of (X₁, . . . , X_n), is given by the pdf or pmf

f (x₁, . . . , x_k) = Z _∞

−∞

· · · Z _∞

−∞

f (x₁, . . . , x_n)dx_k+1· · · dx_n or

f (x1, . . . , xk) = X

(xk+1,...,xn)∈R^n−k

f (x1, . . . , xn) for every (x₁, . . . , x_k) ∈ R^k.

If f (x₁, . . . , x_k) > 0, the conditional pdf or pmf of (X_k+1, . . . , X_n) given X₁ = x₁, . . . , X_k = x_k is the function of (xk+1, . . . , xn) defined by

f (x_k=1, . . . , x_n|x₁, . . . , x_k) = f (x₁, . . . , x_n) f (x₁, . . . , x_k).

(2)

Example 4.6.1 (Multivariate pdfs) Let n = 4 and

f (x1, x2, x3, x4) =







3

4(x²₁+ x²₂+ x²₃+ x²₄) 0 < xi < 1, i = 1, 2, 3, 4

0 otherwise

The joint pdf can be used to compute probabilities such as P (X₁ < 1

2, X₂ < 3

4, X₄ > 1 2)

= Z ₁

1 2

Z ₁

0

Z ³

4

0

Z ¹

2

0

3

4(x²₁+ x²₂+ x²₃+ x²₄)dx₁dx₂dx₃dx₄ = 151 1024. The marginal pdf of (X1, X2) is

f (x₁, x₂) = Z ₁

0

Z ₁

0

3

4(x²₁+ x²₂+ x²₃+ x²₄)dx₂dx₄ = 3

4(x²₁+ x²₂) + 1 2 for 0 < x₁ < 1 and 0 < x₂ < 1.

Definition 4.6.2 Let n and m be positive integers and let p₁, . . . , p_n be numbers satisfying 0 ≤ p_i ≤ 1, i = 1, . . . , n, and P_n

i=1p_i = 1. Then the random vector (X₁, . . . , X_n) has a multinomial distribution with m trials and cell proabilities p₁, . . . , p_n if the joint pmf of (X₁, . . . , X_n) is

f (x1, . . . , xn) = m!

x₁! · · · x_n!p^x₁¹· · · p^x_nⁿ = m!

Yn i=1

p^x_iⁱ x_i! on the set of (x1, . . . , xn) such that each xi is a nonnegative integer and P_n

i=1xi = m.

Example 4.6.3 (Multivariate pmf) Consider tossing a six-sided die 10 times. Suppose the die is unbalanced so that the probability of observing an i is i/21. Now consider the vector (X₁, . . . , X₆), where X_i counts the number of times i comes up in the 10 tosses.

Then (X₁, . . . , X₆) has a multinomial distribution with m = 10 and cell probabilities p₁ =

1

21, . . . , p₆ = ₂₁⁶. For example, the probability of the vector (0, 0, 1, 2, 3, 4) is f (0, 0, 1, 2, 3, 4) = 10!

0!0!1!2!3!4!( 1 21)⁰( 2

21)⁰( 3 21)¹( 4

21)²( 5 21)³( 6

21)⁴ = 0.0059.

The factor _x ^m!

1!···xn! is called a multinomial coefficient. It is the number of ways that m objects can be divided into n groups with x₁ in the first group, x₂ in the second group, . . ., and x_n in the nth group.

(3)

Theorem 4.6.4 (Multinomial Theorem)

Let m and n be positive integers. Let A be the set of vectors x = (x₁, . . . , x_n) such that each x_i is a nonnegative integer and P_n

i=1x_i = m. Then, for any real numbers p₁, . . . , p_n, (p₁+ . . . + p_n)^m = X

x∈A

m!

x₁! · · · x_n!p^x₁¹. . . p^x_nⁿ.

Definition 4.6.5 Let X1, . . . , Xn be random vectors with joint pdf or pmf f (x1, . . . , xn).

Let fXi(x_i) denote the marginal pdf or pmf of X_i. Then X₁, . . . , X_n are called mutually independent random vectors if, for every (x1, . . . , xn),

f (x₁, . . . , x_n) = fX1(x₁) . . . fXn(x_n) = Yn i=1

fXi(x_i).

If the X_i’s are all one dimensional, then X₁, . . . , X_nare called mutually independent random variables.

Mutually independent random variables have many nice properties. The proofs of the following theorems are analogous to the proofs of their counterparts in Sections 4.2 and 4.3.

Theorem 4.6.6 (Generalization of Theorem 4.2.10)

Let X₁, . . . , X_n be mutually independent random variables. Let g₁, . . . , g_n be real-valued functions such that g_i(x_i) is a function only of x_i, i = 1, . . . , n. Then

E(g₁(X₁) · · · g(X_n)) = (Eg₁(X₁)) · · · (Eg_n(X_n)).

Theorem 4.6.7 (Generalization of Theorem 4.2.12)

Let X₁, . . . , X_n be mutually independent random variables with mgfs M_X₁(t), . . . , M_X_n(t).

Let Z = X₁+ · · · + X_n. Then the mgf of Z is

M_Z(t) = M_X₁(t) · · · M_X_n(t).

In particular, if X₁, . . . , X_n all have the same distribution with mgf M_X(t), then M_Z(t) = (M_X(t))ⁿ.

(4)

Example 4.6.8 (Mgf of a sum of gamma variables)

Suppose X₁, . . . , X_n are mutually independent random variables, and the distribution of X_i is gamma(α_i, β). Thus, if Z = X₁+ . . . + X_n, the mgf of Z is

M_Z(t) = M_X₁(t) · · · M_X_n(t) = (1 − βt)^−α¹· · · (1 − βt)^−αⁿ = (1 − βt)^−(α¹^+···+αⁿ⁾. This is the mgf of a gamma(α₁ + · · · + α_n, β) distribution. Thus, the sum of a indepen- dent gamma random variables that have a common scale parameter β also has a gamma distribution.

Example

Let X1, . . . , Xnbe mutually independent random variables with Xi ∼ N(µi, σ_i²). Let a1, . . . , an

and b₁, . . . , b_n be fixed constants. Then Z =

Xn i=1

(a_iX_i+ b_i) ∼ N(

Xn i=1

(a_iµ_i+ b_i), Xn

i=1

a²_iσ_i²).

Theorem 4.6.11 (Generalization of Lemma 4.2.7)

Let X₁, . . . , X_n be random vectors. Then X₁, . . . , X_n are mutually independent random vectors if and only if there exist functions g_i(x_i), i = 1, . . . , n, such that the joint pdf or pmf of (X₁, . . . , X_n) can be written as

f (x₁, . . . , x_n) = g₁(x₁) · · · g_n(x_n).

Theorem 4,6,12 (Generalization of Theorem 4.3.5)

Let X1, . . . , Xn be random vectors. Let gi(xi) be a function only of xi, i = 1, . . . , n. Then the random vectors U_i = g_i(X_i), i = 1, . . . , n, are mutually independent.

Let (X₁, . . . , X_n) be a random vector with pdf f_X(x₁, . . . , x_n). Let A = {x : f_X(x) > 0}.

Consider a new random vector (U₁, . . . , U_n), defined by U₁ = g₁(X₁, . . . , X_n), . . ., U_n = g_n(X₁, . . . , X_n). Suppose that A₀, A₁, . . . , A_k form a partition of A with these properties.

The set A₀, which may be empty, satisfies P ((X₁, . . . , X_n) ∈ A₀) = 0. The transformation (U₁, . . . , U_n) = (g₁(X), . . . , g_n(X)) is a one-to-one transformation from A_i onto B for each i = 1, 2, . . . , k. Then for each i, the inverse functions from B to A can be found. Denote the

(5)

ith inverse by x₁ = h_1i(u − 1, . . . , u_n), . . . , x_n = h_ni(u₁, . . . , u_n). Let J_i denote the Jacobian computed from the ith inverse. That is,

J_i =

¯¯

∂h1i(u)

∂u1

∂h1i(u)

∂u2 . . . ^∂h¹ⁱ⁽u)

∂u1

∂h2i(u)

∂u1

∂h2i(u)

∂u2 . . . ^∂h²ⁱ⁽u)

∂u1

... ... . .. ...

∂hni(u)

∂u1

∂hni(u)

∂u2 . . . ^∂hⁿⁱ⁽u)

∂u1

¯¯

the determinant of an n×n matrix. Assuming that these Jacobians do not vanish identically on B, we have the following representation of the joint pdf, f_U(u₁, . . . , u_n), for u ∈ B:

fu(u¹, . . . , un) = Xk

i=1

fX(h1i(u1, . . . , un), . . . , hni(u1, . . . , un))|Ji|.