3.4 Exponential Families

(1)

3.4 Exponential Families

A family of pdfs or pmfs is called an exponential family if it can be expressed as

f (x|θ) = h(x)c(θ) exp¡X^k

i=1

w_i(θ)t_i(x)¢

. (1)

Here h(x) ≥ 0 and t1(x), . . . , tk(x) are real-valued functions of the observation x (they cannot depend on θ), and c(θ) ≥ 0 and w₁(θ), . . . , w_k(θ) are real-valued functions of the possibly vector-valued parameter θ (they cannot depend on x). Many common families introduced in the previous section are exponential families. They include the continuous families—normal, gamma, and beta, and the discrete families—binomial, Poisson, and negative binomial.

Example 3.4.1 (Binomial exponential family)

Let n be a positive integer and consider the binomial(n, p) family with 0 < p < 1. Then the pmf for this family, for x = 0, 1, . . . , n and 0 < p < 1, is

f (x|p) = µn

x

¶

p^x(1 − p)^n−x

= µn

x

¶

(1 − p)ⁿ¡ p 1 − p

¢_x

= µn

x

¶

(1 − p)ⁿexp¡

log¡ p 1 − p

¢x¢ .

Define

h(x) =







¡_n

x

¢ x = 0, 1, . . . , n

0 otherwise, c(p) = (1 − p)ⁿ, 0 < p < 1, w₁(p) = log( p

1 − p), 0 < p < 1, and

t₁(x) = x.

Then we have

f (x|p) = h(x)c(p) exp{w₁(p)t₁(x)}.

1

(2)

Example 3.4.4 (Normal exponential family)

Let f (x|µ, σ²) be the N(µ, σ²) family of pdfs, where θ = (µ, σ²), −∞ < µ < ∞, σ > 0.

Then

f (x|µ, σ²) = 1

√2πσ exp¡

− (x − µ)² 2σ²

¢

= 1

√2πσ exp¡

− µ² 2σ²

¢exp¡

− x²

2σ² + µx σ²

¢.

Define

h(x) = 1 for all x;

c(θ) = 1

√2πσ exp¡

− µ² 2σ²

¢, −∞ < µ < ∞, σ > 0;

w1(θ) = 1

σ², σ > 0; w2(θ) = µ

σ², σ > 0;

t1(x) = −x²/2; and t2(x) = x.

Then

f (x|µ, σ²) = h(x)c(µ, σ) exp[w₁(µ, σ)t₁(x) + w₂(µ, σ)t₂(x)].

In general, the set of x values for which f (x|θ > 0 cannot depend on θ in an exponential family. For example, the set of pdfs given by

f (x|θ) = θ⁻¹exp(1 − x/θ), 0 < θ < x < ∞, is not an exponential family. We have

f (x|θ) = θ⁻¹exp(1 − x/θ)I_[θ,∞)(x).

The indicator function can not be incorporated into any of the functions of (1) since it is not a function of x alone, not a function of θ alone, and cannot be expressed as an exponential.

3.5 Location and Scale Families

Theorem 3.5.1

Let f (x) be any pdf and let µ and σ > 0 be any given constants. Then the function g(x|µ, σ) = 1

σf (x − µ σ ) is a pdf.

2

(3)

Proof: Since f (x) ≥ 0 for all values of x. So, ¹_σf (^x−µ_σ ) ≥ 0 for all values of x, µ and σ.

Next, Z _∞

−∞

1

σf (x − µ σ )dx =

Z _∞

−∞

f (y)dy = 1.

¤

Definition 3.5.2 Let f (x) be any pdf. Then the family of pdfs f (x − µ), indexed by the parameter µ, −∞ < µ < ∞, is called the location family with standard pdf f (x) and µ is called the location parameter for the family.

The location parameter µ simply shifts the pdf f (x) so that the shape of the graph is unchanged but the point on the graph that was above x = 0 for f (x) is above x = µ for f (x − µ).

Example 3.5.3 (Exponential location family)

Let f (x) = e^−x, x ≥ 0, and f (x) = 0, x < 0. To form a location family we replace x with x − µ to obtain

f (x|µ) =







e^−(x−µ) x − µ ≥ 0 0 x − µ < 0.

That is,

f (x|µ) =







e^−(x−µ) x ≥ µ 0 x < µ.

In this type of model, where µ denotes a bound on the range of X, µ is sometimes called a threshold parameter.

Definition 3.5.4

Let f (x) be any pdf. Then for any σ > 0, the family of pdfs (1/σ)f (x/σ), indexed by the parameter σ, is called the scale family with standard pdf f (x) and σ is called the scale parameter of the family.

The effect of introducing the scale parameter σ is either to stretch (σ > 1) or to contract (σ < 1) the graph of f (x) while still maintaining the same basic shape of the graph.

3

(4)

Definition 3.5.5

Let f (x) be any pdf. Then for any µ, −∞ < µ < ∞, and any σ > 0, the family of pdfs (1/σ)f ((x − µ)/σ), indexed by the parameter (µ, σ), is called the location-scale family with standard pdf f (x); µ is called the location parameter and σ is called the scale parameter.

The effect of introducing both the location and scale parameters is to stretch (σ > 1) or contract (σ < 1) the graph with the scale parameter and then shift the graph so that the point that was above 0 is now above µ. The normal and double exponential families are examples of location-scale families.

Theorem 3.5.6

Let f (·) be any pdf. Let µ be any real number, and let σ be any positive real number. Then X is a random variable with pdf (1/σ)f ((x − µ)/σ) if and only if there exists a random variable Z with pdf f (z) and X = σZ + µ.

Theorem 3.5.7

Let Z be a random variable with pdf f (z). Suppose EZ and VarZ exist. If X is a random variable with pdf (1/σ)f ((x − µ)/σ), then

EX = σEZ + µ and VarX = σ²VarZ.

In particular, if EZ = 0 and VarZ = 1, then EX = µ and VarX = σ².

4