2 Expected values

(1)

However, if F_X is constant on some interval, then F_X⁻¹ is not well defined by (2). The problem is avoided by defining F_X⁻¹(y) for 0 < y < 1 by

F_X⁻¹(y) = inf{x : FX(x) ≥ y}. (3)

At the end point of the range of y, F_X⁻¹(1) = ∞ if FX(x) < 1 for all x and, for any F_X, F_X⁻¹(0) =

−∞.

Theorem 1.4 (Probability integral transformation) Let X have continuous cdf F_X(x) and define the random variable Y as Y = F_X(X). Then Y is uniformly distributed on (0, 1), that is, P (Y ≤ y) = y, 0 < y < 1.

Proof: For Y = F_X(X) we have, for 0 < y < 1,

P (Y ≤ y) = P (FX(X) ≤ y)

= P (F_X⁻¹[F_X(X)] ≤ F_X⁻¹(y))

= P (X ≤ F_X⁻¹(y))

= F_X(F_X⁻¹(y)) = y.

At the endpoints we have P (Y ≤ y) = 1 for y ≥ 1 and P (Y ≤ y) = 0 for y ≤ 0, showing that Y has a uniform distribution.

The reasoning behind the equality

P (F_X⁻¹[F_X(X)] ≤ F_X⁻¹(y)) = P (X ≤ F_X⁻¹(y))

is somewhat subtle and deserves additional attention. If F_X is strictly increasing, then it is true that F_X⁻¹(FX(x)) = x. However, if FX is flat, it may be that F_X⁻¹(FX(x)) 6= x. Then FX⁻¹(FX(x)) = x1, since P (X ≤ x) = P (X ≤ x1) for any x ∈ [x1, x₂]. The flat cdf denotes a region of 0 probability P (x1 < X ≤ x) = F^X(x) − F^X(x1) = 0.

2 Expected values

Definition 2.1 The expected value or mean of a random variable g(X), denoted by Eg(X), is

Eg(X) =





 R_∞

−∞g(x)fX(x)dx if X is continuous

P

x∈Xg(x)f_X(x) =P

x∈Xg(x)P (X = x) if X is discrete,

provided that the integral or sum exists. If E|g(X)| = ∞, we say that Eg(X) does not exist.

6

(2)

Example 2.1 (Exponential mean) Suppose X has an exponential (λ) distribution, that is, it has pdf given by

f_X(x) = 1

λe^−x/λ, 0 ≤ x < ∞, λ > 0.

Then EX is given by

EX = Z _∞

0

x1

λe^−x/λdx = λ.

Example 2.2 (Binomial mean) If X has a binomial distribution, its pmf is given by P (X = x) =n

x

p^x(1 − p)^n−x, x = 0, 1, . . . , n,

where n is a positive integer 0 ≤ p ≤ 1, and for every fixed pair n and p the pmf sums to 1.

EX =

n

X

x=0

xn x

p^x(1 − p)^n−x=

n

X

x=1

xn x

p^x(1 − p)^n−x

=

n−1X

x=1

nn − 1 x − 1

p^x(1 − p)^n−x (xn x

= nn − 1 x − 1

)

=

n−1X

y=0

nn − 1 y

p^y+1(1 − p)^n−(y+1) (substitute y = x − 1)

= np

n−1X

y=0

n − 1 y

p^y(1 − p)^n−1−y

= np.

Example 2.3 (Cauchy mean) A classic example of a random variable whose expected value does not exist is a Cauchy random variable, that is, one with pdf

f_X(x) = 1 π

1

1 + x², −∞ < x < ∞.

It is straightforward to check that R_∞

−∞f_X(x)dx = 1, but E|X| = ∞. Write E|X| =

Z _∞

−∞

|x|

π 1

1 + x²dx = 2 π

Z _∞

0

x 1 + x²dx.

For any positive number M , Z _M

0

x

1 + x²dx = 1

2log(1 + x²)|^M0 = 1

2log(1 + M²).

Thus,

E|X| = 1 π lim

M →∞log(1 + M²) = ∞ and EX does not exist.

7

(3)

Theorem 2.1 Let X be a random variable and let a, b, and c be constants. Then for any functions g₁(x) and g₂(x) whose expectations exist,

a. E(ag₁(X) + bg₂(X) + c) = aEg₁(X) + bEg₂(X) + c.

b. If g₁(x) ≥ 0 for all x, then Eg1(X) ≥ 0.

c. If g₁(x) ≥ g2(x) for all x, then Eg₁(X) ≥ Eg2(X).

d. If a ≤ g1(x) ≤ b for all x, then a ≤ Eg1(X) ≤ b.

Example 2.4 (Minimizing distance) Find the value of b which minimizes the distance E(X − b)². E(X − b)² = E(X − EX + EX − b)²

= E(X − EX)²+ (EX − b)²+ 2E((X − EX)(EX − b))

= E(X − EX)²+ (EX − b)². Hence E(X − b)² is minimized by choosing b = EX.

When evaluating expectations of nonlinear functions of X, we can proceed in one of two ways.

From the definition of Eg(X), we could directly calculate Eg(X) =

Z _∞

−∞

g(x)f_X(x)dx.

But we could also find the pdf f_Y(y) of Y = g(X) and we would have Eg(X) = EY =

Z _∞

−∞

yf_Y(y)dy.

3 Moments and moment generating functions

Definition 3.1 For each integer n, the n^th moment of X (or F_X(x)), µ⁰_n, is µ⁰_n= EXⁿ.

The n^th central moment of X, µ_n, is

µ_n= E(X − µ)ⁿ, where µ = µ⁰₁= EX.

Theorem 3.1 The variance of a random variable X is its second central moment, VarX = E(X − EX)². The positive square root of VarX is the standard deviation of X.

8