However, if FX is constant on some interval, then FX−1 is not well defined by (2). The problem is avoided by defining FX−1(y) for 0 < y < 1 by
FX−1(y) = inf{x : FX(x) ≥ y}. (3)
At the end point of the range of y, FX−1(1) = ∞ if FX(x) < 1 for all x and, for any FX, FX−1(0) =
−∞.
Theorem 1.4 (Probability integral transformation) Let X have continuous cdf FX(x) and define the random variable Y as Y = FX(X). Then Y is uniformly distributed on (0, 1), that is, P (Y ≤ y) = y, 0 < y < 1.
Proof: For Y = FX(X) we have, for 0 < y < 1,
P (Y ≤ y) = P (FX(X) ≤ y)
= P (FX−1[FX(X)] ≤ FX−1(y))
= P (X ≤ FX−1(y))
= FX(FX−1(y)) = y.
At the endpoints we have P (Y ≤ y) = 1 for y ≥ 1 and P (Y ≤ y) = 0 for y ≤ 0, showing that Y has a uniform distribution.
The reasoning behind the equality
P (FX−1[FX(X)] ≤ FX−1(y)) = P (X ≤ FX−1(y))
is somewhat subtle and deserves additional attention. If FX is strictly increasing, then it is true that FX−1(FX(x)) = x. However, if FX is flat, it may be that FX−1(FX(x)) 6= x. Then FX−1(FX(x)) = x1, since P (X ≤ x) = P (X ≤ x1) for any x ∈ [x1, x2]. The flat cdf denotes a region of 0 probability P (x1 < X ≤ x) = FX(x) − FX(x1) = 0.
2 Expected values
Definition 2.1 The expected value or mean of a random variable g(X), denoted by Eg(X), is
Eg(X) =
R∞
−∞g(x)fX(x)dx if X is continuous
P
x∈Xg(x)fX(x) =P
x∈Xg(x)P (X = x) if X is discrete,
provided that the integral or sum exists. If E|g(X)| = ∞, we say that Eg(X) does not exist.
6
Example 2.1 (Exponential mean) Suppose X has an exponential (λ) distribution, that is, it has pdf given by
fX(x) = 1
λe−x/λ, 0 ≤ x < ∞, λ > 0.
Then EX is given by
EX = Z ∞
0
x1
λe−x/λdx = λ.
Example 2.2 (Binomial mean) If X has a binomial distribution, its pmf is given by P (X = x) =n
x
px(1 − p)n−x, x = 0, 1, . . . , n,
where n is a positive integer 0 ≤ p ≤ 1, and for every fixed pair n and p the pmf sums to 1.
EX =
n
X
x=0
xn x
px(1 − p)n−x=
n
X
x=1
xn x
px(1 − p)n−x
=
n−1X
x=1
nn − 1 x − 1
px(1 − p)n−x (xn x
= nn − 1 x − 1
)
=
n−1X
y=0
nn − 1 y
py+1(1 − p)n−(y+1) (substitute y = x − 1)
= np
n−1X
y=0
n − 1 y
py(1 − p)n−1−y
= np.
Example 2.3 (Cauchy mean) A classic example of a random variable whose expected value does not exist is a Cauchy random variable, that is, one with pdf
fX(x) = 1 π
1
1 + x2, −∞ < x < ∞.
It is straightforward to check that R∞
−∞fX(x)dx = 1, but E|X| = ∞. Write E|X| =
Z ∞
−∞
|x|
π 1
1 + x2dx = 2 π
Z ∞
0
x 1 + x2dx.
For any positive number M , Z M
0
x
1 + x2dx = 1
2log(1 + x2)|M0 = 1
2log(1 + M2).
Thus,
E|X| = 1 π lim
M →∞log(1 + M2) = ∞ and EX does not exist.
7
Theorem 2.1 Let X be a random variable and let a, b, and c be constants. Then for any functions g1(x) and g2(x) whose expectations exist,
a. E(ag1(X) + bg2(X) + c) = aEg1(X) + bEg2(X) + c.
b. If g1(x) ≥ 0 for all x, then Eg1(X) ≥ 0.
c. If g1(x) ≥ g2(x) for all x, then Eg1(X) ≥ Eg2(X).
d. If a ≤ g1(x) ≤ b for all x, then a ≤ Eg1(X) ≤ b.
Example 2.4 (Minimizing distance) Find the value of b which minimizes the distance E(X − b)2. E(X − b)2 = E(X − EX + EX − b)2
= E(X − EX)2+ (EX − b)2+ 2E((X − EX)(EX − b))
= E(X − EX)2+ (EX − b)2. Hence E(X − b)2 is minimized by choosing b = EX.
When evaluating expectations of nonlinear functions of X, we can proceed in one of two ways.
From the definition of Eg(X), we could directly calculate Eg(X) =
Z ∞
−∞
g(x)fX(x)dx.
But we could also find the pdf fY(y) of Y = g(X) and we would have Eg(X) = EY =
Z ∞
−∞
yfY(y)dy.
3 Moments and moment generating functions
Definition 3.1 For each integer n, the nth moment of X (or FX(x)), µ0n, is µ0n= EXn.
The nth central moment of X, µn, is
µn= E(X − µ)n, where µ = µ01= EX.
Theorem 3.1 The variance of a random variable X is its second central moment, VarX = E(X − EX)2. The positive square root of VarX is the standard deviation of X.
8