Transformations and Expectations
1 Distributions of Functions of a Random Variable
If X is a random variable with cdf FX(x), then any function of X, say g(X), is also a random variable. Sine Y = g(X) is a function of X, we can describe the probabilistic behavior of Y in terms of that of X. That is, for any set A,
P (Y ∈ A) = P (g(X) ∈ A),
Showing that the distribution of Y depends on the function FX and g.
Formally, if we write y = g(x), the function g(x) defines a mapping from the original sample space of X, X , to a new sample space, Y, the sample space of the random variable Y . That is,
g(x) : X −→ Y.
Conveniently, we can write
X = {x : fX(x) > 0} and Y = {y : y = g(x) for some x ∈ CX}. (1) The pdf of X is positive only on the set X and is 0 elsewhere. Such a set is called the support set or support of a distribution. We associate with g an inverse mapping, denoted by g−1, which is a mapping from subsets of Y to subsets of X , and is defined by
g−1(A) = {x ∈ X : g(x) ∈ A}.
It is possible for A to be a point set, say A = {y}. Then
g−1({y}) = {x ∈ X : g(x) = y}.
In this case, we often write g−1(y) instead of g−1({y}).
The probability distribution of Y can be defined as follows. For any set A ⊂ Y, P (Y ⊂ A) = P (g(X) ⊂ A) = P ({x ∈ X : g(x) ∈ A}) = P (X ∈ g−1(A)).
It is straightforward to show that this probability function satisfies the Kolmogorov Axioms.
If X is a discrete random variable, then X is countable. The sample space for Y = g(X) is Y = {y : y = g(x), x ∈ X }, which is also a countable set. Thus, Y is also a discrete random variable. The pmf for Y is
fY(y) = P (Y = y) = X
x∈g−1(y)
P (X = x)
= X
x∈g−1(y)
fX(x), for y ∈ Y
and fY(y) = 0 for y /∈ Y. In this case, finding the pmf of Y involves simply identifying g−1(y), for each y ∈ Y, and summing the appropriate probabilities.
Example 1.1 (Binomial transformation) A discrete random variable X has a binomial distribution if its pmf is of the form
fX(x) = P (X = x) =n x
px(1 − p)n−x, x = 0, 1, . . . , n,
where n is a positive integer and 0 ≤ p ≤ 1. Consider the random variable Y = g(X), where g(x) = n − x. Thus, g−1(y) is the single point x = n − y, and
fY(y) = X
x∈g−1(y)
fX(x) = fX(n − y)
=
n n − y
pn−y(1 − p)n−(n−y)
=n y
(1 − p)ypn−y.
Thus, we see that Y also has a binomial distribution, but with parameters n and 1 − p.
If X and Y are continuous random variables, the cdf of Y = g(X) is FY(y) = P (Y ≤ y) = P (g(X) ≤ y)
= P ({x ∈ X : g(x) ≤ y}) = Z
{x∈X :g(x)≤y}
fX(x)dx.
Sometimes there may be difficulty in identifying {x ∈ X : g(x) ≤ y} and carrying out the integration of FX(x) over this region.
Example 1.2 (Uniform transformation) Suppose X has a uniform distribution on the interval (0, 2π), that is,
fX(x) =
1/(2π) 0 < x < 2π 0 otherwise.
Consider Y = sin2(X). Then
P (Y ≤ y) = P (X ≤ x1) + P (x2 ≤ X ≤ x3) + P (X ≥ x4)
= 2P (X ≤ x1) + 2P (x2 ≤ X ≤ π), where x1 and x2 are the two solutions to
sin2(x) = y, 0 < x < π.
Thus, even though this example dealt with a seemingly simple situation, the cdf of Y was not simple.
It is easiest to deal with functions g(x) that are monotone, that is, those that satisfy either u > v ⇒ g(u) > g(v) (increasing) or u < v ⇒ g(u) > g(v) (decreasing).
If g is monotone, then g−1 is single-valued; that is, g−1(y) = x if and only if y = g(x). If g is increasing, this implies that
{x ∈ X : g(x) ≤ y} = {x ∈ X : x ≤ g−1(y)}.
If g is decreasing, this implies that
{x ∈ X : g(x) ≤ y} = {x ∈ X : x ≥ g−1(y)}.
If g(x) is increasing, we can write
FY(y) = Z
{x∈X :x≤g−1(y)}
fX(x)dx =
Z g−1(y)
−∞
fX(x)dx = FX(g−1(y)).
If g(x) is decreasing, we have FY(y) =
Z ∞
g−1(y)
fX(x)dx = 1 − FX(g−1(y)).
The continuity of X is used to obtain the second equality. We summarize these results in the following theorem.
Theorem 1.1 Let X have cdf FX(x), let Y = g(X), and let X and Y be defined as in (1).
a. If g is an increasing function on X , FY(y) = FX(g−1(y)) for y ∈ Y.
b. If g is a decreasing function on X and X is a continuous random variable, FY(y) = 1 − FX(g−1(y)) for y ∈ Y.
Example 1.3 (Uniform-exponential relationship-I) Suppose X ∼ fX(x) = 1 if 0 < x < 1 and 0 otherwise, the uniform(0,1) distribution. It is straightforward to check that FX(x) = x, 0 < x < 1.
We now make the transformation Y = g(X) = − log(X). Since d
dxg(x) = −1
x < 0, for 0 < x < 1, g(x) is a decreasing function. Therefore, for y > 0,
FY(y) = 1 − FX(g−1(y)) = 1 − FX(e−y) = 1 − e−y. Of course, FY(y) = 0 for y ≤ 0.
If the pdf of Y is continuous, it can be obtained by differentiating the cdf.
Theorem 1.2 Let X have pdf fX(x) and Y = g(X), where g is a monotone function. Let X and Y be define by (1). Suppose that fX(x) is continuous on X and that g−1(y) has a continuous derivative on Y. Then the pdf of Y is given by
fY(y) =
fX(g−1(y))|dyd g−1(y)| y ∈ Y
0 otherwise.
Proof: From Theorem 1.1 we have, by the chain rule,
fY(y) = d
dyFY(y) =
fX(g−1(y))dydg−1(y) if g is increasing
−fX(g−1(y))dyd g−1(y) if g is decreasing.
Example 1.4 (Inverted gamma pdf) Let fX(x) be the gamma pdf f (x) = 1
(n − 1)!βnxn−1e−x/β, 0 < x < ∞,
where β is a positive constant and n is a positive integer. If we let y = g(x) = 1/x, then g−1(y) = 1/y and dyd g−1(y) = −1/y2. Applying the above theorem, for 0 < y < ∞, we get
fY(y) = fX(g−1(y))| d
dyg−1(y)|
= 1
(n − 1)!βn 1 y
n−1
e−1/(βy) 1 y2
= 1
(n − 1)!βn 1 y
n+1
e−1/(βy), a special case of a pdf known as the inverted gamma pdf.
Theorem 1.3 Let X have pdf FX(x), let Y = g(X), and define the sample space X as in (1).
Suppose there exists a partition, A0, A1, . . . , Ak, of X such that P (X ∈ A0) = 0 and fX(x) is con- tinuous on each Ai. Further, suppose there exist functions g1(x), . . . , gk(x), defined on A1, . . . , Ak, respectively, satisfying
i. g(x) = gi(x), for x ∈ Ai, ii. gi(x) is monotone on Ai,
iii. the set Y = {y : y = gi(x) for some x ∈ Ai} is the same for each i = 1, . . . , k, and iv. gi−1(y) has a continuous derivative on Y, for each i = 1, . . . , k.
Then
fY(y) =
Pk
i=1fX(g−1i (y))|dyd gi−1(y)| y ∈ Y
0 otherwise.
Example 1.5 (Normal-Chi squared relationship) Let X have the standard normal distribution fX(x) = 1
2πe−x2/2, −∞ < x < ∞.
Consider Y = X2. The function g(x) = x2 is monotone on (−∞, 0) and (0, ∞). The set Y = (0, ∞). Applying Theorem 1.3, we take
A0 = {0};
A1= (−∞, 0), g1(x) = x2, g−11 (y) = −√y;
A2 = (0, ∞), g2(x) = x2, g−12 (y) =√y.
The pdf of Y is
fY(y) = 1
√2πe−(−√y)2/2| − 1
2√y| + 1
√2πe−(√y)2/2| 1 2√y|
= 1
√2π
√1ye−y/2, 0 < y < ∞.
So Y is a chi-squared random variable with 1 degree of freedom.
Let FX−1 denote the inverse of the cdf FX. If FX is strictly increasing, then FX−1 is well defined by
FX−1(y) = x ⇔ FX(x) = y. (2)
However, if FX is constant on some interval, then FX−1 is not well defined by (2). The problem is avoided by defining FX−1(y) for 0 < y < 1 by
FX−1(y) = inf{x : FX(x) ≥ y}. (3)
At the end point of the range of y, FX−1(1) = ∞ if FX(x) < 1 for all x and, for any FX, FX−1(0) =
−∞.
Theorem 1.4 (Probability integral transformation) Let X have continuous cdf FX(x) and define the random variable Y as Y = FX(X). Then Y is uniformly distributed on (0, 1), that is, P (Y ≤ y) = y, 0 < y < 1.
Proof: For Y = FX(X) we have, for 0 < y < 1,
P (Y ≤ y) = P (FX(X) ≤ y)
= P (FX−1[FX(X)] ≤ FX−1(y))
= P (X ≤ FX−1(y))
= FX(FX−1(y)) = y.
At the endpoints we have P (Y ≤ y) = 1 for y ≥ 1 and P (Y ≤ y) = 0 for y ≤ 0, showing that Y has a uniform distribution.
The reasoning behind the equality
P (FX−1[FX(X)] ≤ FX−1(y)) = P (X ≤ FX−1(y))
is somewhat subtle and deserves additional attention. If FX is strictly increasing, then it is true that FX−1(FX(x)) = x. However, if FX is flat, it may be that FX−1(FX(x)) 6= x. Then FX−1(FX(x)) = x1, since P (X ≤ x) = P (X ≤ x1) for any x ∈ [x1, x2]. The flat cdf denotes a region of 0 probability P (x1 < X ≤ x) = FX(x) − FX(x1) = 0.
2 Expected values
Definition 2.1 The expected value or mean of a random variable g(X), denoted by Eg(X), is
Eg(X) =
R∞
−∞g(x)fX(x)dx if X is continuous
P
x∈Xg(x)fX(x) =P
x∈Xg(x)P (X = x) if X is discrete,
provided that the integral or sum exists. If E|g(X)| = ∞, we say that Eg(X) does not exist.