1 Distributions of Functions of a Random Variable

(1)

Transformations and Expectations

1 Distributions of Functions of a Random Variable

If X is a random variable with cdf FX(x), then any function of X, say g(X), is also a random variable. Sine Y = g(X) is a function of X, we can describe the probabilistic behavior of Y in terms of that of X. That is, for any set A,

P (Y ∈ A) = P (g(X) ∈ A),

Showing that the distribution of Y depends on the function F_X and g.

Formally, if we write y = g(x), the function g(x) defines a mapping from the original sample space of X, X , to a new sample space, Y, the sample space of the random variable Y . That is,

g(x) : X −→ Y.

Conveniently, we can write

X = {x : fX(x) > 0} and Y = {y : y = g(x) for some x ∈ CX}. (1) The pdf of X is positive only on the set X and is 0 elsewhere. Such a set is called the support set or support of a distribution. We associate with g an inverse mapping, denoted by g⁻¹, which is a mapping from subsets of Y to subsets of X , and is defined by

g⁻¹(A) = {x ∈ X : g(x) ∈ A}.

It is possible for A to be a point set, say A = {y}. Then

g⁻¹({y}) = {x ∈ X : g(x) = y}.

In this case, we often write g⁻¹(y) instead of g⁻¹({y}).

The probability distribution of Y can be defined as follows. For any set A ⊂ Y, P (Y ⊂ A) = P (g(X) ⊂ A) = P ({x ∈ X : g(x) ∈ A}) = P (X ∈ g⁻¹(A)).

(2)

It is straightforward to show that this probability function satisfies the Kolmogorov Axioms.

If X is a discrete random variable, then X is countable. The sample space for Y = g(X) is Y = {y : y = g(x), x ∈ X }, which is also a countable set. Thus, Y is also a discrete random variable. The pmf for Y is

fY(y) = P (Y = y) = X

x∈g⁻¹(y)

P (X = x)

= X

x∈g⁻¹(y)

fX(x), for y ∈ Y

and f_Y(y) = 0 for y /∈ Y. In this case, finding the pmf of Y involves simply identifying g⁻¹(y), for each y ∈ Y, and summing the appropriate probabilities.

Example 1.1 (Binomial transformation) A discrete random variable X has a binomial distribution if its pmf is of the form

f_X(x) = P (X = x) =n x

p^x(1 − p)^n−x, x = 0, 1, . . . , n,

where n is a positive integer and 0 ≤ p ≤ 1. Consider the random variable Y = g(X), where g(x) = n − x. Thus, g⁻¹(y) is the single point x = n − y, and

f_Y(y) = X

x∈g⁻¹(y)

f_X(x) = f_X(n − y)

=

n n − y

p^n−y(1 − p)^n−(n−y)

=n y

(1 − p)^yp^n−y.

Thus, we see that Y also has a binomial distribution, but with parameters n and 1 − p.

If X and Y are continuous random variables, the cdf of Y = g(X) is F_Y(y) = P (Y ≤ y) = P (g(X) ≤ y)

= P ({x ∈ X : g(x) ≤ y}) = Z

{x∈X :g(x)≤y}

f_X(x)dx.

Sometimes there may be difficulty in identifying {x ∈ X : g(x) ≤ y} and carrying out the integration of F_X(x) over this region.

Example 1.2 (Uniform transformation) Suppose X has a uniform distribution on the interval (0, 2π), that is,

f_X(x) =







1/(2π) 0 < x < 2π 0 otherwise.

(3)

Consider Y = sin²(X). Then

P (Y ≤ y) = P (X ≤ x1) + P (x₂ ≤ X ≤ x3) + P (X ≥ x4)

= 2P (X ≤ x1) + 2P (x₂ ≤ X ≤ π), where x₁ and x₂ are the two solutions to

sin²(x) = y, 0 < x < π.

Thus, even though this example dealt with a seemingly simple situation, the cdf of Y was not simple.

It is easiest to deal with functions g(x) that are monotone, that is, those that satisfy either u > v ⇒ g(u) > g(v) (increasing) or u < v ⇒ g(u) > g(v) (decreasing).

If g is monotone, then g⁻¹ is single-valued; that is, g⁻¹(y) = x if and only if y = g(x). If g is increasing, this implies that

{x ∈ X : g(x) ≤ y} = {x ∈ X : x ≤ g⁻¹(y)}.

If g is decreasing, this implies that

{x ∈ X : g(x) ≤ y} = {x ∈ X : x ≥ g⁻¹(y)}.

If g(x) is increasing, we can write

FY(y) = Z

{x∈X :x≤g⁻¹(y)}

fX(x)dx =

Z g⁻¹(y)

−∞

fX(x)dx = FX(g⁻¹(y)).

If g(x) is decreasing, we have F_Y(y) =

Z _∞

g⁻¹(y)

f_X(x)dx = 1 − FX(g⁻¹(y)).

The continuity of X is used to obtain the second equality. We summarize these results in the following theorem.

Theorem 1.1 Let X have cdf FX(x), let Y = g(X), and let X and Y be defined as in (1).

a. If g is an increasing function on X , FY(y) = F_X(g⁻¹(y)) for y ∈ Y.

b. If g is a decreasing function on X and X is a continuous random variable, F^Y(y) = 1 − F_X(g⁻¹(y)) for y ∈ Y.

(4)

Example 1.3 (Uniform-exponential relationship-I) Suppose X ∼ fX(x) = 1 if 0 < x < 1 and 0 otherwise, the uniform(0,1) distribution. It is straightforward to check that F_X(x) = x, 0 < x < 1.

We now make the transformation Y = g(X) = − log(X). Since d

dxg(x) = −1

x < 0, for 0 < x < 1, g(x) is a decreasing function. Therefore, for y > 0,

F_Y(y) = 1 − FX(g⁻¹(y)) = 1 − FX(e^−y) = 1 − e^−y. Of course, F_Y(y) = 0 for y ≤ 0.

If the pdf of Y is continuous, it can be obtained by differentiating the cdf.

Theorem 1.2 Let X have pdf f_X(x) and Y = g(X), where g is a monotone function. Let X and Y be define by (1). Suppose that f^X(x) is continuous on X and that g⁻¹(y) has a continuous derivative on Y. Then the pdf of Y is given by

fY(y) =







f_X(g⁻¹(y))|_dy^d g⁻¹(y)| y ∈ Y

0 otherwise.

Proof: From Theorem 1.1 we have, by the chain rule,

f_Y(y) = d

dyF_Y(y) =







f_X(g⁻¹(y))_dy^dg⁻¹(y) if g is increasing

−fX(g⁻¹(y))_dy^d g⁻¹(y) if g is decreasing.

Example 1.4 (Inverted gamma pdf) Let f_X(x) be the gamma pdf f (x) = 1

(n − 1)!βⁿxⁿ⁻¹e^−x/β, 0 < x < ∞,

where β is a positive constant and n is a positive integer. If we let y = g(x) = 1/x, then g⁻¹(y) = 1/y and _dy^d g⁻¹(y) = −1/y². Applying the above theorem, for 0 < y < ∞, we get

f_Y(y) = f_X(g⁻¹(y))| d

dyg⁻¹(y)|

= 1

(n − 1)!βⁿ 1 y

_n−1

e^−1/(βy) 1 y²

= 1

(n − 1)!βⁿ 1 y

n+1

e^−1/(βy), a special case of a pdf known as the inverted gamma pdf.

(5)

Theorem 1.3 Let X have pdf F_X(x), let Y = g(X), and define the sample space X as in (1).

Suppose there exists a partition, A₀, A₁, . . . , A_k, of X such that P (X ∈ A0) = 0 and f_X(x) is continuous on each A_i. Further, suppose there exist functions g₁(x), . . . , g_k(x), defined on A₁, . . . , A_k, respectively, satisfying

i. g(x) = g_i(x), for x ∈ Ai, ii. g_i(x) is monotone on A_i,

iii. the set Y = {y : y = gi(x) for some x ∈ Ai} is the same for each i = 1, . . . , k, and iv. g_i⁻¹(y) has a continuous derivative on Y, for each i = 1, . . . , k.

Then

f_Y(y) =





 Pk

i=1f_X(g⁻¹_i (y))|_dy^d g_i⁻¹(y)| y ∈ Y

0 otherwise.

Example 1.5 (Normal-Chi squared relationship) Let X have the standard normal distribution f_X(x) = 1

2πe^−x²^/2, −∞ < x < ∞.

Consider Y = X². The function g(x) = x² is monotone on (−∞, 0) and (0, ∞). The set Y = (0, ∞). Applying Theorem 1.3, we take

A₀ = {0};

A₁= (−∞, 0), g₁(x) = x², g⁻¹₁ (y) = −√y;

A₂ = (0, ∞), g₂(x) = x², g⁻¹₂ (y) =√y.

The pdf of Y is

f_Y(y) = 1

√2πe⁻⁽⁻^√^y)²^/2| − 1

2√y| + 1

√2πe⁻⁽^√^y)²^/2| 1 2√y|

= 1

√2π

√1ye^−y/2, 0 < y < ∞.

So Y is a chi-squared random variable with 1 degree of freedom.

Let F_X⁻¹ denote the inverse of the cdf F_X. If F_X is strictly increasing, then F_X⁻¹ is well defined by

F_X⁻¹(y) = x ⇔ FX(x) = y. (2)

(6)

However, if F_X is constant on some interval, then F_X⁻¹ is not well defined by (2). The problem is avoided by defining F_X⁻¹(y) for 0 < y < 1 by

F_X⁻¹(y) = inf{x : FX(x) ≥ y}. (3)

At the end point of the range of y, F_X⁻¹(1) = ∞ if FX(x) < 1 for all x and, for any F_X, F_X⁻¹(0) =

−∞.

Theorem 1.4 (Probability integral transformation) Let X have continuous cdf F_X(x) and define the random variable Y as Y = F_X(X). Then Y is uniformly distributed on (0, 1), that is, P (Y ≤ y) = y, 0 < y < 1.

Proof: For Y = F_X(X) we have, for 0 < y < 1,

P (Y ≤ y) = P (FX(X) ≤ y)

= P (F_X⁻¹[F_X(X)] ≤ F_X⁻¹(y))

= P (X ≤ F_X⁻¹(y))

= F_X(F_X⁻¹(y)) = y.

At the endpoints we have P (Y ≤ y) = 1 for y ≥ 1 and P (Y ≤ y) = 0 for y ≤ 0, showing that Y has a uniform distribution.

The reasoning behind the equality

P (F_X⁻¹[F_X(X)] ≤ F_X⁻¹(y)) = P (X ≤ F_X⁻¹(y))

is somewhat subtle and deserves additional attention. If F_X is strictly increasing, then it is true that F_X⁻¹(FX(x)) = x. However, if FX is flat, it may be that F_X⁻¹(FX(x)) 6= x. Then FX⁻¹(FX(x)) = x1, since P (X ≤ x) = P (X ≤ x1) for any x ∈ [x1, x₂]. The flat cdf denotes a region of 0 probability P (x1 < X ≤ x) = F^X(x) − F^X(x1) = 0.

2 Expected values

Definition 2.1 The expected value or mean of a random variable g(X), denoted by Eg(X), is

Eg(X) =





 R_∞

−∞g(x)fX(x)dx if X is continuous

P

x∈Xg(x)f_X(x) =P

x∈Xg(x)P (X = x) if X is discrete,

provided that the integral or sum exists. If E|g(X)| = ∞, we say that Eg(X) does not exist.