Lecture 2 : Basics of Probability Theory

(1)

Lecture 2 : Basics of Probability Theory

When an experiment is performed, the realization of the experiment is an outcome in the sample space. If the experiment is performed a number of times, different outcomes may occur each time or some outcomes may repeat. This “frequency of occurrence” of an outcome can be thought of as a probability. More probable outcomes occur more frequently. If the outcomes of an experiment can be described probabilistically, we are on our way to analyzing the experiment statistically.

1 Axiomatic Foundations

Definition 1.1 A collection of subsets of S is called a sigma algebra (or Borel field), denoted by B, if it satisfied the following three properties:

a. ∅ ∈ B (the empty set is an element of B).

b. If A ∈ B, then A^c∈ B (B is closed under complementation).

c. If A₁, A₂, . . . ∈ B, then ∪^∞_i=1A_i ∈ B (B is closed under countable unions).

Example 1.1 (Sigma algebra-I) If S is finite or countable, then these technicalities really do not arise, for we define for a given sample space S,

B = {all subsets of S, including S itself}.

If S has n elements, there are 2ⁿ sets in B. For example, if S = {1, 2, 3}, then B is the following collection of 2³= 8 sets: {1}, {1, 2}, {1, 2, 3}, {2}, {1, 3}, ∅, {3}, {2, 3}.

Example 1.2 (Sigma algebra-II) Let S = (−∞, ∞), the real line. Then B is chosen to contain all sets of the form

[a, b], (a, b], (a, b), [a, b)

(2)

for all real numbers a and b. Also, from the properties of B, it follows that B contains all sets that can be formed by taking (possibly countably infinite) unions and interactions of sets of the above varieties.

Definition 1.2 Given a sample space S and an associated sigma algebra B, a probability function is a function P with domain B that satisfies

1. P (A) ≥ 0 for all A ∈ B.

2. P (S) = 1.

3. If A₁, A₂, . . . ∈ B are pairwise disjoint, then P (∪^∞_i=1A_i) =P_∞

i=1P (A_i).

The three properties given in the above definition are usually referred to as the Axioms of Probability or the Kolmogorov Axioms. Any function P that satisfies the Axioms of Probability is called a probability function.

The following gives a common method of defining a legitimate probability function.

Theorem 1.1 Let S = {s1, . . . , s_n} be a finite set. Let B be any sigma algebra of subsets of S. Let p₁, . . . , p_n be nonnegative numbers that sum to 1. For any A ∈ B, define P (A) by

P (A) = X

{i:si∈A}

p_i.

(The sum over an empty set is defined to be 0.) Then P is a probability function on B. This remains true if S = {s₁, s₂, . . .} is a countable set.

Proof: We will give the proof for finite S. For any A ∈ B, P (A) =P

i:si∈Ap_i ≥ 0, because every p_i ≥ 0. Thus, Axiom 1 is true. Now,

P (S) = X

i:si∈S

p_i= Xn

i=1

p_i = 1.

Thus, Axiom 2 is true. Let A₁, . . . , A_k denote pairwise disjoint events. (B contains only a finite number of sets, so we need consider only finite disjoint unions.) Then,

P (∪^k_i=1A_i) = X

{j:sj∈∪^k_i=1Ai}

p_j = i Xk i=1

X

i{j:sj∈Ai}

p_j = Xk

i=1

P (A_i).

The first and third equalities are true by the definition of P (A). The disjointedness of the A_i’s ensures that the second equality is true, because the same p_j’s appear exactly once on each side of

(3)

Example 1.3 (Defining probabilities-II) The game of darts is played by throwing a dart at a board and receiving a score corresponding to the number assigned to the region in which the dart lands.

For a novice player, it seems reasonable to assume that the probability of the dart hitting a particular region is proportional to the area of the region. Thus, a bigger region has a higher probability of being hit.

The dart board has radius r and the distance between rings is r/5. If we make the assumption that the board is always hit, then we have

P (scoring i points) = Area of region i Area of dart board. For example,

P (scoring1point) = πr²− π(4r/5)²

πr² = 1 − (4 5)². It is easy to derive the general formula, and we find that

P (scoring i points) = (6 − i)²− (5 − i)²

5² , i = 1, . . . , 5,

independent of π and r. The sum of the areas of the disjoint regions equals the area of the dart board.

Thus, the probabilities that have been assigned to the five outcomes sum to 1, and, by Theorem 1.2.6, this is a probability function.

2 The calculus of Probabilities

Theorem 2.1 If P is a probability function and A is any set in B, then a. P (∅) = 0, where ∅ is the empty set.

b. P (A) ≤ 1.

c. P (A^c) = 1 − P (A).

Theorem 2.2 If P is a probability function and A and B are any sets in B, then a. P (B ∩ A^c) = P (B) − P (A ∩ B).

b. P (A ∪ B) = P (A) + P (B) − P (A ∩ B).

c. If A ⊂ B, then P (A) ≤ P (B).

(4)

Formula (b) of Theorem 2.2 gives a useful inequality for the probability of an intersection. Since P (A ∪ B) ≤ 1, we have

P (A ∩ B) = P (A) + P (B) − 1.

This inequality is a special case of what is known as Bonferroni’s inequality.

Theorem 2.3 If P is a probability function, then a. P (A) =P_∞

i=1P (A ∩ C_i) for any partition C₁, C₂, . . .;

b. P (∪^∞_i=1A_i) ≤P_∞

i=1P (A_i) for any sets A₁, A₂, . . .. (Boole’s Inequality)

Proof: Since C₁, C₂, . . . form a partition, we have that C_i∩ C_j = ∅ for all i 6= j, and S = ∪^∞_i=1C_i. Hence,

A = A ∩ S = A ∩ (∪^∞_i=1C_i) = ∪^∞_i=1(A ∩ C_i), where the last equality follows from the Distributive Law. We therefore have

P (A) = P (∪^∞_i=1(A ∩ C_i)).

Now, since the C_i are disjoint, the sets A ∩ C_i are also disjoint, and from the properties of a probability function we have

P ( X∞ i=1

(A ∩ C_i)) = X∞ i=1

P (A ∩ C_i)s.

To establish (b) we first construct a disjoint collection A^∗₁, A^∗₂, . . ., with the property that

∪^∞_i=1A^∗_i = ∪^∞_i=1A_i. We define A^∗_i by

A^∗₁= A₁, A^∗_i = A_i\ ( Xi−1 j=1

A_j), i = 2, 3, . . . ,

where the notation A \ B denotes the part of A that does not intersect with B. In more familiar symbols, A \ B = A ∩ B^c. It should be easy to see that ∪^∞_i=1A^∗_i = ∪^∞_i=1A_i, and we therefore have

P (∪^∞_i=1A_i) = P (∪^∞_i=1A^∗_i) = X∞ i=1

P (A^∗_i),

where the equality follows since the A^∗_i are disjoint. To see this, we write A^∗_i ∩ A^∗_k=©

A_i\ (∪ⁱ⁻¹_j=1A_j)ª

∩©

A_k\ (∪^k−1_j=1A_j)ª

(definition of A^∗_i)

=©

A_i∩ (∪ⁱ⁻¹_j=1A_j)^cª

∩©

A_k∩ (∪^k−1_j=1A_j)^cª

(definition of \)

=© A_i∩

i−1\ A^c_jª

∩© A_k∩

k−1\ A^c_jª

(DeM organ⁰sLaws)

(5)

Now if i > k, the first intersection above will be contained in the set A^c_k, which will have an empty intersection with A_k. If k > i, the argument is similar. Further, by construction A^∗_i ⊂ A_i, so P (A^∗_i) ≤ P (A_i) and we have

X∞ i=1

P (A^∗_i) ≤ X∞

i=1

P (A_i), establishing (b). ¤

There is a similarity between Boole’s Inequality and Bonferroni’s Inequality. If we apply Boole’s Inequality to A^c, we have

P (∪ⁿ_i=1A^c_i) ≤ Xn

i=1

P (A^c_i),

and using the facts that ∪A^c_i = (∩A_i)^c and P (A^c_i) = 1 − P (A_i), we obtain

1 − P (∩ⁿ_i=1A_i) ≤ n − Xn

i=1

P (A_i).

This becomes, on rearranging terms,

P (∩ⁿ_i=1A_i) ≥ Xn

i=1

P (A_i) − (n − 1),

which is a more general version of the Bonferroni Inequality.

3 Counting

Methods of counting are often used in order to construct probability assignments on finite sample spaces, although they can be used to answer other questions also. The following theorem is sometimes known as the Fundamental Theorem of Counting.

Theorem 3.1 If a job consists of k separate tasks, the i^th of which can be done in n_i ways, i = 1, . . . , k, then the entire job can be done in n₁× n₂× · · · × n_k ways.

Example 3.1 For a number of years the New York state lottery operated according to the following scheme. From the numbers 1,2, . . ., 44, a person may pick any six for her ticket. The winning number is then decided by randomly selecting six numbers from the forty-four. So the first number can be chosen in 44 ways, and the second number in 43 ways, making a total of 44 × 43 = 1892 ways of choosing the first two numbers. However, if a person is allowed to choose the same number twice, then the first two numbers can be chosen in 44 × 44 = 1936 ways.

(6)

The above example makes a distinction between counting with replacement and counting without replacement. The second crucial element in counting is whether or not the ordering of the tasks is important. Taking all of these considerations into account, we can construct a 2 × 2 table of possibilities.

Number of possible arrangements of size r from n objects Without replacement With replacement

Ordered _(n−r)!^n! n^r

Unordered ¡_n

r

¢ ¡_n+r−1

r

¢

Let us consider counting all of the possible lottery tickets under each of these four cases.

Ordered, without replacement From the Fundamental Theorem of Counting, there are

44 × 43 × 42 × 41 × 40 × 39 = 44!

38! = 5, 082, 517, 440 possible tickets.

Ordered, with replacement Since each number can now be selected in 44 ways, there are

44 × 44 × 44 × 44 × 44 × 44 = 44⁶= 7, 256, 313, 856

possible tickets.

Unordered, without replacement From the Fundamental Theorem, six numbers can be arranged in 6! ways, so the total number of unordered tickets is

44 × 43 × 42 × 41 × 40 × 39

6 × 5 × 4 × 3 × 2 × 1 = 44!

6!38! = 7, 059, 052.

Unordered, with replacement In this case, the total number of unordered tickets is 44 × 45 × 46 × 47 × 48 × 49

6 × 5 × 4 × 3 × 2 × 1 = 49!

6!43! = 13, 983, 816.

4 Enumerating outcomes

The counting techniques of the previous section are useful when the sample space S is a finite set and all the outcomes in S are equally likely. Then probabilities of events can be calculated by

(7)

sample space. Saying that all the outcomes are equally likely means that P ({s_i}) = 1?N for every outcome s_i. Then, we have, for any event A,

P (A) = X

si∈A

P ({s_i}) = X

si∈A

1

N = # of elements in A

# of elements in S.

Example 4.1 Consider choosing a five-card poker hand from a standard deck of 52 playing cards.

What is the probability of having four aces? If we specify that four of the cards are aces, then there are 48 different ways of specifying the fifth card. Thus,

P (four aces) = 48

¡₅₂

5

¢ = 48

2, 598, 960. The probability of having four of a kind is

P (four of a kind) = 13 × 48

¡₅₂

5

¢ = 624

2, 598, 960. The probability of having exactly one pair is

P (exactly one pair) = 13¡₄

2

¢¡₁₂

3

¢4³

¡₅₂

5

¢ = 1, 098, 240 2, 598, 960.