• 沒有找到結果。

Lecture 2 : Basics of Probability Theory

N/A
N/A
Protected

Academic year: 2022

Share "Lecture 2 : Basics of Probability Theory"

Copied!
7
0
0

加載中.... (立即查看全文)

全文

(1)

Lecture 2 : Basics of Probability Theory

When an experiment is performed, the realization of the experiment is an outcome in the sample space. If the experiment is performed a number of times, different outcomes may occur each time or some outcomes may repeat. This “frequency of occurrence” of an outcome can be thought of as a probability. More probable outcomes occur more frequently. If the outcomes of an experiment can be described probabilistically, we are on our way to analyzing the experiment statistically.

1 Axiomatic Foundations

Definition 1.1 A collection of subsets of S is called a sigma algebra (or Borel field), denoted by B, if it satisfied the following three properties:

a. ∅ ∈ B (the empty set is an element of B).

b. If A ∈ B, then Ac∈ B (B is closed under complementation).

c. If A1, A2, . . . ∈ B, then ∪i=1Ai ∈ B (B is closed under countable unions).

Example 1.1 (Sigma algebra-I) If S is finite or countable, then these technicalities really do not arise, for we define for a given sample space S,

B = {all subsets of S, including S itself}.

If S has n elements, there are 2n sets in B. For example, if S = {1, 2, 3}, then B is the following collection of 23= 8 sets: {1}, {1, 2}, {1, 2, 3}, {2}, {1, 3}, ∅, {3}, {2, 3}.

Example 1.2 (Sigma algebra-II) Let S = (−∞, ∞), the real line. Then B is chosen to contain all sets of the form

[a, b], (a, b], (a, b), [a, b)

(2)

for all real numbers a and b. Also, from the properties of B, it follows that B contains all sets that can be formed by taking (possibly countably infinite) unions and interactions of sets of the above varieties.

Definition 1.2 Given a sample space S and an associated sigma algebra B, a probability function is a function P with domain B that satisfies

1. P (A) ≥ 0 for all A ∈ B.

2. P (S) = 1.

3. If A1, A2, . . . ∈ B are pairwise disjoint, then P (∪i=1Ai) =P

i=1P (Ai).

The three properties given in the above definition are usually referred to as the Axioms of Probability or the Kolmogorov Axioms. Any function P that satisfies the Axioms of Probability is called a probability function.

The following gives a common method of defining a legitimate probability function.

Theorem 1.1 Let S = {s1, . . . , sn} be a finite set. Let B be any sigma algebra of subsets of S. Let p1, . . . , pn be nonnegative numbers that sum to 1. For any A ∈ B, define P (A) by

P (A) = X

{i:si∈A}

pi.

(The sum over an empty set is defined to be 0.) Then P is a probability function on B. This remains true if S = {s1, s2, . . .} is a countable set.

Proof: We will give the proof for finite S. For any A ∈ B, P (A) =P

i:si∈Api ≥ 0, because every pi ≥ 0. Thus, Axiom 1 is true. Now,

P (S) = X

i:si∈S

pi= Xn

i=1

pi = 1.

Thus, Axiom 2 is true. Let A1, . . . , Ak denote pairwise disjoint events. (B contains only a finite number of sets, so we need consider only finite disjoint unions.) Then,

P (∪ki=1Ai) = X

{j:sj∈∪ki=1Ai}

pj = i Xk i=1

X

i{j:sj∈Ai}

pj = Xk

i=1

P (Ai).

The first and third equalities are true by the definition of P (A). The disjointedness of the Ai’s ensures that the second equality is true, because the same pj’s appear exactly once on each side of

(3)

Example 1.3 (Defining probabilities-II) The game of darts is played by throwing a dart at a board and receiving a score corresponding to the number assigned to the region in which the dart lands.

For a novice player, it seems reasonable to assume that the probability of the dart hitting a particular region is proportional to the area of the region. Thus, a bigger region has a higher probability of being hit.

The dart board has radius r and the distance between rings is r/5. If we make the assumption that the board is always hit, then we have

P (scoring i points) = Area of region i Area of dart board. For example,

P (scoring1point) = πr2− π(4r/5)2

πr2 = 1 − (4 5)2. It is easy to derive the general formula, and we find that

P (scoring i points) = (6 − i)2− (5 − i)2

52 , i = 1, . . . , 5,

independent of π and r. The sum of the areas of the disjoint regions equals the area of the dart board.

Thus, the probabilities that have been assigned to the five outcomes sum to 1, and, by Theorem 1.2.6, this is a probability function.

2 The calculus of Probabilities

Theorem 2.1 If P is a probability function and A is any set in B, then a. P (∅) = 0, where ∅ is the empty set.

b. P (A) ≤ 1.

c. P (Ac) = 1 − P (A).

Theorem 2.2 If P is a probability function and A and B are any sets in B, then a. P (B ∩ Ac) = P (B) − P (A ∩ B).

b. P (A ∪ B) = P (A) + P (B) − P (A ∩ B).

c. If A ⊂ B, then P (A) ≤ P (B).

(4)

Formula (b) of Theorem 2.2 gives a useful inequality for the probability of an intersection. Since P (A ∪ B) ≤ 1, we have

P (A ∩ B) = P (A) + P (B) − 1.

This inequality is a special case of what is known as Bonferroni’s inequality.

Theorem 2.3 If P is a probability function, then a. P (A) =P

i=1P (A ∩ Ci) for any partition C1, C2, . . .;

b. P (∪i=1Ai) ≤P

i=1P (Ai) for any sets A1, A2, . . .. (Boole’s Inequality)

Proof: Since C1, C2, . . . form a partition, we have that Ci∩ Cj = ∅ for all i 6= j, and S = ∪i=1Ci. Hence,

A = A ∩ S = A ∩ (∪i=1Ci) = ∪i=1(A ∩ Ci), where the last equality follows from the Distributive Law. We therefore have

P (A) = P (∪i=1(A ∩ Ci)).

Now, since the Ci are disjoint, the sets A ∩ Ci are also disjoint, and from the properties of a probability function we have

P ( X i=1

(A ∩ Ci)) = X i=1

P (A ∩ Ci)s.

To establish (b) we first construct a disjoint collection A1, A2, . . ., with the property that

i=1Ai = ∪i=1Ai. We define Ai by

A1= A1, Ai = Ai\ ( Xi−1 j=1

Aj), i = 2, 3, . . . ,

where the notation A \ B denotes the part of A that does not intersect with B. In more familiar symbols, A \ B = A ∩ Bc. It should be easy to see that ∪i=1Ai = ∪i=1Ai, and we therefore have

P (∪i=1Ai) = P (∪i=1Ai) = X i=1

P (Ai),

where the equality follows since the Ai are disjoint. To see this, we write Ai ∩ Ak

Ai\ (∪i−1j=1Aj

©

Ak\ (∪k−1j=1Aj

(definition of Ai)

Ai∩ (∪i−1j=1Aj)cª

©

Ak∩ (∪k−1j=1Aj)cª

(definition of \)

Ai

i−1\ Acjª

© Ak

k−1\ Acjª

(DeM organ0sLaws)

(5)

Now if i > k, the first intersection above will be contained in the set Ack, which will have an empty intersection with Ak. If k > i, the argument is similar. Further, by construction Ai ⊂ Ai, so P (Ai) ≤ P (Ai) and we have

X i=1

P (Ai) ≤ X

i=1

P (Ai), establishing (b). ¤

There is a similarity between Boole’s Inequality and Bonferroni’s Inequality. If we apply Boole’s Inequality to Ac, we have

P (∪ni=1Aci) ≤ Xn

i=1

P (Aci),

and using the facts that ∪Aci = (∩Ai)c and P (Aci) = 1 − P (Ai), we obtain

1 − P (∩ni=1Ai) ≤ n − Xn

i=1

P (Ai).

This becomes, on rearranging terms,

P (∩ni=1Ai) ≥ Xn

i=1

P (Ai) − (n − 1),

which is a more general version of the Bonferroni Inequality.

3 Counting

Methods of counting are often used in order to construct probability assignments on finite sam- ple spaces, although they can be used to answer other questions also. The following theorem is sometimes known as the Fundamental Theorem of Counting.

Theorem 3.1 If a job consists of k separate tasks, the ith of which can be done in ni ways, i = 1, . . . , k, then the entire job can be done in n1× n2× · · · × nk ways.

Example 3.1 For a number of years the New York state lottery operated according to the following scheme. From the numbers 1,2, . . ., 44, a person may pick any six for her ticket. The winning number is then decided by randomly selecting six numbers from the forty-four. So the first number can be chosen in 44 ways, and the second number in 43 ways, making a total of 44 × 43 = 1892 ways of choosing the first two numbers. However, if a person is allowed to choose the same number twice, then the first two numbers can be chosen in 44 × 44 = 1936 ways.

(6)

The above example makes a distinction between counting with replacement and counting with- out replacement. The second crucial element in counting is whether or not the ordering of the tasks is important. Taking all of these considerations into account, we can construct a 2 × 2 table of possibilities.

Number of possible arrangements of size r from n objects Without replacement With replacement

Ordered (n−r)!n! nr

Unordered ¡n

r

¢ ¡n+r−1

r

¢

Let us consider counting all of the possible lottery tickets under each of these four cases.

Ordered, without replacement From the Fundamental Theorem of Counting, there are

44 × 43 × 42 × 41 × 40 × 39 = 44!

38! = 5, 082, 517, 440 possible tickets.

Ordered, with replacement Since each number can now be selected in 44 ways, there are

44 × 44 × 44 × 44 × 44 × 44 = 446= 7, 256, 313, 856

possible tickets.

Unordered, without replacement From the Fundamental Theorem, six numbers can be arranged in 6! ways, so the total number of unordered tickets is

44 × 43 × 42 × 41 × 40 × 39

6 × 5 × 4 × 3 × 2 × 1 = 44!

6!38! = 7, 059, 052.

Unordered, with replacement In this case, the total number of unordered tickets is 44 × 45 × 46 × 47 × 48 × 49

6 × 5 × 4 × 3 × 2 × 1 = 49!

6!43! = 13, 983, 816.

4 Enumerating outcomes

The counting techniques of the previous section are useful when the sample space S is a finite set and all the outcomes in S are equally likely. Then probabilities of events can be calculated by

(7)

sample space. Saying that all the outcomes are equally likely means that P ({si}) = 1?N for every outcome si. Then, we have, for any event A,

P (A) = X

si∈A

P ({si}) = X

si∈A

1

N = # of elements in A

# of elements in S.

Example 4.1 Consider choosing a five-card poker hand from a standard deck of 52 playing cards.

What is the probability of having four aces? If we specify that four of the cards are aces, then there are 48 different ways of specifying the fifth card. Thus,

P (four aces) = 48

¡52

5

¢ = 48

2, 598, 960. The probability of having four of a kind is

P (four of a kind) = 13 × 48

¡52

5

¢ = 624

2, 598, 960. The probability of having exactly one pair is

P (exactly one pair) = 13¡4

2

¢¡12

3

¢43

¡52

5

¢ = 1, 098, 240 2, 598, 960.

參考文獻

相關文件

Goal: predict surrounding words within a window of each word Objective function: maximize the log probability of any context word given the current center

Rudin Chapter 5, Problem 14.. Rudin Chapter 5,

In particular, if a n is a rational function or an algebraic function of n (involving roots of polynomials), then the series should be compared with a p-series.. The comparison

[This function is named after the electrical engineer Oliver Heaviside (1850–1925) and can be used to describe an electric current that is switched on at time t = 0.] Its graph

Notice that the Dirichlet function satisfies this criterion, since the set of dis- continuities is the set of rationals in [0, 1], which is

[r]

Find the power series representation of f in powers of x.. Find the radius

Here is