• 沒有找到結果。

Balls and bins

在文檔中 ALGORITHMS INTRODUCTION TO (頁 154-160)

Third Edition

5.3 Randomized algorithms

5.4.2 Balls and bins

Consider a process in which we randomly toss identical balls into b bins, numbered 1; 2; : : : ; b. The tosses are independent, and on each toss the ball is equally likely to end up in any bin. The probability that a tossed ball lands in any given bin is 1=b.

Thus, the ball-tossing process is a sequence of Bernoulli trials (see Appendix C.4) with a probability 1=b of success, where success means that the ball falls in the given bin. This model is particularly useful for analyzing hashing (see Chapter 11), and we can answer a variety of interesting questions about the ball-tossing process.

(Problem C-1 asks additional questions about balls and bins.)

How many balls fall in a given bin? The number of balls that fall in a given bin follows the binomial distribution b.kI n; 1=b/. If we toss n balls, equation (C.37) tells us that the expected number of balls that fall in the given bin is n=b.

How many balls must we toss, on the average, until a given bin contains a ball?

The number of tosses until the given bin receives a ball follows the geometric distribution with probability 1=b and, by equation (C.32), the expected number of tosses until success is 1=.1=b/ D b.

How many balls must we toss until every bin contains at least one ball? Let us call a toss in which a ball falls into an empty bin a “hit.” We want to know the expected number n of tosses required to get b hits.

Using the hits, we can partition the n tosses into stages. The i th stage consists of the tosses after the .i  1/st hit until the i th hit. The first stage consists of the first toss, since we are guaranteed to have a hit when all bins are empty. For each toss during the i th stage, i  1 bins contain balls and b  i C 1 bins are empty. Thus, for each toss in the i th stage, the probability of obtaining a hit is .b  i C 1/=b.

Let ni denote the number of tosses in the i th stage. Thus, the number of tosses required to get b hits is n D Pb

i D1ni. Each random variable ni has a geometric distribution with probability of success .b  i C 1/=b and thus, by equation (C.32), we have

E Œni D b b  i C 1 :

By linearity of expectation, we have E Œn D E

It therefore takes approximately b ln b tosses before we can expect that every bin has a ball. This problem is also known as the coupon collector’s problem, which says that a person trying to collect each of b different coupons expects to acquire approximately b ln b randomly obtained coupons in order to succeed.

5.4.3 Streaks

Suppose you flip a fair coin n times. What is the longest streak of consecutive heads that you expect to see? The answer is ‚.lg n/, as the following analysis shows.

We first prove that the expected length of the longest streak of heads is O.lg n/.

The probability that each coin flip is a head is 1=2. Let Ai k be the event that a streak of heads of length at least k begins with the i th coin flip or, more precisely, the event that the k consecutive coin flips i; i C 1; : : : ; i C k  1 yield only heads, where 1  k  n and 1  i  nk C1. Since coin flips are mutually independent, for any given event Ai k, the probability that all k flips are heads is

PrfAi kg D 1=2k : (5.8)

For k D 2dlg ne,

PrfAi;2dlg neg D 1=22dlg ne

 1=22 lg n D 1=n2;

and thus the probability that a streak of heads of length at least 2dlg ne begins in position i is quite small. There are at most n  2dlg ne C 1 positions where such a streak can begin. The probability that a streak of heads of length at least 2dlg ne begins anywhere is therefore

since by Boole’s inequality (C.19), the probability of a union of events is at most the sum of the probabilities of the individual events. (Note that Boole’s inequality holds even for events such as these that are not independent.)

We now use inequality (5.9) to bound the length of the longest streak. For j D 0; 1; 2; : : : ; n, let Lj be the event that the longest streak of heads has length ex-actly j , and let L be the length of the longest streak. By the definition of expected value, we have

E ŒL D Xn j D0

j Pr fLjg : (5.10)

We could try to evaluate this sum using upper bounds on each PrfLjg similar to those computed in inequality (5.9). Unfortunately, this method would yield weak bounds. We can use some intuition gained by the above analysis to obtain a good bound, however. Informally, we observe that for no individual term in the sum-mation in equation (5.10) are both the factors j and PrfLjg large. Why? When j  2 dlg ne, then Pr fLjg is very small, and when j < 2 dlg ne, then j is fairly small. More formally, we note that the events Lj for j D 0; 1; : : : ; n are disjoint, and so the probability that a streak of heads of length at least 2dlg ne begins any-where isPn

The probability that a streak of heads exceeds rdlg ne flips diminishes quickly with r. For r  1, the probability that a streak of at least rdlg ne heads starts in position i is

PrfAi;rdlg neg D 1=2rdlg ne

 1=nr :

Thus, the probability is at most n=nr D 1=nr1 that the longest streak is at least rdlg ne, or equivalently, the probability is at least 1  1=nr1that the longest streak has length less than rdlg ne.

As an example, for n D 1000 coin flips, the probability of having a streak of at least 2dlg ne D 20 heads is at most 1=n D 1=1000. The chance of having a streak longer than 3dlg ne D 30 heads is at most 1=n2D 1=1,000,000.

We now prove a complementary lower bound: the expected length of the longest streak of heads in n coin flips is .lg n/. To prove this bound, we look for streaks

of length s by partitioning the n flips into approximately n=s groups of s flips each. If we choose s D b.lg n/=2c, we can show that it is likely that at least one of these groups comes up all heads, and hence it is likely that the longest streak has length at least s D .lg n/. We then show that the longest streak has expected length .lg n/.

We partition the n coin flips into at least bn= b.lg n/=2cc groups of b.lg n/=2c consecutive flips, and we bound the probability that no group comes up all heads.

By equation (5.8), the probability that the group starting in position i comes up all heads is

PrfAi;b.lg n/=2cg D 1=2b.lg n/=2c

 1=p n :

The probability that a streak of heads of length at leastb.lg n/=2c does not begin in position i is therefore at most 1  1=p

n. Since the bn= b.lg n/=2cc groups are formed from mutually exclusive, independent coin flips, the probability that every one of these groups fails to be a streak of lengthb.lg n/=2c is at most

1  1=p

n bn=b.lg n/=2cc



1  1=p

n n=b.lg n/=2c1



1  1=p

n 2n= lg n1

 e.2n= lg n1/=p n

D O.e lg n/ D O.1=n/ :

For this argument, we used inequality (3.12), 1 C x  ex, and the fact, which you might want to verify, that .2n= lg n  1/=p

n  lg n for sufficiently large n.

Thus, the probability that the longest streak exceedsb.lg n/=2c is Xn

j Db.lg n/=2cC1

PrfLjg  1  O.1=n/ : (5.11)

We can now calculate a lower bound on the expected length of the longest streak, beginning with equation (5.10) and proceeding in a manner similar to our analysis of the upper bound:

E ŒL D

As with the birthday paradox, we can obtain a simpler but approximate analysis using indicator random variables. We let Xi k D I fAi kg be the indicator random variable associated with a streak of heads of length at least k beginning with the i th coin flip. To count the total number of such streaks, we define

X D

nkC1X

i D1

Xi k :

Taking expectations and using linearity of expectation, we have E ŒX  D E

By plugging in various values for k, we can calculate the expected number of streaks of length k. If this number is large (much greater than 1), then we expect many streaks of length k to occur and the probability that one occurs is high. If

this number is small (much less than 1), then we expect few streaks of length k to occur and the probability that one occurs is low. If k D c lg n, for some positive constant c, we obtain

E ŒX  D n  c lg n C 1 2c lg n D n  c lg n C 1

nc

D 1

nc1 .c lg n  1/=n nc1 D ‚.1=nc1/ :

If c is large, the expected number of streaks of length c lg n is small, and we con-clude that they are unlikely to occur. On the other hand, if c D 1=2, then we obtain E ŒX  D ‚.1=n1=21/ D ‚.n1=2/, and we expect that there are a large number of streaks of length .1=2/ lg n. Therefore, one streak of such a length is likely to occur. From these rough estimates alone, we can conclude that the expected length of the longest streak is ‚.lg n/.

在文檔中 ALGORITHMS INTRODUCTION TO (頁 154-160)