Vol. 2, No. 2, February 2002 International ed., pp. 279-287
A note on coupon collecting problem
Kuang-Chao Chang Fu Jen Catholic University
Abstract
In the classical coupon collecting problem (CCP), the collector collects one coupon at a time sequentially until a “complete set” consisting of one of each type is obtained. One of the main issues in CCP is the computation of the expected total number of coupons needed for the collector to obtain a complete set. We use a straightforward method, which is easy to understand, to compute the expected total number of coupons. We also propose a truncated form of CCP as the remedial consequence of a real example in the 1960’s.
Keywords:Coupon collecting problem, Poisson process, truncated coupon collecting problem, multiple inverse sampling.
□ Received November 2001, revised January 2002.
□ Kuang-Chao Chang is Associate Professor, Department of Statistics and Informa- tion Science, Fu Jen Catholic University, Taipei, Taiwan, ROC; e-mail: stat1016
@mails.fju.edu.tw.
□ This research is supported by the National Science Council in Taiwan Grant NSC-90-2118-M-030-003.
© 2002 Susan Rivers’ Cultural Institute, Hsinchu, Taiwan, Republic of China.
280 JPPS Vol. 2 No. 2 February 2002 International ed. pp. 279-287
1. Introduction
The coupon collecting problem (CCP), which can be found in many probability theory books (e.g., Parzen 1960, Feller 1968, Karr 1993, Durrett 1996, Ross 2000, etc.), is described as follows. There are L different types of coupons and a collector is seeking to collect one of each (say, to win some prize). The collector collects one coupon at a time continuously until at least one of each type is obtained. It is usually assumed that each type of coupon has a fixed probability of being collected and each time the collection of a coupon is independent from those previously obtained.
Research papers on CCP are surplus, e.g., Baum and Billingsley (1965), Rosen (1970), Sen (1979), Dawkins (1991), Hobbs and Read (1995), Read (1998), etc. One of the main issues in CCP is the computation of the expected total number of coupons needed for the collector to obtain a complete set consisting of one of each type of coupon, and this expected total number of coupons will be called the expected coupon number (ECN) in this paper. Let N denote the random variable of the total number of coupons needed for the collector to obtain a complete set and let >0 be the fixed probability of being selected for a type h coupon, h = 1,…, L, such that
Ph
1 =1
∑
= Lh Ph .
An exact formula for the computation of ECN is given in Ross (2000) as
∫ ∏
∞=
− −
−
=
0
1(1 )]
1 [ )
( L
h
t
P dt
e N
E h (1) which is derived by the method of Poisson process. Integrating out the above improper integral, we obtain
1
1
1 1
1
1 1
1
1 ( ) ( 1) ( ) ( 1)
) (
2 1 2
1
2 1
+
≤
≤
≤
≤
− −
=
= ≤ ≤ ≤
−
− − + + + − + −
=
∑ ∑ ∑ ∑
−
L
L h h
L
i h L
L
h h h L
h h h
L
i
P P
P P
N E
L
L
(2) which is exact in the sense of computation. However, the implementation of (2) into computer program is nontrivial and was treated carefully in Liu and Chang (2000).
The subtle and abstract Poisson process approach used in deriving (1) is difficult to understand for students and users of probability without strong mathematical backgrounds. Therefore, for teaching and education we use an alternative method to compute E(N), in Section 2 of this article, based on conditional expectation that is straightforward (though laborious) and less difficult. This alternative approach can also be used to compute the variance of N. In Section 3 we give a real example of CCP and consider a truncated form of CCP. In Section 4 we conclude this article by introducing an extended version of CCP referred to as multiple inverse sampling.
The contents of this article may be used as supplementary teaching material for teachers of probability, statistics, mathematics, and other related fields.
2. The alternative method for computing ECN
We first compute the ECN for the simple case that L = 2. Let U be the random variable defined by
⎩⎨
=⎧
otherwise ,
2
1, type of is collector by the
collected coupon
first the if , U 1
and let Yh ~ Geometric(Ph ), h = 1,2. Then,
E(N ) = E(N︱U = 1).Pr(U = 1) + E(N︱U = 2).Pr(U = 2) = [1 + E(Y2 )].P1 + [1 + E(Y1 )].P2
= [1 + (1/P2 )].P1 + [1 + (1/P1 )].P2
= 1 +
1 2
2 1
P P
PP + . (3) On the other hand, when L = 2, formula (1) gives
1 1 )] 1
1 )(
1 ( 1 [ ) (
2 0 1
2
1 − = + −
−
−
=
∫
∞ e− e− dt P PN
E Pt Pt . (4)
The equivalence between (3) and (4) can be easily verified as follows:
. 1 1 1
2 1
2 ) 1 (
1 1
2 1
2 1
2 1
2 1
2 1 2 2 1
2 1
2 2 2 1
1 2
2 1
− +
=
+ − +
=
− + +
= + +
= + +
P P
P P
P P
P P
P P P
P P P
P P P
P P P
Next, we consider the general case that L >2. Let S be the set of the first L coupons collected by the collector and let be the random number of type h coupons in S, h = 1,…, L . Define the following mutually exclusive and exhaustive events:
nh
1 2L −
} ,..., 1 , , 1
0 {n h h L
A = h = ∀ = ;
282 JPPS Vol. 2 No. 2 February 2002 International ed. pp. 279-287
L h h h
h h n
n n h h
A2( 1, 2)={ h1 = h2 =0 and h ≥1 ,∀ ≠ 1 , 2}, 1≤ 1 ≤ 2 ≤ ; andfor l = 3,…, L – 1,
. 1
}, ,..., ,
1 and 0 {
) ,...,
(h1 h n1 n n h h1 h h1 h2 h L
Al l = h =L= hl = h ≥ ∀ ≠ l ≤ ≤ ≤L≤ l ≤ .
Then, we compute the ECN by using the following decomposition:
E(N ) = E(N︱ ) Pr(A0 A0) +
∑
[N ︳= L
h
E
1
1
)]
( Pr[
)]
( 1 1 1
1 h A h
A
+ [N ︳
2 1
1 2
1
1 2
h h L
h L
h
E
<
−
= =
∑ ∑
A2(h1 ,h2)]Pr[A2(h1,h2)] + L+ [N︳ (5)
1 2 1
1 2 1
2
1 3
2 1
−
< −
<
= <= = −
∑ ∑ ∑
L L
h h h
h h
L
L h
E
L
L AL−1(h1 ,...,hL−1)]Pr[AL−1(h1,...,hL−1)].
It is clear that
E(N︱A0) = L , Pr(A0) = (L!)
∏
=L h Ph
1 ,
and E [N ︳A1(h1)]= L+(1/Ph1),h1 =1,...,L.
To compute E [N ︳A2(h1,h2)], we solve the equation
E [N ︳A2(h1 ,h2)]={E [ N ︳A2(h1 ,h2)]+1}(1−Ph1 −Ph2)+ E [N ︳
j i
i j
≠
= =
∑ ∑
21 2
1
{ A1(hi)]+1}Phj
and obtain
E [N ︳ 1 1 , 1 .
)]
,
( 1 2 1 2
2
1 2
2 1
2 1
L h P h
P P P P
L P h h A
h h
h h
h h
≤
<
⎟ ≤
⎠
⎜ ⎞
⎝
⎛ + +
+ +
=
In general, for l = 3,…, L – 1, E[N ︳ can be computed by solving the recursive equation
)]
..., ,
( 1 l
l h h
A
E[N ︳Al(h1,...,hl)]
={E[N ︳ ( ,..., )] 1}(1 ) + {E[N ︳
1 + −
∑
l=1i h
l
l h h Pi
A Al−1(h2,...,hl)]+1}Ph1
+ {E[N ︳Al−1(h1,h3,...,hl)]+1}Ph2 + L {E[N ︳+ Al−1(h2 ,...,hl−1)]+1}Phl . (6)
As for the probabilities Pr[Al(h1,...,hl)], l = 1,…, L – 1, they can be calculated as
follows:
Pr[Al(h1,...,hl)]
∑ ∑ ∑
≠ = =
= − − −
−
= L
h h j
j
j l
i h
l
i h
l
i i
L
L P P
P
,..., 1
1 1
1 1 1
1) (
)
(1 1
+
2 1
1 2 1
1 2
,..., ,
1
1 2
(
j j
h h j j
L
j L
j
< l
≠
−
= =
∑ ∑
1 −∑
li=1Phi −Pj1 −Pj2 )L + L +∑
≠ =
−
− − L
h h h
h L h l
L
l
P
,..., 1 1
1
) 1
( .
(7) Using (5), (6), and (7), we obtain, when L = 3,
⎥⎥
⎥
⎦
⎤
⎢⎢
⎢
⎣
⎡
−
− +
=
∑ ∑
≠=
=
3
1 3 3
3
1
1 1
1 1
) 1 1 ( 3
) (
h h h
h h
h h
P P P
N E
+
2
1 3
2
2 1
1 2
(
h h
h h
= < =
∑ ∑
)( )( 1 2)31 2
2 1
2 1
1 1 1
h h h
h
h h
h h
P P P
P P P P
P + + − −
+ ,
and
⎥⎥
⎥
⎦
⎤
⎢⎢
⎢
⎣
⎡
+
−
−
−
− +
=
∑ ∑ ∑
≠=
≠=
=
4
1 4 4
1
4 4
4
1
1 1
1 1
1 1
) 1
( )
1 1 ( 4
) (
h h h
h
h h h
h h h
h h
P P
P P P
N E
2
1 3
2
2 1
1 2
(
h h
h h
=< =
∑ ∑
+ + + − − −
+
)4
)[(
)( 1 2
1 2
2 1
2 1
1 1 1
h h h
h
h h
h h
P P P
P P P P
P ]
4
, 1
4
2 1
∑
≠ = h h h
h
Ph
+
3 2 1
1 2 3
2
1 3
2 4
3
(
h h h
h h h
<
= <= =
∑ ∑ ∑ ∑
⎢⎣⎡ + + + + + + + +=
− 1 (1 ) (1 )
3 1
1 3
3 1
2
3 2
2 3
3 2
1
3
1
) 1
h h
h h
h h
h
h h
h h
h h
h
i
h P
P P P P
P P P
P P P P
P Pi P
3 4
1
1 ) 1
(
2 1
1 2
2 1
3 ⎟
⎠
⎜ ⎞
⎝
⎛ −
⎥⎦⎤ + + +
+
∑
= i
h h
h
h h
h h
h
Pi
P P P P P
P P
when L = 4. If L > 4, the formulas for E(N) are tedious so we will not display them.
The variance of N can be computed by
2
2) [ ( )]
( )
(N E N E N
Var = − ,
whereE(N2)will be decomposed in the same way as (5). We leave the derivation of
284 JPPS Vol. 2 No. 2 February 2002 International ed. pp. 279-287
3. The truncated coupon collecting problem 3.1 A real example of CCP
During the mid-1960’s, when the author of this article was an elementary school student, there was a very popular snack amongst children and youngsters in Taiwan-
the Princess Snow White Chewing Gum. Inside each package is the gum and a card with a number on its backside. On the front side of each card is a particular figure from the well-known Chinese ancient novel “The Romance of the Three Kingdoms”.
There are altogether 100 different figures, therefore the numbers on the backside of the cards are integers numbered from 1 to 100. If a card collector can obtain a complete set of cards consisting of all 100 different figures, he or she will win a big prize from the chewing gum company. However, most of the card collectors stopped with disappointment in the sequel because some of the cards such as #6, #15, #28, #42,
#56, and #98 were very rare. Among those rare cards, #98 seemed like it never appeared (and therefore #98 was termed the “dead card”). Of course, the author was a typical card collector (and there were many other collectors in his class in elemen- tary school) and he used to show off, with the rare card #56 in his hand, in front of his classmates. The above chewing gum story is a real example of CCP in which L = 100 and the probabilityP98 is extremely small.
3.2 The truncated coupon collecting problem
In the above example, most collectors were not able to obtain the complete set due to the rare cards and the dead card. One possible outcome of this case was that the chewing gum company might gradually lose its market because some of the collectors would finally become impatient and discontinued their collection. As a remedy for this problem, the chewing gum company may consider modifying the marketing strategy:In addition to the big prize given to the collectors of the
“complete set”, the company will offer a second prize for any collector who has collected M cards, where M >L, without a complete set. Therefore, the impatient collectors with enough cards in their collection will at least win the second prize as long as their cards meet the required number. Consequently, after winning the second prize, the impatient collectors will repeat playing the game and start collecting coupons (or cards) again. We will call this modified version of CCP the truncated coupon collecting problem (TCCP) with truncation point at M. In the following, we compute the ECN in TCCP, assuming that all collectors are impatient (i.e., all the collectors will trade their coupons for the second prize as long as they have collected M coupons and the collection does not make a complete set). In other words, we compute the expected total number of coupons needed for the collector to obtain
either a complete set or an incomplete set of M coupons. Note that a collector can win the big prize, under the assumption that all collectors are impatient, only if a complete set is obtained before or exactly when the collector has collected M coupons.
We first introduce the following theorem in Chang (2001) without proof.
Theorem 1 Let Z have a negative binomial distribution with p.m.f.
L , 1 , 1 ,
) 1
( ⎟⎟⎠ = +
⎜⎜ ⎞
⎝
⎛
−
= − p q − z m m
m z z
f m z m
and let Z~
be the random variable defined by Z~
= min{Z, M} where M is a positive integer and M ≧ m. Then, the expected value of Z~
, denoted by ~( , , ) p M m
E , is
) , ,
~(
p M m
E =
∑∑
=
−
=
⎟⎟ −
⎠
⎜⎜ ⎞
⎝
− m ⎛
i i
j
j M jq j p M p
p m
1 1
0
1 , if M > m
= m , if M = m.
For the special case that m = 1, Theorem 1 gives )
, , 1
~(
p M
E 1(1 M) p −q
= .
Let denote the random variable of the total number of coupons needed for the collector to stop collecting coupons for either prize in TCCP. Using the technique in Section 2 again, we have for the simple case L = 2
Nt
E(Nt ) = E(Nt ︳U = 1).Pr(U = 1) + E(Nt ︳U = 2).Pr(U = 2) = [1 + ~(1, 1, )]. + [1 + ].
P2
M
E − P1 ~(1, 1, ) P1
M
E − P2
= [1 + 1 (1 1)
1 2
−PM−
P ].P1 + [1 + 1 (1 1)
2 1
−PM−
P ].P2
= 1 + (1 1 1)
2
1 −PM−
P
P + (1 2 1)
1
2 −PM−
P
P .
If L = 3, we use (5) with N replaced byNt :
E(Nt ) = E(Nt ︳ )Pr(A0 A0) +
∑
[ ︳= 3
1 h1
E Nt A1(h1)]Pr[A1(h1)]
+ [ ︳ ,
2 1
1 2
2
1 3
2 h h
h h
E
= <=
∑ ∑
Nt A2(h1 ,h2)]Pr[A2(h1,h2)]286 JPPS Vol. 2 No. 2 February 2002 International ed. pp. 279-287
E(Nt ︳ ) = 3; A0
Pr(A0) = (3!)
∏
= 31 h P ; h
E[Nt ︳A1(h1)] ~(1, 3, ) 3+E M − Ph1
= 1 [1 (1 ) ]
3 1 3
1
− −
− +
= h M
h
P P , h1 =1 ,...,3;
3 ..., , 1 , )
1 ( )]
(
Pr[ 1
3
1 3 3
1 1
1
1 − =
−
=
∑
≠=
h P P
h A
h h h
h
h ;
and Pr[A2(h1 ,h2)]=(1−Ph1 −Ph2)3, 1≤h1 <h2 ≤3. To compute E [Nt ︳A2(h1,h2)], we solve the recursive equation
E[Nt ︳A2(h1,h2)]
= { E [Nt ︳A2(h1 ,h2)]+1}(1−Ph1 −Ph2)+ E[ ︳
j i
i j
≠
= =
∑ ∑
21 2
1
{ Nt A1(hi)]+1}Phj
and obtain
E[Nt ︳ ⎟
⎠
⎜ ⎞
⎝
⎛ + − − + − −
+ +
= 1 1 [1 (1 ) − ] [1 (1 ) − ]
3 )]
,
( 1 2 4 4
2 1
1 2 2
2 1
2 1
M h h
M h h h
h
h h
P P P P
P P P
h P h
A ,
3 1≤h1 <h2 ≤ . The derivation for L > 3 is similar, but the formula for E( ) is tedious so we do not display it here to save space. Also, the computation of Var( ) is complicated and will not be discussed in this article.
Nt
Nt
4. Conclusion
We conclude this article by introducing a sampling scheme referred to as multiple inverse sampling proposed by Chang, Liu, and Han (1998). Multiple inverse sampling (MIS) is a sequential sampling procedure in a stratified population such that random samples are taken continuously until a specified minimum number of observations are obtained in each stratum. In Chang, Liu, and Han (1998), the MIS procedure was used to solve the empty post-strata problem in small sample post-stratification. If the specified minimum number of observations is 1 for each stratum, the MIS procedure reduces to the coupon collecting problem. Therefore, the MIS procedure may be considered as an extended version of CCP such that a
“complete set” may consist of more than L coupons. The alternative method given in Section 2 can also be used to find the expected sample size in the MIS procedure.
On the other hand, since the MIS procedure does not have control over the total
sample size (especially when some of the stratum weights are small), Chang, Han, and Hawkins (1999) modified the MIS by truncating the sampling procedure when the total sample size reaches a specified maximum number.
Acknowledgements
The author would like to thank Professor Chien-Pai Han and Professor Paul C.
Chiou for their careful reading and helpful comments.
References
Baum, L. E. and Billingsley, P. (1965). Asymptotic distributions for the coupon collector’s problem, Annals of Mathematical Statistics, 36, 1835-1839.
Chang, K. C. (2001). An inductive proof for a closed form formula in truncated inverse sampling, Journal of Propagations in Probability and Statistics, 2, 1, 117-122.
Chang, K. C., Han, C. P., and Hawkins, D. L. (1999). Truncated multiple inverse sampling in post-stratification, Journal of Statistical Planning and Inference, 76, 215-234.
Chang, K. C., Liu, J. F., and Han, C. P. (1998). Multiple inverse sampling in post- stratification, Journal of Statistical Planning and Inference, 69, 209-227.
Dawkins, B. (1991). Siobhan’s problem: the coupon collector revisited, The Ameri- can Statistician, 45, 76-82.
Durrett, R. (1996). Probability: Theory and Examples, 2nd ed., Duxbury Press.
Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd ed., John Wiley.
Hobbs, D. and Read, K. (1995). A classroom approach to the collector’s problem, Teaching Mathematics and Its Applications, 14(1), 6-14.
Karr, A. F. (1993). Probability . Springer-verlag.
Liu, J. F. and Chang, K. C. (2000). A note on multiple inverse sampling, Journal of Statistical Planning and Inference, 87, 347-352.
Parzen, E. (1960). Modern Probability Theory and Its Applications, John Wiley . Read, K. L. Q. (1998). A lognormal approximation for the collector’s problem, The
American Statistician, 52, 175-180.
Rosen, B. (1970). On the coupon collector’s waiting time, Annals of Mathematical Statistics, 41, 1952-1969.
Ross, S. M. (2000). Introduction to Probability Models, 7th ed., Academic Press.
288 JPPS Vol. 2 No. 2 February 2002 International ed. pp. 279-287
martingale approach, Annals of Statistics, 7, 372-380.