E. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, eds.
ADAPTIVE MONTE CARLO METHODS FOR RARE EVENT SIMULATIONS
Ming-hua Hsieh
Department of Management Information Systems National Chengchi University
Wenshan, Taipei 11623, TAIWAN
ABSTRACT
We review two types of adaptive Monte Carlo methods for rare event simulations. These methods are based on im-portance sampling. The first approach selects imim-portance sampling distributions by minimizing the variance of im-portance sampling estimator. The second approach selects importance sampling distributions by minimizing the cross entropy to the optimal importance sampling distribution. We also review the basic concepts of importance sampling in the rare event simulation context. To make the basic concepts concrete, we introduce these ideas via the study of rare events of M/M/1 queues.
1 INTRODUCTION
Rare events, although are seldom happened as its name suggests, are important when they do happen in many ap-plication areas. For example, the buffer overflow event is rare in a high quality telecommunication network, but is sig-nificant when it happens; A system break down event is rare in a fault-tolerant computing system, but has a consequen-tial effect when it happens. Therefore, accurate estimation of the probabilities of such rare events is important. How-ever, if the probabilities of rare events are really small, estimation of these probabilities is often computationally intractable when studied using conventional Monte Carlo simulation. Therefore, powerful efficiency improvement techniques (see, e.g. (Glynn (1994)) and (Bratley, Fox, and Schrage (1987))) are needed. The most suitable technique for rare event simulations is importance sampling (see, e.g. Hammersley and Handscomb (1965), and we will review the basic concepts in Section 3.) When applied appropri-ately, importance sampling can improve the efficiency many orders of magnitude. Unfortunately, it is not a simple task to apply importance sampling appropriately on rare event simulations. The main difficulty lies in the selection of an effective importance sampling distribution. The selection
usually requires simulationists have a good understanding of the structure of the system being simulated.
To overcome this difficulty, researchers have developed adaptive procedures for selecting effective importance sam-pling distributions. This paper’s main purpose is to review these procedures. We reviewed two adaptive importance sampling methods in this paper. The first method selects effective importance sampling distributions by minimizing the variance of importance sampling estimator. The second method selects effective importance sampling distributions by minimizing the cross entropy to the optimal importance sampling distribution (see Section 3.2). For applications of these methods in selecting effective importance sampling distributions for queueing models and financial derivative models, see the citations of Section 4.
The rest of the paper is organized as follows. In Section 2, we describe a simple M/M/1 queue model and the rare event of interest. In Section 3, we review the basic con-cepts of importance sampling via the study of rare events of M/M/1 queues. Several criteria of evaluating the goodness of importance sampling estimators are given. In section 4, we reviewed two types of adaptive importance sampling methods for rare event simulations. The first method se-lects importance sampling distributions by minimizing the variance of importance sampling estimator. The second one selects importance sampling distributions by minimiz-ing the cross entropy to the optimal importance samplminimiz-ing distribution. Finally, the paper is summarized in Section 5. This section remarks some properties of these two types of methods.
2 A SIMPLE RARE EVENT SIMULATION PROBLEM
Let us consider a stable M/M/1 queue with arrival rateλ = 1, service rateµ > 1, and the buffer limit of the system K > 1, see Figure 1. Let X(t) be the number of customers in the system at timet. Then, X = (X(t) : t ≥ 0) is a continuous time Markov chain with state space S = {0, 1, 2, . . . K}.
Suppose that X(0) = 1 and we are interested in using importance sampling to estimate
γK = P (X(T ) = K), (1)
where
T = inf{t > 0 : X(t) = 0 or X(t) = K}.
It is well known thatX is an irreducible, positive recurrent Markov chain and T is a stopping time, also known as Markov time (see p. 255 & p. 318 of Karlin and Taylor (1975)). 0 µ 1 λ -µ 2 · · · λ -µ K-1 λ- K
Figure 1: Transition Diagram of the M/M/1 Queue Example If K is large, then γK is a small number. Thus, estimatingγK in (1) is a rare event simulation problem.
For this simple problem, the analytical solution forγK is known. In particular,
γK = µµ − 1K− 1. (2)
Of course with a known analytical result there is no need to simulate. We just use this problem as a vehicle for the basic ideas to follow. Throughout this paper, the concepts of importance sampling and adaptive importance sampling methods will be introduced via this simple problem. 3 THE BASIC CONCEPTS OF
IMPORTANCE SAMPLING
3.1 Importance Sampling
Since X is an irreducible, positive recurrent chain and T is a stopping time, it is well known that P ({ω : T (ω) < ∞}) = 1. Thus, without loss of generality, we can choose = {ω : T (ω) < ∞} being the sample space. Let f (·) denote the (original) density function ofX given X(0) = 1 and let
A = {ω : X(T (ω)) = K}. Then, we can representγK as
Z
I (ω ∈ A)f (ω)dω = Ef[I (A)],
whereI (·) is the indicator function.
To estimate γK via simulation, the direct approach would be to generate n independent sample paths,
ω1, . . . , ωn, from the probability density functionf (·) and
form the estimator ˆγK = 1n
n
X
k=1
I (ωk∈ A).
By the central limit theorem, √
n( ˆγK− γK) ⇒pγK(1 − γK)N(0, 1),
asn → ∞, where N(0, 1) denotes a normal random variable with mean 0 and variance 1. Thus, to construct a 95% confidence interval for γK with relative half-length of 1%, we need sample sizen ≈ 1/(0.012)×1.962×(1−γK)/γK. Therefore, if γK = 10−9, then n ≈ 3.84 × 1013. This demonstrates the main problem of rare event simulations.
Let g(·) be a density function such that ω ∈ A and f (ω) > 0 implies g(ω) > 0. Then we have another representation for γK: γK = Z I (ω ∈ A)f (ω) g(ω)g(ω)dω = Z I (ω ∈ A)Lf,g(ω)g(ω)dω = Eg[I (A)Lf,g], (3) whereLf,g(ω) = f (ω)/g(ω) is called the likelihood ratio andEg(·) denotes the expectation under g(·).
Identity (3) suggests an alternative estimation scheme: generaten samples, ω1, . . . , ωn, fromg(·). By (3),
ˆγKg =n1 n
X
k=1
Lf,g(ωk)I (ωk ∈ A) (4)
is an unbiased estimate of γK. This alternative estimation scheme is called importance sampling. To apply impor-tance sampling to more general stochastic systems including discrete-time Markov chains (DTMC’s), continuous-time Markov chains (CTMC’s), and generalized semi-Markov processes (a mathematical formalization of discrete-event simulations), consult Glynn and Iglehart (1989).
Before proceeding to next subsection, let us derive the explicit formula of the likelihood ratio for our M/M/1 queue example. A typical sample path ω of X is
((X0, h0), (X1, h1), . . . , (XN−1, hN−1), XN)
whereN is the number of jumps before the stochastic process X hits 0 or K, X0, X1, · · · , XN is the sequence of states
of the embedded discrete time Markov chain, andhnis the holding time in stateXnforn = 0, 1, . . . , N − 1. Thus, for the probability density functionf (ω) of the sample path ω of the process X for which Pf(·, ·) denotes the transition
probabilities of the embedded discrete time Markov chain, we have f (ω) = N−1Y n=0 (1 + µ)e−(1+µ)hnPf(Xn, Xn+1). (5)
Now, let us consider a generalized M/M/1 queue, whose arrival rates and service rates vary. In particular, its arrival rate is λk and service rate is µk when it is at state k, k = 1, 2, · · · , K − 1; see Figure 2. For such a generalized M/M/1 queue, the densityg(ω) is equal to
N−1Y n=0
(λXn+ µXn)e−(λXn+µXn)hnPg(Xn, Xn+1), (6)
wherePg(·, ·) is the transition probabilities of the embedded DTMC of this generalized M/M/1 queue.
0 µ1 1 λ1 -µ2 2 · · · λK−2 -µK−1 K-1 λK−1- K
Figure 2: Transition Diagram of the Generalized M/M/1 Queue Example
Thus, if we choose such a generalized M/M/1 to do importance sampling, then the likelihood ratio of ω
Lf,g(ω) = QN−1 n=0(1 + µ)e−(1+µ)hnPf(Xn, Xn+1) QN−1 n=0(λXn+ µXn)e−(λXn+µXn)hnPg(Xn, Xn+1) . (7)
See Glynn and Iglehart (1989) for explicit formulas of likelihood ratios for a variety of more general stochastic processes.
3.2 The Optimal Importance Sampling Distribution
Is importance sampling always better than direct simulation? This answer depends on the choice of importance sampling distribution g. An ideal g is a distribution which has the property Varg[I (A)Lf,g] Varf[I (A)], where Varg(·) and Varf(·) denote the variance under distributions g(·) and f (·), respectively. And the bestg is a distribution which has the property Varg[I (A)Lf,g] = 0.
Definition 1 A distributiong is called the optimal im-portance sampling distribution of distributionf if
Varg[I (A)Lf,g] = 0, whereA is the rare event of interest.
Does suchg exist? If we define the following probability density function on the sample path ω,
g∗(ω) = f (ω)I (ω ∈ A)γ
K , (8)
then we have
Eg∗[(I (A)Lf,g∗)2] = Ef[I (A)Lf,g∗]
= Z I (ω ∈ A)gf (ω)∗(ω)f (ω)dω = γ2 K. Thus, Varg∗[I (A)Lf,g∗] =
Eg∗[(I (A)Lf,g∗)2] − E2g∗[I (A)Lf,g∗] = 0.
Therefore,g∗is the optimal importance sampling distribu-tion.
In most practical problems, the optimal importance sampling distribution is not achievable, since it contains the quantityγK, which is unknown. However, it is computable in our simple problem. We will demonstrate how to compute the optimal importance sampling distribution in this simple M/M/1 queue example.
Example 1 [The Optimal Importance Sampling Distribution] From (8), it is easy to see thatLf,g∗(ω) = γK (a constant) for ω ∈ A. Now, consider two sample paths ω1,ω2∈ A
ω1= ((1, 1), (2, 1), . . . ,
(k, 1), (k + 1, 1), · · · , (K − 1, 1), K)
and
ω2= ((1, 1), . . . , (k, 1), (k + 1, 1),
(k, 1), (k + 1, 1) · · · , (K − 1, 1), K)
where 2≤ k ≤ K − 2. If we choose a generalized M/M/1 queue to do importance sampling, then by (7), we have
Lf,g(ω1) = e−(K−1)(1+µ) QK−1 n=1 λne−(λn+µn) and Lf,g(ω2) = µe−(K+1)(1+µ) λke−(λk+µk)µk+1e−(λk+µk)QK−1n=1 λne−(λn+µn).
We can substantially simplify our expressions forLf,g(ω2)
andLf,g(ω2) by setting
λk+ µk= 1 + µ, 1 ≤ k ≤ K − 1;
and this yields
Lf,g(ω1) = 1 QK−1 n=1 λn Lf,g(ω2) = µ λkµk+1QK−1n=1 λn. It is easy to seeLf,g(ω1) = Lf,g(ω2) if µk+1= µ/λk.
Therefore, the recursion µk= λµ
k−1, λk = 1 + µ − µk, 2 ≤ k ≤ K − 1
is necessary for a generalized M/M/1 queue being the optimal importance sampling distribution. With suitably chosen boundary condition, above recursion does define the optimal importance sampling distribution:
µ1= 0, λ1= 1 + µ
µk =λµ
k−1, λk = 1 + µ − µk, 2 ≤ k ≤ K − 1. (9)
Note that µ1= 0 is necessary for the generalized M/M/1
queue to serve as the optimal importance sampling dis-tribution. Since, otherwise there exists ω 6∈ A such that P (ω) > 0.
3.3 Asymptotically Optimal Importance Sampling Distributions
In the queueing and random walks literature, there is a notion called asymptotically optimal for measuring the ef-fectiveness of an importance sampling distribution in the rare event simulation context. See, e.g. Siegmund (1976), Lehtonen and Nyrhinen (1992), and Heidelberger (1995). Definition 2 LetAK = {ω : X(T (ω)) = K}. If
lim
K→∞
logEg[I (AK)L2f,g]
logγK = 2, (10)
we callg an asymptotically optimal importance sampling distribution. Note that Eg[I (AK)L2f,g] = Var[I (AK)Lf,g] + (Eg[I (AK)Lf,g])2 = Var[I (AK)Lf,g] + γK2 ≥ γ2 K.
In view of the preceding development, we obtain the fol-lowing inequality,
logEg[I (AK)L2f,g] ≥ 2 log γK, for all valid distribution g. Letting K → ∞ yields
lim inf
K→∞
logEg[I (AK)L2f,g] logγK ≥ 2.
Thus, asymptotically optimal importance sampling distri-butions are optimal on the logarithmic scale.
Example 2 [Asymptotical Optimality]
Let us consider a M/M/1 queue with with arrival rateµ and service rate 1, i.e., by switching the service rate and arrival rate of the original M/M/1 queue. If we use such a M/M/1 queue to do importance sampling, then
Lf,g(ω) = µK−11 ,
for allω ∈ A, i.e., Lf,g(ω) is constant for all ω ∈ A. Now,
Eg[I (AK)L2f,g] = 1 µ2(K−1)Eg[I (AK)] = 1 µK−1 µ − 1 µK− 1 γK = µµ − 1K− 1. Thus, lim K→∞ logEg[I (AK)L2f,g] logγK = 2.
Such a change-of-measure is, therefore, an asymptotically optimal importance sampling distribution. This type of sim-ple but effective change-of-measures exist in more general setting, see Heidelberger (1995) for a complete survey.
It is interesting to note that if the boundary condition of (9) is set to µ1= 1 and λ1 = µ, the resulting M/M/1
queue is this asymptotically optimal one. 3.4 Bounded Relative Error
Definition 3 LetAK = {ω : X(T (ω)) = K}. If lim sup K→∞ p Varg[I (AK)Lf,g] γK < ∞ (11)
we call importance sampling distribution g has bounded relative error property.
Importance sampling distributions with bounded rela-tive error property are desirable, since they require only a finite number of samples to construct a confidence interval with a given precision, no matter how small the γK is.
Example 3 [Bounded Relative Error]
The importance sampling distribution of Example 2 has bounded relative error property, since
p Varg[I (AK)Lf,g] γK = s uK−1− 1 µK−1(µ − 1) < 1 √ µ − 1. 4 ADAPTIVE IMPORTANCE SAMPLING METHODS
To use importance sampling in rare event simulations, it is good to have an importance sampling distribution with asymptotical optimality or bounded relative error property. However, to have such an importance sampling distribution, it is usually required to have a good understanding of the large deviation behavior of the rare event of interest; this is the major obstacle to wide-spread application of importance sampling in rare event simulations. (Large deviations is a body of asymptotic theory which may be used to obtain the rare event asymptotics that we are interested in; cf. Bucklew (1990) and Dembo and Zeitouni (1993) for general background.) Thus, it is nice to have more automatical ways of finding good importance sampling distributions, regardless of what rare event problem we are interested. In this section, we review two types of adaptive importance sampling methods, which are to serve this need.
The effectiveness of the importance sampling estimator depends on the choice of importance sampling distribution g. To make the selection simpler, we usually only consider a family of importance sampling distributions parameterized by θ ∈ 2 ⊆ <d; e.g., the family of exponential twisting distributions.
4.1 Approach via Minimizing Estimator’s Variance
The most direct measure of effectiveness of an estimator is its variance. From Section 3, we know the variance of an im-portance sampling estimator is 1/m times Varg[I (A)Lf,g] if the estimator is computed by m independent copies of I (A)Lf,gsampling from the importance sampling distribu-tion g. Let fθ denote the family of distributions. Then, selecting the best importance sampling distribution fromfθ can be formulated as
min
θ Varfθ[I (A)Lθ], (12)
whereLθ(ω) = f (ω)/fθ(ω). But
Varfθ[I (A)Lθ] = Efθ[I (A)L2θ] − (Efθ[I (A)Lθ])2 = Efθ[I (A)L2θ] − γK2.
Thus, the variance-minimization problem (12) is is easily seen to be equivalent to
min
θ Efθ[I (A)L
2
θ]. (13)
How does one compute an (approximate) minimizer of (13)? Since (13) is a stochastic optimization problem, traditional stochastic approximation algorithms, Robbins-Monro algorithm (Robbins and Robbins-Monro (1951)) and Kiefer-Wofowitz algorithm (Kiefer and Wolfowitz (1952)) comes naturally for use.
R-M algorithm basically is the following recursion θn+1= 52(θn−n + 1a ∇h(θc n)), (14)
where52is the projection operator onto2, h(·) is the objec-tive function (Efθ[I (A)L2θ] in our example) and c∇h(θn) is an estimate of∇h at θn. There exist several different approaches for obtaining the gradient estimation c∇h(θn): infinitesimal perturbation analysis (Glasserman (1991)), likelihood ratio methods (Glynn (1986), Glynn (1990)), Conditional Monte Carlo (Fu and Hu (1997)), and the “push-out” approach (Rubinstein (1992)).
K-W algorithm also uses recursion (14). The difference between these two algorithms is on the method of estimating ∇h(·). K-W algorithm use finite differences to estimate ∇h(·).
Using importance sampling for accelerating simulation by finding an approximate minimizer of (13) has been applied in various applications, especially in queueing and reliabil-ity models; see, e.g. Al-Qaq, Devetsikiotis, and Townsend (1995), Devetsikiotis and Townsend (1993a, 1993b), Ru-binstein (1997, 1999), and RuRu-binstein and Melamed (1998). Similar idea has also been applied to speeding up the simulation for pricing financial derivative, such as (out-of-the-money) Asian options. See Su and Fu (2000) and Vazquez-Abad and Dufresne (1998).
It is sometimes advantageous to rewrite (13) as min
θ Ef[I (A)Lθ]. (15)
See Su and Fu (2000) for a successful example of using (15).
4.2 Approach via Minimizing Cross Entropy
4.2.1 Cross Entropy
Given a probability density functionf , cross entropy defines a measurement of “distance tof ”. Let g be a probability density function defined on the same sample space such that f (ω) > 0 implies g(ω) > 0. Then the cross entropy, also known as relative entropy or Kullback Leibler distance,
of the probability density function g with respect to the probability density functionf is
D(f, g) = Z log f (ω) g(ω) f (ω)dω = Ef[log(Lf,g)]. (16)
For more details on this definition, see Kapur and Kesavan (1992). However, beware that this definition of cross entropy is not universal. For example, Jelinek (1997) defines cross entropy as
H (f, g) = − Z
log(g(ω))f (ω)dω.
There are some important properties ofD(·): 1. D(·) is non-symmetric; i.e., D(f, g) 6= D(g, f ) 2. D(f, g) ≥ 0
3. D(f, f ) = 0
SinceD(·) measures the distance between distributions, it is reasonable to expectD(·) can be used to select impor-tance sampling distributions. In particular, we want to find a distribution g such that g is the minimizer of
min g Z log g∗(ω) g(ω) g∗(ω)dω.
Of course, g∗ solves this problem. However,g∗ is usually unattainable. So, the candidate distributions is again a family of importance sampling distributions parameterized byθ ∈ 2 ⊆ <d. Thus, the problem to be solved becomes
min θ Z log g∗(ω) fθ(ω) g∗(ω)dω = Eg∗[log Lg∗,fθ], (17)
whereLg∗,fθ(ω) = g∗(ω)/fθ(ω) is the likelihood ratio.
But D(g∗, fθ) = Z log g∗(ω) fθ(ω) g∗(ω)dω = Z g∗(ω) log g∗(ω)dω − Z g∗(ω) log fθ(ω)dω = Z g∗(ω) log g∗(ω)dω + H(g∗, fθ).
Since the first term is independent of θ, the minimizer of H(g∗, fθ) is also a minimizer of (17).
We obtain the following alternative formulation of the functionH (g∗, fθ): H (g∗, fθ) = − Z log(fθ(ω))g∗(ω)dω = − Z
log(fθ(ω))f (ω)I (ω ∈ A)/γKdω = − 1
γK
Z
log(fθ(ω))I (ω ∈ A)f (ω)dω = − 1
γKEf[I (A) log(fθ)].
Thus, the maximizer of max
θ Ef[I (A) log(fθ)] (18)
is also the minimizer of (17). 4.2.2 Algorithm
Based on (18), it is straightforward to derive an iterative procedure for computing an approximate minimizer of (17). The key idea is to express (17) as
max
θ Efθ0[I (A)Lθ0log(fθ)], (19)
for θ0∈ 2, where Lθ0(ω) = f (ω)/fθ0(ω) for ω ∈ A.
Combine (18) and (19), the iterative procedure is now clear:
1. Select an initial guess θ0 of (18); setn = 0
2. Compute an approximate minimizer of (19) with θ0= θn
3. θn+1← approximate minimizer computed in Step 2;n ← n + 1
4. (convergence test) Ifkθn− θn−1k < ( is a small positive number), stop; otherwise, goto Step 2 This adaptive approach for minimizing cross entropy has been adopted in Lieber, Rubinstein and Elmakis (1997), Rubinstein (1997, 1999), and de Boer, Nicola and Rubinstein (2000).
5 CONCLUDING REMARKS
We have reviewed the basic concepts of importance sam-pling and selection criteria of good importance samsam-pling distributions in rare event simulation context. Also, we have reviewed two adaptive importance sampling methods in the literature. In this section, we will emphasize some properties of these two methods.
Both methods adaptively look for parameters which let an importance sampling distribution optimal in their settings.
But cross entropy method has an advantage on computing an approximate solution on each iteration for certain stochastic models. For example, the optimal transition probabilities of DTMC can be computed analytically because of the logarithm of likelihood ratio; see Section 3.1 of de Boer, Nicola and Rubinstein (2000) for details.
The key optimization problem (18) in cross entropy method is equivalent to min θ Ef[I (A) log(Lθ)]. Since arg max θ Ef[I (A) log(fθ)] = arg min
θ Ef[−I (A) log(fθ)]
= arg min
θ Ef[I (A) log(f ) − I (A) log(fθ)]
= arg min
θ Ef[I (A) log(Lθ)].
In other words, under original density f , minimizing es-timator’s variance is equivalent to minimize the expected likelihood ratio conditioned on the rare event happens; and minimizing cross entropy is equivalent to minimize the expected logarithm of likelihood ratio conditioned on the rare event happens. Therefore, minimizing cross entropy in some sense is close to, but definitely is different from minimizing estimator’s variance.
In terms of estimator’s variance, cross entropy method does not seek for the optimal solution. Although intuitively, the optimizers of both methods are close to each other. It would be beneficial to know how close they are.
REFERENCES
Al-Qaq, W. A., M. Devetsikiotis, and J. K. Townsend. 1995. Stochastic gradient optimization of importance sam-pling for the efficient simulation of digital communica-tion systems. IEEE Transaccommunica-tions on Communicacommunica-tions, 43:2975–2985.
Bratley, P., B. Fox, and L. Schrage. 1987. A Guide to Simulation. Springer-Verlag, New York, second edition.
Bucklew, J. 1990. Large Deviation Techniques in Decision, Simulation, and Estimation. John Wiley and Sons, Inc., New York.
de Boer, P.T., V. F. Nicola and R. Y. Rubinstein, 2000. Adaptive importance sampling simulation of queueing networks. Proceedings of the 2000 Winter Simulation Conference, J. A. Joines, R. R. Barton, K. Kang, and P. A. Fishwick, eds., 646–655. Institute of Electrical and Electronics Engineers.
Dembo, A. and O. Zeitouni. 1993. Large Deviation Tech-niques and Applications. Jones and Bartlett, Boston. Devetsikiotis, M. and J. K. Townsend. 1993a. An
al-gorithmic approach to the optimization of importance sampling parameters in digital communication system simulation. IEEE Transactions on Communications 41:1464–1473.
Devetsikiotis, M. and J. K. Townsend. 1993b. Statistical optimization of dynamic importance sampling parame-ters for efficient simulation of communication networks. IEEE/ACM Transactions on Networking 1:293–305. Fu, M. C., and J. Q. Hu. 1997. Conditional Monte
Carlo: Gradient estimation and optimization appli-cations. Kluwer Academic Publisher, Boston.
Glasserman, P. 1991. Gradient estimation via perturbation analysis. Boston: Kluwer.
Glynn, P. W. 1986. Stochastic approximation for Monte Carlo optimization. In Proceedings of the 1986 Winter Simulation Conference, 356–365.
Glynn, P. W. 1990. Likelihood ratio gradient estimation for stochastic systems. Communications of the ACM 33:75–84.
Glynn, P. W. 1994. Efficiency improvement techniques. Annals of Oper. Res., 53:175–197.
Glynn, P. W. and D. L. Iglehart. 1989. Importance sampling for stochastic simulations. Manage. Sci., 35:1367– 1392.
Hammersley, J. M. and D. C. Handscomb. 1965. Monte Carlo Methods. Methuen & Co. Ltd., London. Heidelberger, P. 1995. Fast simulation of rare events in
queueing and reliability models. ACM Trans. on Modeling and Computer Simulation, 5:43–85. Lehtonen, T. and H. Nyrhinen. 1992. Simulating
level-crossing probabilities by importance sampling. Adv. in Appl. Probab., 24(4):858–874.
Lieber, D., R.Y. Rubinstein and D. Elmakis. 1997. Quick estimation of rare events in stochastic networks. IEEE Trans. Rel., 46(2):254–265.
Jelinek, F. 1997. Statistical methods for Speech Recognition, Massachusetts Institute of Technology Press.
Kapur, J.N. and H. K.Kesavan. 1992. Entropy Optimization Principles with Applications. Academic Press. Karlin, S. and H. M. Taylor. 1975. A First Course in
Stochastic Processes. Academic Press, New York-London, second edition.
Kiefer, J., and J. Wolfowitz. 1952. Stochastic estimation of the maximum of a regression function. Annals of Mathematical Statistics, 23:462–466.
Robbins, H., and S. Monro. 1951. A stochastic approx-imation method. Annals of Mathematical Statistics, 22:400–407.
Rubinstein, R. Y. 1992. Sensitivity analysis of discrete event systems by the “push-out” method. Annals of Operations Research 39:229–237.
Rubinstein, R. Y. 1997. Optimization of computer simu-lation models with rare events. European Journal of Operations Research 99:89–112.
Rubinstein, R. Y. 1999. Rare event simulation via crossen-tropy and importance sampling. In Second International Workshop on Rare Event Simulation, RESIM 99, 1–17. Rubinstein, R. Y. and B. Melamed. 1998. Modern
Simula-tion and Modeling. Wiley.
Siegmund, D. 1976. Importance sampling in the Monte Carlo study of sequential tests. Ann. Statist., 4(4):673– 684.
Su, Y., and M. Fu. 2000. Importance sampling in derivative securities pricing. Proceedings of the 2000 Winter Simulation Conference, J. A. Joines, R. R. Barton, K. Kang, and P. A. Fishwick, eds., 587–596. Institute of Electrical and Electronics Engineers.
Vazquez-Abad, F., and D. Dufresne. 1998. Accelerated simulation for pricing Asian options. Proceedings of the 1998 Winter Simulation Conference, D. J. Medeiros, E. F. Watson, J. S. Carson, and M. S. Manivannan, eds., 1493–1500. Piscataway, New Jersey: Institute of Electrical and Electronics Engineers.
AUTHOR BIOGRAPHY
MING-HUA HSIEH is an assistant Professor of the De-partment of Management Information Systems at National Chengchi University, Taiwan. From 1997 to 1999, he was a software design engineer at Hewlett Packard company, Cali-fornia. He is a member of INFORMS. His research interests include simulation methodology and financial engineering. His e-mail address is <[email protected]>.