Adaptive Monte Carlo Methods for Rare Event Simulations

(1)

E. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, eds.

ADAPTIVE MONTE CARLO METHODS FOR RARE EVENT SIMULATIONS

Ming-hua Hsieh

Department of Management Information Systems National Chengchi University

Wenshan, Taipei 11623, TAIWAN

ABSTRACT

We review two types of adaptive Monte Carlo methods for rare event simulations. These methods are based on im-portance sampling. The first approach selects imim-portance sampling distributions by minimizing the variance of im-portance sampling estimator. The second approach selects importance sampling distributions by minimizing the cross entropy to the optimal importance sampling distribution. We also review the basic concepts of importance sampling in the rare event simulation context. To make the basic concepts concrete, we introduce these ideas via the study of rare events of M/M/1 queues.

1 INTRODUCTION

Rare events, although are seldom happened as its name suggests, are important when they do happen in many ap-plication areas. For example, the buffer overflow event is rare in a high quality telecommunication network, but is sig-nificant when it happens; A system break down event is rare in a fault-tolerant computing system, but has a consequen-tial effect when it happens. Therefore, accurate estimation of the probabilities of such rare events is important. How-ever, if the probabilities of rare events are really small, estimation of these probabilities is often computationally intractable when studied using conventional Monte Carlo simulation. Therefore, powerful efficiency improvement techniques (see, e.g. (Glynn (1994)) and (Bratley, Fox, and Schrage (1987))) are needed. The most suitable technique for rare event simulations is importance sampling (see, e.g. Hammersley and Handscomb (1965), and we will review the basic concepts in Section 3.) When applied appropri-ately, importance sampling can improve the efficiency many orders of magnitude. Unfortunately, it is not a simple task to apply importance sampling appropriately on rare event simulations. The main difficulty lies in the selection of an effective importance sampling distribution. The selection

usually requires simulationists have a good understanding of the structure of the system being simulated.

To overcome this difficulty, researchers have developed adaptive procedures for selecting effective importance sam-pling distributions. This paper’s main purpose is to review these procedures. We reviewed two adaptive importance sampling methods in this paper. The first method selects effective importance sampling distributions by minimizing the variance of importance sampling estimator. The second method selects effective importance sampling distributions by minimizing the cross entropy to the optimal importance sampling distribution (see Section 3.2). For applications of these methods in selecting effective importance sampling distributions for queueing models and financial derivative models, see the citations of Section 4.

The rest of the paper is organized as follows. In Section 2, we describe a simple M/M/1 queue model and the rare event of interest. In Section 3, we review the basic con-cepts of importance sampling via the study of rare events of M/M/1 queues. Several criteria of evaluating the goodness of importance sampling estimators are given. In section 4, we reviewed two types of adaptive importance sampling methods for rare event simulations. The first method se-lects importance sampling distributions by minimizing the variance of importance sampling estimator. The second one selects importance sampling distributions by minimiz-ing the cross entropy to the optimal importance samplminimiz-ing distribution. Finally, the paper is summarized in Section 5. This section remarks some properties of these two types of methods.

2 A SIMPLE RARE EVENT SIMULATION PROBLEM

Let us consider a stable M/M/1 queue with arrival rateλ = 1, service rateµ > 1, and the buffer limit of the system K > 1, see Figure 1. Let X(t) be the number of customers in the system at timet. Then, X = (X(t) : t ≥ 0) is a continuous time Markov chain with state space S = {0, 1, 2, . . . K}.

(2)

Suppose that X(0) = 1 and we are interested in using importance sampling to estimate

γK = P (X(T ) = K), (1)

where

T = inf{t > 0 : X(t) = 0 or X(t) = K}.

It is well known thatX is an irreducible, positive recurrent Markov chain and T is a stopping time, also known as Markov time (see p. 255 & p. 318 of Karlin and Taylor (1975)). 0 µ  1 λ -µ  2 · · · λ -µ  K-1 λ- K

Figure 1: Transition Diagram of the M/M/1 Queue Example If K is large, then γ_K is a small number. Thus, estimatingγ_K in (1) is a rare event simulation problem.

For this simple problem, the analytical solution forγ_K is known. In particular,

γK = _µµ − 1_K_{− 1}. (2)

Of course with a known analytical result there is no need to simulate. We just use this problem as a vehicle for the basic ideas to follow. Throughout this paper, the concepts of importance sampling and adaptive importance sampling methods will be introduced via this simple problem. 3 THE BASIC CONCEPTS OF

IMPORTANCE SAMPLING

3.1 Importance Sampling

Since X is an irreducible, positive recurrent chain and T is a stopping time, it is well known that P ({ω : T (ω) < ∞}) = 1. Thus, without loss of generality, we can choose  = {ω : T (ω) < ∞} being the sample space. Let f (·) denote the (original) density function ofX given X(0) = 1 and let

A = {ω : X(T (ω)) = K}. Then, we can representγ_K as

Z

I (ω ∈ A)f (ω)dω = Ef[I (A)],

whereI (·) is the indicator function.

To estimate γ_K via simulation, the direct approach would be to generate n independent sample paths,

ω1, . . . , ωn, from the probability density functionf (·) and

form the estimator ˆγK = 1_n

n

X

k=1

I (ωk∈ A).

By the central limit theorem, √

n( ˆγK− γK) ⇒pγK(1 − γK)N(0, 1),

asn → ∞, where N(0, 1) denotes a normal random variable with mean 0 and variance 1. Thus, to construct a 95% confidence interval for γ_K with relative half-length of 1%, we need sample sizen ≈ 1/(0.012)×1.962×(1−γ_K)/γ_K. Therefore, if γ_K = 10−9, then n ≈ 3.84 × 1013. This demonstrates the main problem of rare event simulations.

Let g(·) be a density function such that ω ∈ A and f (ω) > 0 implies g(ω) > 0. Then we have another representation for γ_K: γK = Z I (ω ∈ A)f (ω) g(ω)g(ω)dω = Z I (ω ∈ A)Lf,g(ω)g(ω)dω = Eg[I (A)Lf,g], (3) whereL_f,g(ω) = f (ω)/g(ω) is called the likelihood ratio andE_g(·) denotes the expectation under g(·).

Identity (3) suggests an alternative estimation scheme: generaten samples, ω1, . . . , ωn, fromg(·). By (3),

ˆγKg =_n1 n

X

k=1

Lf,g(ωk)I (ωk ∈ A) (4)

is an unbiased estimate of γ_K. This alternative estimation scheme is called importance sampling. To apply impor-tance sampling to more general stochastic systems including discrete-time Markov chains (DTMC’s), continuous-time Markov chains (CTMC’s), and generalized semi-Markov processes (a mathematical formalization of discrete-event simulations), consult Glynn and Iglehart (1989).

Before proceeding to next subsection, let us derive the explicit formula of the likelihood ratio for our M/M/1 queue example. A typical sample path ω of X is

((X0, h0), (X1, h1), . . . , (XN−1, hN−1), XN)

whereN is the number of jumps before the stochastic process X hits 0 or K, X0, X1, · · · , XN is the sequence of states

of the embedded discrete time Markov chain, andh_nis the holding time in stateX_nforn = 0, 1, . . . , N − 1. Thus, for the probability density functionf (ω) of the sample path ω of the process X for which P_f(·, ·) denotes the transition

(3)

probabilities of the embedded discrete time Markov chain, we have f (ω) = N−1_Y n=0 (1 + µ)e−(1+µ)hn_P_f_(X_n_{, X}_n+1_). ₍₅₎

Now, let us consider a generalized M/M/1 queue, whose arrival rates and service rates vary. In particular, its arrival rate is λ_k and service rate is µ_k when it is at state k, k = 1, 2, · · · , K − 1; see Figure 2. For such a generalized M/M/1 queue, the densityg(ω) is equal to

N−1_Y n=0

(λXn+ µXn)e−(λXn+µXn)hnPg(Xn, Xn+1), (6)

whereP_g(·, ·) is the transition probabilities of the embedded DTMC of this generalized M/M/1 queue.

0 µ1 1 λ1 -µ2 2 · · · λK−2 -µK−1 K-1 λK−1- K

Figure 2: Transition Diagram of the Generalized M/M/1 Queue Example

Thus, if we choose such a generalized M/M/1 to do importance sampling, then the likelihood ratio of ω

Lf,g(ω) = Q_N−1 n=0(1 + µ)e−(1+µ)hnPf(Xn, Xn+1) Q_N−1 n=0(λXn+ µXn)e−(λXn+µXn)hnPg(Xn, Xn+1) . (7)

See Glynn and Iglehart (1989) for explicit formulas of likelihood ratios for a variety of more general stochastic processes.

3.2 The Optimal Importance Sampling Distribution

Is importance sampling always better than direct simulation? This answer depends on the choice of importance sampling distribution g. An ideal g is a distribution which has the property Var_g[I (A)L_f,g] Var_f[I (A)], where Var_g(·) and Var_f(·) denote the variance under distributions g(·) and f (·), respectively. And the bestg is a distribution which has the property Var_g[I (A)L_f,g] = 0.

Definition 1 A distributiong is called the optimal im-portance sampling distribution of distributionf if

Var_g[I (A)L_f,g] = 0, whereA is the rare event of interest.

Does suchg exist? If we define the following probability density function on the sample path ω,

g∗(ω) = f (ω)I (ω ∈ A)_γ

K , (8)

then we have

Eg∗[(I (A)L_f,g∗)2] = E_f[I (A)L_f,g∗]

= Z I (ω ∈ A)_gf (ω)_∗_(ω)f (ω)dω = γ2 K. Thus, Var_g∗[I (A)L_f,g∗] =

Eg∗[(I (A)L_f,g∗)2] − E2_g∗[I (A)Lf,g∗] = 0.

Therefore,g∗is the optimal importance sampling distribu-tion.

In most practical problems, the optimal importance sampling distribution is not achievable, since it contains the quantityγ_K, which is unknown. However, it is computable in our simple problem. We will demonstrate how to compute the optimal importance sampling distribution in this simple M/M/1 queue example.

Example 1 [The Optimal Importance Sampling Distribution] From (8), it is easy to see thatL_f,g∗(ω) = γ_K (a constant) for ω ∈ A. Now, consider two sample paths ω1,ω2∈ A

ω1= ((1, 1), (2, 1), . . . ,

(k, 1), (k + 1, 1), · · · , (K − 1, 1), K)

and

ω2= ((1, 1), . . . , (k, 1), (k + 1, 1),

(k, 1), (k + 1, 1) · · · , (K − 1, 1), K)

where 2≤ k ≤ K − 2. If we choose a generalized M/M/1 queue to do importance sampling, then by (7), we have

Lf,g(ω1) = e−(K−1)(1+µ) Q_K−1 n=1 λne−(λn+µn) and Lf,g(ω2) = µe−(K+1)(1+µ) λke−(λk+µk)µ_k+1e−(λk+µk)QK−1_n=1 λ_ne−(λn+µn).

(4)

We can substantially simplify our expressions forL_f,g(ω2)

andL_f,g(ω2) by setting

λk+ µk= 1 + µ, 1 ≤ k ≤ K − 1;

and this yields

Lf,g(ω1) = 1 Q_K−1 n=1 λn Lf,g(ω2) = µ λkµk+1QK−1_n=1 λn. It is easy to seeL_f,g(ω1) = Lf,g(ω2) if µk+1= µ/λk.

Therefore, the recursion µk= _λµ

k−1, λk = 1 + µ − µk, 2 ≤ k ≤ K − 1

is necessary for a generalized M/M/1 queue being the optimal importance sampling distribution. With suitably chosen boundary condition, above recursion does define the optimal importance sampling distribution:

µ1= 0, λ1= 1 + µ

µk =_λµ

k−1, λk = 1 + µ − µk, 2 ≤ k ≤ K − 1. (9)

Note that µ1= 0 is necessary for the generalized M/M/1

queue to serve as the optimal importance sampling dis-tribution. Since, otherwise there exists ω 6∈ A such that P (ω) > 0.

3.3 Asymptotically Optimal Importance Sampling Distributions

In the queueing and random walks literature, there is a notion called asymptotically optimal for measuring the ef-fectiveness of an importance sampling distribution in the rare event simulation context. See, e.g. Siegmund (1976), Lehtonen and Nyrhinen (1992), and Heidelberger (1995). Definition 2 LetA_K = {ω : X(T (ω)) = K}. If

lim

K→∞

logE_g[I (A_K)L2_f,g]

logγ_K = 2, (10)

we callg an asymptotically optimal importance sampling distribution. Note that Eg[I (AK)L2f,g] = Var[I (AK)Lf,g] + (Eg[I (AK)Lf,g])2 = Var[I (AK)Lf,g] + γK2 ≥ γ2 K.

In view of the preceding development, we obtain the fol-lowing inequality,

logE_g[I (A_K)L2_f,g] ≥ 2 log γ_K, for all valid distribution g. Letting K → ∞ yields

lim inf

K→∞

logE_g[I (A_K)L2_f,g] logγ_K ≥ 2.

Thus, asymptotically optimal importance sampling distri-butions are optimal on the logarithmic scale.

Example 2 [Asymptotical Optimality]

Let us consider a M/M/1 queue with with arrival rateµ and service rate 1, i.e., by switching the service rate and arrival rate of the original M/M/1 queue. If we use such a M/M/1 queue to do importance sampling, then

Lf,g(ω) = _µ_K−11 ,

for allω ∈ A, i.e., L_f,g(ω) is constant for all ω ∈ A. Now,

Eg[I (AK)L2f,g] = 1 µ2(K−1)Eg[I (AK)] = 1 µK−1 µ − 1 µK_{− 1} γK = _µµ − 1_K_{− 1}. Thus, lim K→∞ logE_g[I (A_K)L2_f,g] logγ_K = 2.

Such a change-of-measure is, therefore, an asymptotically optimal importance sampling distribution. This type of sim-ple but effective change-of-measures exist in more general setting, see Heidelberger (1995) for a complete survey.

It is interesting to note that if the boundary condition of (9) is set to µ1= 1 and λ1 = µ, the resulting M/M/1

queue is this asymptotically optimal one. 3.4 Bounded Relative Error

Definition 3 LetA_K = {ω : X(T (ω)) = K}. If lim sup K→∞ p Var_g[I (A_K)L_f,g] γK < ∞ (11)

we call importance sampling distribution g has bounded relative error property.

Importance sampling distributions with bounded rela-tive error property are desirable, since they require only a finite number of samples to construct a confidence interval with a given precision, no matter how small the γ_K is.

(5)

Example 3 [Bounded Relative Error]

The importance sampling distribution of Example 2 has bounded relative error property, since

p Var_g[I (A_K)L_f,g] γK = s uK−1_{− 1} µK−1_{(µ − 1)} < 1 √ µ − 1. 4 ADAPTIVE IMPORTANCE SAMPLING METHODS

To use importance sampling in rare event simulations, it is good to have an importance sampling distribution with asymptotical optimality or bounded relative error property. However, to have such an importance sampling distribution, it is usually required to have a good understanding of the large deviation behavior of the rare event of interest; this is the major obstacle to wide-spread application of importance sampling in rare event simulations. (Large deviations is a body of asymptotic theory which may be used to obtain the rare event asymptotics that we are interested in; cf. Bucklew (1990) and Dembo and Zeitouni (1993) for general background.) Thus, it is nice to have more automatical ways of finding good importance sampling distributions, regardless of what rare event problem we are interested. In this section, we review two types of adaptive importance sampling methods, which are to serve this need.

The effectiveness of the importance sampling estimator depends on the choice of importance sampling distribution g. To make the selection simpler, we usually only consider a family of importance sampling distributions parameterized by θ ∈ 2 ⊆ <d; e.g., the family of exponential twisting distributions.

4.1 Approach via Minimizing Estimator’s Variance

The most direct measure of effectiveness of an estimator is its variance. From Section 3, we know the variance of an im-portance sampling estimator is 1/m times Var_g[I (A)L_f,g] if the estimator is computed by m independent copies of I (A)Lf,gsampling from the importance sampling distribu-tion g. Let f_θ denote the family of distributions. Then, selecting the best importance sampling distribution fromf_θ can be formulated as

min

θ Varfθ[I (A)Lθ], (12)

whereL_θ(ω) = f (ω)/f_θ(ω). But

Var_f_θ[I (A)L_θ] = E_f_θ[I (A)L2_θ] − (E_f_θ[I (A)L_θ])2 = Efθ[I (A)L2θ] − γK2.

Thus, the variance-minimization problem (12) is is easily seen to be equivalent to

min

θ Efθ[I (A)L

2

θ]. (13)

How does one compute an (approximate) minimizer of (13)? Since (13) is a stochastic optimization problem, traditional stochastic approximation algorithms, Robbins-Monro algorithm (Robbins and Robbins-Monro (1951)) and Kiefer-Wofowitz algorithm (Kiefer and Wolfowitz (1952)) comes naturally for use.

R-M algorithm basically is the following recursion θn+1= 52(θn−_{n + 1}a ∇h(θc n)), (14)

where5₂is the projection operator onto2, h(·) is the objec-tive function (E_f_θ[I (A)L2_θ] in our example) and c∇h(θ_n) is an estimate of∇h at θ_n. There exist several different approaches for obtaining the gradient estimation c∇h(θ_n): infinitesimal perturbation analysis (Glasserman (1991)), likelihood ratio methods (Glynn (1986), Glynn (1990)), Conditional Monte Carlo (Fu and Hu (1997)), and the “push-out” approach (Rubinstein (1992)).

K-W algorithm also uses recursion (14). The difference between these two algorithms is on the method of estimating ∇h(·). K-W algorithm use finite differences to estimate ∇h(·).

Using importance sampling for accelerating simulation by finding an approximate minimizer of (13) has been applied in various applications, especially in queueing and reliabil-ity models; see, e.g. Al-Qaq, Devetsikiotis, and Townsend (1995), Devetsikiotis and Townsend (1993a, 1993b), Ru-binstein (1997, 1999), and RuRu-binstein and Melamed (1998). Similar idea has also been applied to speeding up the simulation for pricing financial derivative, such as (out-of-the-money) Asian options. See Su and Fu (2000) and Vazquez-Abad and Dufresne (1998).

It is sometimes advantageous to rewrite (13) as min

θ Ef[I (A)Lθ]. (15)

See Su and Fu (2000) for a successful example of using (15).

4.2 Approach via Minimizing Cross Entropy

4.2.1 Cross Entropy

Given a probability density functionf , cross entropy defines a measurement of “distance tof ”. Let g be a probability density function defined on the same sample space such that f (ω) > 0 implies g(ω) > 0. Then the cross entropy, also known as relative entropy or Kullback Leibler distance,

(6)

of the probability density function g with respect to the probability density functionf is

D(f, g) = Z log _{f (ω)} g(ω) f (ω)dω = Ef[log(Lf,g)]. (16)

For more details on this definition, see Kapur and Kesavan (1992). However, beware that this definition of cross entropy is not universal. For example, Jelinek (1997) defines cross entropy as

H (f, g) = − Z

log(g(ω))f (ω)dω.

There are some important properties ofD(·): 1. D(·) is non-symmetric; i.e., D(f, g) 6= D(g, f ) 2. D(f, g) ≥ 0

3. D(f, f ) = 0

SinceD(·) measures the distance between distributions, it is reasonable to expectD(·) can be used to select impor-tance sampling distributions. In particular, we want to find a distribution g such that g is the minimizer of

min g Z log _g_∗_(ω) g(ω) g∗(ω)dω.

Of course, g∗ solves this problem. However,g∗ is usually unattainable. So, the candidate distributions is again a family of importance sampling distributions parameterized byθ ∈ 2 ⊆ <d. Thus, the problem to be solved becomes

min θ Z log g∗_(ω) fθ(ω) g∗(ω)dω = Eg∗[log L_g∗_,f_θ], (17)

whereL_g∗_,f_θ(ω) = g∗(ω)/f_θ(ω) is the likelihood ratio.

But D(g∗, fθ) = Z log _g_∗_(ω) fθ(ω) g∗(ω)dω = Z g∗(ω) log g∗(ω)dω − Z g∗(ω) log fθ(ω)dω = Z g∗_{(ω) log g}∗_{(ω)dω + H(g}∗_{, f}_θ_).

Since the first term is independent of θ, the minimizer of H(g∗_{, f}_θ_{) is also a minimizer of (17).}

We obtain the following alternative formulation of the functionH (g∗, f_θ): H (g∗, fθ) = − Z log(f_θ(ω))g∗(ω)dω = − Z

log(f_θ(ω))f (ω)I (ω ∈ A)/γ_Kdω = − 1

γK

Z

log(f_θ(ω))I (ω ∈ A)f (ω)dω = − 1

γKEf[I (A) log(fθ)].

Thus, the maximizer of max

θ Ef[I (A) log(fθ)] (18)

is also the minimizer of (17). 4.2.2 Algorithm

Based on (18), it is straightforward to derive an iterative procedure for computing an approximate minimizer of (17). The key idea is to express (17) as

max

θ Efθ0[I (A)Lθ0log(fθ)], (19)

for θ0∈ 2, where L_θ0(ω) = f (ω)/f_θ0(ω) for ω ∈ A.

Combine (18) and (19), the iterative procedure is now clear:

1. Select an initial guess θ0 of (18); setn = 0

2. Compute an approximate minimizer of (19) with θ0_{= θ}_n

3. θ_n+1← approximate minimizer computed in Step 2;n ← n + 1

4. (convergence test) Ifkθ_n− θ_n−1k < ( is a small positive number), stop; otherwise, goto Step 2 This adaptive approach for minimizing cross entropy has been adopted in Lieber, Rubinstein and Elmakis (1997), Rubinstein (1997, 1999), and de Boer, Nicola and Rubinstein (2000).

5 CONCLUDING REMARKS

We have reviewed the basic concepts of importance sam-pling and selection criteria of good importance samsam-pling distributions in rare event simulation context. Also, we have reviewed two adaptive importance sampling methods in the literature. In this section, we will emphasize some properties of these two methods.

Both methods adaptively look for parameters which let an importance sampling distribution optimal in their settings.

(7)

But cross entropy method has an advantage on computing an approximate solution on each iteration for certain stochastic models. For example, the optimal transition probabilities of DTMC can be computed analytically because of the logarithm of likelihood ratio; see Section 3.1 of de Boer, Nicola and Rubinstein (2000) for details.

The key optimization problem (18) in cross entropy method is equivalent to min θ Ef[I (A) log(Lθ)]. Since arg max θ Ef[I (A) log(fθ)] = arg min

θ Ef[−I (A) log(fθ)]

= arg min

θ Ef[I (A) log(f ) − I (A) log(fθ)]

= arg min

θ Ef[I (A) log(Lθ)].

In other words, under original density f , minimizing es-timator’s variance is equivalent to minimize the expected likelihood ratio conditioned on the rare event happens; and minimizing cross entropy is equivalent to minimize the expected logarithm of likelihood ratio conditioned on the rare event happens. Therefore, minimizing cross entropy in some sense is close to, but definitely is different from minimizing estimator’s variance.

In terms of estimator’s variance, cross entropy method does not seek for the optimal solution. Although intuitively, the optimizers of both methods are close to each other. It would be beneficial to know how close they are.

REFERENCES

Al-Qaq, W. A., M. Devetsikiotis, and J. K. Townsend. 1995. Stochastic gradient optimization of importance sam-pling for the efficient simulation of digital communica-tion systems. IEEE Transaccommunica-tions on Communicacommunica-tions, 43:2975–2985.

Bratley, P., B. Fox, and L. Schrage. 1987. A Guide to Simulation. Springer-Verlag, New York, second edition.

Bucklew, J. 1990. Large Deviation Techniques in Decision, Simulation, and Estimation. John Wiley and Sons, Inc., New York.

de Boer, P.T., V. F. Nicola and R. Y. Rubinstein, 2000. Adaptive importance sampling simulation of queueing networks. Proceedings of the 2000 Winter Simulation Conference, J. A. Joines, R. R. Barton, K. Kang, and P. A. Fishwick, eds., 646–655. Institute of Electrical and Electronics Engineers.

Dembo, A. and O. Zeitouni. 1993. Large Deviation Tech-niques and Applications. Jones and Bartlett, Boston. Devetsikiotis, M. and J. K. Townsend. 1993a. An

al-gorithmic approach to the optimization of importance sampling parameters in digital communication system simulation. IEEE Transactions on Communications 41:1464–1473.

Devetsikiotis, M. and J. K. Townsend. 1993b. Statistical optimization of dynamic importance sampling parame-ters for efficient simulation of communication networks. IEEE/ACM Transactions on Networking 1:293–305. Fu, M. C., and J. Q. Hu. 1997. Conditional Monte

Carlo: Gradient estimation and optimization appli-cations. Kluwer Academic Publisher, Boston.

Glasserman, P. 1991. Gradient estimation via perturbation analysis. Boston: Kluwer.

Glynn, P. W. 1986. Stochastic approximation for Monte Carlo optimization. In Proceedings of the 1986 Winter Simulation Conference, 356–365.

Glynn, P. W. 1990. Likelihood ratio gradient estimation for stochastic systems. Communications of the ACM 33:75–84.

Glynn, P. W. 1994. Efficiency improvement techniques. Annals of Oper. Res., 53:175–197.

Glynn, P. W. and D. L. Iglehart. 1989. Importance sampling for stochastic simulations. Manage. Sci., 35:1367– 1392.

Hammersley, J. M. and D. C. Handscomb. 1965. Monte Carlo Methods. Methuen & Co. Ltd., London. Heidelberger, P. 1995. Fast simulation of rare events in

queueing and reliability models. ACM Trans. on Modeling and Computer Simulation, 5:43–85. Lehtonen, T. and H. Nyrhinen. 1992. Simulating

level-crossing probabilities by importance sampling. Adv. in Appl. Probab., 24(4):858–874.

Lieber, D., R.Y. Rubinstein and D. Elmakis. 1997. Quick estimation of rare events in stochastic networks. IEEE Trans. Rel., 46(2):254–265.

Jelinek, F. 1997. Statistical methods for Speech Recognition, Massachusetts Institute of Technology Press.

Kapur, J.N. and H. K.Kesavan. 1992. Entropy Optimization Principles with Applications. Academic Press. Karlin, S. and H. M. Taylor. 1975. A First Course in

Stochastic Processes. Academic Press, New York-London, second edition.

Kiefer, J., and J. Wolfowitz. 1952. Stochastic estimation of the maximum of a regression function. Annals of Mathematical Statistics, 23:462–466.

Robbins, H., and S. Monro. 1951. A stochastic approx-imation method. Annals of Mathematical Statistics, 22:400–407.

Rubinstein, R. Y. 1992. Sensitivity analysis of discrete event systems by the “push-out” method. Annals of Operations Research 39:229–237.

(8)

Rubinstein, R. Y. 1997. Optimization of computer simu-lation models with rare events. European Journal of Operations Research 99:89–112.

Rubinstein, R. Y. 1999. Rare event simulation via crossen-tropy and importance sampling. In Second International Workshop on Rare Event Simulation, RESIM 99, 1–17. Rubinstein, R. Y. and B. Melamed. 1998. Modern

Simula-tion and Modeling. Wiley.

Siegmund, D. 1976. Importance sampling in the Monte Carlo study of sequential tests. Ann. Statist., 4(4):673– 684.

Su, Y., and M. Fu. 2000. Importance sampling in derivative securities pricing. Proceedings of the 2000 Winter Simulation Conference, J. A. Joines, R. R. Barton, K. Kang, and P. A. Fishwick, eds., 587–596. Institute of Electrical and Electronics Engineers.

Vazquez-Abad, F., and D. Dufresne. 1998. Accelerated simulation for pricing Asian options. Proceedings of the 1998 Winter Simulation Conference, D. J. Medeiros, E. F. Watson, J. S. Carson, and M. S. Manivannan, eds., 1493–1500. Piscataway, New Jersey: Institute of Electrical and Electronics Engineers.

AUTHOR BIOGRAPHY

MING-HUA HSIEH is an assistant Professor of the De-partment of Management Information Systems at National Chengchi University, Taiwan. From 1997 to 1999, he was a software design engineer at Hewlett Packard company, Cali-fornia. He is a member of INFORMS. His research interests include simulation methodology and financial engineering. His e-mail address is <[email protected]>.