Improving BOPM with Extrapolation

(1)

Extrapolation

• It is a method to speed up numerical convergence.

• Say f (n) converges to an unknown limit f at rate of 1/n:

f (n) = f + c

n + o µ1

n

¶

. (72)

• Assume c is an unknown constant independent of n.

– Convergence is basically monotonic and smooth.

(2)

Extrapolation (concluded)

• From two approximations f (n₁) and f (n₂) and by ignoring the smaller terms,

f (n₁) = f + c n₁ , f (n₂) = f + c

n₂ .

• A better approximation to the desired f is f = n₁f (n₁) − n₂f (n₂)

n₁ − n₂ . (73)

• This estimate should converge faster than 1/n.

• The Richardson extrapolation uses n = 2n .

(3)

Improving BOPM with Extrapolation

• Consider standard European options.

• Denote the option value under BOPM using n time periods by f (n).

• It is known that BOPM convergences at the rate of 1/n, consistent with Eq. (72) on p. 637.

• But the plots on p. 255 (redrawn on next page)

demonstrate that convergence to the true option value oscillates with n.

• Extrapolation is inapplicable at this stage.

(4)

5 10 15 20 25 30 35 n

11.5 12 12.5 13

Call value

0 10 20 30 40 50 60 n

15.1 15.2 15.3 15.4 15.5

Call value

(5)

Improving BOPM with Extrapolation (concluded)

• Take the at-the-money option in the left plot on p. 640.

• The sequence with odd n turns out to be monotonic and smooth (see the left plot on p. 642).^a

• Apply extrapolation (73) on p. 638 with n₂ = n₁ + 2, where n₁ is odd.

• Result is shown in the right plot on p. 642.

• The convergence rate is amazing.

• See Exercise 9.3.8 of the text (p. 111) for ideas in the general case.

aThis can be proved; see Chang and Palmer (2007).

(6)

5 10 15 20 25 30 35 n

12.2 12.4 12.6 12.8 13 13.2 13.4

Call value

5 10 15 20 25 30 35 n

12.11 12.12 12.13 12.14 12.15 12.16 12.17

Call value

(7)

Numerical Methods

(8)

All science is dominated by the idea of approximation.

— Bertrand Russell

(9)

Finite-Difference Methods

• Place a grid of points on the space over which the desired function takes value.

• Then approximate the function value at each of these points (p. 646).

• Solve the equation numerically by introducing difference equations in place of derivatives.

(10)

0 0.05 0.1 0.15 0.2 0.25 80

85 90 95 100 105 110 115

(11)

Example: Poisson’s Equation

• It is ∂²θ/∂x² + ∂²θ/∂y² = −ρ(x, y).

• Replace second derivatives with finite differences through central difference.

• Introduce evenly spaced grid points with distance of ∆x along the x axis and ∆y along the y axis.

• The finite difference form is

−ρ(x_i, y_j) = θ(x_i+1, y_j) − 2θ(x_i, y_j) + θ(x_i−1, y_j) (∆x)²

+θ(x_i, y_j+1) − 2θ(x_i, y_j) + θ(x_i, y_j−1)

(∆y)² .

(12)

Example: Poisson’s Equation (concluded)

• In the above, ∆x ≡ x_i − x_i−1 and ∆y ≡ y_j − y_j−1 for i, j = 1, 2, . . . .

• When the grid points are evenly spaced in both axes so that ∆x = ∆y = h, the difference equation becomes

−h²ρ(x_i, y_j) = θ(x_i+1, y_j) + θ(x_i−1, y_j) +θ(x_i, y_j+1) + θ(x_i, y_j−1) − 4θ(x_i, y_j).

• Given boundary values, we can solve for the x_is and the y_js within the square [ ±L, ±L ].

• From now on, θ_i,j will denote the finite-difference

(13)

Explicit Methods

• Consider the diffusion equation D(∂²θ/∂x²) − (∂θ/∂t) = 0.

• Use evenly spaced grid points (x_i, t_j) with distances

∆x and ∆t, where ∆x ≡ x_i+1 − x_i and ∆t ≡ t_j+1 − t_j.

• Employ central difference for the second derivative and forward difference for the time derivative to obtain

∂θ(x, t)

∂t

¯¯

t=t_j

= θ(x, t_j+1) − θ(x, t_j)

∆t + · · · , (74)

∂²θ(x, t)

∂x²

¯¯

x=x_i

= θ(x_i+1, t) − 2θ(x_i, t) + θ(x_i−1, t)

(∆x)² + · · · . (75)

(14)

Explicit Methods (continued)

• Next, assemble Eqs. (74) and (75) into a single equation at (x_i, t_j).

• But we need to decide how to evaluate x in the first equation and t in the second.

• Since central difference around x_i is used in Eq. (75), we might as well use x_i for x in Eq. (74).

• Two choices are possible for t in Eq. (75).

• The first choice uses t = t_j to yield the following finite-difference equation,

θ_i,j+1 − θ_i,j

∆t = D θ_i+1,j − 2θ_i,j + θ_i−1,j

(∆x)² . (76)

(15)

Explicit Methods (continued)

• The stencil of grid points involves four values, θ_i,j+1, θ_i,j, θ_i+1,j, and θ_i−1,j.

• Rearrange Eq. (76) on p. 650 as

θ_i,j+1 = D∆t

(∆x)² θ_i+1,j + µ

1 − 2D∆t (∆x)²

¶

θ_i,j + D∆t

(∆x)² θ_i−1,j.

• We can calculate θ_i,j+1 from θ_i,j, θ_i+1,j, θ_i−1,j, at the previous time t_j (see exhibit (a) on next page).

(16)

Stencils

t_j t_j ₁ x_i ₁

x_i ₁ x_i

t_j t_j ₁ x_i ₁

x_i ₁ x_i

(a) (b)

(17)

Explicit Methods (concluded)

• Starting from the initial conditions at t₀, that is, θ_i,0 = θ(x_i, t₀), i = 1, 2, . . . , we calculate

θ_i,1, i = 1, 2, . . . .

• And then

θ_i,2, i = 1, 2, . . . .

• And so on.

(18)

Stability

• The explicit method is numerically unstable unless

∆t ≤ (∆x)²/(2D).

– A numerical method is unstable if the solution is highly sensitive to changes in initial conditions.

• The stability condition may lead to high running times and memory requirements.

• For instance, halving ∆x would imply quadrupling

(∆t)⁻¹, resulting in a running time eight times as much.

(19)

Explicit Method and Trinomial Tree

• Recall that

θ_i,j+1 = D∆t

(∆x)² θ_i+1,j + µ

1 − 2D∆t (∆x)²

¶

θ_i,j + D∆t

(∆x)² θ_i−1,j.

• When the stability condition is satisfied, the three coefficients for θ_i+1,j, θ_i,j, and θ_i−1,j all lie between zero and one and sum to one.

• They can be interpreted as probabilities.

• So the finite-difference equation becomes identical to backward induction on trinomial trees!

• The freedom in choosing ∆x corresponds to similar freedom in the construction of trinomial trees.

(20)

Implicit Methods

• Suppose we use t = t_j+1 in Eq. (75) on p. 649 instead.

• The finite-difference equation becomes θ_i,j+1 − θ_i,j

∆t = D θ_i+1,j+1 − 2θ_i,j+1 + θ_i−1,j+1

(∆x)² .

(77)

• The stencil involves θ_i,j, θ_i,j+1, θ_i+1,j+1, and θ_i−1,j+1.

• This method is implicit:

– The value of any one of the three quantities at t_j+1 cannot be calculated unless the other two are known.

– See exhibit (b) on p. 652.

(21)

Implicit Methods (continued)

• Equation (77) can be rearranged as

θ_i−1,j+1 − (2 + γ) θ_i,j+1 + θ_i+1,j+1 = −γθ_i,j, where γ ≡ (∆x)²/(D∆t).

• This equation is unconditionally stable.

• Suppose the boundary conditions are given at x = x₀ and x = x_{N +1}.

• After θ_i,j has been calculated for i = 1, 2, . . . , N , the values of θ_i,j+1 at time t_j+1 can be computed as the solution to the following tridiagonal linear system,

(22)

Implicit Methods (continued)







a 1 0 · · · · · · 0

1 a 1 0 · · · 0

0 1 a 1 · · · 0

.. .

.. . ..

.

.. .

.. . 0 · · · · · · 1 a 1 0 · · · · · · · · · 1 a













θ1,j+1 θ2,j+1 θ3,j+1

.. . .. . .. . θN,j+1







=







−γθ1,j − θ0,j+1

−γθ2,j

−γθ3,j .. . .. .

−γθN−1,j

−γθN,j − θN+1,j+1





 ,

where a ≡ −2 − γ.

(23)

Implicit Methods (concluded)

• Tridiagonal systems can be solved in O(N ) time and O(N ) space.

• The matrix above is nonsingular when γ ≥ 0.

– A square matrix is nonsingular if its inverse exists.

(24)

Crank-Nicolson Method

• Take the average of explicit method (76) on p. 650 and implicit method (77) on p. 656:

θi,j+1 − θi,j

∆t

= 1

2 Ã

D θi+1,j − 2θi,j + θi−1,j

(∆x)2 + D θi+1,j+1 − 2θi,j+1 + θi−1,j+1 (∆x)2

! .

• After rearrangement,

γθi,j+1 − θi+1,j+1 − 2θi,j+1 + θi−1,j+1

2 = γθi,j + θi+1,j − 2θi,j + θi−1,j

2 .

• This is an unconditionally stable implicit method with excellent rates of convergence.

(25)

Stencil

t

_j

t

_j+1

x

_i

x

_i+1

x

_i+1

(26)

Numerically Solving the Black-Scholes PDE

• See text.

(27)

Monte Carlo Simulation

^a

• Monte Carlo simulation is a sampling scheme.

• In many important applications within finance and without, Monte Carlo is one of the few feasible tools.

• When the time evolution of a stochastic process is not easy to describe analytically, Monte Carlo may very well be the only strategy that succeeds consistently.

aA top 10 algorithm according to Dongarra and Sullivan (2000).

(28)

The Big Idea

• Assume X₁, X₂, . . . , X_n have a joint distribution.

• θ ≡ E[ g(X₁, X₂, . . . , X_n) ] for some function g is desired.

• We generate

³

x⁽ⁱ⁾₁ , x⁽ⁱ⁾₂ , . . . , x⁽ⁱ⁾_n

´

, 1 ≤ i ≤ N

independently with the same joint distribution as (X₁, X₂, . . . , X_n).

• Set

Y_i ≡ g

³

x⁽ⁱ⁾₁ , x⁽ⁱ⁾₂ , . . . , x⁽ⁱ⁾_n

´ .

(29)

The Big Idea (concluded)

• Y₁, Y₂, . . . , Y_N are independent and identically distributed random variables.

• Each Y_i has the same distribution as Y ≡ g(X₁, X₂, . . . , X_n).

• Since the average of these N random variables, Y , satisfies E[ Y ] = θ, it can be used to estimate θ.

• The strong law of large numbers says that this procedure converges almost surely.

• The number of replications (or independent trials), N , is called the sample size.

(30)

Accuracy

• The Monte Carlo estimate and true value may differ owing to two reasons:

1. Sampling variation.

2. The discreteness of the sample paths.^a

• The first can be controlled by the number of replications.

• The second can be controlled by the number of observations along the sample path.

aThis may not be an issue if the derivative only requires discrete sampling along the time dimension.

(31)

Accuracy and Number of Replications

• The statistical error of the sample mean Y of the random variable Y grows as 1/√

N . – Because Var[ Y ] = Var[ Y ]/N .

• In fact, this convergence rate is asymptotically optimal by the Berry-Esseen theorem.

• So the variance of the estimator Y can be reduced by a factor of 1/N by doing N times as much work.

• This is amazing because the same order of convergence holds independently of the dimension n.

(32)

Accuracy and Number of Replications (concluded)

• In contrast, classic numerical integration schemes have an error bound of O(N^−c/n) for some constant c > 0.

– n is the dimension.

• The required number of evaluations thus grows

exponentially in n to achieve a given level of accuracy.

– The curse of dimensionality.

• The Monte Carlo method, for example, is more efficient than alternative procedures for securities depending on more than one asset, the multivariate derivatives.

(33)

Monte Carlo Option Pricing

• For the pricing of European options on a

dividend-paying stock, we may proceed as follows.

• Stock prices S₁, S₂, S₃, . . . at times ∆t, 2∆t, 3∆t, . . . can be generated via

S_i+1 = S_ie^(µ−σ²^{/2) ∆t+σ}^√^{∆t ξ}, ξ ∼ N (0, 1)

(78) when dS/S = µ dt + σ dW .

(34)

Monte Carlo Option Pricing (continued)

• If we discretize dS/S = µ dt + σ dW , we will obtain S_i+1 = S_i + S_iµ ∆t + S_iσ√

∆t ξ.

• But this is locally normally distributed, not lognormally, hence biased.^a

• In practice, this is not expected to be a major problem as long as ∆t is sufficiently small.

aContributed by Mr. Tai, Hui-Chin (R97723028) on April 22, 2009.

(35)

Monte Carlo Option Pricing (concluded)

• Non-dividend-paying stock prices in a risk-neutral

economy can be generated by setting µ = r and ∆t = T .

1: C := 0;

2: for i = 1, 2, 3, . . . , m do

3: P := S × e^(r−σ²^{/2) T +σ}^√^{T ξ};

4: C := C + max(P − X, 0);

5: end for

6: return Ce^−rT/m;

• Pricing Asian options is also easy (see text).

(36)

How about American Options?

• Standard Monte Carlo simulation is inappropriate for American options because of early exercise (why?).

• It is difficult to determine the early-exercise point based on one single path.

• But Monte Carlo simulation can be modified to price American options with small biases (p. 719ff).^a

aLongstaff and Schwartz (2001).

(37)

Delta and Common Random Numbers

• In estimating delta, it is natural to start with the finite-difference estimate

e^−rτ E[ P (S + ²) ] − E[ P (S − ²) ]

2² .

– P (x) is the terminal payoff of the derivative security when the underlying asset’s initial price equals x.

• Use simulation to estimate E[ P (S + ²) ] first.

• Use another simulation to estimate E[ P (S − ²) ].

• Finally, apply the formula to approximate the delta.

(38)

Delta and Common Random Numbers (concluded)

• This method is not recommended because of its high variance.

• A much better approach is to use common random numbers to lower the variance:

e^−rτ E

· P (S + ²) − P (S − ²) 2²

¸ .

• Here, the same random numbers are used for P (S + ²) and P (S − ²).

• This holds for gamma and cross gammas (for multivariate derivatives).

(39)

Gamma

• The finite-difference formula for gamma is e^−rτ E

· P (S + ²) − 2 × P (S) + P (S − ²)

²²

¸ .

• For a correlation option with multiple underlying assets, the finite-difference formula for the cross gammas

∂²P (S₁, S₂, . . . )/(∂S₁∂S₂) is:

e^−rτ E

· P (S₁ + ²₁, S₂ + ²₂) − P (S₁ − ²₁, S₂ + ²₂)

4²₁²₂

−P (S₁ + ²₁, S₂ − ²₂) + P (S₁ − ²₁, S₂ − ²₂) ¸ .

(40)

Gamma (continued)

• Choosing an ² of the right magnitude can be challenging.

– If ² is too large, inaccurate Greeks result.

– If ² is too small, unstable Greeks result.

• This phenomenon is sometimes called the curse of differentiation.

(41)

Gamma (concluded)

• In general, suppose

∂ⁱ

∂θⁱe^−rτE[ P (S) ] = e^−rτE

· ∂ⁱP (S)

∂θⁱ

¸

holds for all i > 0, where θ is a parameter of interest.

• Then formulas for the Greeks become integrals.

• As a result, we avoid ², finite differences, and resimulation.

• This is indeed possible for a broad class of payoff functions.^a

aTeng (R91723054) (2004) and Lyuu and Teng (R91723054) (2010).

(42)

Biases in Pricing Continuously Monitored Options with Monte Carlo

• We are asked to price a continuously monitored up-and-out call with barrier H.

• The Monte Carlo method samples the stock price at n discrete time points t₁, t₂, . . . , t_n.

• A sample path S(t₀), S(t₁), . . . , S(t_n) is produced.

– Here, t₀ = 0 is the current time, and t_n = T is the expiration time of the option.

(43)

Biases in Pricing Continuously Monitored Options with Monte Carlo (continued)

• If all of the sampled prices are below the barrier, this sample path pays max(S(t_n) − X, 0).

• Repeating these steps and averaging the payoffs yield a Monte Carlo estimate.

(44)

1: C := 0;

2: for i = 1, 2, 3, . . . , m do

3: P := S; hit := 0;

4: for j = 1, 2, 3, . . . , n do

5: P := P × e^(r−σ²/2) (T /n)+σ√

(T /n) ξ;

6: if P ≥ H then

7: hit := 1;

8: break;

9: end if

10: end for

11: if hit = 0 then

12: C := C + max(P − X, 0);

13: end if

14: end for

(45)

Biases in Pricing Continuously Monitored Options with Monte Carlo (continued)

• This estimate is biased.

– Suppose none of the sampled prices on a sample path equals or exceeds the barrier H.

– It remains possible for the continuous sample path that passes through them to hit the barrier between sampled time points (see plot on next page).

(46)

H

(47)

Biases in Pricing Continuously Monitored Options with Monte Carlo (concluded)

• The bias can certainly be lowered by increasing the number of observations along the sample path.

• However, even daily sampling may not suffice.

• The computational cost also rises as a result.

(48)

Brownian Bridge Approach to Pricing Barrier Options

• We desire an unbiased estimate efficiently.

• So the above-mentioned payoff should be multiplied by the probability p that a continuous sample path does not hit the barrier conditional on the sampled prices.

• This methodology is called the Brownian bridge approach.

• Formally, we have

p ≡ Prob[ S(t) < H, 0 ≤ t ≤ T | S(t₀), S(t₁), . . . , S(t_n) ].

(49)

Brownian Bridge Approach to Pricing Barrier Options (continued)

• As a barrier is hit over a time interval if and only if the maximum stock price over that period is at least H,

p = Prob

·

0≤t≤Tmax S(t) < H | S(t₀), S(t₁), . . . , S(t_n)

¸ .

• Luckily, the conditional distribution of the maximum over a time interval given the beginning and ending stock prices is known.

(50)

Brownian Bridge Approach to Pricing Barrier Options (continued)

Lemma 19 Assume S follows dS/S = µ dt + σ dW and define ζ(x) ≡ exp

·

−2 ln(x/S(t)) ln(x/S(t + ∆t)) σ²∆t

¸ . (1) If H > max(S(t), S(t + ∆t)), then

Prob

·

t≤u≤t+∆tmax S(u) < H

¯¯

¯¯ S(t), S(t + ∆t)

¸

= 1 − ζ(H).

(2) If h < min(S(t), S(t + ∆t)), then Prob

·

t≤u≤t+∆tmin S(u) > h

¯¯

¯¯ S(t), S(t + ∆t)

¸

= 1 − ζ(h).

(51)

Brownian Bridge Approach to Pricing Barrier Options (continued)

• Lemma 19 gives the probability that the barrier is not hit in a time interval, given the starting and ending stock prices.

• For our up-and-out call, choose n = 1.

• As a result,

p =





1 − exp h

−2 ln(H/S(0)) ln(H/S(T )) σ²T

i

, if H > max(S(0), S(T )),

0, otherwise.

(52)

Brownian Bridge Approach to Pricing Barrier Options (continued)

1: C := 0;

2: for i = 1, 2, 3, . . . , m do

3: P := S × e^(r−q−σ²^{/2) T +σ}^√^{T ξ( )};

4: if (S < H and P < H) or (S > H and P > H) then

5: C := C+max(P −X, 0)×

n

1 − exp h

−2 ln(H/S)×ln(H/P ) σ²T

io

;

6: end if

7: end for

8: return Ce^−rT/m;

(53)

Brownian Bridge Approach to Pricing Barrier Options (concluded)

• The idea can be generalized.

• For example, we can handle more complex barrier options.

• Consider an up-and-out call with barrier H_i for the time interval (t_i, t_i+1 ], 0 ≤ i < n.

• This option thus contains n barriers.

• It is a simple matter of multiplying the probabilities for the n time intervals properly to obtain the desired

probability adjustment term.

(54)

Variance Reduction

• The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output.

• If this variance can be lowered without changing the expected value, fewer replications are needed.

• Methods that improve efficiency in this manner are called variance-reduction techniques.

• Such techniques become practical when the added costs are outweighed by the reduction in sampling.

(55)

Variance Reduction: Antithetic Variates

• We are interested in estimating E[ g(X₁, X₂, . . . , X_n) ], where X₁, X₂, . . . , X_n are independent.

• Let Y₁ and Y₂ be random variables with the same distribution as g(X₁, X₂, . . . , X_n).

• Then

Var

· Y₁ + Y₂ 2

¸

= Var[ Y₁ ]

2 + Cov[ Y₁, Y₂ ]

2 .

– Var[ Y₁ ]/2 is the variance of the Monte Carlo method with two (independent) replications.

• The variance Var[ (Y₁ + Y₂)/2 ] is smaller than

Var[ Y₁ ]/2 when Y₁ and Y₂ are negatively correlated.

(56)

Variance Reduction: Antithetic Variates (continued)

• For each simulated sample path X, a second one is obtained by reusing the random numbers on which the first path is based.

• This yields a second sample path Y .

• Two estimates are then obtained: One based on X and the other on Y .

• If N independent sample paths are generated, the antithetic-variates estimator averages over 2N

estimates.

(57)

Variance Reduction: Antithetic Variates (continued)

• Consider process dX = a_t dt + b_t√

dt ξ.

• Let g be a function of n samples X₁, X₂, . . . , X_n on the sample path.

• We are interested in E[ g(X₁, X₂, . . . , X_n) ].

• Suppose one simulation run has realizations

ξ₁, ξ₂, . . . , ξ_n for the normally distributed fluctuation term ξ.

• This generates samples x₁, x₂, . . . , x_n.

• The estimate is then g(x), where x ≡ (x₁, x₂ . . . , x_n).

(58)

Variance Reduction: Antithetic Variates (concluded)

• The antithetic-variates method does not sample n more numbers from ξ for the second estimate g(x⁰).

• Instead, generate the sample path x⁰ ≡ (x⁰₁, x⁰₂ . . . , x⁰_n) from −ξ₁, −ξ₂, . . . , −ξ_n.

• Compute g(x⁰).

• Output (g(x) + g(x⁰))/2.

• Repeat the above steps for as many times as required by accuracy.

(59)

Variance Reduction: Conditioning

• We are interested in estimating E[ X ].

• Suppose here is a random variable Z such that

E[ X | Z = z ] can be efficiently and precisely computed.

• E[ X ] = E[ E[ X | Z ] ] by the law of iterated conditional expectations.

• Hence the random variable E[ X | Z ] is also an unbiased estimator of E[ X ].

(60)

Variance Reduction: Conditioning (concluded)

• As

Var[ E[ X | Z ] ] ≤ Var[ X ],

E[ X | Z ] has a smaller variance than observing X directly.

• First obtain a random observation z on Z.

• Then calculate E[ X | Z = z ] as our estimate.

– There is no need to resort to simulation in computing E[ X | Z = z ].

• The procedure can be repeated a few times to reduce the variance.

(61)

Control Variates

• Use the analytic solution of a similar yet simpler problem to improve the solution.

• Suppose we want to estimate E[ X ] and there exists a random variable Y with a known mean µ ≡ E[ Y ].

• Then W ≡ X + β(Y − µ) can serve as a “controlled”

estimator of E[ X ] for any constant β.

– However β is chosen, W remains an unbiased estimator of E[ X ] as

E[ W ] = E[ X ] + βE[ Y − µ ] = E[ X ].

(62)

Control Variates (continued)

• Note that

Var[ W ] = Var[ X ] + β² Var[ Y ] + 2β Cov[ X, Y ],

(79)

• Hence W is less variable than X if and only if

β² Var[ Y ] + 2β Cov[ X, Y ] < 0. (80)

(63)

Control Variates (concluded)

• The success of the scheme clearly depends on both β and the choice of Y .

– For example, arithmetic average-rate options can be priced by choosing Y to be the otherwise identical geometric average-rate option’s price and β = −1.

• This approach is much more effective than the antithetic-variates method.

(64)

Choice of Y

• In general, the choice of Y is ad hoc, and experiments must be performed to confirm the wisdom of the choice.

• Try to match calls with calls and puts with puts.^a

• On many occasions, Y is a discretized version of the derivative that gives µ.

– Discretely monitored geometric average-rate option vs. the continuously monitored geometric

average-rate option given by formulas (30) on p. 346.

• For some choices, the discrepancy can be significant, such as the lookback option.^b

aContributed by Ms. Teng, Huei-Wen (R91723054) on May 25, 2004.

(65)

Optimal Choice of β

• Equation (79) on p. 698 is minimized when β = −Cov[ X, Y ]/Var[ Y ], which was called beta in the book.

• For this specific β,

Var[ W ] = Var[ X ] − Cov[ X, Y ]²

Var[ Y ] = ¡

1 − ρ²_X,Y ¢

Var[ X ], where ρ_X,Y is the correlation between X and Y .

• The stronger X and Y are correlated, the greater the reduction in variance.

(66)

Optimal Choice of β (continued)

• For example, if this correlation is nearly perfect (±1), we could control X almost exactly.

• Typically, neither Var[ Y ] nor Cov[ X, Y ] is known.

• Therefore, we cannot obtain the maximum reduction in variance.

• We can guess these values and hope that the resulting W does indeed have a smaller variance than X.

• A second possibility is to use the simulated data to estimate these quantities.

– How to do it efficiently in terms of time and space?

(67)

Optimal Choice of β (continued)

• Observe that −β has the same sign as the correlation between X and Y .

• Hence, if X and Y are positively correlated, β < 0, then X is adjusted downward whenever Y > µ and upward otherwise.

• The opposite is true when X and Y are negatively correlated, in which case β > 0.

(68)

Optimal Choice of β (concluded)

• Suppose a suboptimal β + ² is used instead.

• The variance increases by only ²²Var[ Y ].^a

aHan and Lai (2010).

(69)

A Pitfall

• A potential pitfall is to sample X and Y independently.

• In this case, Cov[ X, Y ] = 0.

• Equation (79) on p. 698 becomes

Var[ W ] = Var[ X ] + β² Var[ Y ].

• So whatever Y is, the variance is increased!

(70)

Problems with the Monte Carlo Method

• The error bound is only probabilistic.

• The probabilistic error bound of √

N does not benefit from regularity of the integrand function.

• The requirement that the points be independent random samples are wasteful because of clustering.

• In reality, pseudorandom numbers generated by completely deterministic means are used.

• Monte Carlo simulation exhibits a great sensitivity on the seed of the pseudorandom-number generator.