### Delta and Common Random Numbers

*• In estimating delta, it is natural to start with the*
ﬁnite-diﬀerence estimate

*e*^{−rτ}*E[ P (S + ) ]* *− E[ P (S − ) ]*

*2* *.*

**– P (x) is the terminal payoﬀ of the derivative security***when the underlying asset’s initial price equals x.*

*• Use simulation to estimate E[ P (S + ) ] ﬁrst.*

*• Use another simulation to estimate E[ P (S − ) ].*

*• Finally, apply the formula to approximate the delta.*

### Delta and Common Random Numbers (concluded)

*• This method is not recommended because of its high*
variance.

*• A much better approach is to use common random*
numbers to lower the variance:

*e*^{−rτ}*E*

*P (S + )* *− P (S − )*
*2*

*.*

*• Here, the same random numbers are used for P (S + )*
*and P (S* *− ).*

*• This holds for gamma and cross gammas (for*
multivariate derivatives).

### Problems with the Bump-and-Revalue Method

*• Consider the binary option with payoﬀ*

⎧⎨

⎩

*1,* *if S(T ) > X,*
*0,* otherwise.

*• Then*

*P (S+)−P (S−) =*

⎧⎨

⎩

*1,* *if S + > X and S* *− < X,*
*0,* otherwise.

*• So the ﬁnite-diﬀerence estimate per run for the*
*(undiscounted) delta is 0 or O(1/).*

### Problems with the Bump-and-Revalue Method (concluded)

*• The price of the binary option equals*
*e*^{−rτ}*N (x* *− σ√*

*τ ).*

*• Its delta is*

*N*^{}*(x* *− σ√*

*τ )/(Sσ√*
*τ ).*

### Gamma

*• The ﬁnite-diﬀerence formula for gamma is*
*e*^{−rτ}*E*

*P (S + )* *− 2 × P (S) + P (S − )*

^{2}

*.*

*• For a correlation option with multiple underlying assets,*
the ﬁnite-diﬀerence formula for the cross gamma

*∂*^{2}*P (S*_{1}*, S*_{2}*, . . . )/(∂S*_{1}*∂S*_{2}) is:

*e*^{−rτ}*E*

*P (S*1 + 1*, S*2 + 2) *− P (S*1 *− *1*, S*2 + 2)

412

*−P (S*1 + 1*, S*2 *− *2) +*P (S*1 *− *1*, S*2 *− *2)
*.*

### Gamma (continued)

*• Choosing an of the right magnitude can be*
challenging.

**– If is too large, inaccurate Greeks result.**

**– If is too small, unstable Greeks result.**

*• This phenomenon is sometimes called the curse of*
diﬀerentiation.^{a}

aA¨ıt-Sahalia & Lo (1998); Bondarenko (2003).

### Gamma (continued)

*• In general, suppose*

*∂*^{i}

*∂θ*^{i}*e*^{−rτ}*E[ P (S) ] = e*^{−rτ}*E*

*∂*^{i}*P (S)*

*∂θ*^{i}

*holds for all i > 0, where θ is a parameter of interest.*

**– A common requirement is Lipschitz continuity.**^{a}

*• Then formulas for the Greeks become integrals.*

*• As a result, we avoid , ﬁnite diﬀerences, and*
resimulation.

aBroadie & Glasserman (1996).

### Gamma (continued)

*• This is indeed possible for a broad class of payoﬀ*
functions.^{a}

**– Roughly speaking, any payoﬀ function that is equal**
to a sum of products of diﬀerentiable functions and
indicator functions with the right kind of support.

**– For example, the payoﬀ of a call is**

*max(S(T )* *− X, 0) = (S(T ) − X)I**{ S(T )−X≥0 }**.*
**– The results are too technical to cover here (see next**

page).

aTeng (R91723054) (2004); Lyuu & Teng (R91723054) (2011).

### Gamma (continued)

*• Suppose h(θ, x) ∈ H with pdf f(x) for x and g**j**(θ, x)* *∈ G*
*for j* *∈ B, a ﬁnite set of natural numbers.*

*• Then*

*∂*

*∂θ*

*h(θ, x)*

*j**∈B***1{g***j (θ,x)>0}**(x) f (x) dx*

=

*hθ (θ, x)*

*j**∈B***1{g***j (θ,x)>0}**(x) f (x) dx*

+

*l**∈ B*

⎡

⎢*⎣h(θ, x)Jl(θ, x)*

*j**∈B\l***1{g***j (θ, x)>0}**(x) f (x)*

⎤

⎥⎦

*x=χ**l (θ)*
*,*

where

*Jl(θ, x) = sign*

*∂gl(θ, x)*

*∂xk*

*∂gl(θ, x)/∂θ*

*∂gl(θ, x)/∂x* *for l ∈ B.*

### Gamma (concluded)

*• Similar results have been derived for Levy processes.*^{a}

*• Formulas are also recently obtained for credit*
derivatives.^{b}

*• In queueing networks, this is called inﬁnitesimal*
perturbation analysis (IPA).^{c}

aLyuu, Teng (R91723054), & S. Wang (2013).

bLyuu, Teng (R91723054), & Tzeng (2014).

cCao (1985); Ho & Cao (1985).

### Biases in Pricing Continuously Monitored Options with Monte Carlo

*• We are asked to price a continuously monitored*
*up-and-out call with barrier H.*

*• The Monte Carlo method samples the stock price at n*
*discrete time points t*_{1}*, t*_{2}*, . . . , t** _{n}*.

*• A sample path*

*S(t*_{0}*), S(t*_{1}*), . . . , S(t** _{n}*)
is produced.

* – Here, t*0

*= 0 is the current time, and t*

_{n}*= T is the*

### Biases in Pricing Continuously Monitored Options with Monte Carlo (continued)

*• If all of the sampled prices are below the barrier, this*
*sample path pays max(S(t** _{n}*)

*− X, 0).*

*• Repeating these steps and averaging the payoﬀs yield a*
Monte Carlo estimate.

1: *C := 0;*

2: **for i = 1, 2, 3, . . . , N do**

3: *P := S; hit := 0;*

4: **for j = 1, 2, 3, . . . , n do**

5: *P := P × e*^{(r−σ}^{2}*/2) (T /n)+σ**√*

*(T /n) ξ*;

6: **if P ≥ H then**

7: hit := 1;

8: break;

9: **end if**

10: **end for**

11: **if hit = 0 then**

12: *C := C + max(P − X, 0);*

13: **end if**

14: **end for**

### Biases in Pricing Continuously Monitored Options with Monte Carlo (continued)

*• This estimate is biased.*^{a}

**– Suppose none of the sampled prices on a sample path**
*equals or exceeds the barrier H.*

**– It remains possible for the continuous sample path**
that passes through them to hit the barrier between
sampled time points (see plot on next page).

aShevchenko (2003).

H

### Biases in Pricing Continuously Monitored Options with Monte Carlo (concluded)

*• The bias can certainly be lowered by increasing the*
number of observations along the sample path.

*• However, even daily sampling may not suﬃce.*

*• The computational cost also rises as a result.*

### Brownian Bridge Approach to Pricing Barrier Options

*• We desire an unbiased estimate which can be calculated*
eﬃciently.

*• The above-mentioned payoﬀ should be multiplied by the*
*probability p that a continuous sample path does not*
hit the barrier conditional on the sampled prices.

*• This methodology is called the Brownian bridge*
approach.

*• Formally, we have*

*p* *= Prob[ S(t) < H, 0*^{Δ} *≤ t ≤ T | S(t*_{0}*), S(t*_{1}*), . . . , S(t*_{n}*) ].*

### Brownian Bridge Approach to Pricing Barrier Options (continued)

*• As a barrier is hit over a time interval if and only if the*
*maximum stock price over that period is at least H,*

*p = Prob*

*0≤t≤T*max *S(t) < H* *| S(t*0*), S(t*_{1}*), . . . , S(t** _{n}*)

*.*

*• Luckily, the conditional distribution of the maximum*
over a time interval given the beginning and ending
stock prices is known.

### Brownian Bridge Approach to Pricing Barrier Options (continued)

**Lemma 23 Assume S follows dS/S = μ dt + σ dW and define***ζ(x)* = exp^{Δ}

*−**2 ln(x/S(t)) ln(x/S(t + Δt))*
*σ*^{2}*Δt*

*.*
*(1) If H > max(S(t), S(t + Δt)), then*

Prob

*t≤u≤t+Δt*max *S(u) < H*

* S(t),S(t + Δt)*

*= 1 − ζ(H).*

*(2) If h < min(S(t), S(t + Δt)), then*
Prob

*t≤u≤t+Δt*min *S(u) > h*

* S(t),S(t + Δt)*

*= 1 − ζ(h).*

### Brownian Bridge Approach to Pricing Barrier Options (continued)

*• Lemma 23 gives the probability that the barrier is not*
hit in a time interval, given the starting and ending
stock prices.

*• For our up-and-out call,*^{a} *choose n = 1.*

*• As a result,*

*p =*

⎧⎨

⎩

1 *− exp*

*−**2 ln(H/S(0)) ln(H/S(T ))*
*σ*^{2}*T*

*, if H > max(S(0), S(T )),*

0*,* otherwise.

aSo *S(0) < H.*

### Brownian Bridge Approach to Pricing Barrier Options (continued)

The following algorithms works for up-and-out and down-and-out calls.

1: *C := 0;*

2: **for i = 1, 2, 3, . . . , N do**

3: *P := S × e*^{(r−q−σ}^{2}^{/2) T +σ}

*√**T ξ( )*

;

4: **if (S < H and P < H) or (S > H and P > H) then**

5: *C := C+max(P −X, 0)×*

*1 − exp*

*−**2 ln(H/S)×ln(H/P )*
*σ*^{2}*T*

;

6: **end if**

7: **end for**

8: *return Ce*^{−rT}*/N ;*

### Brownian Bridge Approach to Pricing Barrier Options (concluded)

*• The idea can be generalized.*

*• For example, we can handle more complex barrier*
options.

*• Consider an up-and-out call with barrier H** _{i}* for the

*time interval (t*

_{i}*, t*

*], 0*

_{i+1}*≤ i < n.*

*• This option thus contains n barriers.*

*• Multiply the probabilities for the n time intervals to*
obtain the desired probability adjustment term.

### Variance Reduction

*• The statistical eﬃciency of Monte Carlo simulation can*
be measured by the variance of its output.

*• If this variance can be lowered without changing the*
expected value, fewer replications are needed.

*• Methods that improve eﬃciency in this manner are*
called variance-reduction techniques.

*• Such techniques become practical when the added costs*
are outweighed by the reduction in sampling.

### Variance Reduction: Antithetic Variates

*• We are interested in estimating E[ g(X*_{1}*, X*_{2}*, . . . , X** _{n}*) ].

*• Let Y*_{1} *and Y*_{2} be random variables with the same
*distribution as g(X*_{1}*, X*_{2}*, . . . , X** _{n}*).

*• Then*

Var

*Y*_{1} *+ Y*_{2}
2

= *Var[ Y*_{1} ]

2 + *Cov[ Y*_{1}*, Y*_{2} ]

2 *.*

**– Var[ Y**_{1} *]/2 is the variance of the Monte Carlo*
method with two independent replications.

*• The variance Var[ (Y*_{1} *+ Y*_{2}*)/2 ] is smaller than*

*Var[ Y*_{1} *]/2 when Y*_{1} *and Y*_{2} are negatively correlated.

### Variance Reduction: Antithetic Variates (continued)

*• For each simulated sample path X, a second one is*
*obtained by reusing the random numbers on which the*
ﬁrst path is based.

*• This yields a second sample path Y .*

*• Two estimates are then obtained: One based on X and*
*the other on Y .*

*• If N independent sample paths are generated, the*
*antithetic-variates estimator averages over 2N*

estimates.

### Variance Reduction: Antithetic Variates (continued)

*• Consider process dX = a**t* *dt + b*_{t}*√*

*dt ξ.*

*• Let g be a function of n samples X*_{1}*, X*_{2}*, . . . , X** _{n}* on
the sample path.

*• We are interested in E[ g(X*_{1}*, X*_{2}*, . . . , X** _{n}*) ].

*• Suppose one simulation run has realizations*

*ξ*_{1}*, ξ*_{2}*, . . . , ξ** _{n}* for the normally distributed ﬂuctuation

*term ξ.*

*• This generates samples x*1*, x*_{2}*, . . . , x** _{n}*.

**• The estimate is then g(x), where x***= (x*^{Δ} _{1}*, x*_{2} *. . . , x** _{n}*).

### Variance Reduction: Antithetic Variates (concluded)

*• The antithetic-variates method does not sample n more*
*numbers from ξ for the second estimate g( x*

*).*

^{}**• Instead, generate the sample path x**^{ Δ}*= (x*^{}_{1}*, x*^{}_{2} *. . . , x*^{}* _{n}*)
from

*−ξ*1

*,−ξ*2

*, . . . ,−ξ*

*n*.

**• Compute g(x*** ^{}*).

**• Output (g(x) + g(x**^{}*))/2.*

*• Repeat the above steps for as many times as required by*
accuracy.

### Variance Reduction: Conditioning

*• We are interested in estimating E[ X ].*

*• Suppose here is a random variable Z such that*

*E[ X* *| Z = z ] can be eﬃciently and precisely computed.*

*• E[ X ] = E[ E[ X | Z ] ] by the law of iterated conditional*
expectations.

*• Hence the random variable E[ X | Z ] is also an unbiased*
*estimator of E[ X ].*

### Variance Reduction: Conditioning (concluded)

*• As*

*Var[ E[ X* *| Z ] ] ≤ Var[ X ],*

*E[ X* *| Z ] has a smaller variance than observing X*
directly.

*• First obtain a random observation z on Z.*

*• Then calculate E[ X | Z = z ] as our estimate.*

**– There is no need to resort to simulation in computing**
*E[ X* *| Z = z ].*

*• The procedure can be repeated a few times to reduce*

### Control Variates

*• Use the analytic solution of a similar yet simpler*
problem to improve the solution.

*• Suppose we want to estimate E[ X ] and there exists a*
*random variable Y with a known mean μ* *= E[ Y ].*^{Δ}

*• Then W* *= X + β(Y*^{Δ} *− μ) can serve as a “controlled”*

*estimator of E[ X ] for any constant β.*

**– However β is chosen, W remains an unbiased***estimator of E[ X ] as*

*E[ W ] = E[ X ] + βE[ Y* *− μ ] = E[ X ].*

### Control Variates (continued)

*• Note that*

*Var[ W ] = Var[ X ] + β*^{2} *Var[ Y ] + 2β Cov[ X, Y ],*

(110)

*• Hence W is less variable than X if and only if*

*β*^{2} *Var[ Y ] + 2β Cov[ X, Y ] < 0.* (111)

### Control Variates (concluded)

*• The success of the scheme clearly depends on both β*
*and the choice of Y .*

**– For example, arithmetic average-rate options can be**
*priced by choosing Y to be the otherwise identical*
*geometric average-rate option’s price and β =* *−1.*

*• This approach is much more eﬀective than the*
antithetic-variates method.

### Choice of *Y*

*• In general, the choice of Y is ad hoc,*^{a} and experiments
must be performed to conﬁrm the wisdom of the choice.

*• Try to match calls with calls and puts with puts.*^{b}

*• On many occasions, Y is a discretized version of the*
*derivative that gives μ.*

**– Discretely monitored geometric average-rate option**
vs. the continuously monitored geometric

average-rate option given by formulas (50) on p. 401.

aBut see Dai (B82506025, R86526008, D8852600), Chiu (R94922072),

& Lyuu (2015).

### Optimal Choice of *β*

*• For some choices, the discrepancy can be signiﬁcant,*
such as the lookback option.^{a}

*• Equation (110) on p. 826 is minimized when*
*β =* *−Cov[ X, Y ]/Var[ Y ].*

**– It is called beta in the book.**

*• For this speciﬁc β,*

*Var[ W ] = Var[ X ]* *−* *Cov[ X, Y ]*^{2}

*Var[ Y ]* =

1 *− ρ*^{2}_{X,Y}

*Var[ X ],*
*where ρ*_{X,Y}*is the correlation between X and Y .*

aContributed by Mr. Tsai, Hwai (R92723049) on May 12, 2004.

### Optimal Choice of *β (continued)*

*• Note that the variance can never be increased with the*
optimal choice.

*• Furthermore, the stronger X and Y are correlated, the*
greater the reduction in variance.

*• For example, if this correlation is nearly perfect (±1),*
*we could control X almost exactly.*

### Optimal Choice of *β (continued)*

*• Typically, neither Var[ Y ] nor Cov[ X, Y ] is known.*

*• Therefore, we cannot obtain the maximum reduction in*
variance.

*• We can guess these values and hope that the resulting*
*W does indeed have a smaller variance than X.*

*• A second possibility is to use the simulated data to*
estimate these quantities.

**– How to do it eﬃciently in terms of time and space?**

### Optimal Choice of *β (concluded)*

*• Observe that −β has the same sign as the correlation*
*between X and Y .*

*• Hence, if X and Y are positively correlated, β < 0,*
*then X is adjusted downward whenever Y > μ and*
upward otherwise.

*• The opposite is true when X and Y are negatively*
*correlated, in which case β > 0.*

*• Suppose a suboptimal β + is used instead.*

*• The variance increases by only *^{2}*Var[ Y ].*^{a}

### A Pitfall

*• A potential pitfall is to sample X and Y independently.*

*• In this case, Cov[ X, Y ] = 0.*

*• Equation (110) on p. 826 becomes*

*Var[ W ] = Var[ X ] + β*^{2} *Var[ Y ].*

*• So whatever Y is, the variance is increased!*

*• Lesson: X and Y must be correlated.*

### Problems with the Monte Carlo Method

*• The error bound is only probabilistic.*

*• The probabilistic error bound of* *√*

*N does not beneﬁt*
from regularity of the integrand function.

*• The requirement that the points be independent random*
samples are wasteful because of clustering.

*• In reality, pseudorandom numbers generated by*
completely deterministic means are used.

*• Monte Carlo simulation exhibits a great sensitivity on*
the seed of the pseudorandom-number generator.

*Matrix Computation*

To set up a philosophy against physics is rash;

philosophers who have done so have always ended in disaster.

— Bertrand Russell

### Definitions and Basic Results

*• Let A* *= [ a*^{Δ} * _{ij}* ]

*1≤i≤m,1≤j≤n*

*, or simply A*

**∈ R***,*

^{m×n}*denote an m*

*× n matrix.*

*• It can also be represented as [ a*_{1}*, a*_{2}*, . . . , a** _{n}* ] where

*a*

_{i}

**∈ R***are vectors.*

^{m}**– Vectors are column vectors unless stated otherwise.**

*• A is a square matrix when m = n.*

*• The rank of a matrix is the largest number of linearly*
independent columns.

### Definitions and Basic Results (continued)

*• A square matrix A is said to be symmetric if A*^{T} *= A.*

*• A real n × n matrix*

*A* *= [ a*^{Δ} * _{ij}* ]

*is diagonally dominant if*

_{i,j}*| a*

*ii*

*| >*

*j=i* *| a**ij* *| for*
1 *≤ i ≤ n.*

**– Such matrices are nonsingular.**

*• The identity matrix is the square matrix*
*I* *= diag[ 1, 1, . . . , 1 ].*^{Δ}

### Definitions and Basic Results (concluded)

*• A matrix has full column rank if its columns are linearly*
independent.

*• A real symmetric matrix A is positive deﬁnite if*
*x*^{T}*Ax =*

*i,j*

*a*_{ij}*x*_{i}*x*_{j}*> 0*
*for any nonzero vector x.*

*• A matrix A is positive deﬁnite if and only if there exists*
*a matrix W such that A = W*^{T}*W and W has full*

column rank.

### Cholesky Decomposition

*• Positive deﬁnite matrices can be factored as*
*A = LL*^{T}*,*

called the Cholesky decomposition.

**– Above, L is a lower triangular matrix.**

### Generation of Multivariate Distribution

**• Let x***= [ x*^{Δ} _{1}*, x*_{2}*, . . . , x** _{n}* ]

^{T}be a vector random variable

*with a positive deﬁnite covariance matrix C.*

**• As usual, assume E[ x ] = 0.**

**• This covariance structure can be matched by P y.**

**– C = P P**^{T} *is the Cholesky decomposition of C.*^{a}
**– y***= [ y*^{Δ} _{1}*, y*_{2}*, . . . , y** _{n}* ]

^{T}is a vector random variable

with a covariance matrix equal to the identity matrix.

aWhat if *C is not positive deﬁnite? See Lai (R93942114) & Lyuu*
(2007).

### Generation of Multivariate Normal Distribution

*• Suppose we want to generate the multivariate normal*
*distribution with a covariance matrix C = P P*^{T}.

**– First, generate independent standard normal**
*distributions y*_{1}*, y*_{2}*, . . . , y** _{n}*.

**– Then**

*P [ y*_{1}*, y*_{2}*, . . . , y** _{n}* ]

^{T}has the desired distribution.

**– These steps can then be repeated.**

### Multivariate Derivatives Pricing

*• Generating the multivariate normal distribution is*
essential for the Monte Carlo pricing of multivariate
derivatives (pp. 748ﬀ).

*• For example, the rainbow option on k assets has payoﬀ*
*max(max(S*_{1}*, S*_{2}*, . . . , S** _{k}*)

*− X, 0)*

at maturity.

*• The closed-form formula is a multi-dimensional integral.*^{a}

aJohnson (1987); Chen (D95723006) & Lyuu (2009).

### Multivariate Derivatives Pricing (concluded)

*• Suppose dS*_{j}*/S*_{j}*= r dt + σ*_{j}*dW** _{j}*, 1

*≤ j ≤ k, where C is*

*the correlation matrix for dW*

_{1}

*, dW*

_{2}

*, . . . , dW*

*.*

_{k}*• Let C = P P*^{T}.

*• Let ξ consist of k independent random variables from*
*N (0, 1).*

*• Let ξ*^{}*= P ξ.*

*• Similar to Eq. (109) on p. 791,*
*S*_{i+1}*= S*_{i}*e*^{(r−σ}^{2}^{j}^{/2) Δt+σ}^{j}

*√**Δt ξ*_{j}^{}*, 1* *≤ j ≤ k.*

### Least-Squares Problems

*• The least-squares (LS) problem is concerned with*

*x∈R*min^{n}* Ax − b ,*
*where A* **∈ R**^{m×n}*, b* **∈ R**^{m}*, and m* *≥ n.*

*• The LS problem is called regression analysis in statistics*
and is equivalent to minimizing the mean-square error.

*• Often written as*

*Ax = b.*

### Polynomial Regression

*• In polynomial regression, x*_{0} *+ x*_{1}*x +* *· · · + x*_{n}*x** ^{n}* is used
to ﬁt the data

*{ (a*

_{1}

*, b*

_{1}

*), (a*

_{2}

*, b*

_{2}

*), . . . , (a*

_{m}*, b*

*)*

_{m}*}.*

*• This leads to the LS problem,*

⎡

⎢⎢

⎢⎢

⎢⎢

⎣

1 *a*_{1} *a*^{2}_{1} *· · · a*^{n}_{1}
1 *a*_{2} *a*^{2}_{2} *· · · a*^{n}_{2}
... ... ... . .. ...
1 *a*_{m}*a*^{2}_{m}*· · · a*^{n}_{m}

⎤

⎥⎥

⎥⎥

⎥⎥

⎦

⎡

⎢⎢

⎢⎢

⎢⎢

⎣
*x*_{0}
*x*_{1}
...
*x*_{n}

⎤

⎥⎥

⎥⎥

⎥⎥

⎦

=

⎡

⎢⎢

⎢⎢

⎢⎢

⎣
*b*_{1}
*b*_{2}
...
*b*_{m}

⎤

⎥⎥

⎥⎥

⎥⎥

⎦
*.*

*• Consult p. 273 of the textbook for solutions.*

### American Option Pricing by Simulation

*• The continuation value of an American option is the*
conditional expectation of the payoﬀ from keeping the
option alive now.

*• The option holder must compare the immediate exercise*
value and the continuation value.

*• In standard Monte Carlo simulation, each path is*
treated independently of other paths.

*• But the decision to exercise the option cannot be*
reached by looking at one path alone.

### The Least-Squares Monte Carlo Approach

*• The continuation value can be estimated from the*
cross-sectional information in the simulation by using
least squares.^{a}

*• The result is a function (of the state) for estimating the*
continuation values.

*• Use the function to estimate the continuation value for*
each path to determine its cash ﬂow.

*• This is called the least-squares Monte Carlo (LSM)*
approach.

a

### The Least-Squares Monte Carlo Approach (concluded)

*• The LSM is provably convergent.*^{a}

*• The LSM can be easily parallelized.*^{b}

**– Partition the paths into subproblems and perform**
LSM on each of them independently.

**– The speedup is close to linear (i.e., proportional to**
the number of cores).

*• Surprisingly, accuracy is not aﬀected.*

aCl´ement, Lamberton, & Protter (2002); Stentoft (2004).

bHuang (B96902079, R00922018) (2013); Chen (B97902046, R01922005) (2014); Chen (B97902046, R01922005), Huang (B96902079, R00922018) & Lyuu (2015).

### A Numerical Example

*• Consider a 3-year American put on a*
non-dividend-paying stock.

*• The put is exercisable at years 0, 1, 2, and 3.*

*• The strike price X = 105.*

*• The annualized riskless rate is r = 5%.*

*• The current stock price is 101.*

**– The annual discount factor hence equals 0.951229.**

*• We use only 8 price paths to illustrate the algorithm.*

### A Numerical Example (continued)

Stock price paths

Path Year 0 Year 1 Year 2 Year 3

1 **101** **97.6424** **92.5815** 107.5178

2 **101** **101.2103** 105.1763 **102.4524**
3 **101** 105.7802 **103.6010** 124.5115

4 **101** **96.4411** **98.7120** 108.3600

5 **101** 124.2345 **101.0564 104.5315**

6 **101** **95.8375** **93.7270** **99.3788**

7 **101** 108.9554 **102.4177 100.9225**
8 **101** **104.1475** 113.2516 115.0994

0 0.5 1 1.5 2 2.5 3 95

100 105 110 115 120 125

1

2

3

4 5

6 7

8

### A Numerical Example (continued)

*• We use the basis functions 1, x, x*^{2}.
**– Other basis functions are possible.**^{a}

*• The plot next page shows the ﬁnal estimated optimal*
exercise strategy given by LSM.

*• We now proceed to tackle our problem.*

*• The idea is to calculate the cash ﬂow along each path,*
*using information from all paths.*

aLaguerre polynomials, Hermite polynomials, Legendre polynomials, Chebyshev polynomials, Gedenbauer polynomials, and Jacobi polynomi- als.

0 0.5 1 1.5 2 2.5 3 95

100 105 110 115 120 125

1

2 3

4 5

6 7 8

### A Numerical Example (continued)

Cash flows at year 3

Path Year 0 Year 1 Year 2 Year 3

1 — — — 0

2 — — — 2.5476

3 — — — 0

4 — — — 0

5 — — — 0.4685

6 — — — 5.6212

7 — — — 4.0775

8 — — — 0

### A Numerical Example (continued)

*• The cash ﬂows at year 3 are the exercise value if the put*
is in the money.

*• Only 4 paths are in the money: 2, 5, 6, 7.*

*• Some of the cash ﬂows may not occur if the put is*
exercised earlier, which we will ﬁnd out step by step.

*• Incidentally, the European counterpart has a value of*

*0.951229*^{3} *×* *2.5476 + 0.4685 + 5.6212 + 4.0775*

8 *= 1.3680.*

### A Numerical Example (continued)

*• We move on to year 2.*

*• For each state that is in the money at year 2, we must*
decide whether to exercise it.

*• There are 6 paths for which the put is in the money: 1,*
3, 4, 5, 6, 7 (p. 851).

*• Only in-the-money paths will be used in the regression*
because they are where early exercise is relevant.

**– If there were none, we would move on to year 1.**

### A Numerical Example (continued)

*• Let x denote the stock prices at year 2 for those 6 paths.*

*• Let y denote the corresponding discounted future cash*
ﬂows (at year 3) if the put is not exercised at year 2.

### A Numerical Example (continued)

Regression at year 2

Path *x* *y*

1 *92.5815* 0 *× 0.951229*

2 — —

3 *103.6010* 0 *× 0.951229*
4 *98.7120* 0 *× 0.951229*
5 *101.0564* *0.4685* *× 0.951229*
6 *93.7270* *5.6212* *× 0.951229*
7 *102.4177* *4.0775* *× 0.951229*

8 — —

### A Numerical Example (continued)

*• We regress y on 1, x, and x*^{2}.

*• The result is*

*f (x) = 22.08* *− 0.313114 × x + 0.00106918 × x*^{2}*.*

*• f(x) estimates the continuation value conditional on the*
stock price at year 2.

*• We next compare the immediate exercise value and the*
continuation value.^{a}

aThe *f(102.4177) entry on the next page was corrected by Mr. Du,*
Yung-Szu (B79503054, R83503086) on May 25, 2017.

### A Numerical Example (continued)

Optimal early exercise decision at year 2
Path Exercise Continuation
1 12.4185 *f (92.5815) = 2.2558*

2 — —

3 1.3990 *f (103.6010) = 1.1168*
4 6.2880 *f (98.7120) = 1.5901*
5 3.9436 *f (101.0564) = 1.3568*
6 11.2730 *f (93.7270) = 2.1253*
7 2.5823 *f (102.4177) = 1.2266*

8 — —

### A Numerical Example (continued)

*• Amazingly, the put should be exercised in all 6 paths: 1,*
3, 4, 5, 6, 7.

*• Now, any positive cash ﬂow at year 3 should be set to*
zero or overridden for these paths as the put is exercised
before year 3 (p. 851).

**– They are paths 5, 6, 7.**

*• The cash ﬂows on p. 855 become the ones on next slide.*

### A Numerical Example (continued)

Cash flows at years 2 & 3

Path Year 0 Year 1 Year 2 Year 3

1 — — 12.4185 0

2 — — 0 2.5476

3 — — 1.3990 0

4 — — 6.2880 0

5 — — 3.9436 0

6 — — 11.2730 0

7 — — 2.5823 0

8 — — 0 0

### A Numerical Example (continued)

*• We move on to year 1.*

*• For each state that is in the money at year 1, we must*
decide whether to exercise it.

*• There are 5 paths for which the put is in the money: 1,*
2, 4, 6, 8 (p. 851).

*• Only in-the-money paths will be used in the regression*
because they are where early exercise is relevant.

**– If there were none, we would move on to year 0.**

### A Numerical Example (continued)

*• Let x denote the stock prices at year 1 for those 5 paths.*

*• Let y denote the corresponding discounted future cash*
ﬂows if the put is not exercised at year 1.

*• From p. 863, we have the following table.*

### A Numerical Example (continued)

Regression at year 1

Path *x* *y*

1 97.6424 *12.4185* *× 0.951229*
2 101.2103 *2.5476* *× 0.951229*^{2}

3 — —

4 96.4411 *6.2880* *× 0.951229*

5 — —

6 95.8375 *11.2730* *× 0.951229*

7 — —

### A Numerical Example (continued)

*• We regress y on 1, x, and x*^{2}.

*• The result is*

*f (x) =* *−420.964 + 9.78113 × x − 0.0551567 × x*^{2}*.*

*• f(x) estimates the continuation value conditional on the*
stock price at year 1.

*• We next compare the immediate exercise value and the*
continuation value.

### A Numerical Example (continued)

Optimal early exercise decision at year 1

Path Exercise Continuation

1 7.3576 *f (97.6424) = 8.2230*
2 3.7897 *f (101.2103) = 3.9882*

3 — —

4 8.5589 *f (96.4411) = 9.3329*

5 — —

6 9.1625 *f (95.8375) = 9.83042*

7 — —

### A Numerical Example (continued)

*• The put should be exercised for 1 path only: 8.*

**– Note that f(104.1475) < 0.**

*• Now, any positive future cash ﬂow should be set to zero*
or overridden for this path.

**– But there is none.**

*• The cash ﬂows on p. 863 become the ones on next slide.*

*• They also conﬁrm the plot on p. 854.*

### A Numerical Example (continued)

Cash flows at years 1, 2, & 3

Path Year 0 Year 1 Year 2 Year 3

1 — 0 12.4185 0

2 — 0 0 2.5476

3 — 0 1.3990 0

4 — 0 6.2880 0

5 — 0 3.9436 0

6 — 0 11.2730 0

7 — 0 2.5823 0

### A Numerical Example (continued)

*• We move on to year 0.*

*• The continuation value is, from p 870,*

*(12.4185* *× 0.951229*^{2} *+ 2.5476* *× 0.951229*^{3}
*+1.3990* *× 0.951229*^{2} *+ 6.2880* *× 0.951229*^{2}
*+3.9436* *× 0.951229*^{2} *+ 11.2730* *× 0.951229*^{2}
*+2.5823* *× 0.951229*^{2} *+ 0.8525* *× 0.951229)/8*

= *4.66263.*

### A Numerical Example (concluded)

*• As this is larger than the immediate exercise value of*
105 *− 101 = 4,*

the put should not be exercised at year 0.

*• Hence the put’s value is estimated to be 4.66263.*

*• Compare this with the European put’s value of 1.3680*
(p. 856).

*Time Series Analysis*

The historian is a prophet in reverse.

— Friedrich von Schlegel (1772–1829)

### GARCH Option Pricing

^{a}

*• Options can be priced when the underlying asset’s*
return follows a GARCH process.

*• Let S*_{t}*denote the asset price at date t.*

*• Let h*^{2}_{t}*be the conditional variance of the return over*
*the period [ t, t + 1 ] given the information at date t.*

**– “One day” is merely a convenient term for any**
*elapsed time Δt.*

aARCH (autoregressive conditional heteroskedastic) is due to Engle (1982), co-winner of the 2003 Nobel Prize in Economic Sciences. GARCH (generalized ARCH) is due to Bollerslev (1986) and Taylor (1986). A Bloomberg quant said to me on Feb 29, 2008, that GARCH is seldom used in trading.

### GARCH Option Pricing (continued)

*• Adopt the following risk-neutral process for the price*
dynamics:^{a}

ln *S*_{t+1}

*S*_{t}*= r* *−* *h*^{2}_{t}

2 *+ h*_{t}_{t+1}*,* (112)
where

*h*^{2}* _{t+1}* =

*β*

_{0}

*+ β*

_{1}

*h*

^{2}

_{t}*+ β*

_{2}

*h*

^{2}

_{t}*(*

_{t+1}*− c)*

^{2}

*,*(113)

_{t+1}*∼ N(0, 1) given information at date t,*
*r* = *daily riskless return,*

*c* *≥ 0.*

### GARCH Option Pricing (continued)

*• The ﬁve unknown parameters of the model are c, h*0*, β*_{0},
*β*_{1}*, and β*_{2}.

*• It is postulated that β*0*, β*_{1}*, β*_{2} *≥ 0 to make the*
conditional variance positive.

*• There are other inequalities to satisfy (see text).*

*• The above process is called the nonlinear asymmetric*
GARCH (or NGARCH) model.

### GARCH Option Pricing (continued)

*• It captures the volatility clustering in asset returns ﬁrst*
noted by Mandelbrot (1963).^{a}

**– When c = 0, a large ***t+1* *results in a large h** _{t+1}*,

*which in turns tends to yield a large h** _{t+2}*, and so on.

*• It also captures the negative correlation between the*
asset return and changes in its (conditional) volatility.^{b}

**– For c > 0, a positive ***t+1* (good news) tends to
*decrease h*_{t+1}*, whereas a negative ** _{t+1}* (bad news)
tends to do the opposite.

a“*. . . large changes tend to be followed by large changes—of either*
sign—and small changes tend to be followed by small changes *. . . ”*

b

### GARCH Option Pricing (concluded)

*• With y*_{t}*= ln S*^{Δ} * _{t}* denoting the logarithmic price, the
model becomes

*y*_{t+1}*= y*_{t}*+ r* *−* *h*^{2}_{t}

2 *+ h*_{t}_{t+1}*.* (114)

*• The pair (y*_{t}*, h*^{2}* _{t}*) completely describes the current state.

*• The conditional mean and variance of y** _{t+1}* are clearly

*E[ y*

_{t+1}*| y*

_{t}*, h*

^{2}

*] =*

_{t}*y*

_{t}*+ r*

*−*

*h*

^{2}

_{t}2 *,* (115)
*Var[ y*_{t+1}*| y**t**, h*^{2}* _{t}* ] =

*h*

^{2}

_{t}*.*(116)

### GARCH Model: Inferences

*• Suppose the parameters c, h*0*, β*_{0}*, β*_{1}*, and β*_{2} are given.

*• Then we can recover h*_{1}*, h*_{2}*, . . . , h*_{n}*and *_{1}*, *_{2}*, . . . , ** _{n}*
from the prices

*S*_{0}*, S*_{1}*, . . . , S*_{n}

under the GARCH model (112) on p. 876.

*• This property is useful in statistical inferences.*

### The Ritchken-Trevor (RT) Algorithm

^{a}

*• The GARCH model is a continuous-state model.*

*• To approximate it, we turn to trees with discrete states.*

*• Path dependence in GARCH makes the tree for asset*
prices explode exponentially (why?).

*• We need to mitigate this combinatorial explosion.*

aRitchken & Trevor (1999).