Problems with the Bump-and-Revalue Method

(1)

Delta and Common Random Numbers

• In estimating delta, it is natural to start with the ﬁnite-diﬀerence estimate

e^−rτ E[ P (S + ) ] − E[ P (S − ) ]

2 .

– P (x) is the terminal payoﬀ of the derivative security when the underlying asset’s initial price equals x.

• Use simulation to estimate E[ P (S + ) ] ﬁrst.

• Use another simulation to estimate E[ P (S − ) ].

• Finally, apply the formula to approximate the delta.

(2)

Delta and Common Random Numbers (concluded)

• This method is not recommended because of its high variance.

• A much better approach is to use common random numbers to lower the variance:

e^−rτ E

P (S + ) − P (S − ) 2

.

• Here, the same random numbers are used for P (S + ) and P (S − ).

• This holds for gamma and cross gammas (for multivariate derivatives).

(3)

Problems with the Bump-and-Revalue Method

• Consider the binary option with payoﬀ

⎧⎨

⎩

1, if S(T ) > X, 0, otherwise.

• Then

P (S+)−P (S−) =

⎧⎨

⎩

1, if S + > X and S − < X, 0, otherwise.

• So the ﬁnite-diﬀerence estimate per run for the (undiscounted) delta is 0 or O(1/).

(4)

Problems with the Bump-and-Revalue Method (concluded)

• The price of the binary option equals e^−rτN (x − σ√

τ ).

• Its delta is

N(x − σ√

τ )/(Sσ√ τ ).

(5)

Gamma

• The ﬁnite-diﬀerence formula for gamma is e^−rτ E

P (S + ) − 2 × P (S) + P (S − )

²

.

• For a correlation option with multiple underlying assets, the ﬁnite-diﬀerence formula for the cross gamma

∂²P (S₁, S₂, . . . )/(∂S₁∂S₂) is:

e^−rτ E

P (S1 + 1, S2 + 2) − P (S1 − 1, S2 + 2)

412

−P (S1 + 1, S2 − 2) +P (S1 − 1, S2 − 2) .

(6)

Gamma (continued)

• Choosing an of the right magnitude can be challenging.

– If is too large, inaccurate Greeks result.

– If is too small, unstable Greeks result.

• This phenomenon is sometimes called the curse of diﬀerentiation.^a

aA¨ıt-Sahalia & Lo (1998); Bondarenko (2003).

(7)

Gamma (continued)

• In general, suppose

∂ⁱ

∂θⁱe^−rτE[ P (S) ] = e^−rτE

∂ⁱP (S)

∂θⁱ

holds for all i > 0, where θ is a parameter of interest.

– A common requirement is Lipschitz continuity.^a

• Then formulas for the Greeks become integrals.

• As a result, we avoid , ﬁnite diﬀerences, and resimulation.

aBroadie & Glasserman (1996).

(8)

Gamma (continued)

• This is indeed possible for a broad class of payoﬀ functions.^a

– Roughly speaking, any payoﬀ function that is equal to a sum of products of diﬀerentiable functions and indicator functions with the right kind of support.

– For example, the payoﬀ of a call is

max(S(T ) − X, 0) = (S(T ) − X)I{ S(T )−X≥0 }. – The results are too technical to cover here (see next

page).

aTeng (R91723054) (2004); Lyuu & Teng (R91723054) (2011).

(9)

Gamma (continued)

• Suppose h(θ, x) ∈ H with pdf f(x) for x and gj(θ, x) ∈ G for j ∈ B, a ﬁnite set of natural numbers.

• Then

∂

∂θ

h(θ, x)

j∈B1{gj (θ,x)>0}(x) f (x) dx

=

hθ (θ, x)

j∈B1{gj (θ,x)>0}(x) f (x) dx

+

l∈ B

⎡

⎢⎣h(θ, x)Jl(θ, x)

j∈B\l1{gj (θ, x)>0}(x) f (x)

⎤

⎥⎦

x=χl (θ) ,

where

Jl(θ, x) = sign

∂gl(θ, x)

∂xk

∂gl(θ, x)/∂θ

∂gl(θ, x)/∂x for l ∈ B.

(10)

Gamma (concluded)

• Similar results have been derived for Levy processes.^a

• Formulas are also recently obtained for credit derivatives.^b

• In queueing networks, this is called inﬁnitesimal perturbation analysis (IPA).^c

aLyuu, Teng (R91723054), & S. Wang (2013).

bLyuu, Teng (R91723054), & Tzeng (2014).

cCao (1985); Ho & Cao (1985).

(11)

Biases in Pricing Continuously Monitored Options with Monte Carlo

• We are asked to price a continuously monitored up-and-out call with barrier H.

• The Monte Carlo method samples the stock price at n discrete time points t₁, t₂, . . . , t_n.

• A sample path

S(t₀), S(t₁), . . . , S(t_n) is produced.

– Here, t0 = 0 is the current time, and t_n = T is the

(12)

Biases in Pricing Continuously Monitored Options with Monte Carlo (continued)

• If all of the sampled prices are below the barrier, this sample path pays max(S(t_n) − X, 0).

• Repeating these steps and averaging the payoﬀs yield a Monte Carlo estimate.

(13)

1: C := 0;

2: for i = 1, 2, 3, . . . , N do

3: P := S; hit := 0;

4: for j = 1, 2, 3, . . . , n do

5: P := P × e^(r−σ²/2) (T /n)+σ√

(T /n) ξ;

6: if P ≥ H then

7: hit := 1;

8: break;

9: end if

10: end for

11: if hit = 0 then

12: C := C + max(P − X, 0);

13: end if

14: end for

(14)

Biases in Pricing Continuously Monitored Options with Monte Carlo (continued)

• This estimate is biased.^a

– Suppose none of the sampled prices on a sample path equals or exceeds the barrier H.

– It remains possible for the continuous sample path that passes through them to hit the barrier between sampled time points (see plot on next page).

aShevchenko (2003).

(15)

H

(16)

Biases in Pricing Continuously Monitored Options with Monte Carlo (concluded)

• The bias can certainly be lowered by increasing the number of observations along the sample path.

• However, even daily sampling may not suﬃce.

• The computational cost also rises as a result.

(17)

Brownian Bridge Approach to Pricing Barrier Options

• We desire an unbiased estimate which can be calculated eﬃciently.

• The above-mentioned payoﬀ should be multiplied by the probability p that a continuous sample path does not hit the barrier conditional on the sampled prices.

• This methodology is called the Brownian bridge approach.

• Formally, we have

p = Prob[ S(t) < H, 0^Δ ≤ t ≤ T | S(t₀), S(t₁), . . . , S(t_n) ].

(18)

Brownian Bridge Approach to Pricing Barrier Options (continued)

• As a barrier is hit over a time interval if and only if the maximum stock price over that period is at least H,

p = Prob

0≤t≤Tmax S(t) < H | S(t0), S(t₁), . . . , S(t_n)

.

• Luckily, the conditional distribution of the maximum over a time interval given the beginning and ending stock prices is known.

(19)

Brownian Bridge Approach to Pricing Barrier Options (continued)

Lemma 23 Assume S follows dS/S = μ dt + σ dW and define ζ(x) = exp^Δ

−2 ln(x/S(t)) ln(x/S(t + Δt)) σ²Δt

. (1) If H > max(S(t), S(t + Δt)), then

Prob

t≤u≤t+Δtmax S(u) < H

S(t),S(t + Δt)

= 1 − ζ(H).

(2) If h < min(S(t), S(t + Δt)), then Prob

t≤u≤t+Δtmin S(u) > h

S(t),S(t + Δt)

= 1 − ζ(h).

(20)

Brownian Bridge Approach to Pricing Barrier Options (continued)

• Lemma 23 gives the probability that the barrier is not hit in a time interval, given the starting and ending stock prices.

• For our up-and-out call,^a choose n = 1.

• As a result,

p =

⎧⎨

⎩

1 − exp

−2 ln(H/S(0)) ln(H/S(T )) σ²T

, if H > max(S(0), S(T )),

0, otherwise.

aSo S(0) < H.

(21)

Brownian Bridge Approach to Pricing Barrier Options (continued)

The following algorithms works for up-and-out and down-and-out calls.

1: C := 0;

2: for i = 1, 2, 3, . . . , N do

3: P := S × e^(r−q−σ²^{/2) T +σ}

√T ξ( )

;

4: if (S < H and P < H) or (S > H and P > H) then

5: C := C+max(P −X, 0)×

1 − exp

−2 ln(H/S)×ln(H/P ) σ²T

;

6: end if

7: end for

8: return Ce^−rT/N ;

(22)

Brownian Bridge Approach to Pricing Barrier Options (concluded)

• The idea can be generalized.

• For example, we can handle more complex barrier options.

• Consider an up-and-out call with barrier H_i for the time interval (t_i, t_i+1 ], 0 ≤ i < n.

• This option thus contains n barriers.

• Multiply the probabilities for the n time intervals to obtain the desired probability adjustment term.

(23)

Variance Reduction

• The statistical eﬃciency of Monte Carlo simulation can be measured by the variance of its output.

• If this variance can be lowered without changing the expected value, fewer replications are needed.

• Methods that improve eﬃciency in this manner are called variance-reduction techniques.

• Such techniques become practical when the added costs are outweighed by the reduction in sampling.

(24)

Variance Reduction: Antithetic Variates

• We are interested in estimating E[ g(X₁, X₂, . . . , X_n) ].

• Let Y₁ and Y₂ be random variables with the same distribution as g(X₁, X₂, . . . , X_n).

• Then

Var

Y₁ + Y₂ 2

= Var[ Y₁ ]

2 + Cov[ Y₁, Y₂ ]

2 .

– Var[ Y₁ ]/2 is the variance of the Monte Carlo method with two independent replications.

• The variance Var[ (Y₁ + Y₂)/2 ] is smaller than

Var[ Y₁ ]/2 when Y₁ and Y₂ are negatively correlated.

(25)

Variance Reduction: Antithetic Variates (continued)

• For each simulated sample path X, a second one is obtained by reusing the random numbers on which the ﬁrst path is based.

• This yields a second sample path Y .

• Two estimates are then obtained: One based on X and the other on Y .

• If N independent sample paths are generated, the antithetic-variates estimator averages over 2N

estimates.

(26)

Variance Reduction: Antithetic Variates (continued)

• Consider process dX = at dt + b_t√

dt ξ.

• Let g be a function of n samples X₁, X₂, . . . , X_n on the sample path.

• We are interested in E[ g(X₁, X₂, . . . , X_n) ].

• Suppose one simulation run has realizations

ξ₁, ξ₂, . . . , ξ_n for the normally distributed ﬂuctuation term ξ.

• This generates samples x1, x₂, . . . , x_n.

• The estimate is then g(x), where x = (x^Δ ₁, x₂ . . . , x_n).

(27)

Variance Reduction: Antithetic Variates (concluded)

• The antithetic-variates method does not sample n more numbers from ξ for the second estimate g(x).

• Instead, generate the sample path x^Δ= (x₁, x₂ . . . , x_n) from −ξ1,−ξ2, . . . ,−ξn.

• Compute g(x).

• Output (g(x) + g(x))/2.

• Repeat the above steps for as many times as required by accuracy.

(28)

Variance Reduction: Conditioning

• We are interested in estimating E[ X ].

• Suppose here is a random variable Z such that

E[ X | Z = z ] can be eﬃciently and precisely computed.

• E[ X ] = E[ E[ X | Z ] ] by the law of iterated conditional expectations.

• Hence the random variable E[ X | Z ] is also an unbiased estimator of E[ X ].

(29)

Variance Reduction: Conditioning (concluded)

• As

Var[ E[ X | Z ] ] ≤ Var[ X ],

E[ X | Z ] has a smaller variance than observing X directly.

• First obtain a random observation z on Z.

• Then calculate E[ X | Z = z ] as our estimate.

– There is no need to resort to simulation in computing E[ X | Z = z ].

• The procedure can be repeated a few times to reduce

(30)

Control Variates

• Use the analytic solution of a similar yet simpler problem to improve the solution.

• Suppose we want to estimate E[ X ] and there exists a random variable Y with a known mean μ = E[ Y ].^Δ

• Then W = X + β(Y^Δ − μ) can serve as a “controlled”

estimator of E[ X ] for any constant β.

– However β is chosen, W remains an unbiased estimator of E[ X ] as

E[ W ] = E[ X ] + βE[ Y − μ ] = E[ X ].

(31)

Control Variates (continued)

• Note that

Var[ W ] = Var[ X ] + β² Var[ Y ] + 2β Cov[ X, Y ],

(110)

• Hence W is less variable than X if and only if

β² Var[ Y ] + 2β Cov[ X, Y ] < 0. (111)

(32)

Control Variates (concluded)

• The success of the scheme clearly depends on both β and the choice of Y .

– For example, arithmetic average-rate options can be priced by choosing Y to be the otherwise identical geometric average-rate option’s price and β = −1.

• This approach is much more eﬀective than the antithetic-variates method.

(33)

Choice of Y

• In general, the choice of Y is ad hoc,^a and experiments must be performed to conﬁrm the wisdom of the choice.

• Try to match calls with calls and puts with puts.^b

• On many occasions, Y is a discretized version of the derivative that gives μ.

– Discretely monitored geometric average-rate option vs. the continuously monitored geometric

average-rate option given by formulas (50) on p. 401.

aBut see Dai (B82506025, R86526008, D8852600), Chiu (R94922072),

& Lyuu (2015).

(34)

Optimal Choice of β

• For some choices, the discrepancy can be signiﬁcant, such as the lookback option.^a

• Equation (110) on p. 826 is minimized when β = −Cov[ X, Y ]/Var[ Y ].

– It is called beta in the book.

• For this speciﬁc β,

Var[ W ] = Var[ X ] − Cov[ X, Y ]²

Var[ Y ] =

1 − ρ²_X,Y

Var[ X ], where ρ_X,Y is the correlation between X and Y .

aContributed by Mr. Tsai, Hwai (R92723049) on May 12, 2004.

(35)

Optimal Choice of β (continued)

• Note that the variance can never be increased with the optimal choice.

• Furthermore, the stronger X and Y are correlated, the greater the reduction in variance.

• For example, if this correlation is nearly perfect (±1), we could control X almost exactly.

(36)

Optimal Choice of β (continued)

• Typically, neither Var[ Y ] nor Cov[ X, Y ] is known.

• Therefore, we cannot obtain the maximum reduction in variance.

• We can guess these values and hope that the resulting W does indeed have a smaller variance than X.

• A second possibility is to use the simulated data to estimate these quantities.

– How to do it eﬃciently in terms of time and space?

(37)

Optimal Choice of β (concluded)

• Observe that −β has the same sign as the correlation between X and Y .

• Hence, if X and Y are positively correlated, β < 0, then X is adjusted downward whenever Y > μ and upward otherwise.

• The opposite is true when X and Y are negatively correlated, in which case β > 0.

• Suppose a suboptimal β + is used instead.

• The variance increases by only ²Var[ Y ].^a

(38)

A Pitfall

• A potential pitfall is to sample X and Y independently.

• In this case, Cov[ X, Y ] = 0.

• Equation (110) on p. 826 becomes

Var[ W ] = Var[ X ] + β² Var[ Y ].

• So whatever Y is, the variance is increased!

• Lesson: X and Y must be correlated.

(39)

Problems with the Monte Carlo Method

• The error bound is only probabilistic.

• The probabilistic error bound of √

N does not beneﬁt from regularity of the integrand function.

• The requirement that the points be independent random samples are wasteful because of clustering.

• In reality, pseudorandom numbers generated by completely deterministic means are used.

• Monte Carlo simulation exhibits a great sensitivity on the seed of the pseudorandom-number generator.

(40)

Matrix Computation

(41)

To set up a philosophy against physics is rash;

philosophers who have done so have always ended in disaster.

— Bertrand Russell

(42)

Definitions and Basic Results

• Let A = [ a^Δ _ij ]1≤i≤m,1≤j≤n, or simply A ∈ R^m×n, denote an m × n matrix.

• It can also be represented as [ a₁, a₂, . . . , a_n ] where a_i ∈ R^m are vectors.

– Vectors are column vectors unless stated otherwise.

• A is a square matrix when m = n.

• The rank of a matrix is the largest number of linearly independent columns.

(43)

Definitions and Basic Results (continued)

• A square matrix A is said to be symmetric if A^T = A.

• A real n × n matrix

A = [ a^Δ _ij ]_i,j is diagonally dominant if | aii | >

j=i | aij | for 1 ≤ i ≤ n.

– Such matrices are nonsingular.

• The identity matrix is the square matrix I = diag[ 1, 1, . . . , 1 ].^Δ

(44)

Definitions and Basic Results (concluded)

• A matrix has full column rank if its columns are linearly independent.

• A real symmetric matrix A is positive deﬁnite if x^TAx =

i,j

a_ijx_ix_j > 0 for any nonzero vector x.

• A matrix A is positive deﬁnite if and only if there exists a matrix W such that A = W^TW and W has full

column rank.

(45)

Cholesky Decomposition

• Positive deﬁnite matrices can be factored as A = LL^T,

called the Cholesky decomposition.

– Above, L is a lower triangular matrix.

(46)

Generation of Multivariate Distribution

• Let x = [ x^Δ ₁, x₂, . . . , x_n ]^T be a vector random variable with a positive deﬁnite covariance matrix C.

• As usual, assume E[ x ] = 0.

• This covariance structure can be matched by P y.

– C = P P^T is the Cholesky decomposition of C.^a – y = [ y^Δ ₁, y₂, . . . , y_n ]^T is a vector random variable

with a covariance matrix equal to the identity matrix.

aWhat if C is not positive deﬁnite? See Lai (R93942114) & Lyuu (2007).

(47)

Generation of Multivariate Normal Distribution

• Suppose we want to generate the multivariate normal distribution with a covariance matrix C = P P^T.

– First, generate independent standard normal distributions y₁, y₂, . . . , y_n.

– Then

P [ y₁, y₂, . . . , y_n ]^T has the desired distribution.

– These steps can then be repeated.

(48)

Multivariate Derivatives Pricing

• Generating the multivariate normal distribution is essential for the Monte Carlo pricing of multivariate derivatives (pp. 748ﬀ).

• For example, the rainbow option on k assets has payoﬀ max(max(S₁, S₂, . . . , S_k) − X, 0)

at maturity.

• The closed-form formula is a multi-dimensional integral.^a

aJohnson (1987); Chen (D95723006) & Lyuu (2009).

(49)

Multivariate Derivatives Pricing (concluded)

• Suppose dS_j/S_j = r dt + σ_j dW_j, 1 ≤ j ≤ k, where C is the correlation matrix for dW₁, dW₂, . . . , dW_k.

• Let C = P P^T.

• Let ξ consist of k independent random variables from N (0, 1).

• Let ξ = P ξ.

• Similar to Eq. (109) on p. 791, S_i+1 = S_ie^(r−σ²^j^{/2) Δt+σ}^j

√Δt ξ_j, 1 ≤ j ≤ k.

(50)

Least-Squares Problems

• The least-squares (LS) problem is concerned with

x∈Rminⁿ  Ax − b , where A ∈ R^m×n, b ∈ R^m, and m ≥ n.

• The LS problem is called regression analysis in statistics and is equivalent to minimizing the mean-square error.

• Often written as

Ax = b.

(51)

Polynomial Regression

• In polynomial regression, x₀ + x₁x + · · · + x_nxⁿ is used to ﬁt the data { (a₁, b₁), (a₂, b₂), . . . , (a_m, b_m)}.

• This leads to the LS problem,

⎡

⎢⎢

⎣

1 a₁ a²₁ · · · aⁿ₁ 1 a₂ a²₂ · · · aⁿ₂ ... ... ... . .. ... 1 a_m a²_m · · · aⁿ_m

⎤

⎥⎥

⎦

⎡

⎢⎢

⎣ x₀ x₁ ... x_n

⎤

⎥⎥

⎦

=

⎡

⎢⎢

⎣ b₁ b₂ ... b_m

⎤

⎥⎥

⎦ .

• Consult p. 273 of the textbook for solutions.

(52)

American Option Pricing by Simulation

• The continuation value of an American option is the conditional expectation of the payoﬀ from keeping the option alive now.

• The option holder must compare the immediate exercise value and the continuation value.

• In standard Monte Carlo simulation, each path is treated independently of other paths.

• But the decision to exercise the option cannot be reached by looking at one path alone.

(53)

The Least-Squares Monte Carlo Approach

• The continuation value can be estimated from the cross-sectional information in the simulation by using least squares.^a

• The result is a function (of the state) for estimating the continuation values.

• Use the function to estimate the continuation value for each path to determine its cash ﬂow.

• This is called the least-squares Monte Carlo (LSM) approach.

a

(54)

The Least-Squares Monte Carlo Approach (concluded)

• The LSM is provably convergent.^a

• The LSM can be easily parallelized.^b

– Partition the paths into subproblems and perform LSM on each of them independently.

– The speedup is close to linear (i.e., proportional to the number of cores).

• Surprisingly, accuracy is not aﬀected.

aCl´ement, Lamberton, & Protter (2002); Stentoft (2004).

bHuang (B96902079, R00922018) (2013); Chen (B97902046, R01922005) (2014); Chen (B97902046, R01922005), Huang (B96902079, R00922018) & Lyuu (2015).

(55)

A Numerical Example

• Consider a 3-year American put on a non-dividend-paying stock.

• The put is exercisable at years 0, 1, 2, and 3.

• The strike price X = 105.

• The annualized riskless rate is r = 5%.

• The current stock price is 101.

– The annual discount factor hence equals 0.951229.

• We use only 8 price paths to illustrate the algorithm.

(56)

A Numerical Example (continued)

Stock price paths

Path Year 0 Year 1 Year 2 Year 3

1 101 97.6424 92.5815 107.5178

2 101 101.2103 105.1763 102.4524 3 101 105.7802 103.6010 124.5115

4 101 96.4411 98.7120 108.3600

5 101 124.2345 101.0564 104.5315

6 101 95.8375 93.7270 99.3788

7 101 108.9554 102.4177 100.9225 8 101 104.1475 113.2516 115.0994

(57)

0 0.5 1 1.5 2 2.5 3 95

100 105 110 115 120 125

1

2

3

4 5

6 7

8

(58)

A Numerical Example (continued)

• We use the basis functions 1, x, x². – Other basis functions are possible.^a

• The plot next page shows the ﬁnal estimated optimal exercise strategy given by LSM.

• We now proceed to tackle our problem.

• The idea is to calculate the cash ﬂow along each path, using information from all paths.

aLaguerre polynomials, Hermite polynomials, Legendre polynomials, Chebyshev polynomials, Gedenbauer polynomials, and Jacobi polynomials.

(59)

0 0.5 1 1.5 2 2.5 3 95

100 105 110 115 120 125

1

2 3

4 5

6 7 8

(60)

A Numerical Example (continued)

Cash flows at year 3

1 — — — 0

2 — — — 2.5476

3 — — — 0

4 — — — 0

5 — — — 0.4685

6 — — — 5.6212

7 — — — 4.0775

8 — — — 0

(61)

A Numerical Example (continued)

• The cash ﬂows at year 3 are the exercise value if the put is in the money.

• Only 4 paths are in the money: 2, 5, 6, 7.

• Some of the cash ﬂows may not occur if the put is exercised earlier, which we will ﬁnd out step by step.

• Incidentally, the European counterpart has a value of

0.951229³ × 2.5476 + 0.4685 + 5.6212 + 4.0775

8 = 1.3680.

(62)

A Numerical Example (continued)

• We move on to year 2.

• For each state that is in the money at year 2, we must decide whether to exercise it.

• There are 6 paths for which the put is in the money: 1, 3, 4, 5, 6, 7 (p. 851).

• Only in-the-money paths will be used in the regression because they are where early exercise is relevant.

– If there were none, we would move on to year 1.

(63)

A Numerical Example (continued)

• Let x denote the stock prices at year 2 for those 6 paths.

• Let y denote the corresponding discounted future cash ﬂows (at year 3) if the put is not exercised at year 2.

(64)

A Numerical Example (continued)

Regression at year 2

Path x y

1 92.5815 0 × 0.951229

2 — —

3 103.6010 0 × 0.951229 4 98.7120 0 × 0.951229 5 101.0564 0.4685 × 0.951229 6 93.7270 5.6212 × 0.951229 7 102.4177 4.0775 × 0.951229

8 — —

(65)

A Numerical Example (continued)

• We regress y on 1, x, and x².

• The result is

f (x) = 22.08 − 0.313114 × x + 0.00106918 × x².

• f(x) estimates the continuation value conditional on the stock price at year 2.

• We next compare the immediate exercise value and the continuation value.^a

aThe f(102.4177) entry on the next page was corrected by Mr. Du, Yung-Szu (B79503054, R83503086) on May 25, 2017.

(66)

A Numerical Example (continued)

Optimal early exercise decision at year 2 Path Exercise Continuation 1 12.4185 f (92.5815) = 2.2558

2 — —

3 1.3990 f (103.6010) = 1.1168 4 6.2880 f (98.7120) = 1.5901 5 3.9436 f (101.0564) = 1.3568 6 11.2730 f (93.7270) = 2.1253 7 2.5823 f (102.4177) = 1.2266

8 — —

(67)

A Numerical Example (continued)

• Amazingly, the put should be exercised in all 6 paths: 1, 3, 4, 5, 6, 7.

• Now, any positive cash ﬂow at year 3 should be set to zero or overridden for these paths as the put is exercised before year 3 (p. 851).

– They are paths 5, 6, 7.

• The cash ﬂows on p. 855 become the ones on next slide.

(68)

A Numerical Example (continued)

Cash flows at years 2 & 3

1 — — 12.4185 0

2 — — 0 2.5476

3 — — 1.3990 0

4 — — 6.2880 0

5 — — 3.9436 0

6 — — 11.2730 0

7 — — 2.5823 0

8 — — 0 0

(69)

A Numerical Example (continued)

• For each state that is in the money at year 1, we must decide whether to exercise it.

• There are 5 paths for which the put is in the money: 1, 2, 4, 6, 8 (p. 851).

• Only in-the-money paths will be used in the regression because they are where early exercise is relevant.

– If there were none, we would move on to year 0.

(70)

A Numerical Example (continued)

• Let x denote the stock prices at year 1 for those 5 paths.

• Let y denote the corresponding discounted future cash ﬂows if the put is not exercised at year 1.

• From p. 863, we have the following table.

(71)

A Numerical Example (continued)

Regression at year 1

Path x y

1 97.6424 12.4185 × 0.951229 2 101.2103 2.5476 × 0.951229²

3 — —

4 96.4411 6.2880 × 0.951229

5 — —

6 95.8375 11.2730 × 0.951229

7 — —

(72)

A Numerical Example (continued)

• We regress y on 1, x, and x².

• The result is

f (x) = −420.964 + 9.78113 × x − 0.0551567 × x².

• f(x) estimates the continuation value conditional on the stock price at year 1.

• We next compare the immediate exercise value and the continuation value.

(73)

A Numerical Example (continued)

Optimal early exercise decision at year 1

Path Exercise Continuation

1 7.3576 f (97.6424) = 8.2230 2 3.7897 f (101.2103) = 3.9882

3 — —

4 8.5589 f (96.4411) = 9.3329

5 — —

6 9.1625 f (95.8375) = 9.83042

7 — —

(74)

A Numerical Example (continued)

• The put should be exercised for 1 path only: 8.

– Note that f(104.1475) < 0.

• Now, any positive future cash ﬂow should be set to zero or overridden for this path.

– But there is none.

• The cash ﬂows on p. 863 become the ones on next slide.

• They also conﬁrm the plot on p. 854.

(75)

A Numerical Example (continued)

Cash flows at years 1, 2, & 3

1 — 0 12.4185 0

2 — 0 0 2.5476

3 — 0 1.3990 0

4 — 0 6.2880 0

5 — 0 3.9436 0

6 — 0 11.2730 0

7 — 0 2.5823 0

(76)

A Numerical Example (continued)

• The continuation value is, from p 870,

(12.4185 × 0.951229² + 2.5476 × 0.951229³ +1.3990 × 0.951229² + 6.2880 × 0.951229² +3.9436 × 0.951229² + 11.2730 × 0.951229² +2.5823 × 0.951229² + 0.8525 × 0.951229)/8

= 4.66263.

(77)

A Numerical Example (concluded)

• As this is larger than the immediate exercise value of 105 − 101 = 4,

the put should not be exercised at year 0.

• Hence the put’s value is estimated to be 4.66263.

• Compare this with the European put’s value of 1.3680 (p. 856).

(78)

Time Series Analysis

(79)

The historian is a prophet in reverse.

— Friedrich von Schlegel (1772–1829)

(80)

GARCH Option Pricing

^a

• Options can be priced when the underlying asset’s return follows a GARCH process.

• Let S_t denote the asset price at date t.

• Let h²_t be the conditional variance of the return over the period [ t, t + 1 ] given the information at date t.

– “One day” is merely a convenient term for any elapsed time Δt.

aARCH (autoregressive conditional heteroskedastic) is due to Engle (1982), co-winner of the 2003 Nobel Prize in Economic Sciences. GARCH (generalized ARCH) is due to Bollerslev (1986) and Taylor (1986). A Bloomberg quant said to me on Feb 29, 2008, that GARCH is seldom used in trading.

(81)

GARCH Option Pricing (continued)

• Adopt the following risk-neutral process for the price dynamics:^a

ln S_t+1

S_t = r − h²_t

2 + h_t_t+1, (112) where

h²_t+1 = β₀ + β₁h²_t + β₂h²_t(_t+1 − c)², (113)

_t+1 ∼ N(0, 1) given information at date t, r = daily riskless return,

c ≥ 0.

(82)

GARCH Option Pricing (continued)

• The ﬁve unknown parameters of the model are c, h0, β₀, β₁, and β₂.

• It is postulated that β0, β₁, β₂ ≥ 0 to make the conditional variance positive.

• There are other inequalities to satisfy (see text).

• The above process is called the nonlinear asymmetric GARCH (or NGARCH) model.

(83)

GARCH Option Pricing (continued)

• It captures the volatility clustering in asset returns ﬁrst noted by Mandelbrot (1963).^a

– When c = 0, a large t+1 results in a large h_t+1,

which in turns tends to yield a large h_t+2, and so on.

• It also captures the negative correlation between the asset return and changes in its (conditional) volatility.^b

– For c > 0, a positive t+1 (good news) tends to decrease h_t+1, whereas a negative _t+1 (bad news) tends to do the opposite.

a“. . . large changes tend to be followed by large changes—of either sign—and small changes tend to be followed by small changes . . . ”

b

(84)

GARCH Option Pricing (concluded)

• With y_t = ln S^Δ _t denoting the logarithmic price, the model becomes

y_t+1 = y_t + r − h²_t

2 + h_t_t+1. (114)

• The pair (y_t, h²_t) completely describes the current state.

• The conditional mean and variance of y_t+1 are clearly E[ y_t+1 | y_t, h²_t ] = y_t + r − h²_t

2 , (115) Var[ y_t+1 | yt, h²_t ] = h²_t. (116)

(85)

GARCH Model: Inferences

• Suppose the parameters c, h0, β₀, β₁, and β₂ are given.

• Then we can recover h₁, h₂, . . . , h_n and ₁, ₂, . . . , _n from the prices

S₀, S₁, . . . , S_n

under the GARCH model (112) on p. 876.

• This property is useful in statistical inferences.

(86)

The Ritchken-Trevor (RT) Algorithm

^a

• The GARCH model is a continuous-state model.

• To approximate it, we turn to trees with discrete states.

• Path dependence in GARCH makes the tree for asset prices explode exponentially (why?).

• We need to mitigate this combinatorial explosion.

aRitchken & Trevor (1999).