1INTRODUCTION DuyMinhDang ChristinaC.Christara Adaptiveandhigh-ordermethodsforvaluingAmericanoptions

(1)

Adaptive and high-order methods for valuing American options

Christina C. Christara

Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3G4, Canada; email: [email protected]

Duy Minh Dang

Department of Computer Science, University of Toronto,

Toronto, Ontario M5S 3G4, Canada; email: [email protected]

Space–time adaptive and high-order methods for valuing American options using a partial differential equation (PDE) approach are developed in this paper. The linear complementarity problem that arises due to the free boundary is handled using a penalty method. Both finite difference and finite element methods are con- sidered for the space discretization of the PDE, while classical finite differences, such as Crank–Nicolson, are used for the time discretization. The high-order dis- cretization in space is based on an optimal finite element collocation method, the main computational requirements of which are the solution of one tridiagonal linear system at each timestep, while the resulting errors at the grid points and midpoints of the space partition are fourth order. To control the space error we use adaptive grid-point distribution based on an error equidistribution principle.

A timestep size selector is used to further increase the efficiency of the methods.

Numerical examples show that our methods converge fast and provide highly accurate options prices, Greeks and early exercise boundaries.

1 INTRODUCTION

The pricing of an American option is a difficult task, mainly due to the early exercise feature of the option (Tavella and Randall (2000) and Wilmott et al (1995)). Typically, at any time, there is a specific value of the asset price that divides the asset domain into the early exercise region, where the option should be exercised immediately, and the continuation region, where the option should be held. Hence, the early exercise feature leads to an additional constraint that stipulates that the value of an American option must be greater than or equal to its payoff. This constraint requires special treatment, a fact that makes an explicit closed-form solution for an American option intractable for most cases. Consequently, numerical methods must be used.

(2)

Although several approaches such as Monte Carlo simulations (Fu et al (2001) and Longstaff and Schwartz (2001)), lattice (tree) methods (Hull (2008) and Jiang and Dai (2004)) or integral equations (Barone-Adesi and Whaley (1987); Carr et al (1992);

and MacMillan (1986)) can be used for pricing an American option, for problems in low dimensions, ie, fewer than five dimensions, the partial differential equation (PDE) approach is very popular due to its efficiency and global character. In addition, accurate hedging parameters such as delta and gamma, which are essential for risk managing financial derivatives, are generally much easier to compute using a PDE approach than they are using other methods. Using a PDE approach, the American option pricing problem can be formulated as a time-dependent linear complementarity problem (LCP) with the inequalities involving the Black–Scholes PDE and some additional constraints (Wilmott et al (1993)). Recently, several approaches for handling the LCP have been developed. In particular, various penalty methods were discussed in Forsyth and Vetzal (2002), Nielsen et al (2002) and Zvan et al (1998a). In this paper we adopt the penalty method of Forsyth and Vetzal (2002) to solve the LCP. According to this approach, a penalty term is introduced in the discretized equations in order to enforce the early exercise constraint. Although this method was originally built upon a finite volume discretization method for the space dimension, the idea of this method could be extended to other discretization techniques such as finite difference and finite element.

The popularity of finite differences in option pricing is mainly due to the fact that they are intuitive and easy to implement. Finite elements can also be used as an alter- native. These discretization methods offer several advantages over finite differences.

Firstly, the solution is a piecewise polynomial approximation to the entire domain, while the method of finite differences supplies an approximate solution only to distinct points in the domain, and interpolation may therefore become necessary. Secondly, there are several finite element methods such as, for instance, spline collocation, that supply hedging parameters such as delta and gamma as a by-product and, in addition, allow other hedging parameters to be computed in a slightly easier manner than with finite differences. In particular, certain spline collocation¹methods have been shown to be effective on uniform and nonuniform grids for the solution of boundary value problems (Christara and Ng (2006a,b)) and parabolic initial value problems (Christara et al (2009)).

Using spline collocation in its standard formulation gives only second-order (and therefore suboptimal) accuracy. In the context of parabolic PDEs, this suboptimal spline collocation method requires the solution of one tridiagonal linear system at each timestep. In general, high-order methods in space usually require larger discretization

1Note that we use the term “spline” in this paper to refer to maximum smoothness piecewise polynomials.

(3)

stencils and, hence, the systems to be solved at each timestep are not tridiagonal. In Christara et al (2009), several optimal and efficient methods based on quadratic spline collocation (QSC) are developed for one-dimensional linear parabolic PDEs. These methods give fourth-order (optimal) convergence on the knots and midpoints with the main computational requirements of the methods being the solution of only one tridiagonal linear system at each timestep. Extensions of such efficient high-order spline collocation methods to option pricing, especially to pricing American options, have not yet been discussed in the literature. This shortcoming motivates our work.

Adaptive methods aim at dynamically adjusting the location of the grid points in order to control the error in the approximate solution. Although adaptive techniques have been extensively developed for numerical solutions of parabolic PDEs (see, for example, Christara and Ng (2006a); Eriksson and Johnson (1991, 1995); and Wang et al (2004)), they are not so frequent in the option pricing literature. Examples of algorithms for adaptivity in space and time can be found in Achdou and Pironneau (2005), Lötstedt et al (2007) and Persson and Sydow (2007). In Achdou and Pironneau (2005), a space-adaptive mesh refinement based on a posteriori estimates of the finite element discretization errors of the Black–Scholes equation computed using a Hilbert sum is proposed. Persson and Sydow (2007) propose a space–time-adaptive finite difference technique for pricing multiasset European options. The adaptivity in space is based on first solving the problem on a coarse grid with large timesteps for an estimation of the errors, and then resolving the problem with more grid points redistributed in such a way that the estimated local error is below a certain level. In Lötstedt et al (2007), an error equation is derived for the global error in the solution, and the grid and timestep sizes are chosen such that a tolerance on the final global error is satisfied by the solution. A popular technique for mesh generation and adaptation is based on De Boor’s equidistribution principle (De Boor (1974)). The underlying idea of this principle is to relocate the nodes to equidistribute the error in some chosen norm (or seminorm) among the subintervals of the partition. Although adaptive techniques based on an equidistribution principle are widely used in the numerical solution of PDEs, these techniques have not, to the best of our knowledge, been successfully extended to option pricing in general, and American option pricing in particular. This deficiency further motivates our work.

In this paper we develop highly accurate and efficient numerical methods for pricing American options on a single asset. Although we focus primarily on the one- dimensional case, some of the results in this paper can be naturally extended to two or more dimensions. The high-order methods in the spatial dimension are built upon the efficient and high-order QSC methods of Christara et al (2009). Second-order finite difference discretization for the spatial variable is also considered. Adaptive techniques based on the equidistribution principle of De Boor (1974) are introduced into the space dimension. A timestep size selector (see Forsyth and Vetzal (2002)) is

(4)

used to further improve the performance of the methods. Numerical results show that our methods provide highly accurate options prices and Greeks, and capture well the moving behavior of the free boundary. In this paper we do not include grid size estimators and changes of the grid size from timestep to timestep, such as those in Wang et al (2004). We plan to incorporate the grid size estimators presented in Christara and Ng (2006a) into American option pricing problems in a future work.

The remainder of the paper is organized as follows. Section 2 presents a PDE formulation of the pricing problem for an American option. We restrict our attention to American put options. In Section 3 we describe discretization methods, with strong emphasis on the efficient and high-order QSC methods, and discuss the selection of an appropriate form for the discrete penalty term. A penalty iteration for the discretized American put option is discussed in Section 4. Section 5 introduces an adaptive mesh algorithm for American option pricing and a simple but effective timestep size selector.

Numerical results that demonstrate the efficiency and accuracy of the methods are presented in Section 6. Section 7 concludes the paper.

2 FORMULATION

The Black–Scholes model for American put options takes the form of a free boundary problem (Tavella and Randall (2000) and Wilmott et al (1995)). The disadvantage of the free boundary formulation is that there is an explicit mention of the free boundary.

To avoid this, we write the American put option valuation problem in an LCP form, and the optimal free boundary can then be determined a posteriori. More specifically, denoting the value at time t of the underlying asset by S , denoting the expiry time of the option by T and denoting the backward time variable by D T t , the early exercise constraint leads to the following LCP for the value V .S; / of an American put option (see Wilmott et al (1993)):

@V

@ LV D 0; V V>0 or @V

@ LV > 0; V VD 0 9>

>=

>>

;

(2.1)

subject to the payoff:

V.S / D V .S; 0/ D max.E S; 0/ (2.2) and the boundary conditions:

V .0; / D E

V .S; / 0 as S ! S1

)

(2.3)

(5)

where:

LV ¹₂²S²@²V

@S² C rS@V

@S rV (2.4)

Here, S1is the right boundary of the semitruncated spatial domain, E is the strike, r is the positive constant risk-free interest rate and is the constant asset volatility.

Following Forsyth and Vetzal (2002), we replace the LCP (2.1) by a nonlinear PDE obtained by adding a penalty term to the right-hand side of the Black–Scholes equation. More specifically, with a penalty parameter , ! 1, we consider the nonlinear PDE for an American put option:

@V

@ LV D max.V V; 0/; S 2 ˝ .0; S1/; 2 Œ0; T (2.5) subject to the initial and boundary conditions (2.2) and (2.3). The penalty term

max.V V; 0/ effectively ensures that the solution satisfies V V > " for 0 < " 1. Essentially, in the region where V > V, the PDE (2.5) resembles the Black–Scholes equation. On the other hand, when " 6 V V < 0, the Black–

Scholes inequality is satisfied, ensuring that the early exercise rule is not violated.

3 DISCRETIZATION

We now discuss the discretization of (2.5) and the selection of appropriate forms for the discrete penalty term. For the rest of the paper, we adopt the following notation.

Let:

^j fS₀^j 0 < S₁^j < S₂^j < < S_n1^j < S_n^j S1g

be a partition of N˝ ˝ [ @˝ at time j, with (not necessarily uniform) spatial step sizes h^j_i D S_i^j S_{i 1}^j , i D 1; 2; : : : ; n. In general, the superscript j applied to an operator or a function of and/or S denotes evaluation of the operator or function at time j. Denote by h^j D j j 1, j D 1; 2; : : : , the j th timestep size with 0 D 0.

Let:

VN^j.S / NV .S; j/

be the approximation to the true solution V .S; j/. Furthermore, let:

VN^j D Œ NV₁^j; NV₂^j; : : : ; NV_n1^j ^T

be the vector of values NV_i^j NV .S_i^j; j/, i D 1; : : : ; n 1. Similarly, denote by:

V^;j D ŒV₁^;j; V₂^;j; : : : ; V_n1^;j^T the vector of the payoff values V_i^;j D V.S_i^j/, i D 1; : : : ; n 1.

(6)

To proceed from time j 1 to time j we apply the standard -timestepping discretization scheme to (2.5):

.I h^jL^j/ NV^j.S / D .I C .1 /h^jL^{j 1}/ NV^{j 1}.S / C P^j. NV^j.S //; S 2 ˝ (3.1) where 0 6 6 1, and incorporate the boundary conditions (2.3) by setting:

VN^j.0/ D E; VN^j.S1/ D 0 (3.2)

Here,I and P^jdenote the identity and penalty operators, respectively, whereP^j is defined byP^j. NV^j.S // D max.V.S / NV^j.S /; 0/. The above timestepping technique, together with the boundary conditions, can be viewed as equivalent to solving a nonlinear boundary value problem at each timestep. In (3.1), the values D ¹₂ and D 1 give rise to the standard Crank–Nicolson method and the fully-implicit method, respectively. It is known that the Crank–Nicolson method is second-order accurate, but prone to producing spurious oscillations, while the implicit method is first-order accurate but maintains strong stability properties (see, for example, Pooley et al (2003) and Zvan et al (1998b)). To maintain the accuracy of the Crank–Nicolson method as well as smoothness of the solution we use the Rannacher smoothing technique (Rannacher (1994)), which applies the fully implicit timestepping in the first few (usually two) timesteps.

3.1 Finite differences

Applying the standard centered finite difference discretization for the space variable in (3.1) gives rise to an .n 1/ .n 1/ algebraic system of the form:

.I C h^jM^j C NP^j/ NV^j D .I .1 /h^jM^{j 1}/ NV^{j 1}C NP^jV^;jC Ng^j (3.3) where M^j is a tridiagonal matrix that arises from discretizingL^jby finite difference on ^j, I is the identity matrix, NP^j is a diagonal penalty matrix and Ng^j is a vector containing certain values arising from the boundary conditions. The explicit formula for M^j is:

M^j tridf.M^j/i;i 1; .M^j/i;i; .M^j/i;i C1g

D tridf¹₂².S_i^j/²ˇ_1i^j rS_i^j˛_1i^j; ¹₂².S_i^j/²ˇ^j_2i rS_i^j˛_2i^j C r;

¹₂².S_i^j/²ˇ_3i^j rS_i^j˛_3i^jg (3.4)

(7)

where:

˛^j_1i D h^j_{i C1} h^j_i.h^j_i C h^j_{i C1}/

; ˛_2i^j D .h^j_{i C1} h^j_i/ h^j_ih^j_{i C1}

; ˛_3i^j D h^j_i h^j_{i C1}.h^j_i C h^j_{i C1}/ ˇ^j_1i D 2

h^j_i.h^j_i C h^j_{i C1}/; ˇ_2i^j D 2

h^j_ih^j_{i C1}; ˇ_3i^j D 2

h^j_{i C1}.h^j_i C h^j_{i C1}/ and where tridf; ; g denotes a tridiagonal matrix with the subdiagonal, main and superdiagonal elements listed in the brackets, and with the first and last rows modified to take into account the boundary conditions. The penalty matrix NP^j is defined by:

. NP^j/i;l

( if NV_i^j < V_i^;j and i D l

0 otherwise (3.5)

3.2 Finite element collocation methods

For high-order discretization in space, we apply collocation based on quadratic splines. We remind the reader that the space of quadratic splines with respect to partition ^jwith n subintervals has dimension n C 2, and thus we need n C 2 conditions. Two of the n C 2 conditions are obtained from the boundary conditions (3.2), and the rest come from collocation conditions, as explained later in this section. Let:

VN^j.S / DX

i

c_i^j_i^j.S /

be the spline approximation to V .S; j/ expressed in terms of appropriate quadratic spline basis functions _i^j.S / and coefficients or degrees of freedom c^j_i. Let D^j fD_i^jgⁿ_{i D1}be the set of collocation points on the partition ^j. It is important to emphasize that the choice of collocation points may affect the order of convergence of the resulting methods (Christara and Ng (2006b)), especially on a nonuniform grid. In the case of a uniform partition, ie, ^j fS_i^j D ih; i D 0; : : : ; n; h D S1=ng, and quadratic splines, it is natural to haveD^j fD_i^j D ¹₂.S_{i 1}^j C S_i^j/; i D 1; : : : ; ng.

That is, for a uniform partition, the set of collocation points for a QSC method is chosen to be the set of the midpoints of the partition. For a nonuniform partition, the set of collocation points is defined slightly differently. In Section 5 we describe in detail how the set of collocation points for a QSC method can be constructed on an adaptive grid. For convenience, let D₀^j S₀^j D 0 and let D_nC1^j Sn^j D S1. Also, let NV_I^j.S / be the quadratic-spline interpolant of V^j.S / satisfying:

VN_I^j.S / D NV^j.S /; S D 0; S 2 D^j; S D S1 (3.6) For the convenience of the reader, first we briefly review spline collocation methods for linear parabolic PDEs. It is known that applying the standard spline collocation

(8)

discretization to linear parabolic PDEs results in suboptimal approximations, ie, the order of convergence of the spline collocation approximation is less than that of the interpolant in the same polynomial space. To obtain optimal (fourth-order) QSC methods for linear parabolic PDEs, appropriate perturbations of the differential oper- atorL and of the boundary operator, similar to those used to obtain optimal spline collocation methods for boundary value problems (see, for example, Houstis et al (1988)), are developed in Christara et al (2009). An optimal fourth-order spline col- location method can either be obtained via deferred correction (two-step method), using the perturbation operator on the right-hand side of the collocation equations of the correction step and requiring the solution of two tridiagonal linear systems per timestep, or by extrapolation (one-step method), using the perturbation operator on the left-hand side of the collocation equations and requiring the solution of an almost pentadiagonal linear system at each timestep.²

Several optimal (fourth-order) and efficient QSC methods for solving one- dimensional linear parabolic PDEs with general boundary conditions have recently been introduced and studied in Christara et al (2009). These methods can be viewed as a combination of the two steps of the deferred correction method, treating the perturbation term forL explicitly, thereby maintaining the fourth-order accuracy while requiring the solution of only one tridiagonal linear system per timestep. The QSC discretization for the space variable considered in this paper can be viewed as an exten- sion of the efficient and optimal method named QSC–CN for linear one-dimensional parabolic PDEs in Christara et al (2009) to the context of one-dimensional nonlinear PDEs of the same form as (2.5). More specifically, NV^j.S / is computed by:

.I h^jL^j/ NV^j.S / D .I C .1 /h^jL^{j 1}C h^jP_L^{j 1}/

NV^{j 1}.S / C P^j. NV^j.S //; S 2 D^j VN^j.0/ D E; VN^j.S1/ D 0

9>

>=

>>

; (3.7) with NV⁰.S / D NV_I⁰.S /, where P_L^jis an appropriate perturbation ofL^j. The definition ofP_L^jon a general grid can be found in Christara and Ng (2006b) and is omitted here for brevity. The reader is referred to Christara et al (2009) for detailed discussions of the relevant methods.

As discussed in Christara et al (2009), for one-dimensional linear parabolic PDEs, the perturbation termsP_L^{j 1}. NV^{j 1}.S // corresponding to the first and last collocation points D^j₁ and D^jnare responsible for potential instability. Experiments show that a similar conclusion holds for the PDE (2.5). Among several remedies proposed in Christara et al (2009) we find that the QSC–CN0 method, which completely omits

2By “almost pentadiagonal”, we mean that all rows of the matrix, except the first two and the last two, follow a pentadiagonal pattern.

(9)

the perturbation terms on the first and last collocation points, is simple and works well for pricing American options. Hence, we adopt this choice forP_L^j in (3.7).

Let c^j fc_i^jg^nC1_{i D0} and c^;j fc^;j_i g^nC1_{i D0} be the vector of the unknown degrees of freedom for the quadratic spline approximation and the vector of the degrees of freedom for the spline interpolant of the payoff on the partition ^j, respectively.

Method (3.7) gives rise to a .n C 2/ .n C 2/ algebraic system of the form:

.Q₀^jCh^jQ^jCP^j/c^j D .Q^{j 1}₀ .1 /h^jQ^{j 1}Ch^jQ^{j 1}_P /c^{j 1}CP^jc^;jCg^j (3.8) where Q₀^jis the quadratic spline interpolation matrix for the partition ^j, where Q^j arises from discretizingL^jusing QSC, and where the matrix Q^{j 1}_P arises fromP_L^{j 1}. We refer the reader to Christara et al (2009) and Christara and Ng (2006b) for the explicit definitions of these matrices. It is important to note that Q₀^j is a tridiagonal matrix as opposed to the identity matrix in the finite difference case. The penalty matrix P^j is also a tridiagonal matrix as opposed to a diagonal matrix in the finite difference case, and is defined by:

.P^j/_i;l

(.Q^j₀/i;l if NV^j.S / < V^;j.S /; S D D_{i 1}^j 2 D^j [ @˝

0 otherwise (3.9)

or, equivalently, P^j D NNP^jQ₀^j, with:

. NNP^j/_i;l

( if NV^j.S / < V^;j.S /; S D D^j_{i 1}2 D^j[ @˝ and i D l

0 otherwise (3.10)

4 PENALTY ITERATION

In Forsyth and Vetzal (2002), a penalty iteration algorithm for American put options in the context of finite volume discretization methods is presented. The penalty iteration algorithms for (3.3) and (3.8) are essentially the same as in Forsyth and Vetzal (2002) and, hence, for brevity, we only present the penalty algorithm for the QSC methods.

Let k be the index of the nonlinear penalty iteration. Let c^j;.k/be the kth estimate of c^j and let P^j;.k/ be the kth penalty matrix constructed at the j th timestep. The vector of initial guess c^j;.0/is usually chosen to be c^{j 1}, which is the vector of the degrees of freedom of the quadratic spline approximation at the previous timestep. A QSC penalty iteration is presented in Algorithm 4.0.1.

4.0.1 Algorithm: QSC penalty iteration for American options (1) initialize c^j;.0/;

(2) construct P^j;.0/using (3.9);

(10)

(3) for k D 0; : : : , until convergence do (4) solve (3.8) for c^j;.kC1/;

(5) construct P^j;.kC1/using (3.9);

(6) if

maxi

j NV^j;.kC1/.S / NV^j;.k/.S /j

max.1; j NV^j;.kC1/.S /j/ for S D D_i^j 2 D^j

< tol

or ŒP^j;.k/D P^j;.kC1/ then (7) break;

(8) end if (9) end for (10) c^j D c^j;.kC1/.

In general, for points at which NV^j.S / < V^;j.S /, where S 2 ^jfor finite difference methods or S 2D^jfor QSC methods, if we want the LCP (2.1) to be computed with a relative precision tol, we should have ' 1=tol. So is well defined, and cannot be arbitrarily large. It is worth noting that, in practice, a small number (one or two) of penalty iterations usually suffices to obtain convergence. Note that, in the case of finite difference methods, the initial guess NV^j;.0/for j > 1 is chosen based on linear extrapolation of the numerical solution from the two previous timesteps. That is:

VN^j;.0/D .h^j C h^{j 1} / h^{j 1}

VN^{j 1} h^j h^{j 1}

VN^{j 2}

Extensive experiments have shown that this choice of initial guess is more efficient than the standard choice of the numerical solution at the previous timestep (Dang (2007)). For j D 1, we set NV^1;.0/D NV⁰.

4.1 Solution of the linear complementarity problem

We now investigate the discrete solution of the LCP (2.1). We first consider the finite difference case. In this case, at each timestep, the solution of (3.3) is required. We define:

F NV_i^j Œ.I C h^jM^j/ NV^j .I .1 /h^jM^{j 1}/ NV^{j 1} g^ji (4.1) where Œidenotes the i th component of a vector. In order to obtain a finite difference approximate solution of (2.1) with an arbitrary level of precision, we need to show that the solution of (3.3) satisfies NV_i^j V_i! 0 as ! 1 for grid points where

(11)

F NV_i^j > 0. For finite difference methods, similarly to finite volume methods in Forsyth and Vetzal (2002), this follows if we can show that the term:

Œ NP^j.V^;j NV^j/i (4.2) is bounded independently of . It is also desirable that the bound be independent of the timestep and the spatial mesh spacing, so that can be chosen without regard to the grid and the timestep sizes.

We follow the lines of Forsyth and Vetzal (Theorem 4.1, 2002), which essentially gives sufficient conditions that allow us to bound (4.2). For the finite difference methods, these sufficient conditions are:

(i) that the matrix M^j arising from discretizing the differential operatorL^j is an M-matrix, ie, a matrix with nonpositive off-diagonals, and nonsingular with the inverse being nonnegative; and

(ii) that 1 .1 /h^j..M^j/i;i 1C .M^j/i;i C1C r/ > 0, where .M^j/i;i 1and .M^j/i;i C1are as given in (3.4).

Note that condition (ii) arises since we require that .I .1 //h^jM^{j 1}VN^{j 1} is bounded (see Appendix A of Forsyth and Vetzal (2002) for a similar proof in the context of finite volume methods). When a fully implicit method is used, condition (ii) is trivially satisfied, but when Crank–Nicolson timestepping is used, this condition essentially requires the boundedness of h^j=.mini.h^j_i/²/. In our experiments, this boundedness condition is not always satisfied. However, we observe that, as long as the Crank–Nicolson method is preceded by a finite number of fully implicit steps (using Rannacher smoothing (Rannacher (1984))), (4.2) is bounded independently of , and can be chosen without regard to the timestep and mesh spacing. Similar observations were also reported in Forsyth and Vetzal (2002), where, in fact, an open question is posed as to whether we can remove condition (ii) if the Crank–Nicolson timestepping is preceded by a finite number of fully implicit steps.

We now consider condition (i). For the finite difference methods it is more con- venient to study the property of the matrix M^j based on the following sufficient conditions for theM-matrix structure: strict diagonal dominance with positive diagonals and nonpositive off-diagonals (Hackbush (1993)). Note that if the matrix M^j satisfies these sufficient conditions, then so does the matrix I C h^jM^j, taking into account that and h^j are positive.

Lemma 4.1 Assume that the partition^j fS_i^jgⁿ_{i D0}satisfies the conditions:

h^j_i 6 ²S_{i 1}^j

r ; i D 3; 4; : : : ; n (4.3)

(12)

on the spatial step sizes, whereh^j_i D S_i^j S_{i 1}^j ,i D 3; 4; : : : ; n. Then, the matrix M^j defined in (3.4) is a strictly diagonally dominant matrix with positive diagonal and nonpositive off-diagonal entries.

Proof The explicit formula for superdiagonals of M^j is:

².S_i^j/²

h^j_{i C1}.h^j_i C h^j_{i C1}/ rS_i^jh^j_i

h^j_{i C1}.h^j_i C h^j_{i C1}/; i D 1; 2; : : : ; n 2

and hence the superdiagonal elements are always nonpositive. The subdiagonal entries of M^j are:

².S_i^j/²

h^j_i.h^j_i C h^j_{i C1}/C rS_i^jh^j_{i C1} h^j_i.h^j_i C h^j_{i C1}/

; i D 2; 3; : : : ; n 1

Under the given condition (4.3) on the spatial step length, the subdiagonal entries are nonpositive. Thus, under (4.3), all rows have nonpositive off-diagonals. Also, all but the first and last rows of M^jhave row sums equal to the positive interest rate .r > 0/, thus, these rows are strictly diagonally dominant, with positive diagonal elements.

Taking into account that S₁^j D h^j₁, the first row has elements:

M_1;1^j D ¹₂².S₁^j/²ˇ^j₂₁ rS₁^j˛₂₁^j C r D .²C r/h^j₁ h^j₂ M_1;2^j D ¹₂².S₁^j/²ˇ^j₃₁ rS₁^j˛₃₁^j D .²C r/ .h^j₁/²

h^j₂.h^j₁C h^j₂/

from which we obtain M_1;1^j > 0, M_1;2^j < 0 and jM_1;1^j j > jM_1;2^j j. Similarly, for the last row, we have M_n1;n2^j < 0 under condition (4.3), and the row sum is greater than r. Thus, we also have M_n1;n1^j > 0 and jM_n1;n1^j j > jM_n1;n2^j j. This concludes

the proof.

For QSC methods, we have not been able to obtain a rigorous proof of the boundedness of the term:

ŒP^j.c^;j c^j/_i (4.4)

However, as a numerical test, we monitor the quantity:

maxi;j

maxŒ0; V^;j.S / NV^j.S /

max.1; V^;j.S //

; S D D_i^j 2 D^j (4.5) which is a measure of the maximum relative error associated with enforcing the American constraint using the penalty method. This quantity will be small if the quantity (4.4) is bounded and if is sufficiently large. During experiments we noted that, as long as the Rannacher smoothing technique (Rannacher (1984)) is used, the a posteriori error quantity (4.5) is indeed of the order of tol. In Section 6 we report selected statistics of this measure for all the experiments that we run.

(13)

4.2 Convergence of the penalty iteration

The convergence study of the penalty iteration in Theorem 6.1 of Forsyth and Vetzal (2002) essentially consists of the following three results:

(i) The nonlinear penalty iteration converges to the unique solution of the discretized equation for any initial iterate.

(ii) The iterates converge monotonically.

(iii) The iteration has finite termination.

For finite difference methods the proof of convergence of the penalty iteration is based on two conditions. Firstly, that the matrix I C h^jM^j C NP^j;.k/is nonsingular and, secondly, that the inverse of the matrix I C h^jM^jC NP^j;.k/is nonnegative, where PN^j;.k/is the kth penalty matrix constructed at the j th timestep for finite difference methods using (3.5).

Under the sufficient condition (4.3), for matrix M^j to be anM-matrix, both the first and the second conditions are satisfied since I C h^jM^jC NP^j;.k/is a diagonally dominantM-matrix.

For QSC methods, since the unknowns are the degrees of freedom, we consider an equivalent transformed discretized problem with the unknowns being values instead of degrees of freedoms. To this end, instead of (3.8), we consider the transformed problem:

.I C h^jQ^j.Q₀^j/¹C NNP^j/ NNV^j

D .I .1 /h^jQ^{j 1}.Q^{j 1}₀ /¹C h^jQ^{j 1}_P .Q^{j 1}₀ /¹/ NNV^{j 1}C NNP^jVNN^;j C NNg^j (4.6) taking into account (3.9) and (3.10). In (4.6), NNV^jand NNV^;jare vectors of option values and payoff values, respectively, onD^j [ @˝ and NNg^j D g^j.Q^j₀/¹.

Similarly to the convergence proof in the finite difference case, two conditions must be satisfied at the kth iteration of the j th timestep. Firstly, the matrix:

I C h^jQ^j.Q^j₀/¹C NNP^j;.k/

should be nonsingular, ie, Equation (4.6) should have a unique solution, and, secondly, the inverse of I C h^jQ^j.Q₀^j/¹C NNP^j;.k/should be nonnegative, where NNP^j;.k/is the kth penalty matrix constructed at the j th timestep for QSC methods using (3.10).

Consider the matrix Q^j.Q^j₀/¹. Unfortunately, since, in general, this is a dense matrix with alternating signs at the off-diagonal entries, it cannot be anM-matrix, and hence we cannot use the same technique that was employed for the convergence proof of finite difference methods. Rather, we study numerically whether the matrix

(14)

I C h^jQ^j.Q₀^j/¹C NNP^j;.k/satisfies the two conditions. Our numerical results show that, at each timestep and for all grid sizes considered, I C h^jQ^j.Q₀^j/¹C NNP^j;.k/

is nonsingular and its entries are nonnegative within a tolerance of size tol ' 1=. It is also worth noting that the inverse of Q^j.Q₀^j/¹can be proved to be positive.

As shown by the numerical results, the fact that Q^j.Q^j₀/¹ does not satisfy the sufficient condition of being anM-matrix does not seem to have ill effects on the fast convergence of the penalty methods applied to QSC methods. Related observations were reported in Forsyth and Vetzal (2002), where a conjecture was made that the penalty iteration converges rapidly under much weaker conditions than the sufficient condition that the discretized differential operator is anM-matrix.

5 ADAPTIVE MESH METHODS

To construct the adaptive grids, we use monitor functions and respective grading functions, and the error equidistribution principle (Carey and Dinh (1985) and De Boor (1974)). According to the error equidistribution principle, the partition points are distributed in such a way that the error in some chosen norm (or seminorm) is equi- distributed among the subintervals of the partition. Depending on the norm chosen, a different monitor and a respective grading function arises. Generally, a grading function has the form:

.S; / D RS

0 V dSQ RS₁

0 V dSQ

for some appropriate monitor function QV .S; /. In this formula, the value .S; / at S represents the portion of the approximate error at time from the left endpoint of the spatial domain up to point S . To approximate the value of a grading function, the integrals involved in the formula are approximated using appropriate quadrature rules. Usually, the monitor functions involve high-order derivatives of V , which, in a practical situation, are not known. Therefore, approximate values are used to obtain the respective approximate values of the grading functions.

According to De Boor (1974), for a discretization method with error proportional to h^pV^.q/, where h is a spatial step size and V^.q/ is the qth derivative of V with respect to S , a good grading function is:

.S; / D RS

0 jV^.q/j^1=pdS RS₁

0 jV^.q/j^1=pdS

For different spatial discretization methods we may obtain different grading functions. We first consider the monitor functions for the finite difference method. Ignor- ing higher-order terms, the truncation errors of the finite difference approximations for the first and second spatial derivatives can be bounded in terms of max.h^j_i/²

(15)

and .h^j_{i C1} h^j_i/ C max.h^j_i/², respectively, which means that the finite difference method is formally first order. However, through numerical experiments, we found that h^j_{i C1} h^j_i is small enough that the error is dominated by max.h^j_i/². Then, the (spatial) discretization error ofLV in the finite difference method is considered second order with respect to the step sizes, and involves the third derivative V^.3/, resulting in the grading function:

_f.S; / D RS

0 jV^.3/j²⁼⁴dS RS1

0 jV^.3/j²⁼⁴dS

For the QSC methods we take q D 3 (as the error formula for the interpolant suggests) and p D 3 (the global order (see Christara and Ng (2006b))), resulting in the grading function:

_q.S; / D RS

0 jV^.3/j²⁼⁶dS RS1

0 jV^.3/j²⁼⁶dS

Given a grading function .S; j/ for a fixed time j and a fixed number of subintervals n, the adaptive algorithm computes points S_i^j, i D 0; : : : ; n, with .S₀^j; j/

.0; j/ D 0 and .Sn^j; j/ .S1; j/ D1, such that .S_i^j; j/ .S_{i 1}^j ; j/ 1=n, i D 1; : : : ; n, or, equivalently, .S_i^j; j/ i=n. To do this, we apply an iterative scheme based on Newton’s method:

S_i^j;.kC1/D S_i^j;.k/ .S_i^j;.k// .i=n/

⁰.S_i^j;.k//

S_i^j;.k/Q.R^S_i^j;.k/

0 VQ^j;.k/dS / .i=n/Q.RS1

0 VQ^j;.k/dS / VQ_i^j;.k/

(5.1) where QV_i^j;.k/denotes the approximate value of the monitor function evaluated at S_i^j;.k/

and whereQ./ is a quadrature rule approximation to an integral. Several quadrature rules may be used but, in our experiments, we used the trapezoidal rule for the finite difference method and the midpoint rule for the QSC method.

The fact that the midpoints are points of high accuracy and have no discontinuities for the QSC method explains the decision to use the midpoint rule. For the finite difference method, since the grid-point values are computed, the trapezoidal rule is a natural choice. Furthermore, we found that the variations between those quadrature rules have a negligible effect on the final results.

In our experiments, we applied only one iteration of (5.1). That is, at most, one redistribution of the spatial points takes place in one timestep, and thus the placement of the spatial points evolves as the timesteps proceed. Experiments show that this choice works well for the American option pricing problem and is attractive due to its efficiency.

(16)

To decide whether one redistribution of the spatial points is needed at a certain timestep, we use the criterion:

rdrift maxifr_i^jg r^j 6˛;

where r_i^j D Z S^j

i S_{i 1}^j

V dS; i D 1; : : : ; n; rQ ^j D RS₁

0 V dSQ

n (5.2)

The ratio rdrift gives an indication of how well distributed the partition is. If this ratio is too large, it follows that the maximum error estimate over all subintervals is considerably larger than the average estimate. Thus, the current partition is not well distributed. That is, for a partition to be well distributed, the maximum value of r_i^j must be roughly at most ˛ times as large as the average value r_i^j. Typical choices for

˛ are small numbers, such as ˛ D 2 (Wang et al (2004)). We have used ˛ D 5 in all our experiments, and the results show that this criterion works well for American option pricing.

Next we discuss in more detail how our adaptive mesh techniques work. For the purpose of our discussions, denote by NV^j

^l the approximate solution on the partition

^l at timestep j . A generic algorithm for timestepping from time j 1 to time j

using an adaptive mesh technique is summarized below. Note that we always start with a uniform grid pretending we do not know how the solutions behave. Subsequent partitions are fully determined by the adaptive technique.

5.0.1 An adaptive algorithm for timestepping fromj 1 toj

(1) let ^j D ^{j 1};

(2) compute NV^j_j by solving either (3.3) or (3.8) with a penalty iteration;

(3) if (5.2) is satisfied, then (4) proceed to line (14);

(5) else

(6) apply (5.1) (one iteration) to obtain a new ^j

(for QSC, some adjustment to ^j is made, as described in Remark 5.1);

(7) ifj < ˇ, then

(8) interpolate NV^{j 1}_{j 1}to obtain NV^{j 1}_j ;

(9) compute NV^j_j by solving either (3.3) or (3.8) with a penalty iteration;

(10) else

(17)

(11) interpolate NV^j

^{j 1}to obtain NV^j

^j; (12) end if

(13) end if

(14) proceed to step j .

We now briefly describe the algorithm. In lines (1) and (2) we apply a timestepping method, usually Crank–Nicolson, with the exception of the Rannacher smoothing technique for the first few timesteps, using the same spatial points for time j as for j 1. This computes approximate values NV^j

^j NV^j

^{j 1}. We calculate all needed quantities and check the criterion in (5.2). If the points are well distributed, we proceed to the next timestep (lines (3) and (4)). If not, the new location of the spatial points ^j is computed using (5.1) (lines (5) and (6)) (see Remark 5.1 about an adjustment to ^j for QSC). Next we need to calculate values of the approximation at the new spatial points at time j, ie, NV^j

^j. There are two ways to do this. The first is to interpolate VN^{j 1}

^{j 1}from the old partition ^{j 1}to the new partition ^jto obtain NV^{j 1}

^j , and then apply the same timestepping procedure that was previously used at line 2 to compute values of the approximation at the new partition points, ie, NV^j

^j. The second way is to simply interpolate NV^j_{j 1}from ^{j 1}to ^jto obtain NV^j_j. The first technique is used in the first few (ˇ) timesteps (lines (7)–(9)), while the second is used in all subsequent timesteps where a remeshing is invoked (lines (10) and (11)). Note that using the first technique for all timesteps is only desirable for functions that are fast evolving with time. For functions that evolve slowly with time, such as the value function of an American option, this approach is unnecessarily inefficient. During the experiments, we observed that:

(i) remeshings are always required for the first few timesteps due to the disconti- nuity of the initial data and

(ii) using the first technique for these initial timesteps is absolutely crucial for the accuracy of the numerical methods.

In the experiments, ˇ is equal to 4. We elaborate on several fine points of Algo- rithm 5.0.1 in the form of remarks.

Remark 5.1 We give some details about how the new partition ^j computed at line (6) of the algorithm is adjusted for QSC. As mentioned earlier, the space of quadratic splines with respect to a partition with n subintervals has dimension n C 2.

For a uniform grid, the natural choice for collocation points is the set of midpoints and the two boundary points. However, for a nonuniform partition, it is not obvious how these points can be chosen so that the optimal convergence of the resulting methods

(18)

is preserved. We follow the technique used in Christara and Ng (2006a) to construct a nonuniform grid and a set of collocation points for QSC via a certain mapping function.

Denote by fSi D ih; i D 0; : : : ; n; h D S1=ng the uniform partition of N˝ with n subintervals. Assume that we are timestepping from j 1to j and that (5.1) has been applied at line (6) of Algorithm 5.0.1 to give a nonuniform partition ^j D fS_i^jgⁿ_{i D0}. For adaptive QSC methods, we define a mapping ^jW N˝ ! N˝ with ^j being a bijective strictly increasing function such that:

j.0/ D 0; ^j.S1/ D S1

j.¹₂.Si 1C Si// D ¹₂.S_{i 1}^j C S_i^j/; i D 1; : : : ; n )

(5.3)

We then redefine ^j and defineD^jby:

^j S_i^j D ^j.Si/; i D 0; 1; : : : ; n D^j D_i^j D ^j.¹₂.S_{i 1}C Si//; i D 1; : : : ; n

9=

; (5.4)

The above adjustment of ^j was used in Christara and Ng (2006a) in the context of boundary value problems, and improved the accuracy of results. We therefore adopt it here as well.

The mapping ^j is generated using the algorithm for monotone piecewise cubic interpolation by Fritsch and Carlson (1990).

It is important to point out that the basis functions _i^j.S / are defined with respect to the adjusted ^j. With this discussion, for the adaptive mesh QSC methods, line (6) of Algorithm 5.0.1 can be broken down into the following substeps:

(61) apply (5.1) (one iteration) to obtain ^j fS_i^jgⁿ_{i D0}; (62) construct the mapping ^jW N˝ ! N˝ using (5.3);

(63) adjust ^j and defineD^j using (5.4).

Remark 5.2 In Algorithm 5.0.1, interpolation is needed at lines (8) and (11). For adaptive mesh finite difference methods, interpolation at these steps takes place on the option values, while, for adaptive mesh QSC methods, interpolation takes place on other quantities, as will be explained later. We now explain in detail how interpolation at these steps of the algorithm is done.

First, consider adaptive mesh finite difference methods. When j D 1, ie, when timestepping from 0 D 0 to 1, we can take advantage of the initial boundary condition, and hence no interpolation is needed at line (8). When j > 1 and a remeshing is required, interpolation needs to be tailored properly to ensure that certain properties of the problem related to the free boundary are not violated. For anAmerican

(19)

put, at each timestep the free boundary point separates the spatial domain into the stopping region, where the option value is equal to the payoff, and the continuation region, where the option value is greater than the payoff. Since, for adaptive finite difference methods, interpolation takes place on the option values, interpolated option values must have the following properties:

(i) They must be equal (within some tolerance) to the payoff in the stopping region.

(ii) They must be larger than the payoff, and decreasing in the continuation region.

Several possible choices for interpolation include cubic spline interpolation or piecewise cubic Hermite interpolation. It is known that piecewise cubic Hermite interpolation oscillates less, but is also less accurate than cubic spline interpolation.

We chose to use cubic spline interpolation to obtain higher accuracy. With cubic spline interpolation techniques, we observed that (ii) is always satisfied. However, (i) is not always met.

To resolve this problem we adjust the interpolated values to the left of the free boundary point so that they are equal to the payoff. It should be noted that the free boundary point at each timestep is approximated using the partition points and option values available; namely fS_i^{j 1}gⁿ_{i D0}, NV^{j 1}

^{j 1} for line (8) or fS_i^{j 1}gⁿ_{i D0}, NV^j

^{j 1} for line (11).

For adaptive mesh QSC methods it is important to point out that interpolation at lines (8) and (11) takes place neither on the option values nor on degrees of freedom corresponding to NV^{j 1}

^{j 1}and NV^j

^{j 1}, but on the values of:

.I C .1 /h^jL^{j 1}C h^jP_L^{j 1}/ NV^{j 1}

^{j 1} or .I C .1 /h^jL^j C h^jP_L^j/ NV^j

^{j 1}

We observe that the standard cubic spline interpolation worked well in this case, and no specific tailoring was needed.

Remark 5.3 In addition to the Rannacher smoothing technique (Rannacher (1984)), we also adopt another smoothing technique suggested in Pooley et al (2003).

That is, at each timestep, for finite difference methods, we choose to position a grid point at the strike E (the initial kink point). We extend this technique to QSC methods by positioning a collocation point at the strike E. A combination of Rannacher smoothing and this technique helps to preclude large oscillations in the estimation of the hedging parameters (Forsyth and Vetzal (2002) and Pooley et al (2003)). In addition, when we are interested in the option value and its hedging parameters at S D E, this technique has the benefit of avoiding interpolation in the case of a finite difference method and obtaining more precise option values and hedging parameters when a QSC method is used, since collocation points are points of high accuracy.

(20)

However, in the case of adaptive methods, the location of the grid points or collocation points is computed dynamically by the adaptive technique and thus strike E may fall between grid points or collocation points. In these cases we need to adjust the grid so that the strike E falls at a grid point (finite difference) or collocation point (QSC). To apply this adjustment we use the observation that the option values behave linearly in the area toward the left boundary of the domain, ie, in the stopping region, and, therefore, few points are needed in that region. Thus, we propose to move one grid point from this area to line up with the strike price (if a finite difference method is used), or to make the strike a collocation point (if a QSC method is used). More specifically, for finite difference methods, ^j contains E as a grid point, while, for QSC methods, E is one of the midpoints of ^j before the adjustment (5.4). Then, under the mapping (5.3), the set of collocation pointsD^j contains E.

It is important to note that, although it would also be desirable to have a partition with a grid point (for finite difference methods) or a collocation point (for QSC methods) at the free boundary, it is impossible to construct a partition with that property, since we do not know beforehand the exact location of the free boundary at each timestep. This issue appears with any method, adaptive or not. However, adaptive methods can partly resolve this issue. As shown by numerical results, the adaptive technique concentrates a lot of points around the free boundary.

As a result, the approximation of the free boundary at each timestep is highly accurate. Note that the free boundary locations approximated by adaptive mesh QSC methods are based on the set of collocation points rather than the set of grid points, as in the case of adaptive mesh finite difference methods.

Remark 5.4 We now discuss whether theM-matrix step size restriction (4.3) for finite difference methods could interfere with the step sizes chosen by the adaptive technique. We emphasize that condition (4.3) does not impose any restriction on h^j₂ and h^j₁, since the first row of M^j does not have a subdiagonal element and the superdiagonal elements are (unconditionally) nonpositive. For the rest of the step sizes it is possible that the new partition j constructed using (5.1) may not satisfy condition (4.3). If we wish to enforce condition (4.3), we can do so by monitoring whether (4.3) holds, while computing the points S_i^j, i D 1; : : : ; n 1, using (5.1), and adjusting the points accordingly if needed. For example, if a point S_i^j computed by (5.1) is such that h^j_i violates condition (4.3), that is, if h^j_i > ²S_{i 1}^j =r, we can set h^j_i D ²S_{i 1}^j =r and S_i^j D S_{i 1}^j C h^j_i. This means that we may need to introduce some extra spatial points.

During the experiments, we monitored carefully whether condition (4.3) holds, and noticed that, for all the cases we ran, the points generated by the adaptive finite difference procedure never violated this condition. Hence, we never needed to introduce more spatial points.