` 1 −Penalized Likelihood Smoothing of Volatility Processes allowing for Abrupt Changes

(1)

` ₁ −Penalized Likelihood Smoothing of Volatility Processes allowing for Abrupt Changes

David Neto

^(a)

, Sylvain Sardy

^(b)

and Paul Tseng

^(c)

April 2009

Abstract

We consider the problem of estimating the volatility of a financial asset from a time series record of length T . We believe the underlying volatility process is smooth, possibly stationary, and with potential abrupt changes due to market news. By drawing parallels between time series and regression models, in particular between stochastic volatility models and Markov random fields smoothers, we propose a semiparametric estimator of volatility. Our Bayesian posterior mode estimate is the solution to an `1-penalized likelihood optimization that we solve with an interior point algorithm that is efficient since its complexity is bounded by O(T^3/2). We apply our volatility estimator to real financial data, di- agnose the model and perform back-testing to investigate to forecasting power of the method by comparison to (I)GARCH.

Keywords: `₁ penalty, Markov random field, stochastic volatility model, smoothing, wavelet, extreme value theory, forecasting, GARCH.

JEL classification: C14, C22, C61.

(a)Department of Econometrics, University of Geneva, CH-1211 Geneva 4, Switzerland.

E-mail: david.neto@unige.ch).

(b)Department of Mathematics, University of Geneva, CH-1211 Geneva 4, Switzerland.

E-mail: Sylvain.Sardy@unige.ch).

(c)Department of Mathematics, University of Washington, Seattle, WA.

E-mail: tseng@math.washington.edu).

(2)

1 Introduction

Suppose we observe values y_t of a financial asset (stock-returns, interest-rates or exchange-rates) at regularly spaced times t = 1, . . . , T . Our goal is to estimate an important intrinsic characteristic of the financial asset of interest, the evolution of the conditional variance of the stochastic return process, so as to assess the past, present and future risk of the asset. To that aim, many stochastic models have been proposed. Standard among them is to assume a data generating process for y_t. Most models assume y_t = σ_t_t, where σ_t is a measure of volatility, and _t is white standard Gaussian noise. With- out additional assumption, the maximum likelihood estimate is ˆσ^MLE_t = |y_t|, which is practically useless due to its nonsmoothness (i.e., high variability).

A temporal structure for σ_t is assumed to regularize the maximum likelihood estimation, so as to obtain a smoother estimate of σ_t while capturing the stylized features (Rydberg 2000) observed in financial econometric, like heavy tails of the marginal distributions, volatility clustering or evolution with possible abrupt changes (e.g., peaks, discontinuities). The popular GARCH-type models (Engle 1982, Bollerslev 1986) are parametric and enjoy good estimation properties. Stochastic volatility models (Taylor 1986, Taylor 1994) are power- ful semiparametric alternatives which fit more flexibly the stylized features. In particular, the log-normal stochastic volatility model seems to better capture the leptokurticity of the marginal distributions of the financial data than the standard GARCH model (Shephard 1996). Stochastic volatility models are the empirical discrete-time analogs of continuous-time models in finance theory and, in particular, option pricing (Hull and White 1987). They can also be viewed as an Euler discretization of a diffusion. The general discrete model of Andersen (1994) includes GARCH and stochastic volatility models as particular cases. It defines the discrete polynomial stochastic autoregressive volatility process as:

y_t = σ_t_t, (1)

σ_t^q = ϕ (h_t) , (2)

h_t = ω + φh_t−1+ (γ + ψh_t−1) η_t, (3) where q ∈ {1, 2}, ϕ(·) is a positive continuous and strictly monotone function, the autoregressive process h_thas positive coefficients ψ, φ, γ such that ψ + φ > 0 and ψ + φ > 0, the errors η_t and _t are i.i.d. and mutually independent with mean-variance of 1, σ_η²

and (0, 1), respectively. This process includes, for instance, the GARCH process (corresponding to q = 2, ϕ (h_t) = h_t, η_t = ²_t−1, and γ = 0) and the stochastic volatility model (SVM), corresponding to ϕ (h_t) = exp (ht), q = 1 and ψ = 0. Often ηt and t are assumed to be Gaussian.

(3)

One challenge for these models has been how to infer the parameters (ψ, φ, γ, ω) and how to estimate the hidden conditional volatility σt within the sample (smoothing) and out of sample (forecasting). While parametric GARCH models have been relatively easy to fit owing to the small number of parameters, it is not the case for semi-parametric SVM that represent the returns as a non- linear hidden Markov chain. Several methods have been proposed to estimate the latter data generating process, many of which are based on calculating or estimating the conditional expectations

( ˆψ, ˆφ, ˆγ, ˆω) = E((ψ, φ, γ, ω) | y; ση²) and ˆh_t = E ht|y; σ²_η , t = 1, . . . , T, (4) where y = (y₁, ..., y_T) are the returns up to time T . For a linear and Gaussian data generating process, the Kalman filter calculates a closed form expression for the posterior means (4). The quasi-maximum likelihood approach of Harvey, Ruiz and Shephard (1994) applies Kalman filtering to the logarithm of the squared observations log y_t² = 2h_t+ ζ_t, where ζ_t = log ²_t. However, the data generating process is non-Gaussian, and the poor approximation of ζ_t by a Gaussian distribution leads to poor estimation of σ_t, particularly when the variance σ_η² of the log-volatility η_t in (3) is small (Jacquier, Polson and Rossi 1994). Another possible approach is to generate by Monte-Carlo an ergodic chain from the exact posterior distribution given the observations (Jacquier et al. 1994) and calculate the componentwise averages to estimate the conditional expectations (4). However, this may require generating long sample paths before reaching convergence. The estimate of the variance σ_η² of the log-volatility innovations also remains a difficult issue.

Based on a GARCH model, the estimated persistence of the conditional volatilities of many financial time series is seemingly high, which also makes the inference difficult. For example the second moment of an Integrated GARCH (Engle and Bollerslev 1986) does not exist even if the stationarity conditions are met. One also wonders whether the volatility process is really stationary, and in fact, it is now believed that such strong persistence is due to misspecification modeling. For instance Diebold (1986) argued that the persistence may be due to the instability of the unconditional variance. Lastrapes (1989) observed that the persistence of exchange-rate volatility depends on U.S. monetary policy regimes. This statement is confirmed by Lamoureux and Lastrapes (1990) who obtained a substantial lower persistence volatility measure (i.e. the sum of the GARCH parameters, ψ + φ in equation (3)) when some dummy variables, corresponding to some periods over which the volatility seems to be stationary, are included in the GARCH model. Cai (1994) and Hamilton and Susmel (1994) used Markov switching ARCH models to take into account the regime effect in

(4)

the volatility, and in both cases, the persistence is captured by the stochastic process which describes the state of the system, rather than by the GARCH parameters. Another widely used approach (see for instance Chan and Maheu (2002) and Chernov, Gallant, Ghysels and Tauchen (1999)) adds a jump process to capture large discrete changes. However, the empirical evidence suggests that the jump models capture less the persistence than additional extremes in the returns marginal distribution. Starica (2003) shares the concerns early stated by Diebold (1986) about persistence and misspecification, and discusses the danger of assuming global stationarity. Starica and Granger (2005) proposed a nonstationary approach to estimate volatility by considering the log-absolute returns as a locally stationary process as defined in Dahlhaus (1997). Their methodology considers the underlying process as a slowly and smoothly varying process with sharp changes.

The aforementioned results motivate the features we aim to reproduce with our proposed model and estimator, namely, that volatility evolves smoothly except for occasional abrupt changes whose transient effect prevents the market from returning quickly to the level before shock. However, while Starica and Granger’s approach is based on successive stationarity tests over intervals of varying width to identify the times of regime switching, our `₁-based estimator finds the desired segmentation via the selection of a single parameter. Moreover, the solution of our `₁-penalized likelihood optimization problem is calculated with a fast algorithm. This paper is organized as follows. In Section 2.1 we define our volatility estimator and discuss its link to wavelet and Markov random field smoothing. In Section 2.2 we use the connection to smoothers to present a selection rule for the smoothing parameter which guarantees smoothness of the volatility estimate. In Section 2.3 we show that the volatility estimate can be efficiently computed using an interior-point algorithm. Section 2.4 illustrates the performance of the estimator on a simulated time series. In Section 3 we apply our estimator to the DOW JONES and NASDAQ, analyze the results and evaluate the forecasting performance. Section 4 draws some conclusions and points to future extensions.

2 Smoothing the volatility

2.1 `

₁

penalized likelihood estimator (fSVM)

We consider a discrete time SVM with Laplace innovations η_t whose density f_η(x) = ^λ₂ exp(−λ|x|) has heavier tails than the Gaussian density to better re- flect sudden changes of regime due to important market news. More precisely,

(5)

this model corresponds to (1)–(3) with ψ = 0, q = 1, γ = 1 and ω = µ(1 − φ).

Hence the variance of the log-volatility innovations is σ_η² = 2/λ² and this SVM is related to variance gamma (VG) models (Madan and Seneta 1990) for modeling the non-Gaussian nature of stock market returns and future index prices. In- deed the VG model can be expressed by equation (1), where σtfollows a Gamma process, which in its continuous version writes as S (t) = S (0) exp (L (t)), where S(t) are the stock prices and L (t) is a Laplace motion (Kotz, Kozubowski and Podg´orski 2001). Hence, the Laplace motion is defined as a Brownian motion, denoted B(τ ), evaluated at random time distributed as a Gamma process, denoted γ (t). This Brownian is said subordinated to the process γ (t). There- fore, the Laplace motion is defined as L (t) = B (γ (t)). Another appeal of the^d Laplace motion is it can be written as a compound Poisson process with independent and random jumps. In this sense it is a pure jumps process able then to capture abrupt changes. In addition to VG model, our model considers an extra parameter to model and capture the persistence in volatility.

Using Bayes theorem with (1) for the noise model and (2)–(3) for the prior assumption on the volatility process, we derive the posterior distribution of the volatility given the returns y. More precisely, by considering the negative log-posterior distribution, we define the maximum a posteriori estimate as the solution to

min

φ,µ,h T

X

t=1

log ϕ(h_t) − log{f(y_t/ϕ(h_t))} + λ

T

X

t=2

|h_t− (µ + φ (h_t−1− µ))|, (5)

where f denotes the density function of the error term _t, (h_t)^T_t=1 are the ϕ⁻¹- volatilities, µ is an average volatility measure, and φ is the persistence parameter. The function ϕ(·) is a strictly monotone function that maps the estimand h_t to the volatility σ_t = ϕ(h_t). For instance, the exponential function used by the original stochastic volatility model conveniently maps R into R⁺ for a positive volatility. Our estimator has the advantage that the positivity constraint is already active with the first logarithmic term in (5) which acts as a barrier against negativity. Hence we can consider a broader class of links, for instance, the power transform ht = (σ^δ_t − 1)/δ = ϕ⁻¹(σt) (Box and Cox 1982) that includes as special cases the exponential link (take δ → 0) and the linear link (take δ = 1).

The first sum in (5) is the negative log-likelihood for (1)–(2) and the second sum stems from the autoregressive process prior (3) with Laplace η-innovations.

The estimator is akin to a Tikhonov regularization (Tikhonov 1963) of the erratic maximum likelihood estimate, but using an `1-based penalty and suffi- ciently large penalty parameter λ > 0 to enforce smoothness. The `₁ penalized

(6)

likelihood formulation of the estimator (5) draws connection to two nonpara- metric function estimation techniques: One is Markov random field smoothing (Geman and Geman 1984, Besag 1986, Sardy and Tseng 2004), which solves a similar problem

min

h −l(h; y) + λ

T

X

t=2

|h_t− h_t−1|,

where −l(·; ·) is the negative log-likelihood of a noisy signal/image and h = (h₁, . . . , h_T). Another is wavelet smoothing (Donoho and Johnstone 1994). For instance, soft-Waveshrink for a wavelet matrix Φ, solves

h=Φαmin 1

2ky − hk²₂+ λkαk₁,

where α are the wavelet coefficients (Donoho, Johnstone, Hoch and Stern 1992).

The main appeal of soft-Waveshrink is that it has near minimax properties for a class of loss functions and smoothness classes (Donoho, Johnstone, Kerky- acharian and Picard 1995) for a simple selection of λ that only depends on T , the so-called universal rule (Donoho and Johnstone 1994). We exploit this connection to propose in Section 2.2 a selection rule for λ that leads to a smooth estimation of the volatility process.

How does our smoothing approach differ from existing approaches? We aim at estimating a smooth evolution of volatility in time with possible abrupt changes of regime which seems more realistic for finance applications, while the latter aims at estimating the true coefficients of some assumed erratic volatility data generating process. To that aim we propose a selection of λ (hence of σ²_η) borrowing ideas from wavelet smoothing. Our estimator also differs in the way the estimator is computed since we solve a convex optimization problem (5) (see Section 2.3) instead of solving an integration problem (4) approximately by sampling an ergodic Markov chain.

2.2 Selection of the smoothing parameter λ

The `₁-regularization parameter λ ≥ 0 in (5) controls the smoothing: When λ = 0, the solution is the wiggly maximum likelihood estimate ˆσ_t^MLE = |yt| for t = 1, . . . , T , while when λ tends to infinity the estimates h_ttends to either a linear function for φ = 1 or a function that is asymptotically constant for φ < 1. We see that selection of λ (or equivalently σ²_η in (3)) is crucial. In Gaussian wavelet smoothing, the universal penalty λ^wave_T = √

2 log T is a simple but surprisingly efficient choice as it endows the `1-penalized likelihood wavelet estimator with near minimax properties for a class of loss functions and smoothness measures

(7)

(Donoho and Johnstone 1994, Donoho et al. 1995). Borrowing from wavelet smoothing, we derive a universal penalty λ^fSVM_T for the maximum a posteriori estimator (5) such that the estimated volatility is a smooth fit to the data.

With λ selected by the universal rule, our stochastic volatility model (1)–(3) can be seen as a functional data generating process in the sense that it leads to an estimated volatility process which is a smooth rather than erratic function of time that fits the volatility of the financial asset. Hence our estimator is derived from a functional stochastic volatility model (fSVM).

In wavelet smoothing, the universal parameter is chosen such that, when the underlying signal is piecewise constant, the estimate is also piecewise constant with probability tending to one as the sample size tends to infinity. Likewise here, we set the universal parameter so that, when the true volatility is piecewise constant (i.e., persistence with φ = 1 on each interval) on K_T successive times, the volatility estimate is also piecewise constant with probability tending to one.

In Appendix A we derive the universal parameter λ^fSVM_T =pKT log(n_Tlog n_T) with n_T = T /K_T for standard Gaussian -innovations, φ = 1, K_T ∼ log T and the link ϕ(·) = exp(·). Deriving the universal parameter for φ = 1 is valid since we expect strong persistence in practice, and this avoids the derivation of a φ-dependent universal parameter.

2.3 Optimization issues

We study here how to solve (5) to obtain the proposed estimate. To solve (5) in (φ, µ, h), we use a decomposition approach that alternately solves in (µ, h) with φ held fixed, and in φ with (µ, h) held fixed. This alternating minimization approach, though not guaranteed to converge to a global minimum, works well in practice. How to solve each subproblem? For a fixed (µ, h), the objective function of (5) is convex piecewise-linear in φ and the minimum can be found by, e.g., sorting the breakpoints. In what follows, we focus on solving in (µ, h), with φ held fixed. The resulting subproblem can be written compactly as

min

h,µ T

X

t=1

g_t(h_t) + π(B_φh + µ(φ − 1)1), (6)

where 1 denotes the T −vector of ones, g_t(h_t) = log ϕ(h_t) − log{f(y_t/ϕ(h_t))}, (B_φh)_t = h_t+1− φh_t for t = 1, . . . , T − 1 and π(·) = λk · k₁. We will focus on the link ϕ(·) = exp(·), and it can be seen that g_t(h_t) = h_t+ ¹₂y_t²exp(−2h_t)+

constant is strictly convex for a Gaussian -innovations. Hence (6) is a convex optimization problem.

(8)

The case of φ = 1 can be solved efficiently using the IDM algorithm. Specif- ically, Theorem 3 of Sardy and Tseng (2004) applies with g_t^∗(ut) = ¹₂(1 − u_t)(log(^1−u_y2^t

t

) − 1) for Gaussian -innovations, so that a dual coordinate de- scent algorithm can be employed on the dual problem in u = (u₁, ..., u_T). We consider below the more challenging and interesting case of φ 6= 1, for which the IDM algorithm is impractical. As we show below, in this case (6) can be efficiently solved by a primal-dual interior-point algorithm.

2.3.1 Dual formulation

Using ϕ(·) = exp(·), we derive in Appendix B the dual of the primal subproblem (6). It has the general form

min

Q

X

t=1

q_t(x_t) s.t. Ax = b, x ≥ 0, (7)

where x is the dual vector, the matrix A has Q = 3T − 2 columns and q_t(·) is a function assumed to be convex, twice differentiable on (0, ∞) with limξ→0qt(ξ) = qt(0), with q⁰_t(·) concave and which satisfies

(ξ + δ) (q_t⁰(ξ + δ) − q_t⁰(ξ) − q_t⁰⁰(ξ)δ) ≥ −κq_t⁰⁰(ξ)δ² whenever |δ|

ξ ≤ ρ, (8) for some κ > 0 and 0 < ρ < 1. In particular, Appendix C shows that (8) is satisfied with κ = _2(1−ρ)¹ for Gaussian noise and exponential link. Specifically, we can take q_t(x_t) = x_tlog(x_t) + c_tx_t with c_t = −(1 + log(y_t²)), t = 1, . . . , T . 2.3.2 Log-barrier problem

The log-barrier problem, parameterized by ε > 0, is

min

Q

X

t=1

q_t(x_t) − ε log(x_t) s.t. Ax = b, x > 0,

with Karush-Kuhn-Tucker condition

Ax = b, x > 0, q⁰(x) − εX⁻¹1 − A^>u = 0,

where q⁰(x) = (q_t⁰(x_t))^Q_t=1 and X = diag(x₁, . . . , x_Q). This can be rewritten as Ax = b, x > 0, s = q⁰(x) − A^>u, Xs = ε1. (9)

(9)

The exact solution of (9) traces the central path as ε ranges over (0, ∞). The primal-dual interior-point algorithm solves the equations approximately using damped Newton method and decreases ε after each iteration. Specifically, (x, u) is an approximate solution of (9) if it belongs, together with ε, to the following so-called “wide neighborhood” of the central path:

N (τ ) =

(x, u, ε) | Ax = b, x > 0, s = q⁰(x) − A^>u, min

t x_ts_t≥ τ ε, ε = x^>s Q

, with 0 < τ < 1; see (Wright 1997) and references therein.

2.3.3 Primal-dual interior-point algorithm

The algorithm begins with any (x, u, ε) ∈ N (τ ). Then it solves the Newton equation

Xds+ Sdx = δε1 − Xs, (10)

Ad_x = 0, (11)

q⁰⁰(x)d_x− A^>d_u = d_s, (12) for (dx, ds, du), where s = q⁰(x) − A^>u, and 0 < δ < 1. Let

x[α] = x + αd_x, u[α] = u + αd_u, s[α] = q⁰(x[α]) − A^>u[α] ∀α > 0.

Let ν and ¯α be given by (26) and (27) in Appendix D. Then, beginning with α = 1, it checks if

(x[α], u[α]) ∈ N (τ ), ε[α] = x[α]^>s[α]

Q ≤ (1 − ¯αν)ε, (13) and if not, it decreases α by some factor 0 < % < 1 and repeat, until (13) is satisfied. Then we update

(x^new, u^new, ε^new) ←− (x[α], u[α], ε[α]),

and re-iterate, until ε ≤ ε^final. In our implementation, we use τ = 10⁻⁴, δ = 0.5,

% = 0.7, ρ = 0.99, we initialize by h = _α¹1 − c, w = αλ1, where c = (c₁, ..., c_T).

(which uniquely determine u and x), s = q⁰(x) − A^>u, and ε = ^x_Q^>^s, with 0 < α < 1 chosen so that (x, u, ε) ∈ N (τ ).

Our code is available from the authors and is fast since its complexity is bounded by O(Q^3/2), as we now show.

(10)

2.3.4 Iteration complexity

Appendix D shows that (13) is satisfied when α = ¯α given by (27) in Ap- pendix D. Thus ε decreases by a factor of at most 1 − ¯αν after each iteration so that, after k iterations, ε ≤ (1 − ¯αν)^kε^init. Thus ε ≤ ε^final whenever k ≥ log

ε^init ε^final

1

− log(1− ¯αν). Since log(1− ¯αν) ≤ − ¯αν and, by (27), _α¹_¯ = O(κQ^3/2+Q), this shows that the number of iterations until termination is at most

log ε^init ε^final

! 1

¯

αν = O (κQ^3/2+ Q) log ε^init ε^final

!!

,

where Q = 3T − 2 and T is the length of the time series. While there have been previous studies of path-following algorithms for entropic optimization of the form (7) and (8) (Potra and Ye 1993, Tseng 1992), these algorithms use the so-called “narrow neighborhood”, which is not practically efficient. To our knowledge, this is the first study of a path-following algorithm for entropic optimization that uses the wide neighborhood and is practically and theoretically efficient. Specifically, when (7) is a linear or a convex quadratic program, i.e., κ = 0 in (8), the above complexity result is the best known for an algorithm using the wide neighborhood; see Wright (1997, Theorem 5.11).

2.4 Simulated time series

We simulate data from a smooth volatility function with periods of abrupt changes of regime and volatility peaks, as one may expect in the financial mar- kets. To that aim we take the sum of two classical functions in wavelet smoothing, the blocks and bumps functions (Donoho and Johnstone 1994), rescaled to have a range of volatility σt ∈ [0.1, 10]. The log of the volatility function σ_t for t = 1, . . . , 5000 is the curve plotted on Figure 1 (b), where the dots are the maximum likelihood estimates log ˆσ_t^MLE = log |y_t|. Figure 1 (a) shows the simulated returns yt, and (b) shows the empirical autocorrelation function (acf) of the absolute returns, which reflects potential phenomena observed on real financial time series, such as volatility clustering, nonstationarity or long mem- ory. Figure 1 (e) shows the fSVM penalized likelihood estimated log-volatilities solution to (5) with ϕ(·) = exp(·), using the universal penalty λ_T derived in Section 2.2, while (d) shows the acf of the fitted absolute residuals |y_t/ˆσ_t| and (f) shows the quantile-quantile plot of the fitted residuals. We see with these simulated data that the estimation of the underlying volatility captures the important features of the true volatility, and that the fitted residual process matches well the i.i.d. standard Gaussian distribution.

(11)

0 1000 2000 3000 4000 5000

−20

−10 0 10

(a) Simulated returns y_t

0 50 100 150 200

−1

−0.5 0 0.5 1

lag (b) acf(|y|)

0 1000 2000 3000 4000 5000

−4

−2 0 2 4

(c) log |y

t| and true volatilities log σ_t

0 50 100 150 200

−1

−0.5 0 0.5 1

lag (d) acf(|res|)

0 1000 2000 3000 4000 5000

−4

−2 0 2 4

time (e) log |y

t| and estimated volatilities log σ_t

−4 −2 0 2 4

−4

−2 0 2 4

Theoretical quantiles

Empirical quantiles

(f) qq−plot of residuals

Figure 1: Volatility simulation with T = 5000. Top: simulated returns y_t(dots), and empirical correlation function (acf) of |y_t|; Middle: true log-volatilities σ_t (line) with log-absolute-returns log |y_t| (dots), and acf of residuals |y_t/ˆσ_t|;

Bottom: log-volatilities (line) estimated with the universal penalty λ^fSVM_T with log |y_t| (dots), and Gaussian qq-plot of residuals.

(12)

3 Applications

We illustrate our methodology on the DOW JONES and NASDAQ stock index returns. Data are the T = 4846 daily closing prices P_t between 29 December 1989 until 23 march 2009. Stock returns y_t are computed as 100 log (P_t/P_t−1).

Figures 2 and 3 display the log-returns for both indices on their top-left graphs.

In Section 3.1 we first consider the entire sequence of returns, apply our volatility estimator to it and analyze the result. Note that the estimation for such time- series took less than 2 minutes running a Matlab code on a standard computer.

In Section 3.2 we evaluate and compare the forecasting performance of our method to that of GARCH and IGARCH for short and long horizons.

3.1 Volatility smoothing

The volatility clustering feature is reflected by the strong autocorrelation in the absolute values of the returns on Figures 2 and 3 (b). Results of our volatility estimation is presented in graphs (d) on log-scale, and graphs (e)-(f) are diagnostics plots on the residuals. As expected the autocorrelation in the absolute rescaled residuals, |yt/ˆσt|, has been removed (see graphs (e)). And the normality holds quite well, except in the negative extreme tail (see graphs (f)).

This asymmetry is certainly due to the fact that our methodology does not yet model the leverage effect on the volatility (Nelson 1991, Glosten, Jagannathan and Runkle 1993), i.e. the responses of the volatility to the negative shocks (bad news) are stronger than the ones to the positive shocks. To capture this asymmetry, model (1)-(3) can be modified as in Omori, Chib, Shephard and Nakajima (2007).

The persistence parameter estimates for both indices are ˆφ = 1.0003 and φ = 1.0001, for the DOW JONES and the NASDAQ, respectively. These highˆ values of φ lead to conclude that the volatility may be nonstationary. However in the out-of-sample forecast exercise presented below, the estimates ˆφ are less than 1.00 and quite stable over the period from November 2001 until just before the recent subprime crisis, where the estimate of φ rises dramatically. Therefore, the fact that the full sample estimate of φ reaches above unity is probably due to the instability related to the recent crashes (see Figure 4). To derive the variance-covariance matrix of the vector of parameters and in particular of φ, we observe that for given λ and (h_t)^T_t=1, the second term of the objective function (5) is a least absolute error regression problem of (ht)^T_t=2 on (ht)^{T −1}_t=1 with slope φ and intercept ˜µ = µ(φ − 1). Bassett and Koenker (1978) provide asymptotic theory for least absolute fit. Hence the asymptotic covariance is given by V((ˆ˜µ, ˆφ)^>) = _λ¹2Q⁻¹ with λ = λ^fSVM_T of Section 2.2, where Q = X^>X

(13)

0 1000 2000 3000 4000

−5 0 5 10

(a) DOW JONES returns y_t

50 100 150

0 0.1 0.2 0.3 0.4 0.5

lag (b) acf(|y|)

50 100 150

−0.2

−0.1 0 0.1 0.2

lag (c) acf(y)

0 1000 2000 3000 4000

−4

−2 0 2

time

(d) log |y_t| and estimated volatilities log σ_t

−4 −2 0 2 4

−4

−2 0 2 4

Empirical quantiles

50 100 150

−0.2

−0.1 0 0.1 0.2

lag (e) acf(|res|)

(14)

0 1000 2000 3000 4000

−10

−5 0 5 10

(a) NASDAQ returns y_t

50 100 150

0 0.1 0.2 0.3 0.4 0.5

lag (b) acf(|y|)

50 100 150

−0.2

−0.1 0 0.1 0.2

lag (c) acf(y)

0 1000 2000 3000 4000

−4

−2 0 2

time

(d) log |y_t| and estimated volatilities log σ_t

−4 −2 0 2 4

−4

−2 0 2 4

Empirical quantiles

50 100 150

−0.2

−0.1 0 0.1 0.2

lag (e) acf(|res|)

Figure 3: NASDAQ: (a) returns y_t from 29 december 1989 until 23 march 2009;

(15)

with X = (1, h−1), and h−1 = (h₁, ..., h_{T −1})0, and with the standard assumption limT →∞ 1

(T −1)X^>X is positive definite. Here the vector h−1 is not observable, but is estimated by our estimator. So we compute the matrix Q with the estimated log-volatilities to provide an approximate covariance matrix for the two estimated parameters of the model. For the time series considered the estimated standard deviations of ˆφ are 0.0055 for the DOW JONES and 0.0045 for the NASDAQ.

Figure 4 compares the evolution of both estimated log-volatilities for the DOW JONES and the NASDAQ. As expected the NASDAQ is more volatile than the DOW JONES, except for the recent crashes of Autumn 2008 which affected all economic sectors. Some important market turbulence periods have been identified, and correspond to the main abrupt changes estimated by fSVM.

DOWJONES and NASDAQ estimated log−volatilities

-1.00 -0.50 0.00 0.50 1.00 1.50 2.00

02.01.1990 02.01.1991 02.01.1992 02.01.1993 02.01.1994 02.01.1995 02.01.1996 02.01.1997 02.01.1998 02.01.1999 02.01.2000 02.01.2001 02.01.2002 02.01.2003 02.01.2004 02.01.2005 02.01.2006 02.01.2007 02.01.2008 02.01.2009

Dow Jones

Nasdaq Autumn

1997:

Asiatic crisis

1998 Russian

crisis

Dot-com bubble

Subprimes summer

2007 Autumn crashes 2008

Figure 4: DOW JONES and NASDAQ’s estimated log-returns from 29 december 1989 until 23 march 2009.

(16)

3.2 Volatility forecasting

To evaluate the forecasting performance of various models, we calibrate on time series running from day one until day t = 3000 + (k − 1)H/2, k = 1, 2, . . . and forecast the volatility out-of-sample from day t + 1 until day t + H, for a short horizon H = 20 business days (one month) and long horizon H = 120 days (six months). The calibration up to time t provides parameter estimates for forecasting the volatility beyond time t using the following formulas for the different models, with the notation σ_t+j² = E(ϕ(ht+j)|y₁, ..., y_t):

for fSVM:

σ_t+j^2,fSVM = exp(2(µ + φ^j(h_t− µ))), j = 1, . . . , H, for GARCH:

σ^2,GARCH_t+j = (ψ + φ)^jσ_t²+ ω

j−1

X

k=0

(ψ + φ)^k, j = 1, . . . , H

and for IGARCH:

σ^2,IGARCH_t+j = σ_t²+ jω, j = 1, . . . , H.

For each model, we then calculate the median absolute error (MAE) forecasting measure between the forecasted volatilities and the realized volatilities

MAE^∗(H) = median{|

H

X

j=1

ˆ σ_t+j^2,∗ −

H

X

j=1

y²_t+j|, t = 3001+(k −1)H/2, k = 1, 2, . . .},

where “*” stands for fSVM, GARCH or IGARCH. We report in Table 1 the relative MAE with respect to that of fSVM (i.e., MAE^(I)GARCH/MAE^fSVM), so that a ratio larger than 1 means better forecasting for fSVM. The results show that fSVM outperforms GARCH for forecasting, especially for a long horizon.

Consequently our methodology appears better for the purpose of option pricing or portfolio management.

4 Conclusion

This paper proposes an original and complete new way to estimate returns volatility. Our approach combines the dynamic proposed by the well known SV models and the Markov random field smoother in order to estimate hidden Markov chains. The proposed estimator is based on Bayesian posterior

(17)

Table 1: Volatility forecast based on relative median absolute errors (MAE) for a short (H = 20) and long (H = 120) horizons.

DOW JONES NASDAQ

Horizon H = 20 H = 120 H = 20 H = 120

MAE GARCH/fSVM 1.1 1.9 0.8 1.8

MAE IGARCH/fSVM 1.2 3.0 0.8 2.7

mode estimation. Its performance is illustrated through smulations and empirical applications based on the Dow Jones and Nasdaq. Our volatility forecast outperforms GARCH’s forcast, especially on long horizon, and therefore our methodology looks more suitable for option pricing models or portfolio management which require long term forcast. As a major topic for future research, the multivariate extension is probably the one which is most promising, especially given our selection of the smoothing parameter, the speed of our algorithm and the good forecasting performance. Another natural extension would consist in capturing leverage effect or asymmetry in the marginal distribution of the returns.

APPENDIX

A Universal penalty

Assuming for simplicity T is a multiple of K, let B_K be the matrix operating fi- nite differences skipping every other K, i.e., B_Kh = 0 iff h is piecewise constant taking identical values at K successive points:

h_K(i−1)+1= . . . = h_K(i−1)+K, i = 1, . . . , n_T = T /K. (14) The Karush-Kuhn-Tucker first-order optimality conditions for (5) with φ = 1 and skipping every other K differences in the penalty are

1 − y_t²exp(−2h_t) + (B_K^>w)_t = 0 t = 1, . . . , T (15)

kwk_∞ ≤ λ, (16)

where kwk∞ = max (|w1|, ..., |wT|). The solution w to (15)-(16) subject to K successive identical values has entries

w_K(i−1)+k = k − K Pk

j=1y_K(i−1)+j² PK

j=1y_K(i−1)+j² , i = 1, . . . , nT, k = 1, . . . , K (17)

(18)

provided λ is large enough. The smallest possible λ allowing a solution is λy = kwk∞ for w given in (17). The goal of the universal penalty λT is to control the extremal behavior of λ_y so that P(kwk^∞ ≤ λ_T) ^{T →∞}→ 1 when the true underlying volatility is constant σ₀, i.e., y_t i.i.d. N (0, σ₀²). The vector w can be broken into nT = T /K independent blocks wi = (wi1, . . . , w_i(K−1)) each of which converging to a Brownian bridge process. To see this, consider the first block for which

P (kw¹k∞ ≤ λ) = P max

k=1,...,K−1|k − K Pk

j=1Z_j² PK

j=1Z_j²| ≤ λ

!

where Z_j = y_j/σ₀is i.i.d. N (0, 1). Note that Z_j²+Z_j+1² =_dE_l, with j = 2(l−1)+1 and l = 1, . . . , K/2, where E_l is i.i.d. Exp(1/2), so

Pk j=1Z_j² PK

j=1Z_j²

=d

Pk/2 l=1E_l PK/2

l=1 E_l

= Ud _l,

where 0 ≤ U₁ ≤ . . . ≤ U_K/2≤ 1 are distributed as Uniform(0, 1) order statistics from a sample of size K/2 (Shorack and Wellner 1986, p.496). Moreover the uniform quantile process converges to a Brownian bridge process W (r), so

P(kw¹k_∞ ≤ λ) = P( sup^·

r∈[0,1]

|W (r)| ≤ λ/√ 2K)

= 1 − 2

∞

X

k=1

(−1)^k+1exp(−2k²(λ/√ 2K)²)

≥ 1 − 2 exp(−2(λ/√

2K)²).

Consequently

P(kwk^∞ ≤ λ_T) ≥

1 − 2 exp(−2(λ/√

2K)²)n_T ·

= exp(−2/ log n_T) with λ^{f SV M}_T =√

2K q1

2 log((n_T log n_T)), which tends to one with a slow rate as T grows. We choose K = K_T ∼ log T for the blocks size to slowly grow with T .

B Derivation of the dual problem

Letting θ = (h, µ), the primal problem (6) has the form of min

θ g(θ) + π(Bθ),

(19)

where g(θ) = PT

t=1g_t(h_t) + g₀(µ), g₀(µ) = 0, B = [B_φ (φ − 1)1]. Thus the Fenchel dual (Rockafellar 1970) has the form

minw g^∗(B^>w) + π^∗(−w), with g^∗(η) = PT

t=1g_t^∗(ηt) + g₀^∗(η0). Here ^∗ denotes the convex conjugate, i.e., g^∗(η) = sup_θθ^Tη − g(θ). Straightforward calculation yields g^∗_t(u_t) =

1

2(1 − u_t)(log(^1−u_y2^t

t ) − 1) when g_t(h_t) = h_t + ¹₂y_t²exp(−2h_t) (Gaussian noise) and g^∗_t(u_t) = (1 − u_t)(log(^1−u_y ^t

t ) − 1) when g_t(h_t) = h_t+ y_texp(−h_t) (Laplace noise), for t = 1, . . . , T . Moreover, g₀^∗(η₀) = 0 if η₀ = 0 and otherwise equals

∞. Similarly, π^∗(−w) = 0 if kwk_∞ ≤ λ and otherwise equals ∞. This yields the dual problem

min

T

X

t=1

g_t^∗(u_t) s.t. u = B_φ^>w, 1^>w = 0, −λ1 ≤ w ≤ λ1.

Then letting z = 1 − u and w₁ = λ1 − w, w₂ = λ1 + w, and c_t = −(1 + log(y_t²)) or ct = −(1 + log(yt)), and upon eliminating w, the above dual problem is equivalent to

wmin1,w2,z T

X

t=1

z_tlog(z_t) + c_tz_t

s.t. B^>_φw₂+ z = 1 + λB_φ^>1, 1^>w₂ = λ(T − 1), w₁+ w₂ = 2λ1, z ≥ 0, w₁ ≥ 0, w₂ ≥ 0.

where x = (z, w₁, w₂) are the dual variables. This has the form (7) with

A =





I 0 B_φ^>

0 0 1^>

0 I I



, b =





1 + λB^>_φ1 λ(T − 1) 2λ1



, q_t(x_t) =

(ztlog(zt) + ctzt if t ≤ T ;

0 else.

C Checking the condition (8)

It is easily seen that the condition (8) with κ = _2(1−ρ)¹ is satisfied by q_t(x_t) = 0 for any 0 < ρ < 1. Below we show that it is also satisfied by q_t(x_t) = x_tlog(x_t)+c_tx_t. We have q_t⁰(xt) = log(xt) + 1 + ct and q⁰⁰_t(xt) = 1/xt. Since |δ|/ξ ≤ ρ < 1, we have the series expansion

log(ξ + δ) − log ξ −δ ξ =

∞

X

k=2

(−1)^k+1 δ^k kξ^k.

(20)

When δ < 0, we have 0 < ξ+δ ≤ ξ. Moreover, all terms in the series are negative and the series can be bounded from below byP∞

k=2−^δ_ξ²2

ρ^k−2

2 = −_2(1−ρ)¹ ^δ_ξ²2. Thus (ξ + δ)

log(ξ + δ) − log ξ − δ ξ

≥ − ξ + δ 2(1 − ρ)

δ²

ξ² ≥ − δ² 2(1 − ρ)ξ,

implying (8) with κ = _2(1−ρ)¹ . When δ ≥ 0, we have 0 < ξ + δ ≤ ξ(1 + ρ) ≤

ξ

1−ρ. Moreover, the series alternates in sign and can be written as −_2ξ^δ²2 + P

k=3,5,...

1

k− _(k+1)ξ^δ

δ^k

ξ^k ≥ −_2ξ^δ²2. Then we again obtain (8) with κ = _2(1−ρ)¹ .

D Iteration complexity analysis

Fix any (x, u, ε) ∈ N (τ ). Let (d_x, d_s, d_u) be the solution of (10)–(12), where s = q⁰(x) − A^>u, and 0 < δ < 1. We show that (13) is satisfied when α = ¯α, where ¯α and ν are given by (27) and (26).

As in the proof of Wright (1997, Lemma 5.10), we first bound kD_xd_sk.

Letting D = X^−1/2S^1/2, we rewrite (10) as

D⁻¹d_x+ Dd_s = (XS)^−1/2r,

where for simplicity we denote r = δε1 − Xs. Left multiplying (12) by d^>_x and using q⁰⁰_t(x_t) ≥ 0 (since q_t(·) is convex) and (11) yields 0 ≤ d^>_xq⁰⁰(x)d_x = d^>_xd_s. Then, by Wright (1997, Lemma 5.3), we have

kD_xd_sk = k(D⁻¹D_x)(Dd_s)k

≤ 2^−3/2kD⁻¹d_x+ Dd_sk²

= 2^−3/2k(XS)^−1/2rk²

= 2^−3/2k(XS)^−1/2δε1 − (XS)^1/21k²

= 2^−3/2 X

t

δ²ε²

x_ts_t − 2σεQ + x^>s

!

≤ 2^−3/2

Qδ²ε²

τ ε − 2δεQ + εQ

= ϑQε, (18)

where we let ϑ = 2^−3/2

δ²

τ − 2δ + 1 .

Let ¯d_x = X⁻¹d_x. We next bound k ¯d_xk. We have from left multiplying (10) by ¯d^>_x and using d^>_xd_s = d^>_xq⁰⁰(x)d_x that

d^>_xq⁰⁰(x)d_x+ ¯d^>_xSX ¯d_x = ¯d^>_xr,

(21)

where for simplicity we denote r = δε1 − Xs. Since (x, u, ε) ∈ N (τ ) so that xtst ≥ τ ε for all t, this and the Cauchy-Schwarz inequality yields τ εk ¯dxk² ≤ k ¯d_xkkrk so that

k ¯d_xk ≤ krk

τ ε . (19)

Since z = ^Xs_ε satisfies z ≥ 0 and 1^>z = Q, we have krk ≤ max

z≥0{εkδ1 − zk | 1^>z = Q} = εp

(Q − δ)²+ (Q − 1)δ² ≤ εQ, (20) where the equality is due to the maximum of a convex function being attained at an extreme point, which in this case is a Q multiple of the unit coordinate vector.

Fix any α > 0 with αk ¯d_xk∞≤ ρ. By (19) and (20), this occurs whenever α ≤ ρτ

Q. (21)

We have Ax[α] = Ax + αAd_x = b and X[α]s[α]

= X[α](q⁰(x + αd_x) − A^>(u + αd_u))

= X[α](q⁰(x + αd_x) − q⁰(x) − αq⁰⁰(x)d_x) + X[α](s + αq⁰⁰(x)d_x− αA^>d_u)

= X[α](q⁰(x + αd_x) − q⁰(x) − αq⁰⁰(x)d_x) + Xs + αSd_x+ αXd_s+ α²D_xd_s

= X[α](q⁰(x + αd_x) − q⁰(x) − αq⁰⁰(x)d_x) + (1 − α)Xs + αδε1 + α²D_xd_s

≥ −κα²q⁰⁰(x)D_xd_x+ (1 − α)τ ε1 + αδε1 + α²D_xd_s

≥ −κα²kq⁰⁰(x)D_xd_xk₁1 + ((1 − α)τ + αδ)ε1 − α²kD_xd_sk1

= −κα²d^>_xds1 + ((1 − α)τ + αδ)ε1 − α²kDxdsk1

≥ −κα²ϑQ^3/2ε1 + ((1 − α)τ + αδ)ε1 − α²ϑQε1, (22) where the third equality uses (12); the fourth equality uses (10); the first inequality uses x[α] > 0, (8), and x_ts_t≥ τ ε for all t (since (x, u, ε) ∈ N (τ )); the last inequality uses d^>_xds ≤ kDxdsk1 ≤√

QkDxdsk and (18). Similarly, we have x[α]^>s[α] = x[α]^>(q⁰(x + αd_x) − q⁰(x) − αq⁰⁰(x)d_x)

+(1 − α)x^>s + αδεQ + α²d^>_xd_s

≤ (1 − α)εQ + αδεQ + α²ϑQ^3/2ε, (23) where the inequality uses the concavity of q_t⁰(·) as well as d^>_xd_s ≤ √

QkD_xd_sk and (18). Comparing the two bounds (22) and (23), we see that X[α]s[α] ≥

τ

Qx[α]^>s[α]1 whenever

−κα²ϑQ^3/2+ ((1 − α)τ + αδ) − α²ϑQ ≥ τ (1 − α + αδ + α²ϑQ^1/2)

(22)

or, equivalently,

α ≤ δ(1 − τ )

ϑ(κQ^3/2+ Q + τ Q^1/2). (24) Moreover, by (23), we have

ε[α] = x[α]^>s[α]

Q ≤ (1 − α(1 − δ − αϑQ^1/2))ε ≤ (1 − αν)ε whenever

α ≤ ρ(1 − δ)

ϑQ^1/2 , (25)

where

ν = (1 − δ)(1 − ρ). (26)

The minimum of the three bounds (21), (24), and (25) is

¯

α = min ρτ

Q, δ(1 − τ )

ϑ(κQ^3/2+ Q + τ Q^1/2),ρ(1 − δ) ϑQ^1/2

. (27)

References

Andersen, T. G. (1994), “Stochastic autoregressive volatility: A framework for volatility modeling,” Mathematical Finance, 4, 75–102.

Bassett, G. Jr, and Koenker, R. (1978), “Asymptotic theory of least absolute error regression,” Journal of the American Statistical Association, 73, 618–

622.

Besag, J. (1986), “On the statistical analysis of dirty pictures (with discussion),”

Journal of the Royal Statistical Society, Series B, 48, 192–236.

Bollerslev, T. (1986), “Generalized autoregressive conditional heteroskedasticity,” Journal of Econometrics, 31, 307–327.

Box, G. E. P., and Cox, D. R. (1982), “An analysis of transformations revisited, rebutted,” Journal of the American Statistical Association, 77, 209–210.

Cai, J. (1994), “A Markov model of switching-regime ARCH,” Journal of Busi- ness and Economic Statistics, 12, 309–316.

Chan, H. W., and Maheu, J. M. (2002), “Conditional jump dynamics in stock market returns,” Journal of Business and Enonomic Statistics, 20, 377–

389.

(23)

Chernov, M., Gallant, A. R., Ghysels, E., and Tauchen, G. (1999), “A new class of stochastic volatility models with jumps: Theory and estimation,”

Technical Report, CIRANO, Montr´eal.

Dahlhaus, R. (1997), “Fitting time series models to nonstationary processes,”

Annals of Statistics, 25, 1–37.

Diebold, F.X. (1986), “Modeling the persistence of conditional variances: A comment,” Econometric Reviews, 5, 51–56.

Donoho, D. L., and Johnstone, I. M. (1994), “Ideal spatial adaptation via wavelet shrinkage,” Biometrika, 81, 425–455.

Donoho, D. L., Johnstone, I. M., Hoch, J. C., and Stern, A. S. (1992), “Max- imum entropy and the nearly black object (with discussion),” Journal of the Royal Statistical Society, Series B, Methodological, 54, 41–81.

Donoho, D. L., Johnstone, I. M., Kerkyacharian, G., and Picard, D. (1995),

“Wavelet shrinkage: Asymptopia? (with discussion),” Journal of the Royal Statistical Society, Series B, 57, 301–369.

Engle, R. F. (1982), “Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation,” Econometrica, 50, 987–

1007.

Engle, R.F., and Bollerslev, T. (1986), “Modelling the persistence of conditional variance,” Econometric Reviews, 5, 1–50.

Geman, S., and Geman, D. (1984), “Stochastic relaxation. Gibbs distributions, and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 61, 721–741.

Glosten, R. T., Jagannathan, R., and Runkle, D. (1993), “On the relation between the expected value and the volatility of the nominal excess return on stocks,” Journal of Finance, 48, 1779–1801.

Hamilton, J. D., and Susmel, R. (1994), “Autoregressive conditional heteroskedasticity and changes in regime,” Journal of Econometrics, 64, 307–

333.

Harvey, A.C., Ruiz, E., and Shephard, N. (1994), “Multivariate stochastic variance models,” Review of Economic Studies, 61, 247–264.

(24)

Hull, J., and White, A. (1987), “The pricing of options on assets with stochastic volatilities,” Journal of Finance, 42, 281–300.

Jacquier, E., Polson, N. G., and Rossi, P. E. (1994), “Bayesian analysis of stochastic volatility models (Disc: P389-417),” Journal of Business and Economic Statistics, 12, 371–389.

Kotz, S., Kozubowski, T. J., and Podg´orski, K. (2001) The Laplace Distribu- tion and Generalizations: a Revisit with Applications to Communications, Economics, Engineering, and Finance, Birkhaeuser Verlag.

Lamoureux, C. G., and Lastrapes, W. D. (1990), “Persistence in variance, struc- tural change, and the GARCH model,” Journal of Business and Economic Statistics, 8, 225–234.

Lastrapes, W.D. (1989), “Exchange rate volatility and u.s. monetary policy: An arch application,” Journal of Money, Credit, and Banking, 21, 66–77.

Madan, D. B., and Seneta, E. (1990), “The variance gamma (V.G.) model for share market returns,” The Journal of Business, 63, 511–524.

Nelson, D.B. (1991), “Conditional heteroskedasticity in asset returns: A new approach,” Econometrica, 59, 397–370.

Omori, Y., Chib, S., Shephard, N., and Nakajima, J. (2007), “Stochastic volatility with leverage: fast and efficient likelihood inference,” Journal of Econo- metrics, 140, 425–449.

Potra, F., and Ye, Y. (1993), “A quadratically convergent polynomial interior- point algorithm for solving entropy optimization problems,” SIAM Journal on Optimization, 3, 843–860.

Rockafellar, R. (1970) Convex Analysis, Princeton: Princeton University Press.

Rydberg, T. H. (2000), “Realistic statistical modelling of financial data,” In- ternational Statistical Review, 68(3), 233–258.

Sardy, S., and Tseng, P. (2004), “On the statistical analysis of smoothing by maximizing dirty markov random field posterior distributions,” Journal of the American Statistical Association, 99, 191–204.

Shephard, N. (1996), “Statistical aspects of arch and stochastic volatility,” In

“Time Series Models in Econometrics, Finance and Other Fields,” Lon- don: Chapman and Hall: Eds. D.R. Cox, David V. Hinkley and Ole E. Barndorff-Nielsen pp. 1–67.

(25)

Shorack, G.R., and Wellner, J. (1986) Empirical Processes with Applications to Statistics, Wiley.

Starica, C. (2003), “Is garch(1,1) as good a model as the nobel prize accolades would imply?,” Working paper.

Starica, C., and Granger, C. (2005), “Nonstationarities in stock returns,” The Review of Economics and Statistics, 87, 503–522.

Taylor, S. (1986) Modelling Financial Time Series, John Wiley.

Taylor, S.J. (1994), “Modelling stochastic volatility: a review and comparative study,” Mathematical Finance, 4, 183–204.

Tikhonov, A. N. (1963), “Solution of incorrectly formulated problems and the regularization method,” Soviet Math. Dokl., 4, 1035–1038.

Tseng, P. (1992), “Global linear convergence of a path-following algorithm for some monotone variational inequality problems,” Journal of Optimization Theory and Applications, 75, 265–279.

Wright, S. (1997) Primal-Dual Interior-Point Methods, Philadelphia: SIAM.