2 Estimating the continuous time stochastic volatility models
2.2 The model and likelihood inference about diffusions
The stochastic volatility models have been presented in the introduction. Generally, equation (2.1) just corresponds to the commonly used geometric Brownian motion or the CKLS model with coefficient of volatility replaced by the square root of a positive random variable Vt. Since in documented literatures, for example Engle and Patton (2001), it is suggested that volatilities possess the property of mean reversion,
equation (2.2) is usually specified with a term like
μ
V(V
t,t
)=−κ
(V
t −ξ
), in whichκ
represents the speed of mean reversion andξ
is the long-term equilibrium level of volatility.For further discussions, the following conditions are assumed.
Assumption 1. ) μ
X(X
t,V
t,t
, )σ
X(X
t,t
, )μ
V(V
t,t
andσ
V(V
t,t
) are functions which satisfy regularity conditions such that there exists a unique strong solution (X t,V
t ) of the systems (2.1) and (2.2).Assumption 2. The process V
t is stationary and ergodic with a distributionπ
.With these conditions, the stationary distribution for V0 and the transition probability density function for X t, Vt| X t-1, Vt-1 exist. Then the likelihood function can be
expressed in a proper form.
2.2.1 Likelihood function
Data for this problem consist of discrete-time observations
X
0,X
1,⋅ ⋅⋅,X
T only, while the process Vt remains completely unobservable. With the Markovian property of the diffusion process, the likelihood function can be expressed as∫ ∏
transition density functions are quite complicated so the integral cannot bedecomposed into factors concerning pivotal quantities such as Xt - Xt-1. This means that, when V~(1) is not observable, Xt will be dependent on (
X
0,⋅ ⋅⋅,X
t−1)rather than only on Xt-1.Explicit forms of transition probabilities have been identified only for some specific univariate processes, for example the Ornstein-Uhlenbeck process and the
Cox-Ingersoll-Ross process. However, it is almost infeasible to find explicit form of transition densities for the whole system (2.1) and (2.2) even when W1,t and W2,t are not correlated.
A practical means to compute the likelihood functions and find MLEs is through numerical methods. When equation (2.1) is set as a geometric Brownian motion for prices and W1,t and W2,t are assumed to be independent, each increment Xt - Xt-1 is normally distributed and the likelihood function can be obtained through simulating large number of paths of volatilities (Sørensen, 2003).
2.2.2 Simulated maximum likelihood and MCMC methods for inferences about diffusions
When observations for all processes are available, Pedersen (1995) shows the approximate likelihood under the Euler expansion converges to the true likelihood function in probability as the subdivision length between observations approaches 0.
That is, to obtain a good approximation of the likelihood function data augmentation is necessary and different paths connecting two consecutive observations should be simulated.
Consider the model consisting of equation (2.1) with Vt as a constant. The discretized version with subdivision length Δ=1/n would be
t
where ΔWt is a normally distributed random variable with mean 0 and variance Δ. Let the observed data be ~ ( , , )
X is the n-1 augmented data points lying between Xt-1
and Xt.
and )
φ
(⋅ is the density function of the standard normal distribution.Since ~ )
in which the expectation is taken over ~ ) ,
~ ,
(*
X
1(n) ⋅ ⋅⋅*X
T(n) .Generally numerical procedures such importance sampling shall be used for the
of paths, the likelihood function maybe approximated as
The original suggestion of Pedersen is quite simple. The required augmented data can be simply generated with the Euler expansion (2.4), that is, an importance sampler like
Clearly, a major drawback about this method is that it tends to lead to large jumps between the last augmented points and the consecutive observed data point.
Based on the Brownian bridge, Pedersen’s approach can be modified with the following scheme
Elerian et al. (2001) proposed alternative importance sampler for the problem. The advantage of the approach is drawing paths at one shot and eliminating huge jumps.
The augmented data between Xt-1 and Xt can be sampled from a multivariate normal distribution N(
μ
*, Σ*) whereMore bias correction and variance reduction methods and a summary discussion may be found in Durham and Gallant (2001). Most of the methods mentioned may be applied to the stochastic volatility models, especially when the two driving Brownian motions are uncorrelated.
2.2.3 Asymptotic equivalence of stochastic volatility models and GARCH models
Since Engle(1982) and Bollerslov (1986), GARCH models have been widely used for modeling financial time series with stochastic volatilities. Nelson (1990) first
investigated the convergence of GARCH processes to bivariate diffusions as the length of time intervals between observations goes to zero. Up to now diffusion limits for a variety of GARCH type processes have been found, for example, Duan (1997) and Fornari and Mele (2004). The relation between the two categories of models becomes very elaborate, especially when they both are essentially one-dimensional processes.
But even though the GARCH processes converge to their diffusion limits in
distribution, it is not trivial that inferences through the two processes are equivalent. A major distinction between the two types of models is observability of volatility
processes. Thus, once by subtle arrangement a GARCH model may maintain its availability of likelihood but its volatility process unobservable, it may work well to approximate the continuous counterpart. In fact, some recent researches have shown that the equivalence of the two types of models depends on the sampling frequency and the basic frequency of construction of the processes.
As set in the previous section, let ~ ( , , , ) at the basic frequency of construction. With the notation D(X, Y) for L1 distance of the joint density functions of the two processes X and Y, Wang (2002) showed
~ )
~ ,
(
X
(n)Y
(n)D
does not converge to 0 as n→∞. In other words, the likelihood processes have different asymptotic distributions and consequently the two types of models are not asymptotically equivalent.However, as the frequency of observations become much lower than that of
construction, the result goes quite different. Specifically speaking, let observations be ) period between observations and N* is the largest integer not larger than nT/l. Brown, Wang and Zhao (2003) illustrated the asymptotic equivalence of the MGARCH model and its diffusion limit with the dataset as n→∞ and
l
/ n1/2→∞.These seemingly contradicting results in fact sketch the relation between the
stochastic volatility model and its GARCH counterpart elaborately. Even though the GARCH process converges to the stochastic volatility model, the GARCH process is
still composed of normally distributed innovations and determined volatilities.
Augmentation of data deprives the GARCH process of these properties so that it may look like generated by a stochastic volatility model.
In other words, the implications are very similar to those among Lo (1988) and Pedersen (1995) on the univariate processes or multivariate process that are
completely observable. In short, even though the GARCH models provide as good approximations to stochastic volatility models, likelihood functions for the stochastic volatility models cannot be obtained through the corresponding GARCH model at the frequency of observation, 1/T. However, by the GARCH processes constructed at higher frequencies, the approximate likelihood function can be calculated with simulating all missing values.