Chapter 3. A Quantitative Comparison of the Lee-Carter Model under Different Types
3.3. Conclusions
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
47
3.3 Conclusions
Recently, many researchers have examined mortality rates and explored different models. Some studies demonstrate that mortality rate improvements also exhibit jump properties. We therefore attempt to incorporate five heavy-tailed distributions—t, JD, VG, NIG, and GHST—into the Lee-Carter model. Using mortality data from six countries, we apply the BIC and KS, AD, and CvM tests and find consistent support for the non-Gaussian residuals of the Lee-Carter model. Specifically, when we calibrate the parameters of the Lee-Carter model, the JD-JD model6 is the best one for French mortality data, the NIG-NIG model is best for the Netherlands, the VG-t model offers the best goodness of fit for Swedish mortality data, the t-t model is best for the U.S. mortality data, and the NIG-t model is the best one for the mortality data from Finland and Switzerland. For forecasting mortality rates, we find that the normal distribution provides weak mortality projection performance, whereas t and its skew extension provide good mortality projections. Therefore, for applications of the Lee-Carter model, the heavy-tailed distributions appear to be the most appropriate choices for modeling long-term mortality data.
6 The terminology “X-Y model” refers to the error terms in Equations (3-1) and (3-2), respectively.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
48
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
49
Chapter 4
Mortality Modeling with
Non-Gaussian Innovations and Applications to the Valuation of Longevity Swaps
4.1 Stochastic Mortality Models with Cox Error Structures
In this section, we first review the RH model, in which the mortality index and cohort effect follow ARIMA models with normal innovations. However, according to the mortality data, the residuals exhibit non-Gaussian distribution. Consequently, we assume that the number of deaths follows a Cox process and that the death rates adhere to the RH model in which the residuals, the mortality indices, and the cohort effects follow three non-Gaussian distributions: JD, VG, and NIG. We also develop an iterative process for calibrating the corresponding parameters of the Cox process with leptokurtic intensity.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
50
Renshaw and Haberman’s (2006) Model
We analyze changes in mortality as a function of both age x and time t. For mortality forecasting, the cohort-based extension to the LC model proposed by Renshaw and Haberman (2006) is as follows:
x t, x x t x t x x t,ln m k
e
, (4-1)where
m
x t, is the death rate for age x in calendar year t, defined as running from time to timet 1
; describes the average pattern of mortality over an age xgroup;
k
t explains the time trend of the general mortality level; represents x age-specific patterns of mortality change, indicating the sensitivity of the logarithm of the force of mortality at age x to variations ink
t;
t x is a cohort effect;
xcontrols age-specific cohort contributions to the mortality projection; and
e
x t,represents the error term, which is normally distributed with mean 0 and variance
e2. This structure is designed to capture age–period–cohort effects.To forecast future mortality dynamics, the mortality index
k
t follows a one-dimensional random walk with drift (Lee and Carter, 1992), as follows:1
t t t
k k
, (4-2)where
is a drift term and
t is a sequence of independent and identically zero-mean Gaussian random variables. Let the year of birth be equal to c t x . Following the model setup of Renshaw and Haberman (2006) and Cairns et al. (2010), we model the cohort factor as an ARIMA(1,1,0) process that is independent of ck
t:
1
c c zc
, (4-3) t
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
51
where zc is a sequence of independent and identically standard normal random variables.
Normality Test for the RH Model
According to table 1 in Dowd et al.’s (2010) article, the residuals of the Renshaw and Haberman (2006) model exhibit leptokurticity. In this subsection, we therefore apply the JB statistic (Jarque and Bera, 1980) to test empirically the normality of the three mortality data sets from England and Wales, France, and Italy for subjects aged 60–89 years during the period 1900–1983. The mortality data came from the Human Mortality Database (HMD) website.7 Table 4-1 contains the results of the JB test for the residuals of the RH model, the first difference of the three countries’ mortality indices, and the corresponding cohort effects from 1900 to 1983. The JB statistic rejects the assumption of normality for the residuals of the RH model and the cohort effects. Therefore, we use the heavy-tailed distributions—JD, VG and NIG—to model the non-Gaussian nature of the error terms of the RH model.
Table 4-1. The Jarque-Bera Test
England and Wales France Italy
Residuals of the RH Model
373.952 1489.534 15531.672 [< 0.001] [< 0.001] [< 0.001]
First Difference in Mortality Indices
1.175 0.638 2.532
[0.477] [0.500] [0.182]
the Residuals of 343.557 43.933 295.214
Cohort Effects [< 0.001] [< 0.001] [< 0.001]
Note: The p-values of the Jarque-Bera test are in brackets.
7 See http://www.mortality.org/.
‧
heavy-tailed distributions: JD, VG and NIG. In the subsequent subsection, we take,
e
x t as an example to describe the properties of these heavy-tailed distributions;analogous results are obtained for
t andz
t x . We refer to these distributions in Chapter 2.A Cox Process with Leptokurtic Intensity
We assume that
D
x t, , or the number of deaths at age x during year t, adheres to a Cox process, also known as a doubly stochastic Poisson process. That is,
where
e
x t, is assumed to be an age- and period-homogeneous heavy-tailed distribution that captures leptokurticity. Letd
x t, be the corresponding number of deaths actually observed. Conditional one
x t, y
, the number of deathsD
x t, becomes a Poisson distribution with intensityE
xtexp
x x tk
x t x . As ay
result, the log-likelihood function based on the Cox regression model is defined as
, , ,
,
‧
Thus, to find the maximum likelihood estimates of the parameters of the Renshaw and Haberman (2006) model, we can maximize Equation (4-5) with respect to
x,
x,k
t,
x,
t x , and the error term distribution parameters. The closed-form solution of the log-likelihood function in Equation (4-5) is derived as follows:
,
Appendix C. Note that when
e
x t, is ignored while modeling the number of deaths (i.e.,M
ex t,
1 ), the log-likelihood function defined in Equation (4-7) is precisely 1 the same as that proposed by Wilmoth (1993), Brouhns et al. (2002), and Cairns et al.(2009).
Similar to the two-step procedures of Lee and Carter (1992), Brouhns et al.
(2002), and Renshaw and Haberman (2006), we first calibrate the parameters
x,
x,k
t,
x, and
t x- with an updating scheme. Then, we estimate
,
,
,
, and the distribution parameters of the residuals of the mortality indices and cohort effects. In the first step, there are six sets of parameters, namely, the
x,
x,k
t,
x, and
t x- parameters, as well as thee
x t, distribution parameters. Following Brouhns et al. (2002) and Renshaw and Haberman (2006), we use the following updating scheme: Let nx be the total number of ages. Starting with
x 1/ n
x and‧
LC model, we calibrate the correspondinge
x t, distribution parameters by maximizing the sample log-likelihood function:
Then, with the
e
x t, distribution parameters estimated from Equation (4-8), we employ an iterating method to estimate the corresponding parameters of the RH model according to the elementary Newton method (Goodman, 1979; Brouhns et al., 2002; Renshaw and Haberman, 2006).Following the estimating procedure of Renshaw and Haberman (2006), the parameters are estimated by iteration. In each iteration step, we update a single set of parameters; the other parameters are fixed at their current estimates using the following updating scheme: Consequently, the updating scheme is as follows:
‧
We repeat the updating cycle (Equations (4-8)–(4-14)) and stop when the log-likelihood function in Equation (4-7) converges.8 Model identification can be conveniently achieved with parameter constraints: t 0
t
After obtaining the mortality indices and cohort effects, we can calculate the parameters of Equations (4-2) and (4-3) by maximizing the log-likelihood function, as follows:
In this section, we investigate the goodness-of-fit distributions for the number of deaths, the first differences of the mortality indices, and the cohort effects. Using the
8 The criterion used to stop the iterative fitting procedure is a very small relative change in the log-likelihood function. We adopt 10−7 as the default value.
9 Similar to Brouhns et al. (2002), after updating the kt parameters, we impose a centering constraint
t 0 estimates for kt by the same number. Following the analogical procedure, the constraints of x and
t x are also achieved.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
56
mortality data from 1900–1983, we fit the residuals of the RH model to four distributions: normal, JD, VG, and NIG. We then fit the mortality indices and cohort effects from the best-fitting model according to the Bayesian information criterion (BIC) to the same four distributions. Finally, we project the subsequent 25-year mortality rates (1984–2008).
In-Sample Goodness of Fit
Using mortality data from 1900 to 1983, we first investigate the goodness-of-fit distributions of the number of deaths for England and Wales, France, and Italy. Table 4-2 presents the LLF, Akaike information criterion (AIC), and BIC statistics10 for the number of deaths at which the residuals of the RH model adhere to the normal, JD, VG, and NIG models. All three criteria indicate that the normal distribution is the worst fitting model for the number of deaths. They also indicate that the VG model is consistently the best model for the number of deaths in the three mortality data sets.
Therefore, we use the mortality indices and cohort effects obtained from the VG model to investigate the pattern of the error terms of the time and cohort effects.
Table 4-2. Goodness-of-fit Measures for the Number of Deaths
Model England and Wales France ItalyLLF AIC BIC LLF AIC BIC LLF AIC BIC
Normal -32727 33011 33839 -30356 30640 31468 -45130 45414 46242 JD -32557 32844 33681 -30237 30524 31361 -43719 44006 44843 VG
-32554 32840 33674 -30084 30370 31204 -42900 43186 44020
NIG -32556 32842 33676 -30236 30522 31356 -44314 44600 45434The test results for the first difference in mortality indices are in Table 4-3, which contains the LLF, AIC, and BIC statistics for the normal, JD, VG, and NIG distributions. The Gaussian model is the worst according to the LLF criterion, which
10 AIC LLF NP and BIC LLF0.5NPlog
NS , where NP is the effective number of parameters being estimated and NS is the number of observations.‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
57
also indicates that the best fit for the three mortality data sets derives from the JD model. The best in-sample goodness of fit for mortality indices changes for the normal distribution of all mortality data, because the BIC introduces a penalty term for the effective number of parameters.
Table 4-3. Goodness-of-fit Tests for the First Difference in Mortality Indices
Model England and Wales France ItalyLLF AIC BIC LLF AIC BIC LLF AIC BIC
Normal -164.87 166.87 169.29 -174.07 176.07 178.49 -179.71 181.71 184.13 JD
-161.89 166.89 172.93 -173.71 178.71 184.75 -178.59 183.59 189.64
VG -162.67166.67 171.50 -173.85 177.85 182.69 -178.67 182.67 187.51
NIG -163.49 167.49 172.33 -173.84 177.84 182.68 -178.68 182.68 187.52In Table 4-4 we present the LLF, AIC, and BIC statistics for the normal, JD, VG, and NIG distributions for cohort effects. The LLF, AIC, and BIC statistics consistently indicate that the best fit for Italy derives from the JD model, but for England and Wales, it derives from the NIG model. For France, the best model for cohort effects is the JD model according to LLF, but in terms of the AIC and BIC, the best is the VG model. All three criteria indicate that the normal distribution is the worst fitting model for the cohort effects. Consequently, with mortality data from three countries over the period 1900–1983, in-sample model selection criteria indicate a preference for modeling the RH model with non-Gaussian innovations.
Table 4-4. Goodness-of-fit Tests for the Residuals of Cohort Effects
Model England and Wales France ItalyLLF AIC BIC LLF AIC BIC LLF AIC BIC
Normal -96.11 99.11 103.18 -107.63 110.63 114.69 -125.75 128.75 132.81 JD -84.27 90.27 98.4 -97.57 103.57 111.69 -105.31 111.31 119.44 VG -83.83 88.83 95.6 -97.79 102.79 109.57 -115.01 120.01 126.79 NIG
-83.48 88.48 95.25 -97.86 102.86 109.64 -111.45 116.45 123.22
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
58
Mortality Projection
To assess out-of-sample performance, we apply the parameters estimated from the time period 1900–1983 to obtain 25-year mortality projections, calculating the mean absolute percentage error (MAPE) as follows:
, (4-16)
where
A
i is the logarithm of the historical mortality rate;F
i is the natural logarithm of the forecast mortality rate; and n is the number of observations.By applying the calibrated parameters of the RH model to the VG innovations (the best model according to BIC), we reveal the impact of the different distributions on the mortality projection for MAPE from 1984 to 2008 (Table 4-5). A lower value indicates better predictive power for the distribution. For comparison, we also provide the mortality projection of the original RH model with four forecasting distributions—normal, JD, VG and NIG (the original RH-Normal model corresponds to the M2 model of Cairns et al., 2009). The VG-NIG model11 is the best mortality projection for the mortality data of England and Wales. The VG-VG model provides the best one for the mortality data from France and Italy. As a result, in terms of the MAPE criterion, the RH model with non-Gaussian innovations provides better mortality projection than that obtained from the original RH model with normal innovations.
11 A VG-NIG model corresponds to a VG error term in the RH model and to NIG distributions for the time and cohort effects.
1
1 n i i
i i
A F MAPE n A
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
59
Table 4-5. MAPE of Logarithm of Mortality Projection in 1984-2008 (Unit: %)
Model England and Wales France Italy
Original RH-Normal 4.9344 5.1999 7.2268
Original RH-JD 4.9343 5.1915 7.2822
Original RH-VG 4.9393 5.1000 7.9326
Original RH-NIG 4.9319 5.1891 7.2983
VG-Normal 4.8072 5.1719 7.0612 VG-JD 4.8064 5.1619 7.1135 VG-VG 4.807 5.0595 6.9687
VG-NIG
4.8063
5.1598 7.1293Note: Original RH-Normal is the same as M2 of Cairns et al. (2009). The X-Y model corresponds to an X error term in the RH model and to Y distributions for the time and cohort effects.
4.3 Application: The Valuation of Longevity Swaps
In this section, we first price a longevity swap. Using the mortality data of England and Wales from 1900 to 2008, we then re-fit the RH model to attain the fair swap premium of the longevity swap for both the original RH model (M2) and the best projection model. Finally, we provide the VaR and CTE of the longevity swaps.
Pricing Longevity Swaps
The traditional method of transferring longevity risk in a pension plan or an annuity book is to sell the liability through an insurance or reinsurance contract, known as pension buyouts. These tactics have attracted increasing attention since 2006, especially in the United Kingdom. However, such transactions involve the transfer of all risks, including longevity and investment risk. To transfer longevity risk only to capital markets, Blake and Burrows (2001) first advocate the use of longevity bonds, whose coupon payments depend on the proportion of the population surviving to particular ages. Bauer (2006) and Barbarin (2008) also apply the Heath-Jarrow-Morton methodology (see Heath et al., 1992) to price longevity bonds.
The EIB/BNP longevity bond was the first securitization instrument designed to
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
60
transfer longevity risk but ultimately was withdrawn. The lack of success in issuing longevity bonds led to new securitization instruments, such as longevity swaps,12 which were pioneered in capital markets by J.P. Morgan and Canada Life in July 2008.
As Blake et al. (2012) show, 16 publicly announced longevity swaps were executed between 2007 and 2012 in the United Kingdom. In this context, the valuation of longevity swaps represents an important research topic for developing capital market solutions for longevity risk.
Longevity swaps have been widely explored in prior literature (Dawson, 2002;
Lin and Cox, 2005; Dowd et al., 2006; Dawson et al., 2010; Biffis et al., 2011; Wang and Yang, 2012). Dowd et al. (2006) introduce the mechanism for transferring longevity risk; this instrument involves exchanging actual pension payments for a series of pre-agreed fixed payments. On each payment date, the fixed-rate payer (e.g., pension plan) receives from the hedge supplier a random mortality-dependent payment and, in return, makes a fixed payment to the hedge supplier. Dowd et al.
(2006) demonstrate that the hedge is almost perfect when the reference index is based on the survivor experience of the insurer’s annuity book. If the expected reference indices and insurers’ own survivor experiences are highly correlated, the longevity swap can still hedge the insurer against a considerable amount of the aggregate longevity risk it faces. In this article, following the vanilla longevity swap structure analyzed by Dowd et al. (2006) and Dawson et al. (2010), we discuss a T-year bespoke longevity swap linked to a benchmark cohort of a given initial age for the England and Wales mortality data.13
For a given time horizon T, we consider a filtered probability space
12 For the recent development of longevity-linked securities, see Blake et al. (2012) and reference therein.
13 To bear no basis risk, the variable payments in bespoke longevity swaps are designed to match precisely the mortality experience of each individual hedger.
‧
enlarged filtration ℋ࣡ where ℋ is the filtration related to risk factors and ࣡ is the complement, such that is a stopping time on . Conditional on the path followed by the mortality rates, the t-year survival probability that a 65-year-old person in calendar year 2008+t reaches age 65+t is of the form:
xp
0 65 ,2008
( ) T e t s s
S t P t
m ds . (4-17) We assume that the mortality rates are constant within certain age and time windows but may vary from one window to the next. Specifically, given any integer age and calendar year , we presume thatTo transfer longevity risk, on each of the payment dates t, the fixed-rate payer pays the notional principal multiplied by a prespecified fixed proportion (1) ( )H t to the floating-rate payer and receives the notional principal multiplied by S t( ), where H t( ) is anticipated by using the best estimate of the underlying mortality model, and is the swap premium that would be set so that the initial value of the swap is zero for each party.
The distribution function of S t( ) under the real-world (physical) probability measure P is
‧
Wang (2000) proposes a distortion operator to change the probability measure from the real-world probability measure P to an equivalent martingale measure Q, with the following transformation:14 normal distribution function. Therefore, as shown by Denuit et al. (2007), the expectation value of S t( ) under the equivalent martingale measure Q is defined as
( )
01
1 ( )
01
1
1
( )
Q t t
E S t
F y dy
F y
dy
. (4-22) Let be the total annuities issued to an initial population that consists of persons aged 65 years who also are alive in 2008. Under the equivalent martingale measure Q, the fair value of a pay-fixed longevity swap at issue year 2009, denoted byLS
0, can be calculated aswhere is the risk-free rate. We also consider the term structure of the interest rate in our valuation framework. Let denote the price of a zero-coupon bond issued at time t that pays $1 at time T, . With the assumption that mortality rates and financial risk are independent, the fair value of a pay-fixed longevity swap takes the form:
14 The Wang transform represents only one possible choice among several incomplete market pricing methods. For example, Biffis et al. (2010) provide the equivalent changes of measures that preserve the structure of the LC model and the tractability of the doubly stochastic setup. The specification of both a real-world and an equivalent martingale measure raises the issue of whether the doubly stochastic setting applies under the two measures. For more details, please refer to the Proposition 3.2 in Biffis et al. (2010).
‧
The fair swap premium , which is set when the initial value of the swap equals zero, is given by
the Monte Carlo algorithm to compute the expected value of the t-year survival probability under the equivalent martingale measure Q in Appendix D.
Numerical Analysis
To simulate the mortality rates, we first re-fit the RH model with four distributions—normal, JD, VG, and NIG—to the mortality data of England and Wales from 1900 to 2008 in Table 4-6. Similar to the results based on the 1900–1983 period, the best model for England and Wales is still the VG model. Consequently, we use the best prediction models presented in Table 4-5 to simulate mortality rates, which is the VG-NIG model for England and Wales.
Table 4-6. Goodness-of-fit Measures for the Number of Deaths in 1900-2008
Model England and Wales
LLF AIC BIC
Normal -41760 42094 43111
JD -41616 41953 42980
VG
-41524 41860 42883
NIG -41526 41862 42886
In this section, we provide a numerical example of the longevity swaps based on a cohort of 65-year-old persons in calendar year 2008. The initial term structure is obtained from the U.S. Department of the Treasury.15 We also assume that M = 1.
15 See http://www.treasury.gov/resource-center/data-chart-center/interest-rates/pages/TextView.aspx?
‧
Figure 4-1 depicts the swap premium curve by varying the level of the risk-adjusted parameter . The fair swap premium is higher for a longer duration swap, because long-duration contracts are usually more expensive for covering longevity risk. The lower implies higher survival probabilities, so the fair swap premiums should be bigger for lower . In addition, the fair swap premiums of the RH model (the M2 model of Cairns et al., 2009) are higher than those of the best prediction model, which means that the fixed-rate payer (longevity risk hedger) can pay lower swap premium,
Figure 4-1 depicts the swap premium curve by varying the level of the risk-adjusted parameter . The fair swap premium is higher for a longer duration swap, because long-duration contracts are usually more expensive for covering longevity risk. The lower implies higher survival probabilities, so the fair swap premiums should be bigger for lower . In addition, the fair swap premiums of the RH model (the M2 model of Cairns et al., 2009) are higher than those of the best prediction model, which means that the fixed-rate payer (longevity risk hedger) can pay lower swap premium,