of modeling idiosyncratic covariance by comparing POET and Lee-Carter model.
The challenge of estimating the covariance matrix between the mortality rates is that there are too many parameters when the age span of mortality rate N is large. POET takes advantage on the low-rank common component and assumes that Σu is conditionally sparse, that is, a common factor structure with a few factors accounted for a large portion of correlation between mortality rates as N grows. The correlation matrix of idiosyncratic error is sparse, i.e. most entries are zero, with only a few exception. The non-zero entries of the correlation matrix is then estimated via thresholding on ε. Let τij be the entry-dependent thresholding. Consider
ΣP OETu = (ˆσP OETij )N ×N, where ˆσijP OET = B is estimated by SVD. The shrinkage function sˆ ij(·; τij) governs the estimated off-diagonal covariance via threshold parameter τij, which can either be a constant or a varying parameter.
We adopt the adaptive thresholding as in Fan et al. (2013). The threshold τij takes the form:
τij = C
where C > 0 is a constant to ensure the positive definiteness of Σ, ˆθij is the estimate of entry ˆσij of the sample covariance matrix, and the optimal weight wT = 1/√
N +plog(N)/T is chosen following Fan et al. (2013).
2.7 Benchmark Example: USA
By now we have introduced three special cases of approximating factor model and how to estimate them. In this section we demonstrate how to estimate a single population model with them. In the process, we also show that the idiosyncratic variance and correlation are present in the mortality rates. We use the USA male population as an example. We estimate a mortality model for age 0 – 100 with observation period ranges from 1933 to 2010. The death count and exposure data are obtained from Human Mortality Database.
Let Dxt and Ext be the death count and exposure of individual of age x at time t. We
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
estimated the 1-factor model using the log of crude death rate, that is,
mxt = logDxt Ext
. (19)
Figure 1 shows the factor and loading estimates for United States during the period of 1933–2010. We estimate the factor and loading with SVD for all three models to maintain a consistent comparison.
Figure 2 shows the variance of idiosyncratic mortality rates in which cross-sectional het-eroskedasticity is easily observed. The dots in Figure 2 are the estimate of idiosyncratic variance, and the solid line is the estimate of idiosyncratic variance in Lee-Carter model. A clear pat-tern emerges. The idiosyncratic mortality rates are particularly volatile in the infant ages, plus two spikes between age 20-40 and 50-70. It is evident that homoscedasticity in cross-sectional dimension is not supported empirically.
We can also look at the idiosyncratic mortality rates from another angle. Recall that ap-proximate factor model extracts the factor K from T × T cross product of demeaned mortality rates Z from the approximation
1
NZ>Z −→ K>K + D, (20)
where D is the asymptotic idiosyncratic covariance matrix. This implies we can decompose the mortality rate variance into the sum of factor variance and idiosyncratic variance:
var(mt) = var(kt) + var(dt), (21)
where dt = Dtt = ¯σt2, the tth diagonal of D is the time t idiosyncratic variance. This suggests that the time variation in mortality rate cannot be explained by the factor alone. It is possible that idiosyncratic variance of mortality rates is persistent and this leads to time series het-eroskedasticity in mortality rate. This help us in predicting the variation of mortality rates in the future. Figure 3 shows the idiosyncratic variance in the times series dimension. The dots are idiosyncratic variance estimate for each calendar year, and the solid line is the homoscedastic estimate. The pattern suggests that the idiosyncratic variance is somewhat nonlinear, which reconciles with Gao and Hu (2009). The estimated variance is low for most of the year, but
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
idiosyncratic variance seems to fall slowly but rise quickly. These behavior suggests that the idiosyncratic mortality rates is not homoscedastic in time series as well.
To demonstrate the remaining correlation in the one-factor model, we try to visualize the sample correlation matrix of idiosyncratic mortality rates in Figure 4. The gray (red) part are stronger positive (negative) correlation while the light color part are the weakly correlated. The positive correlations mostly cluster in the adjacent ages, while the negative correlations scattered between the younger and older ages. The correlogram shows the idiosyncratic mortality are most certainly highly correlated even after the common component is removed. We count about half of area have correlation coefficient greater than 0.5 in absolute term. Clearly, the empirical evidence does not favor the assumption of uncorrelated idiosyncratic error.
The dependence of idiosyncratic errors can be captured by POET estimator. Figure 5 provides the POET estimate of idiosyncratic correlation. The covariance above threshold value is preserved then scaled based on the variance level to provide a more robust result.7 The range of correlation coefficients in Figure 5 is asymmetric because the area of negative covariance concentrates on age 0-25 and age 80-100, which has lower variance as seen in Figure 2.
By demonstration above, it is clear that the heterogeneity and correlation are evident even within a single population. The next question should be if these effects actually make a difference in fitting and forecasting. We discuss the fitting performance of the approximation factor model in next section.
3 Fitting performance for single population
We investigate the finite sample performance of approximate factor models in a large-scale comparison. The estimation of approximate factor models relies asymptotic convergence to the true covariance matrix. Naturally, one would question the finite sample performance in a real-world application. We use the data from Human Mortality Database (HMD) to compare APCA, HFA, POET, and Lee-Carter model as a benchmark. We collect the death and exposure of all 46 male populations in HMD. We choose to use the full sample, despite each of them has a different sample period. We list the sample period of every population used this study in Table 1.8
7Here we use the “soft” threshold in Fan et al. (2013) as it has been demonstrated to be more robust choice of threshold.
8We try to include as many years as possible to evaluate the robustness of the approximate factor models.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
We use two goodness-of-fit measures to examine the fitting performance: log-likelihood value and Bayesian information criterion (BIC). The log-likelihood value is a measure of goodness of fit for the distribution. It reflects the goodness of fit with correlation between variables in mind. We use the log-likelihood function of multivariate Gaussian distribution to evaluate the goodness of fit.9 The log-likelihood function for multivariate Gaussian is
L(X; µ, Σ) = −N T
2 log(2π) − T
2 log(|Σ|) −1
2tr((X − µ)(X − µ)>Σ−1), (22) where X is the sample, µ and Σ is the estimated mean and covariance matrix, respectively.
We also consider the Bayesian Information Criterion (BIC), which is defined as −2L(M ) + p log(N T ), where p is the number of free parameter. BIC penalizes the excess use of parameters.
Given the same data, smaller BIC indicates more favorable model. The criterion is commonly used in other paper as a model selection statistic, for example in Li and Hardy (2011).
Table 1 reports the log-likelihood value for all 4 models across 45 populations.1011 We estimated the number of static factor in every population with maximum of Bai and Ng (2002)’s two information criteria. The criterion suggests that only one factor is pervasive in all 45 populations.12
APCA, HFA and POET lead vastly on fitting performance in terms of log-likelihood. Our baseline model is the Lee-Carter model with homoskedasticity and uncorrelated idiosyncratic error. This suggests the idiosyncratic heteroskedasticity should not be neglected in the first place. POET dominates the horse race. By comparing the log-likelihood value, it is obvious the idiosyncratic covariance has a huge impact to the goodness-of-fit. The additional fitting power to the baseline Lee-Carter model is entirely contributed by the full-on idiosyncratic covariance ΣP OETu since both models have identical common factors and loadings estimate as noted in the previous section. The performance of APCA and HFA is mixed, but mostly favor APCA.
9Note that the estimation of approximate factor model is nonparametric and therefore does not assume a probability distribution for the variables. We merely use the log-likelihood function as a measure of goodness-of-fit to compare the models, because (a) it accounts for the idiosyncratic covariance; (b) most of the models we discuss in the paper assume multivariate Gaussian, so at least we are not favoring our models. We have considered other choice such as Poisson (see Brouhns et al. (2002) and Renshaw and Haberman (2006) for example). However, writing down the joint probability distribution function for a correlated Poisson distribution is a non-trivial problem. See Johnson et al. (1997) for detail.
10We exclude Belgium because the mortality data were missing for the entire the World War I period.
11We considered the Renshaw and Haberman (2006) model with cohort extension but failed to achieve conver-gence when estimating the model in many population. The likelihood function seems to be very flat when there are many ages. The model were estimated with the ilc package in R.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
The APCA is designed to capture the age heteroskedasticity while HFA is designed to capture the time heteroskedasticity, which the population may exhibit either or perhaps both. The implication of our result suggests that the cross-sectional idiosyncratic heteroskedasticity play a more important role than the time-series heteroskedasticity.
Table 2 reports the BIC for all models. BIC reflects the goodness-of-fit with consideration to parsimony in parameter. Lee-Carter model is the most parsimonious model with only 1 parameter for the idiosyncratic covariance, since it assumes all the age-specific mortality rates have equal variance. APCA assumed heteroskedasticity hence it has N additional parameters.
HFA assumed asymptotic time heteroskedasticity hence it has T additional parameters when estimating the factor and loading. Finally, POET is the least parsimonious since the all off-diagonal entries of idiosyncratic covariance can take value. BIC suggested that APCA has the best overall performance across 45 populations. Lee-Carter is the least favorable model overall, as we expected. Because the mortality rate clearly exhibit heteroskedasticity as shown in 2.
The BIC result of POET showed that, despite being heavily penalized, adding correlation to the model still worth the cost in many cases. If parsimony is the primary concern, perhaps APCA is better suited for the situation. Our result also suggested that the benefit of taking heteroskedasticity and/or correlations in idiosyncratic error outweighs the cost.
4 Multiple populations
The approximate factor model is well-suited for multiple population mortality rates modeling because we are after an even larger number of mortality rates, these data actually increases the estimation accuracy rather than creating a problem. The estimation process also does not change with the number of population or mortality rate included in the model. On top of that, the approximate factor model is equipped to deal with the correlation and heterogeneity amongst multiple populations. We discuss the advantage of approximating factor model below.
In the multi-population environment, the interrelation between populations is as important as the interrelation of age-specific mortality rates. In fact, the distinctive feature for multi-population models is how they link the multi-populations together. For example, the gravity model (i.e. (Jarner and Kryger, 2011) and (Cairns et al., 2011)) assumes the primary/secondary re-lationship between two populations. The mortality rate fluctuation of secondary population resolves around the primary population. Yang and Wang (2013) link mortality rate across
dif-‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
ferent populations by the cointegration relationship of the common factor. Li and Lee (2005) propose a multi-population mortality model, also known as augmented common factor model, with one global factor and one individual factor for each population. The global factor af-fects same-age mortality rate in all population in an equal manner, and the individual factors are independent to each other. The global factor works as a cohesive bonding agent across independent populations. Li and Hardy (2011) compare four multi-population mortality mod-els in terms of fitting performance and forecasting reasonableness. They concluded that the augmented common factor model is the most preferable model.
Compare to the existing multi-population mortality rate model, our model does not super-impose any kind of structure on populations. Rather we treat the mortality rate from different populations equally as a random variable. The mortality rate from different population is gov-erned by the same (set of) factors, just with different loadings. The simple setting has some advantages. The first advantage is that multi-population modeling is as easy as the single population modeling. The estimation procedure will be exactly the same for any number of populations, which resulted in virtually unlimited expandability without any additional effort.
The second advantage is the transparent structure. The covariance structure among populations is straightforward and analytical, which is a huge advantage for risk management purpose. The third advantage comes from the versatility of factor model. It is possible to include any other variable we might interested in, for example, cause-specific mortality rates or socio-economic variables. Factor model allows us to highlight their effect to mortality improvement through the joint estimation. Lastly, because our model is also a factor model, it can be weaved into any existing form of factor model with zero conflict.
We now introduce a basic structure of multi-population model. Let m(i)x,tbe the log mortality rate of age x at time t of population i, x = 1, . . . , N , t = 1, . . . , T and i = 1, . . . , I. Let M(i) be
‧
the log mortality matrix of ith population. Size of M(i) is N × T . We are interested in
M =
Since we are interested in the factor model for multiple populations, we can extend the old notation to
m(i)x,t= a(i)x + b(i)x k(i)t + ε(i))x,t. (24) where the upper suffix (i) indicates ith population. The factor model can also written in matrix form as in Equation (3), i.e.
Z = BK + ε, (25)
where µ is a collection of a(i)x , i.e., the average of log mortality rate across populations, B is the factor loading, K is factor of log mortality rate across populations, and ε is the idiosyncratic error matrix. The loading B takes the form
B =
where J is the number of factor. The factor K is a J ×T matrix as they were in single population.
‧
In essence, the model assumes some global factor governs the movement of mortality rates across populations, with each mortality rate affected in a different magnitude. The loading B estimates each mortality rates’ sensitivity to the global factor. In many applications, it is natural to assert that populations share a common component. For example, the Jarner and Kryger (2011) assume one smaller population follows another large population’s movement.
The augmented common factor model assumes a common component exists across different population as well.
The joint covariance under the approximate factor model has exactly the same structure as they were in single population. As an example, consider that there are two populations, M(1) and M(2), each includes an arbitrary number N1 and N2 of mortality rate, respectively. Let the Σapprox be the joint covariance under approximate factor model, then
Σapprox = Σapproxc + Σapproxu
where V is the idiosyncratic covariance.
5 Multi-population performance
We evaluate the approximate factor model in the multi-population setting by comparing the in-sample fitting and out-of-in-sample forecasting performance. The application of multi-population mortality modeling is perhaps best examined by the forecasting performance, as it is often considered as the “ultimate test” to a model. We compare the approximate factor models to various models, including both multi-population model and single population models such as Plat (2009) and O’Hare and Li (2012). We use the following five male populations: USA, UK, France, Spain, and Italy to fit the model. The mortality data are available for all five populations simultaneously during 1933 to 2009. For benchmark model, we discuss several choices in the next subsection.
‧
國立 政 治 大 學