死亡率模型、保單貼現與最適保險之研究 - 政大學術集成

全文

(1)國立政治大學風險管理與保險學系. 博士學位論文指導教授：蔡政憲博士. 立. 政治大. ‧. ‧ 國. 學 sit. y. Nat. 死亡率模型、保單貼現與最適保險之研究. n. er. io. Three Essays on Life Insurance al v i n Ch engchi U. 研究生：宮可倫中華民國. 一〇五. 年. 十二. 月.

(2) Contents 1 Introduction. 4. 2 Multi-population Mortality Modeling: When the Data is Too Much and. 治政 Introduction . . . . . . . . . . . . . . . . . . . . . 大 . . . . . . . . . . . . . . . . . . 立 The approximate factor model . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Not Enough. 8. 1. 8. Motivation for approximate factor model . . . . . . . . . . . . . . . . . . 14. 2.3. The benchmark case for approximate factor model . . . . . . . . . . . . . 15. 2.4. Asymptotic PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17. 2.5. Heteroskedasticity Factor Analysis . . . . . . . . . . . . . . . . . . . . . . 18. 2.6. Principal Orthogonal complEment Thresholding estimator . . . . . . . . . 20. y. sit. er. al. v i n C population . . . .U. . . . . . . . . . . . . . . . . . Fitting performance for single h engchi n. 6. ‧. 2.2. io. 5. 學. The basic setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12. Nat. 4. 12. 2.1. 2.7. 3. ‧ 國. 2. Benchmark Example: USA . . . . . . . . . . . . . . . . . . . . . . . . . . 21 23. Multiple populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Multi-population performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.1. Other models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29. 5.2. The difference to augmented common factor model . . . . . . . . . . . . . 30. 5.3. Goodness-of-fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31. 5.4. Forecast comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33. Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35. 3 Explaining the Yield Spread of Life Settlements. 51. 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51. 2. Life settlement samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55. 2.

(3) 3. Calculating expected returns and risk premiums. . . . . . . . . . . . . . . . . . . 55. 4. Possible sources of the spreads and variable specifications . . . . . . . . . . . . . 58. 5. Empirical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61. 6. Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62. 4 Optimal Consumption and Investment Problem Incorporating the Life Insurance Decision: Continuous Time Case. 73. 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73. 2. Market framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 2.1. Financial assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76. 2.2. Life insurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77. 3. 政治大 2.4 Household’s decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 立 Solution of optimal strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4. Numerical example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84. 79 81. 學. Wealth and consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . 84. 4.2. Optimal investment strategies . . . . . . . . . . . . . . . . . . . . . . . . . 85. 4.3. The effect of loading and optimal insurance demand . . . . . . . . . . . . 86. y. sit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89. io. n. al. er. Conclusion. ‧. 4.1. Nat. 5. Labor income . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78. ‧ 國. 2.3. Ch. engchi. 3. i n U. v.

(4) Chapter 1. Introduction. 政治大. In this thesis, I present three essays related to life insurance. The first essay introduces the asymptotic factor model in multi-population mortality modeling. The second essay is an empir-. 立. ical exploration on yield determinant of life settlement. The third essay focuses on the optimal. ‧ 國. 學. life insurance demand in a life-cycle model.. In the first essay, I introduce a unifying mortality modeling approach that could deal with any. ‧. number of population. In the literature, there is a clear distinction on the methodology of single population and multi-population model. Granted, mortality models are designed to capture one. y. Nat. sit. or several empirical properties of the mortality rate curve or capitalize on a relation between. er. io. two populations, but catering to a specific need also makes them hard to generalize. With a. al. v i n C h also supports thisUline of direction. The data availability engchi n. growing complexity in product design and risk management, the flexibility of model seems to be an advantage.. Researchers and. practitioners now have more and finer data on mortality rate, thanks to advance and awareness in data science. Taking advantage of the abundance of data, I explore the possibility of applying advance factor model to mortality modeling. The factor model provides a coherent framework that unify the model of multiple populations. A basic example of factor model is the renowned Lee-Carter model, which can be seen as a one-factor model applied to a single population. This chapter is based on a joint work with Richard. MacMinn, Weiyu Kuo, and my advisor Chenghsien Tsai. One of the challenge in estimating a factor model lies in the increasing size of populations. When we have too many populations and not enough observations, large covariation among population mortality rates could break the model, in the sense that the estimated factor does not converge to the true factor. In fact, the assumption of factor model requires the number 4.

(5) of observation to greater than the number of mortality rates, which is often not satisfied in practical application. The asymptotical factor model capitalizes on the abundance of data to ensure the convergence. Chamberlain and Rothschild (1983) provided the fundamental proof. The basic idea is that we can rotate the data in estimation so that the factor estimator will not violate the assumption. The rotated estimator is shown to converge to the true factor, provided that the covariation among mortality rate are small. I introduce the heterogeneous models, which can be heterogeneous either in cross-sectional or time-series dimension. Along the line, I demonstrate that Lee-Carter model estimated with Singular Vector Decomposition is a special case of the asymptotic factor model with homogenous variance and zero correlation. Finally, I consider a model with cross-sectional correlation. The. 政治大. empirical application shows that the asymptotical factor models are indeed more accurate in fitting and forecasting.. 立. In the second essay, I investigate the yield spread determinant of life settlement. Life. ‧ 國. 學. settlement creates a secondary market for life insurance policy by selling the policy to a third party, instead of surrendering to the insurance company. The selling price is usually greater. ‧. than the cash value. It provides a way to extract the wealth from asset that were considered illiquid, therefore often promoted as a tool to finance retirement for the underprepared retiree.. y. Nat. sit. To investor, life settlement seems to be a rare asset class that is not correlated with financial. al. er. io. risk, thus serves the purpose of providing cushion at the time of financial turmoil. This chapter. v. n. is based on a joint work with Ming-Hua Hsieh, Jing-Lung Peng, Jennifer L. Wang, and my advisor Chenghsien Tsai.. Ch. engchi. i n U. Life settlement is a relatively new product in the insurance market. It only came to existence in 2000s. The behavior of life settlement remains relatively unbeknownst to the researcher, as there is little data to analyze. I obtain a unique dataset on the life settlement transaction by a private life settlement provider. The data include many information on the policy itself, including policy year, carrier and more; as well as the information of the insured, such as age, gender, and estimate on insured’s life expectancy. I consider the determinant for the mortality risk, such as age and gender of the insured, as it is the main source of risk embedded in the policy. I also consider determinants from the empirical asset pricing literature, for that the market conditions and other variables such as credit rating of the insurer should affect the cost of purchase. I estimate the expected rate spread of a life settlement under various different assumptions.. 5.

(6) I calculate the rate spread of both certain death time and uncertain death time with or without considering mortality improvements. The average return of life settlement is about 13% if we assume death time is certain. Considering uncertain death time would increase the average return to 27%. I find that the expected spreads are determined by the factors affecting the surrender tendencies of the underlying policies, and the proprietary information on insured’s health condition. In the third essay, I aim to explore the life insurance demand in a continuous-time life cycle model. Life insurance demand has been an important topic for insurance economic since Yaari (1965). I consider a household with a stochastic labor income source and facing a financial market with stochastic interest rate. The financial market setup follows from Munk and Sorensen. 政治大. (2010). The household has a breadwinner with random lifetime and a certain retirement date, which in line with Huang, Milevsky and Wang (2008). The breadwinner earns a stochastic labor. 立. income that support the whole family. The household has a probability to lose the labor income. ‧ 國. 學. with random lifetime, which provide the incentive needed for purchasing life insurance. Life insurance protects the labor income against the mortality risk, so in the case of breadwinner’s. ‧. death, the insurance benefit goes to the rest of family members to continue providing the income for the household. This chapter is based on a joint work with Shang-Yin Yang.. y. Nat. sit. I solve the optimal consumption, investment, and insurance strategy for the household. The. er. io. solution comes from solving two PDE iteratively, where the Bellman equation is decomposed into. al. v i n the stochastic income, insurance decision, C h and retirementU continues to influence the future engchi optimality. The second part is the breadwinner dies, and the rest of family optimize their n. two parts. The first part is the breadwinner survives in the next infinitesimal period, where. consumption and investment choice with the death benefit and financial income afterward. The solution suggests that when insurance is fair, the optimal coverage equals to the human capital. I explore the effect of loading on insurance premium to the household’s consumption and investment, as well as the insurance decision. In the real world, life insurance policy is sold with insurer charging an additional loading to cover the profit and expenses. Consequently, partial coverage on the human capital becomes the optimal choice for the household when insurance is not fair. The unhedged background risk would cause the optimal strategy to shift toward a different direction. In other word, the decision on insurance is not independent from the finance decision, and would therefore determine how people would invest and consume. I show that by introducing to the loading to the model, it is possible to produce a hump-shape. 6.

(7) consumption. I also quantify the influence of loading on household’s financial decision and insurance decision. My result imply that insurer’s pricing decision, surprisingly, have the power to affect the financial market and investor’s financial decision.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. 7. i n U. v.

(8) Chapter 2. Multi-population Mortality Modeling: When the Data is Too Much and Not Enough 政治大立 1 Introduction. ‧ 國. 學. We introduce a factor model for mortality rates that incorporates conditional time series het-. ‧. eroskedasticity, conditional cross-sectional heteroskedasticity, conditional correlation, and allows for infinite number of mortality rates.12. y. Nat. sit. It has been long since the factor model is introduced to mortality rate modeling. Over a. er. io. decade ago, Bell and Monsell (1991) and Lee and Carter (1992) applied factor model to mortality. al. v i n model, also known as the Lee-CarterCmodel, is fairly robust h e n g c h i Uempirically among the populations n. rates modeling, and it was very successful: various researches have shown that the 1-factor. throughout different countries, genders, and age spans.3 After Lee and Carter (1992), many. researchers have improved upon the Lee-Carter model on the fitting and forecasting (see Booth and Tickle (2008)). One of the challenge in mortality modeling lies in capturing the complicated relation among a large number of mortality rates and populations. Scholar and practitioner seek for a comprehensive mortality model for pricing and managing the mortality risk in life insurance and pension, where the entire lifespan is in the consideration. On top of that, the multi-population modeling takes interest in the relation among double, even triple the number of mortality rates. Factor 1. This chapter is based on a joint work with Richard. MacMinn, Weiyu Kuo, and my advisor Chenghsien Tsai. By conditional heteroskedasticity we mean the idiosyncratic errors have unequal variances. We sometimes use heteroskedasticity to replace conditional heteroskedasticity for the sake of simplicity. 3 e.g. Table 1 in Girosi and King (2007), Booth and Tickle (2008) and the references within. 2. 8.

(9) model is a suitable candidate in this situation: it has the flexibility to model a large number of mortality rates, and a proven track record in the single population mortality modeling. The large number of variables not only poses a challenge to the modeling, but also to the estimation aspect. An important assumption in the factor model estimation is that the sample size T (i.e. the length of observed mortality rate) should be much greater than the fixed number of variable N (i.e. the number of mortality rates); otherwise the model is ill-posed, in the sense that the factor estimated will not converge to the true factor. (Bai, 2003). This is less explored in the mortality modeling literature; although the previous researches based on the Lee-Carter model are extensive, it seems this assumption is often ignored. If this assumption is not satisfied, the mortality trend estimate is spurious and many previous results are hindered.. 政治大. When the assumption is violated, the factor scores, which in our case is the mortality trend in Lee-Carter model, cannot be consistently estimated. In the original article of Lee and Carter. 立. (1992) and many subsequent papers, the application was limited to mortality rates in 5-year. ‧ 國. 學. age group, which meet the assumption but greatly limited the scope of the model. From a practical point of view, the limitation is essentially came from the fact that observed length. ‧. on mortality rate is too short. We have no more than 100 years of observation for most of the countries, and some developing countries may have less than 50 years of data. Moreover, some. y. Nat. sit. early observations are less representative due to the socio-economic changes in the population,. al. er. io. and even this is true, we may still have to compromise with the data availability. Combined. v. n. with the application for multi-population mortality model, it seems we simply need to estimate but without much data at hand.. Ch. engchi. i n U. We introduce a general methodology for multi-population mortality rate modeling. Especially, the methodology complements the Lee-Carter model and its variant from both theoretical and estimation aspect. Chamberlain and Rothschild (1983) first proposes a general factor model that relaxed two important aspects of the classical factor model: (1) the error terms are iid over both time series and cross-sectional dimension, (2) The number of variable N fixed while T goes to infinity. They coined the term approximate factor model for the general framework that allows large N , weakly correlated errors, and heteroskedasticity. Coincidentally, Lee-Carter model is still viable for consistently estimating the mortality trend factor under the setting of approximate factor model. It is shown that the principal component analysis (PCA) used in the original Lee-Carter model and other variants to extract factors coincides with the asymptotic. 9.

(10) estimate of homoscedastic factor model in the N > T situation.4 Early mortality model also assumes that the mortality rates are iid Gaussian in both time series and cross-sectional dimensions, but the heterogeneity in mortality rates is evident. Brouhns et al. (2002) were one of the first to propose an unconditional heteroskedastic model for mortality. They assume the number of death is independent Poisson for each age x and time t. Empirically, we also observe the unconditional heteroskedasticity within single population mortality rate, for example, in Figure 1 of Hyndman and Ullah (2007). When we consider the multi-population application, heterogeneity among populations would very likely to induce the cross-sectional heteroskedasticity. On the time dimension, Gao and Hu (2009) found that the factors in mortality rate are heteroskedastic over time, and incorporating the conditional. 政治大. heteroskedasticity in time dimension improved the forecast of mortality rates. When the heteroskedasticity and correlation presents, however, the SVD estimator no longer. 立. provides an efficient estimate. We apply the work of Connor and Korajczyk (1988) and Jones. ‧ 國. 學. (2001) to deal with the conditional heteroskedasticity in cross-sectional and time series dimension, respectively. The central idea of estimating factor model when N tends to infinity is the. ‧. outer product of sample data, a T × T matrix, converge to the population T × T covariance matrix and this allows us to extract the factor from the outer product matrix. Based on the. y. Nat. sit. assumption of cross-sectional and time heteroskedasticity, the T × T asymptotic covariance is. er. io. either a scalar matrix or a diagonal matrix. Connor and Korajczyk (1986) and Connor and. al. v i n outer product of sample data underC the assumption of cross-sectional heteroskedasticity. Jones hengchi U n. Korajczyk (1988) proposes the Asymptotic PCA (APCA), which extracts the factor from the. (2001) provides an iterative method to estimate the factor under time heteroskedasticity.. We also discuss the role of dependence in the mortality rate model. The dependence of mortality rate describes how mortality rates of different ages move with respect to each other, which is important in the multi-population mortality literature. Li and Hardy (2011) perform a series of comparison on the multi-population mortality model, including the independent model, the cointegration model, the joint time-varying index model, and Li and Lee (2005)’s augmented common factor model. They concluded that Li and Lee (2005) is the most preferred model in term of goodness-of-fit and forecasting using the Canada and US female mortality data. Lin et al. (2013), Yang and Wang (2013), and Chen et al. (2014) showed that the 4. Chamberlain and Rothschild (1983) and Connor and Korajczyk (1986) shows the spectral decomposition to the T × T outer product of data gives us consistent estimates of the factor, which is equivalent to the Singular Value Decomposition (SVD) used in Lee and Carter (1992) and other extensions.. 10.

(11) dependence between mortality rates of different populations greatly influence the pricing of longevity product. Lin et al. (2013) capture the mortality rate correlation with a multivariate jump-diffusion model. Yang and Wang (2013) propose a cointegration model between the mortality improvements of two populations, and Chen et al. (2014) apply the factor copula on the pricing of Kortis longevity bond. The shared challenge in estimating the correlation matrix is that the number of parameter to be estimated is large, and it increases quadratically with N , as also discussed in Chen et al. (2014). We introduce an estimator under the approximate factor model framework, proposed by Fan et al. (2013), to estimate the correlation in the large N mortality models. The POET approximates the true covariance matrix with a common component plus an idiosyncratic co-. 政治大. variance matrix. The common component reflects the factor structure in the mortality rate and the sensitivity to the factor shock, which we argue some of the prominent multi-population mod-. 立. els does not possess. We approximate the idiosyncratic covariance with threshold estimation,. ‧ 國. 學. which greatly reduced the effective number of parameter. Our empirical evaluation suggested that the correlation greatly improve the model fit in terms of BIC and log likelihood.. ‧. Our paper contributes to the existing literatures in a few ways. First, we introduce a generalize factor model that allows us to deal with large amount of mortality rates. Traditional. y. Nat. sit. factor analysis cannot handle the large N case. One will have to adjust to meet the N < T. er. io. assumption, e.g. use grouped age mortality rates, and information may be lost during the. al. v i n compatible to the existing estimation the approximate factor model C htechnique, and therefore engchi U n. process. Second, the approximate factor model estimation technique is in fact is backward. can be viewed as formalizing the statistical theory behind the Lee-Carter model. Third, we evaluate the approximation factor model by performing a comprehensive empirical comparison. The empirical comparison is meant to evaluate the finite-sample performance of approximate factor model to the mortality rate modeling in a diversified environment. We compare the goodness-of-fit of 45 populations of all available country in Human Mortality. Database. Our conclusion is that the heteroskedasticity and correlation greatly enhance the fitting in term of BIC and log likelihood. Fourth, we consider the case of multi-population, where we have 5 populations and total of 505 variables with 78 observations. We show the heterogeneity leads to the heteroskedasticity and the approximate factor model captures those heteroskedasticity, as well as the correlation. Furthermore, we show by an example that the augmented common factor model did not capture the individual population’s correlation to. 11.

(12) the mortality trend adequately, potentially because of the attempt to circumvent the N < T requirement. The result provides a valid argument to our first point, as in this case valuable information is indeed lost due to the limitation of classical factor analysis. We show that the goodness-of-fit and forecasting of the augment factor model can be greatly improved by simply re-estimating the common factor under approximate factor model framework. The paper is organized as follows. Section 2 describes the approximate factor model and estimation. In the end of section 2 we provide an example using data from the USA population. Section 3 compares the fitting performance of approximate factor model using 45 populations from Human Mortality Database. Section 4 describes the approximate factor model under the multiple population setting. Section 5 evaluates the fitting and forecasting performance with a. 政治大. sample of 5 populations. Section 6 concludes.. 立. 2 The approximate factor model. ‧ 國. 學. First we introduce the notation and a simple factor model for mortality. Then we discuss several approximate factor models in our consideration. Particularly we focus on their extension to. ‧. traditional (strict) factor model for mortality such as Lee and Carter (1992) and how they. sit. n. al. er. io. 2.1 The basic setup. y. Nat. affect the covariance of log mortality rates.. Ch. i n U. v. Let mxt be the log mortality rate of age x at time t, where x = 1, . . . , N and t = 1, . . . , T . Let. engchi. M be the matrix of log mortality rate with size N × T . We can write down an 1-factor model of log mortality, also known as Lee-Carter model:. mxt = ax + bx kt + εxt ,. (1). where ax is average log mortality rate of age x, kt is the factor that describes the mortality trend movement of time t, bx describes the sensitivity of age x mortality to kt and is called factor loading, and εxt is the idiosyncratic error of age x mortality. We can rewrite the model in vector form Mt = Ax + Bx Kt + εt ,. 12. (2).

(13) where Mt = (m1t , . . . , mN t )> , Ax is the average log mortality vector, Bx and Kt are the matrix counterpart of bx and kt respectively, and ε is the error matrix. The factor Kt = (k1t , k2t , . . . , kmt )> is assumed to be an m-dimensional stochastic process, and the loading matrix Bx = [bx1 , . . . , bxm ] is constant for all age x. The error term εx,t is a white noise with mean zero and covariance Σu , and εx,t is assumed to be uncorrelated with factor Kt , that is, cov(kit , exs ) = 0 for all x, i, t, s. We sometime write the model in compact form as ˜K ˜ + ε, M =µ+B. (3). ˜ is where µ = Ax ⊗ eT , and eT = (1, . . . , 1) is a one vector of size 1 × T . The factor matrix K ˜ = [K1 , · · · , KT ]. So is ε = [ε1 , · · · , εT ]. just the collection of Kt , that is, K. 政治大 mean zero and covariance Σ ; (2) The eigenvalue of Σ is bounded; (3) {K } 立 ˜ ˜. Throughout the paper, we assume the following: (1) {εt }0≤t≤T is stationary and ergodic with u. u. t 0≤t≤T. and {εt }0≤t≤T. ‧ 國. 學. are both independent stochastic processes; (4) G = plimN 1/N B > B exists and is full rank. This set of assumptions is in line of Connor and Korajczyk (1986) and Jones (2001), which focus on the growing N and fixed T case, and therefore is stronger than the Bai (2003) and Fan et al.. ‧. (2013), where both N and T grows to infinity. There are two reasons for assuming T is fixed.. sit. y. Nat. First, we are limited to the mortality rates at yearly frequency, in which T grows very slowly. io. er. and asymptotic property is unlikely to be useful. The T → ∞ assumption only makes sense if mortality rate data were available at monthly or even shorter frequency. Second, if we assume. n. al. i n U. v. T can go to infinity, then we must have {Kt }t≥0 to be stationary to maintain the convergence,. Ch. engchi. yet we know empirically at least one factor (the mortality trend factor) exhibits non-stationary behavior. We can circumvent this problem by differencing the mortality rate. The differencing may even beneficial to the forecasting performance, as Mitchell et al. (2013) show that modeling the change of log mortality rate actually outperform various more sophisticated models. The time-t mortality curve Mt has mean Ax = (a1 , a2 , . . . , aN )> , and covariance ˜ ˜> Σ = Σc + Σu = Bcov(K t )B + cov(εt ).. (4). Equation (4) shows that two sources contribute to the covariance of mortality rate. The stochastic common factor kt contributes to the first part Σc . In this model, the mortality trend is captured by these random variables, which can be viewed as the systematic factors that explained the contemporary movement of mortality rate. The second part Σu is the covariance 13.

(14) of idiosyncratic error, which represents the variation that is specific to the variable, or it can be seen as from the higher order components.. 2.2 Motivation for approximate factor model Many applications of mortality rate model cover a rather large range of age even across multiple populations, i.e. retirement planning and pricing of longevity product. Factor model like LeeCarter model can easily deal with hundreds or even thousands of mortality rates in theory, but we need a lot of observation to estimate the model. The aggregate mortality data available today mostly exist in annual frequency and goes back to the beginning of 20th century. The observable period of data is considerably shorter. 政治大. in developing countries and at firm level. When we apply a (strict) factor model to mortality modeling, the estimation process assumes that the number of variable N is fixed and the number. 立. of observation T tends to infinity to ensure consistent estimate for the factor and the factor. ‧ 國. 學. loading. When the observation period is shorter than the age span, the assumption starts to be restrictive in mortality rate modeling. On the other hand, today we have more mortality data. ‧. then any other time in the history. We used to have mortality data at countries level, but now we have even more data at firm level, waiting to be analyzed. With the advance and awareness. y. Nat. sit. in data collecting, the possibility is endless. We might need a mortality model for all insurance. er. io. contracts at firm level in the near future.. al. v i n Ch variable to comply with the N < T assumption. One can cover the same age span by combining engchi U n. When the observation cannot support the estimation, a quick fix is to reduce the number of. the mortality rates in groups.. 5. When we averaged out the mortality rates, valuable information. is lost. One would have to interpolate the mortality rate curve, which cause additional error that can be avoid with approximate factor model. The slow speed of data observation, and the need to include more mortality rates, is what motivates the approximate factor model. The approximate factor model also relaxes other assumptions in classic factor analysis, including homoscedastic and correlation in idiosyncratic error. Mortality rates are known for being heteroskedastic in both cross-sectional and time series dimension. Gao and Hu (2009) and Lin et al. (2013) both considered the heteroskedasticity in time dimension as helpful in forecasting mortality and pricing longevity product. To extend their work, we consider the 5. For example, one can cover age 60 to 79 using four 5-age grouped mortality rate of age 60-64, 65-69, 70-74, and 75-79, assuming the death count and exposure is known for each age.. 14.

(15) time heteroskedasticity in the idiosyncratic error. The change in population structure, if not explained by the factor, may cause the average idiosyncratic variance to vary over time. Moreover, many papers focused on the dependence between multiple populations, for example Lin et al. (2013), Yang and Wang (2013), and Chen et al. (2014). Lin et al. (2013) consider the common and idiosyncratic jump components and aims at the mortality indices. Yang and Wang (2013) links the mortality rates via modeling the factor with cointegration and Chen et al. (2014) is based on the factor copula. The approximate factor model can complement these works in two ways. First it could fill in the role of classic factor model to adopt to the small T environment, which allows us to adequately estimate the factor model. And second, the approximate factor model allows us to estimate the dependence in the idiosyncratic error,. 政治大 The benchmark case for 立 approximate factor model. which has not yet been discussed in the mortality literature to our knowledge.. 2.3. N 1 X εxt εxs → 0, N. ‧. ‧ 國. assumption of what Bai (2003) call asymptotic orthogonality. 學. We introduce the basic approximate factor model. The simplest setup would be under the. t 6= s,. y. (5). sit. Nat. x=1. io. er. and asymptotic homoscedasticity. n. a l 1 X ε → σ , for all t. i v NC hengchi Un N. 2 xt. 2. (6). x=1. Using the conditions above, Connor and Korajczyk (1986) provide an consistent estimator for the factor, namely the asymptotic PCA (APCA). The APCA is simply the principal component of the T × T outer product of data. They showed that it could approximate the true factors with any precision when N grows large. To estimate model we need to obtain the factor loading B, the factor K, and idiosyncratic covariance Σu for the mortality rate. Let Z = M − µ. Consider the T × T cross product matrix ˆ T of Z, i.e., Ω ˆ T = 1 Z > Z. Ω N. 15. (7).

(16) Substitute in Equation (3) we have ˜K ˜ + 1 (ε> B ˜ >B ˜ > ε) + 1 ε> ε. ˜ >B ˜K ˜ +K ˜ >B ˆT = 1 K Ω N N N. (8). ˆ T converges to K ˜ > GK ˜ +σ It can be shown that Ω ¯ 2 IN ×N in probability as N goes to infinity, where IN ×N is an identity matrix of size N × N . It is easy to see that the interaction term ˜K ˜ +K ˜ >B ˜ > ε) has a probability limit of zero with factor and idiosyncratic error (1/N )(ε> B uncorrelated. The first term converges to K > GK assuming that the factors are pervasive.6 ˜ to simplify the notation, as the first term now converges to We can redefine K = G1/2 K K > K. Note that this does not affect the factor estimate, because factor and factor loading are ˜ −1/2 so that BK = B ˜ K. ˜ Finally, indeterminate up to a rotation. That is, we can write B = BG P 2 the cross product N1 ε> ε converges to σ 2 IT ×T in probability, where σ 2 = N1 N n=1 σn < ∞ is the. 政治大 idiosyncratic variance of cross-sectional variables. 立ˆ. We extract the factor K from ΩT by finding its eigenvector and find the loading B by. ‧ 國. 學. least squared method. Here, the issue of indeterminacy arises when we attempt to estimate the factor and factor loading. Because both of them are unobservable, the factor loading and. ‧. factor are not uniquely determined, as BK = BXX −1 K for an arbitrary invertible matrix X.. sit. y. Nat. We need some constraints to uniquely determine (up to a sign change) the factor loading and. io. er. factor. The choice of constraint does not affect fitting performance; rather, it is more about the interpretation to the factor. To uniquely determine K we impose the usual identification. n. al. i n U. v. constraints: KK > /T is an identity matrix and B > B is a diagonal matrix. We estimate K. Ch. engchi. ˆ the first column of the eigenvector of Ω ˆ T . With the constraint, the factor loading B is by K, ˆ estimated by B, ˆ = T −1 Z K ˆ> B. (9). An interesting observation is that Lee-Carter model can be seen as a special case of approximate factor model. If we assume asymptotic homoscedasticity and orthogonality, the factor estimate from the spectrum decomposition coincides (up to a rotation) with the SVD estimate in Lee and Carter (1992). The intuitive explanation is that the approximate factor model aims to consistently estimates the factor when N T , and the way to do this is via transposing the outer product ΩT so that we can use the larger N dimension to estimate the true factor. This estimation process matches with the mechanical aspect of SVD, therefore the factor estimates 6. See Assumption 6 in Connor and Korajczyk (1986).. 16.

(17) are the same. We show in the following that they are equivalent. SVD decomposes Z to two orthogonal matrices U and V , and a diagonal matrix S, that is, Z = U SV > .. (10). Let S1 = max(Sii ) for i = 1, . . . , N . The first factor Kt is estimated by the S1 times first column of V and bx is the first column of U divided by S1 and so on. Once Bx and Kt is obtained, we can estimate the idiosyncratic covariance Σu from ε. The estimated covariance matrix ΣLC u in Lee-Carter model is 2 ΣLC u = σLC IN ×N ,. (11). 政治大. P 2 2 where σLC = x,t (εx,t )/(N T − 1). We normalized bx and kt with the usual identification P P constraint, x bx = 1 and t kt = 0. On the other hand, the factors in approximate factor. 立. 學. ‧ 國. ˆ T . The eigenvector and eigenvalue of Ω ˆ T can be obtained from model are the eigenvector of Ω Z > Z = V SS > V > = V ΛV > .. (12). ‧. Thus Z > ZV = V Λ and V is the eigenvector of Λ is a diagonal matrix with the eigenvalues on. y. sit. n. al. er. io. 2.4 Asymptotic PCA. Nat. the diagonal.. Ch. i n U. v. We now turn to a model with cross-sectional heteroskedasticity. Connor and Korajczyk (1988). engchi. proposed an algorithm to deal with heteroskedasticity in the age dimension. We run the GLS regression against Z = M − µ, using the residual standard deviation of each age mortality rate σ ˆn2 , n = 1, . . . , N as weighting. This is done in a two-stage estimation. First, we estimate the 2 . We run PCA on Z > Z to obtain the variance of idiosyncratic error of each age σ12 , . . . , σ ˆN. factor estimate from principal components, then regress the factor estimate on M − µ to obtain 2 ). In idiosyncratic error for age n and its sample variances as σ ˆn2 . Let Vˆ = diag(ˆ σ12 , . . . , σ ˆN. the second stage, we normalize the log mortality rate M with mean µ and standard deviation V −1/2 and use the principal component of normalized Z ∗ = V −1/2 (M − µ) to re-estimate K. ˆ is obtained by regressing K ˆ on Z ∗ . So that the cross-product The factor loading estimate B ˆ T = (1/N )Z ∗ > Z ∗ is Ω ˆ T −→ K > K + σ 2 IN ×N . Ω. 17. (13).

(18) The estimated covariance ΣAP CA is ˆB ˆ > + Vˆ . ΣAP CA = B. (14). Following summarizes the algorithm of APCA: ˆT = 1. Compute Ω. 1 N (M. − µ)> (M − µ).. 2. Obtain first J eigenvectors. 3. For each age of mortality rate n = 1, . . . , N , regress first J eigenvector on M − µ, and compute residual variance σ ˆn2 .. 政治大 5. Compute the new cross-product from Vˆ Z. 立. 4. Scale Z = M − µ as V −1/2 Z, where V = diag(σˆ1 2 , . . . , σˆN 2 ) −1/2. ‧ 國. 學. ˆ and regression 6. Repeat step 1 - 3 to obtain first J eigenvector as factor estimate K, coefficient as the loading.. ‧. We note that although it is probably unreasonable to assume the maximum number of age. sit. y. Nat. to go to infinity within a single population, it is reasonable to assume the number of age N to be infinite, for example the multi-population analysis. The finite sample performance of APCA. io. n. al. er. is very plausible, even in the single population setting. In fact APCA outperforms other model. i n U. v. in almost all male populations in Human Mortality Database, suggesting the importance of. Ch. engchi. cross-sectional heteroskedasticity. We will discuss their empirical performance in later section.. 2.5 Heteroskedasticity Factor Analysis Time-varying volatility in idiosyncratic error motivates the heteroskedasticity factor analysis (HFA) model. Jones (2001) incorporates the time series heteroskedasticity in the approximate factor model. Specifically, HFA permits the average cross-sectional idiosyncratic variance σ ¯t2 varies over time, while Connor and Korajczyk (1986) assumes the σ ¯t2 is constant over time, that ˆ under HFA converges to is, σ ¯t2 = σ ¯ 2 . In other word, the cross product Ω ˆ T −→ K > K + D, Ω where D = diag(¯ σ12 , . . . , σ ¯T2 ). 18. (15).

(19) The fluctuation in σ ¯t2 suggests the average idiosyncratic variances are correlated across age and may be governed by a common component. If one considered nonstationarity appropriate for an aggregate mortality rate index (as in Lin et al. (2013) and Gao and Hu (2009)), similar rationale should support that such potential nonstationarity may reside in the idiosyncratic error since idiosyncratic error is a part of aggregate mortality index. The average idiosyncratic variance may vary over time because of the fluctuation of the socio-economic variables. For example, the medical advance or diseases specific to certain ages or demographic factor may not be captured by the common mortality trend. These advances or diseases affect people of different ages and they may spread to other ages, creating shocks in mortality rate. For the case of multiple population, the short-term change in relation among populations can also lead to the heteroskedasticity.. 政治大. Unlike APCA, the spectral decomposition or SVD cannot be used to estimate HFA factors. 立. since the asymptotic covariance will change over time. HFA estimates factor with an iterative. ‧ 國. 學. algorithm, which is originally proposed by J¨ oreskog (1967) to find maximum likelihood estimate to classical factor analysis when cross-sectional idiosyncratic error is heteroskedastic. While it. ‧. can no longer find MLE for the approximate factor model, the consistency in the algorithm still holds. Therefore it can be used to estimate the factor.. y. Nat. al. n. ˆT . 1. Compute Ω. er. io. HFA:. sit. We conclude this subsection with a step-by-step description of the iterative algorithm for. Ch. engchi. i n U. v. ˆ 0 as the initial guess of diagonal residual covariance matrix D. Note that D is a 2. Set D T × T matrix. ˆ −1/2 Ω ˆT D ˆ −1/2 . 3. Obtain first J eigenvectors for D 0 0 4. Let G be be the eigenvector matrix, where the jth column is the eigenvector associated with jth eigenvalue. Let Λ be a diagonal matrix of eigenvalue sorted in descending order. ˆ by D ˆ −1/2 (Λ − I)1/2 Compute the factor estimate K 0 ˆ =Ω ˆT − K ˆ > K. ˆ 5. Update the estimate of D by D 6. Iterate step 3 - 5 until convergence criterion is achieved. ˆ is orthonormalized to K ˆ N . The 7. (Optional) To compare with APCA estimate of factor, K ˆN, K ˆ N is the normalized first eigenvector. Every other eigenvectors is first column of K ·1 19.

(20) regressed on F·1N without intercept, and jth residual of the regression. This step yields ˆ N such that K ˆ N >K ˆ N = IJ×J . the factor estimate K ˆ on M . 8. Obtain the loading by regressing optimal factor K The estimated covariance of HFA model ΣHF A is ˆB ˆ> + W ˆ, ΣHF A = B. (16). 2 ) is obtained from sample residual variance of ε. ˆ = diag(ˆ where W σ12 , . . . , σ ˆN. 2.6 Principal Orthogonal complEment Thresholding estimator. 治政 the idiosyncratic error. Idiosyncratic error represents the 大 higher order variation in mortality 立 curve that has not been captured in the common factors. So far APCA and HFA have provided. An important theoretical aspect of approximate factor model is the (weak) correlation among. ‧ 國. 學. us ways to deal with two important special cases in approximate factor model: cross-sectional heteroskedasticity and time series heteroskedasticity. Now, we turn to the correlations among. ‧. the idiosyncratic errors. We introduce POET estimator, proposed by Fan et al. (2013), to. y. Nat. complete the approximate factor model estimation method.. sit. Most mortality models ignore the correlation between idiosyncratic errors. This is possibly. er. io. due to the fact that it is costly to estimate a large number of parameter. However, ignoring. al. n. v i n C longevity risk management, model) may not be sufficient for the h because both dependence engchi U. the correlation may lead to severe consequences. The low-rank factor model (e.g. 1-factor. and higher moments are the primary driver of risk. Zhu and Bauer (2014) suggested that the idiosyncratic error’s contribution to risk might be overlooked when we consider only single factor models. Specifically, they showed that the higher order components of mortality rate have non-negligible impact to the hedging portfolio when insurer constructs the hedge. Principal Orthogonal Complement Thresholding (POET) estimator provides a way to capture the higherorder effect in the model without adding extra factors. POET estimates the covariance matrix ΣP OET by complementing low-rank principal component with thresholding on covariance of idiosyncratic error Σu . The low-rank principal component can be estimated with the spectral decomposition or SVD. We choose to use SVD to estimate principal components in subsequent analysis, so that the factors and loadings of LeeCarter and POET model will be exactly the same. By doing so we can examine the contribution 20.

(21) of modeling idiosyncratic covariance by comparing POET and Lee-Carter model. The challenge of estimating the covariance matrix between the mortality rates is that there are too many parameters when the age span of mortality rate N is large. POET takes advantage on the low-rank common component and assumes that Σu is conditionally sparse, that is, a common factor structure with a few factors accounted for a large portion of correlation between mortality rates as N grows. The correlation matrix of idiosyncratic error is sparse, i.e. most entries are zero, with only a few exception. The non-zero entries of the correlation matrix is then estimated via thresholding on ε. Let τij be the entry-dependent thresholding. Consider. P OET ΣPu OET = (ˆ σij )N ×N ,. P OET where σ îj =.    σ îi ,. if i = j (17).   sîj (σij ; τij ), if i 6= j. 政治大 ˆB ˆ Then the estimated covariance立 of mortality rate Σ is equal to B P OET. >. + ΣPu OET in which. ‧ 國. 學. ˆ is estimated by SVD. The shrinkage function sij (·; τij ) governs the estimated off-diagonal B. covariance via threshold parameter τij , which can either be a constant or a varying parameter. We adopt the adaptive thresholding as in Fan et al. (2013). The threshold τij takes the form:. ‧. θîj wT ,. T 1X (εit εjt − σ îj )2 θîj = T. y. q. Nat. τij = C. io. sit. t=1. (18). n. al. er. where C > 0 is a constant to ensure the positive definiteness of Σ, θîj is the estimate of entry σ îj p √ of the sample covariance matrix, and the optimal weight wT = 1/ N + log(N )/T is chosen following Fan et al. (2013).. Ch. engchi. i n U. v. 2.7 Benchmark Example: USA By now we have introduced three special cases of approximating factor model and how to estimate them. In this section we demonstrate how to estimate a single population model with them. In the process, we also show that the idiosyncratic variance and correlation are present in the mortality rates. We use the USA male population as an example. We estimate a mortality model for age 0 – 100 with observation period ranges from 1933 to 2010. The death count and exposure data are obtained from Human Mortality Database. Let Dxt and Ext be the death count and exposure of individual of age x at time t. We. 21.

(22) estimated the 1-factor model using the log of crude death rate, that is,. mxt = log. Dxt . Ext. (19). Figure 1 shows the factor and loading estimates for United States during the period of 1933–2010. We estimate the factor and loading with SVD for all three models to maintain a consistent comparison. Figure 2 shows the variance of idiosyncratic mortality rates in which cross-sectional heteroskedasticity is easily observed. The dots in Figure 2 are the estimate of idiosyncratic variance, and the solid line is the estimate of idiosyncratic variance in Lee-Carter model. A clear pattern emerges. The idiosyncratic mortality rates are particularly volatile in the infant ages, plus. 政治大 dimension is not supported empirically. 立. two spikes between age 20-40 and 50-70. It is evident that homoscedasticity in cross-sectional. We can also look at the idiosyncratic mortality rates from another angle. Recall that ap-. ‧ 國. 學. proximate factor model extracts the factor K from T × T cross product of demeaned mortality rates Z from the approximation. ‧ y. (20). sit. Nat. 1 > Z Z −→ K > K + D, N. n. al. er. io. where D is the asymptotic idiosyncratic covariance matrix. This implies we can decompose the. i n U. v. mortality rate variance into the sum of factor variance and idiosyncratic variance:. Ch. engchi. var(mt ) = var(kt ) + var(dt ),. (21). where dt = Dtt = σ ¯t2 , the tth diagonal of D is the time t idiosyncratic variance. This suggests that the time variation in mortality rate cannot be explained by the factor alone. It is possible that idiosyncratic variance of mortality rates is persistent and this leads to time series heteroskedasticity in mortality rate. This help us in predicting the variation of mortality rates in the future. Figure 3 shows the idiosyncratic variance in the times series dimension. The dots are idiosyncratic variance estimate for each calendar year, and the solid line is the homoscedastic estimate. The pattern suggests that the idiosyncratic variance is somewhat nonlinear, which reconciles with Gao and Hu (2009). The estimated variance is low for most of the year, but rise rapidly in the certain periods spread out in the 30s, 60s, 90s and after 2005. Also, the. 22.

(23) idiosyncratic variance seems to fall slowly but rise quickly. These behavior suggests that the idiosyncratic mortality rates is not homoscedastic in time series as well. To demonstrate the remaining correlation in the one-factor model, we try to visualize the sample correlation matrix of idiosyncratic mortality rates in Figure 4. The gray (red) part are stronger positive (negative) correlation while the light color part are the weakly correlated. The positive correlations mostly cluster in the adjacent ages, while the negative correlations scattered between the younger and older ages. The correlogram shows the idiosyncratic mortality are most certainly highly correlated even after the common component is removed. We count about half of area have correlation coefficient greater than 0.5 in absolute term. Clearly, the empirical evidence does not favor the assumption of uncorrelated idiosyncratic error.. 政治大. The dependence of idiosyncratic errors can be captured by POET estimator. Figure 5 provides the POET estimate of idiosyncratic correlation. The covariance above threshold value. 立. is preserved then scaled based on the variance level to provide a more robust result.7 The range. ‧ 國. 學. of correlation coefficients in Figure 5 is asymmetric because the area of negative covariance concentrates on age 0-25 and age 80-100, which has lower variance as seen in Figure 2.. ‧. By demonstration above, it is clear that the heterogeneity and correlation are evident even within a single population. The next question should be if these effects actually make a difference. y. Nat. n. al. er. io. in next section.. sit. in fitting and forecasting. We discuss the fitting performance of the approximation factor model. C h population 3 Fitting performance for single. engchi. i n U. v. We investigate the finite sample performance of approximate factor models in a large-scale comparison. The estimation of approximate factor models relies asymptotic convergence to the true covariance matrix. Naturally, one would question the finite sample performance in a real-world application. We use the data from Human Mortality Database (HMD) to compare APCA, HFA, POET, and Lee-Carter model as a benchmark. We collect the death and exposure of all 46 male populations in HMD. We choose to use the full sample, despite each of them has a different sample period. We list the sample period of every population used this study in Table 1.8 7. Here we use the “soft” threshold in Fan et al. (2013) as it has been demonstrated to be more robust choice of threshold. 8 We try to include as many years as possible to evaluate the robustness of the approximate factor models.. 23.

(24) We use two goodness-of-fit measures to examine the fitting performance: log-likelihood value and Bayesian information criterion (BIC). The log-likelihood value is a measure of goodness of fit for the distribution. It reflects the goodness of fit with correlation between variables in mind. We use the log-likelihood function of multivariate Gaussian distribution to evaluate the goodness of fit.9 The log-likelihood function for multivariate Gaussian is. L(X; µ, Σ) = −. T 1 NT log(2π) − log(|Σ|) − tr((X − µ)(X − µ)> Σ−1 ), 2 2 2. (22). where X is the sample, µ and Σ is the estimated mean and covariance matrix, respectively. We also consider the Bayesian Information Criterion (BIC), which is defined as −2L(M ) + p log(N T ), where p is the number of free parameter. BIC penalizes the excess use of parameters.. 政治大 used in other paper as a model selection statistic, for example in Li and Hardy (2011). 立. Given the same data, smaller BIC indicates more favorable model. The criterion is commonly. Table 1 reports the log-likelihood value for all 4 models across 45 populations.1011 We. ‧ 國. 學. estimated the number of static factor in every population with maximum of Bai and Ng (2002)’s two information criteria. The criterion suggests that only one factor is pervasive in all 45. ‧. populations.12. sit. y. Nat. APCA, HFA and POET lead vastly on fitting performance in terms of log-likelihood. Our. io. er. baseline model is the Lee-Carter model with homoskedasticity and uncorrelated idiosyncratic error. This suggests the idiosyncratic heteroskedasticity should not be neglected in the first. n. al. Ch. i n U. v. place. POET dominates the horse race. By comparing the log-likelihood value, it is obvious the. engchi. idiosyncratic covariance has a huge impact to the goodness-of-fit. The additional fitting power to the baseline Lee-Carter model is entirely contributed by the full-on idiosyncratic covariance ΣPu OET since both models have identical common factors and loadings estimate as noted in the previous section. The performance of APCA and HFA is mixed, but mostly favor APCA. 9 Note that the estimation of approximate factor model is nonparametric and therefore does not assume a probability distribution for the variables. We merely use the log-likelihood function as a measure of goodnessof-fit to compare the models, because (a) it accounts for the idiosyncratic covariance; (b) most of the models we discuss in the paper assume multivariate Gaussian, so at least we are not favoring our models. We have considered other choice such as Poisson (see Brouhns et al. (2002) and Renshaw and Haberman (2006) for example). However, writing down the joint probability distribution function for a correlated Poisson distribution is a non-trivial problem. See Johnson et al. (1997) for detail. 10 We exclude Belgium because the mortality data were missing for the entire the World War I period. 11 We considered the Renshaw and Haberman (2006) model with cohort extension but failed to achieve convergence when estimating the model in many population. The likelihood function seems to be very flat when there are many ages. The model were estimated with the ilc package in R. 12 This concurs to the performance and popularity of Lee-Carter model. Additional analysis suggests that one factor model explained over 70% of the variation in about 70% of population in our sample.. 24.

(25) The APCA is designed to capture the age heteroskedasticity while HFA is designed to capture the time heteroskedasticity, which the population may exhibit either or perhaps both. The implication of our result suggests that the cross-sectional idiosyncratic heteroskedasticity play a more important role than the time-series heteroskedasticity. Table 2 reports the BIC for all models. BIC reflects the goodness-of-fit with consideration to parsimony in parameter. Lee-Carter model is the most parsimonious model with only 1 parameter for the idiosyncratic covariance, since it assumes all the age-specific mortality rates have equal variance. APCA assumed heteroskedasticity hence it has N additional parameters. HFA assumed asymptotic time heteroskedasticity hence it has T additional parameters when estimating the factor and loading. Finally, POET is the least parsimonious since the all off-. 政治大. diagonal entries of idiosyncratic covariance can take value. BIC suggested that APCA has the best overall performance across 45 populations. Lee-Carter is the least favorable model overall,. 立. as we expected. Because the mortality rate clearly exhibit heteroskedasticity as shown in 2.. ‧ 國. 學. The BIC result of POET showed that, despite being heavily penalized, adding correlation to the model still worth the cost in many cases. If parsimony is the primary concern, perhaps. ‧. APCA is better suited for the situation. Our result also suggested that the benefit of taking heteroskedasticity and/or correlations in idiosyncratic error outweighs the cost.. sit. y. Nat. n. al. er. io. 4 Multiple populations. i n U. v. The approximate factor model is well-suited for multiple population mortality rates modeling. Ch. engchi. because we are after an even larger number of mortality rates, these data actually increases the estimation accuracy rather than creating a problem. The estimation process also does not change with the number of population or mortality rate included in the model. On top of that, the approximate factor model is equipped to deal with the correlation and heterogeneity amongst multiple populations. We discuss the advantage of approximating factor model below. In the multi-population environment, the interrelation between populations is as important as the interrelation of age-specific mortality rates. In fact, the distinctive feature for multipopulation models is how they link the populations together. For example, the gravity model (i.e. (Jarner and Kryger, 2011) and (Cairns et al., 2011)) assumes the primary/secondary relationship between two populations. The mortality rate fluctuation of secondary population resolves around the primary population. Yang and Wang (2013) link mortality rate across dif-. 25.

(26) ferent populations by the cointegration relationship of the common factor. Li and Lee (2005) propose a multi-population mortality model, also known as augmented common factor model, with one global factor and one individual factor for each population. The global factor affects same-age mortality rate in all population in an equal manner, and the individual factors are independent to each other. The global factor works as a cohesive bonding agent across independent populations. Li and Hardy (2011) compare four multi-population mortality models in terms of fitting performance and forecasting reasonableness. They concluded that the augmented common factor model is the most preferable model. Compare to the existing multi-population mortality rate model, our model does not superimpose any kind of structure on populations. Rather we treat the mortality rate from different. 政治大. populations equally as a random variable. The mortality rate from different population is governed by the same (set of) factors, just with different loadings. The simple setting has some. 立. advantages. The first advantage is that multi-population modeling is as easy as the single. ‧ 國. 學. population modeling. The estimation procedure will be exactly the same for any number of populations, which resulted in virtually unlimited expandability without any additional effort.. ‧. The second advantage is the transparent structure. The covariance structure among populations is straightforward and analytical, which is a huge advantage for risk management purpose. The. y. Nat. sit. third advantage comes from the versatility of factor model. It is possible to include any other. er. io. variable we might interested in, for example, cause-specific mortality rates or socio-economic. al. v i n the joint estimation. Lastly, because is also a factor model, it can be weaved into Cour h emodel i U h n c g any existing form of factor model with zero conflict. n. variables. Factor model allows us to highlight their effect to mortality improvement through. (i). We now introduce a basic structure of multi-population model. Let mx,t be the log mortality rate of age x at time t of population i, x = 1, . . . , N , t = 1, . . . , T and i = 1, . . . , I. Let M (i) be. 26.

(27) the log mortality matrix of ith population. Size of M (i) is N × T . We are interested in . (1). (1). ··· .. ..  m11  .  ..      m(1)  N1 (1) M   (2) m11  .    .  M =  . = .     ..  M (I)  (2) mN 1   .  ..   (I) mN 1. . m1T .. ..      (1)  mN T    (2)  m1T  . ..   .   (2)  mN T   ..  .    (I) mN T. ··· ··· .. . ··· .. . ···. (23). 政治大. Since we are interested in the factor model for multiple populations, we can extend the old notation to. 立m. (i) x,t. (i). (i)). (i) = a(i) x + bx kt + εx,t .. (24). ‧ 國. 學. where the upper suffix (i) indicates ith population. The factor model can also written in matrix form as in Equation (3), i.e.. ‧. Z = BK + ε,. Nat. y. (25). (i). sit. where µ is a collection of ax , i.e., the average of log mortality rate across populations, B is the. n. al. er. io. factor loading, K is factor of log mortality rate across populations, and ε is the idiosyncratic error matrix. The loading B takes the form. i n U. C h e n g· · c· hb i  b (1) 11.  .  ..    (1) bN 1   (2)  b11   . . B=  .   (2) bN 1   (I) b  11  .  ..   (I) bN 1. (1) 1J. ... .. ··· ··· .. . ··· ··· .. . ···. ..  .    (1)  bN J   (2)  b1J   ..  .  ,  (2)  bN J   (I)  b1J   ..  .    (I) bN J. v. (26). where J is the number of factor. The factor K is a J ×T matrix as they were in single population.. 27.

(28) In essence, the model assumes some global factor governs the movement of mortality rates across populations, with each mortality rate affected in a different magnitude. The loading B estimates each mortality rates’ sensitivity to the global factor. In many applications, it is natural to assert that populations share a common component. For example, the Jarner and Kryger (2011) assume one smaller population follows another large population’s movement. The augmented common factor model assumes a common component exists across different population as well. The joint covariance under the approximate factor model has exactly the same structure as they were in single population. As an example, consider that there are two populations, M (1) and M (2) , each includes an arbitrary number N1 and N2 of mortality rate, respectively. Let the Σapprox be the joint covariance under approximate factor model, then. 立. Σapprox = Σapprox + Σapprox c u. 政治大. >. ‧ 國. . 學. = BB + V 2 σ ˆ1,1. 2 σ ˆ1,N 1 +N2. . ···   > >  . ..  B(1)B (1) B(2)B (1)   .  , . . = . + . .  > >   B(1)B (2) B(2)B (2) 2 2 σ ˆN · · · σ ˆ N1 +N2 ,N1 +N2 1 +N2 ,1 . . ‧. sit. y. Nat. where V is the idiosyncratic covariance.. n. er. io. al 5 Multi-population performance. Ch. engchi. i n U. v. We evaluate the approximate factor model in the multi-population setting by comparing the insample fitting and out-of-sample forecasting performance. The application of multi-population mortality modeling is perhaps best examined by the forecasting performance, as it is often considered as the “ultimate test” to a model. We compare the approximate factor models to various models, including both multi-population model and single population models such as Plat (2009) and O’Hare and Li (2012). We use the following five male populations: USA, UK, France, Spain, and Italy to fit the model. The mortality data are available for all five populations simultaneously during 1933 to 2009. For benchmark model, we discuss several choices in the next subsection.. 28.

(29) 5.1 Other models To incorporate correlation between populations, the augmented common factor model “augmented” a global component to the single population Lee-Carter model. Li and Lee (2005) propose the augmented common factor model. A discussion of the augmented common factor model and other multi-population model can be found in Li and Hardy (2011). The augmented common factor model for I populations takes the form of (i). (i). (i). g g (i) mxt = a(i) x + bx kt + bx kt + εxt ,. i = 1, . . . , I. (27). where the global factor bgx and its loading ktg are obtained from a weighted average population PI PI (i) (i) (i) (i) i are estimated with Mg = i=1 Ext . The individual component bx and kt i=1 Ext Mxt /. 政治大. (i). (i). Lee-Carter model, using residual terms from the global model mxt − ax − bgx ktg as input. The. 立. idiosyncratic error has mean zero and variance σi2 for population i. The global factor controls. ‧ 國. 學. the co-movement of mortality rates across different populations; therefore it is crucial to the mortality dependence structure. If presented in matrix form, the augmented common factor. ‧. model can be written as. (i). io (i). = (m1t , . . . , mN t )> . Or, equivalently,. n. al. i n C hBe Kn g+cB h Ki U+ ε, M =A+ G. G. y. (i). (28). er. (i). where Mt. (i). = A(i)) + B g Ktg + B (i) Kt + εt ,. sit. Nat. (i). Mt. idv. v. idv. (29). where B G = eI ⊗ B g is the global loading stacked vertically for I times, respectively. K G = (K1g , · · · , KTg ) is the matrix for global factor, B (i) is the loading for the ith population, and K i = (K1i , · · · , KTi ) is the matrix for individual factor of ith population. We also consider the M9 and M10 model proposed in Plat (2009) and O’Hare and Li (2012), respectively. The mortality model, especially the single population model, has been repeatedly investigated and improved since the last decade. The Lee-Carter model can be seen as an application of factor analysis, which has been highly successful in both academic and practice. However, many specialized mortality models were developed after the Lee-Carter model and they have shown advances in the modeling. Therefore, we add M9 and M10 model to the comparison because their empirical performance. In the M9 model the log mortality rate is in 29.

(30) following form: mxt = ax + kt1 + kt2 (x − x ¯) + kt3 (¯ x − x)+ + γt−x + εxt , where the kti ,. (30). i = 1, . . . , 3 is the factor. And the loadings, all presented in some function of. age x, are associated with certain feature in the mortality curve. For example (x − x ¯) aims to capture the increasing mortality rate in age, and (¯ x − x)+ capture the effects specific to the lower ages. The M10 model modifies M9 by adding a non-linear term to make it more suitable for a wider range of age. The M10 model takes the form. mxt = ax + kt1 + kt2 (x − x ¯) + kt3 (¯ x − x)+ + [(¯ x − x)+ ]2 + γt−x + εxt ,. (31). 政治大 The difference to augmented 立 common factor model. where the additional [(¯ x − x)+ ]2 term capture the non-linear lower age effect.. 5.2. ‧ 國. 學. How are our models different from the augmented common factor model? The primary difference rests in how the global factor’s impact to individual mortality rates is handled, which set the. ‧. structure of covariance matrix. This is best explained by an example.. The factor model should allow the correlation of mortality rate across different populations. y. Nat. sit. to be free parameters so they can be estimated. This is important since the dependence is main. er. io. element in risk management and pricing. However, this is not true in augment common factor. al. n. v i n C h common factor model The joint covariance of the augmented e n g c h i U should illustrate this point. The augmented common factor is a 2-factor model with one global factor and one individual factor for model since the mortality rates are set to be perfectly correlated.. each country. So, we can decompose the covariance into sum of three parts: the global component, the individual component, and the idiosyncratic component. Recall that B (i) and K (i) are the loading and the factor for individual population in the augmented common factor model. We >. acm (j) (j) (j) . can write the individual component Σacm idv for jth population as Σidv (j) = B cov(K )B. Similarly, independence and homoscedasticity allows us to write the idiosyncratic covariance as. 30.

(31) 2 Σacm u (j) = σj IN ×N . Then the joint covariance is. Σacm = Σacm + Σacm + Σacm g i u   g g g> g g g> B cov(K )B  B cov(K )B  =  B g cov(K g )B g> B g cov(K g )B g>     acm acm 0 0  Σidv (1)   Σu (1)  + + , acm acm 0 Σidv (2) 0 Σu (2) where Σacm is the component of covariance contributed by the global factor, Σacm is the compog i nent contributed by individual factor, and Σacm is the idiosyncratic covariance. Here we can see u that only off-diagonal block in Σacm has non-zero entries. Since the global factor affects every g. 政治大 the model limits the mortality 立 rate from different population to be always perfectly correlated.. population equally, it is the only source of linking the dynamic of populations. In other word,. ‧ 國. 學. Although there is no explanation regarding the specific choice in Li and Lee (2005), we suspect it is possible that the model and estimation method were designed to comply with. ‧. the N < T assumption. The key element that leads to the particular property is that every population has the same loading toward the global factor in the model. Li and Lee (2005). Nat. sit. y. choose to extract the global factor and loading from the weighted average of mortality rates,. io. n. al. er. which is the same size as the individual mortality rate data.. 5.3 Goodness-of-fit. Ch. engchi. i n U. v. We use the same log-likelihood and BIC as in the in-sample fitting comparison. Our comparison includes the approximate factor models, plus the augmented common factor model, the LeeCarter model, M9 model, and M10 model. The multi-population models were fitted with all five populations simultaneously. The single population models were first fitted to each of the five populations individually then combined together to obtain maximum goodness-of-fit. We assume the joint distribution of mortality rates of all populations is multivariate Gaussian with mean µ and covariance Σ. For approximate factor models, it is fairly straightforward to ˆB ˆ > + Vˆ . For single population models, the mortality find the joint covariance with Σapprox = B rates of different populations are assumed to be independent hence the joint covariance is a block diagonal matrix. In our case, the joint covariance is simply the direct sum of 5 individual. 31.

(32) covariances, i.e.,. Σindep.   indep (1) Σ 0 0   5 M   . indep . .. = Σ (i) =  0 0     i=1 0 0 Σindep (5). (32). Finally, the joint covariance of the augmented common factor model is Σacm = Σacm + Σacm + Σacm G i u >. = B G cov(K G )B G +. 5 M. >. B (j) cov(K (j) )B (j) +. j=1. 5 M. .. (33). σj2 IN ×N. j=1. 政治大. Table 3 reports the log-likelihood, BIC, and explanation ratio (ER) comparison for all models. Here we only consider 1-factor model for approximate factor model because information. 立. criteria in Bai and Ng (2002) suggest that there is only one pervasive factor for 5 populations.. ‧ 國. 學. Even with one factor, approximate factor models clearly fit more accurately than single population models. If we look at the ER ratio, which is a measure for point estimate accuracy, the. ‧. numbers would show that the single population model is more accurate in point estimate. This is unsurprising since single population models are fitted specifically to each of the population, they. y. Nat. sit. would be have the smallest deviation to the sample. When compared to the multi-population. er. io. models, we can see the (lack of) dependence greatly compromise the ability of fitting the data. al. n. v i n C h common factor model In Table 3, we see that the augmented e n g c h i U perform most favorably against 1-factor approximate factor models, which is probably because it is a 2-factor model. Table 4 in term of log-likelihood value and BIC.. reports the goodness-of-fit result of 2-factor approximate factor model, which outperform the augmented common factor model in pure goodness-of-fit. We also see that the HFA model is the best performer in this 5 populations comparison. It is interesting because in our single population comparison HFA is not even the runner-up. If we look at Table 1, the consensus comes from the individual population is that the POET model is the best model and APCA comes second. One explanation is that the multi-population setting naturally leads to heteroskedastic idiosyncratic error in both cross-sectional and time dimension. Consider an example: there are two populations of the same size M (1) and M (2) . They are generated by two unknown factor models with iid and homoscedasticity. Let the idiosyncratic variances be σ12 and σ22 . Suppose one tends to estimate the joint factor model from T × T cross-product ΣT of [M (1) ; M (2) ]. The 32.

(33) ΣT will be in the form of.   2 σ1 0  ΣT = F > F +  , 0 σ22. (34). which implies heteroskedasticity in time series dimension. This suggests that the time heteroskedasticity is crucial to multi-population mortality model. On the cross-sectional dimension, the joint idiosyncratic covariance is a direct sum of individual idiosyncratic covariances, which will be heteroskedastic unless all mortality rates from each population have the same variance. This could happen when the populations have distinct structure due to varying socioeconomic condition. We conjecture that the impact would be less visible if the populations were chosen from the same country. After all, the timing and size of mortality improvement. 政治大. could be very different across countries, and we cannot capture these features without modeling the idiosyncratic heteroskedasticity.. 立. The appeal of modeling idiosyncratic covariance can be demonstrated in multi-population. ‧ 國. 學. approximate factor model. As previously noted, the common component of POET model is estimated with PCA, therefore their fitted value to mortality data is the same hence LC and. ‧. POET model have exactly the same ER ratio. In Table 3 we can see that the log-likelihood value of POET model is almost twice as much of the LC when incorporating the full idiosyncratic. y. Nat. sit. covariance estimation. Even with APCA and HFA model the increase in both log-likelihood. er. io. and BIC is substantial. As we increase the number of factor, the contribution of idiosyncratic. al. v i n Cfactor Compared to the augmented common the 2-factor models perform better in both U h e nmodel, i h gc n. covariance to goodness-of-fit becomes smaller, but still large in the case of 2-factor models.. log-likelihood value and BIC except the LC model. The fact that log-likelihood value of LC model is very close to, but not greater the augmented common factor model may suggest that the estimation of idiosyncratic covariance might be more important than the interrelation of the different population. Overall, our result suggested that the APCA, HFA, and POET not only suitable in single population, but in multi-population modeling as well.. 5.4 Forecast comparison In this section we carry out a simple but rigorous statistical test of forecast accuracy for the competing models. After all, model that fits the best does not necessarily forecast the best. Diebold and Mariano (1995) developed a test of equal predictive accuracy, that is, test whether. 33.