The Proposed Approach - 出席國際學術會議心得報告 - 死亡壓縮及死亡模型在延壽風險的應用

出席國際學術會議心得報告

2. The Proposed Approach

Since the variance (or volatility) is in inverse proportion to the sample size, the parameter estimation in smaller populations usually has a larger fluctuation. To increase the sample size is a natural choice for lower the fluctuation, and it is assumed that there is a larger reference population in the previous work. In this study, however, suppose there are a target (small) population and a lot of populations available. The goal is to find populations which have similar mortality experience as the target population, and combine these populations in order to acquire more reliable estimation.

For example, suppose we want to model Taiwan’s mortality rates, using the mortality data from the HMD⁴

(The Human Mortality Database). One of the key steps is to define the notion of

“similar mortality experience” and a possible choice is to check whether two populations have similar mortality improvements at all ages (more general than identical mortality rates at all ages).

In other words, we first compute the mortality improvement at all ages, or

r

_x(

t

q

_x(

t

+1)/

q

_x(

t

), where

q

_x(t)is the age-specific mortality rate of age x at time t. Next, we shall check if two populations share similar behavior in

r

_x(t) for all ages, and we adopt the cluster analysis to select countries that have similar mortality experience as Taiwan. Finally, we combine the data of these countries, together with Taiwan data, and apply the principal component analysis to model mortality rates.

In the following discussions, we shall briefly describe the processes of cluster analysis and principal component analysis used in this study.

 Choosing Countries using Cluster Analysis

The cluster analysis (CA) is to classify a set of observations/variables into one, two, or more mutually exclusive groups, and the observations/variables within a group share properties in common (i.e., homogeneity). It is cognitively easier to model and predict a homogeneous group, rather than a heterogeneous group. Heuristically speaking, if two countries have similar mortality improvement at all ages, the differences of

r

_x(t) at all ages shall be very close. In other words, applying the CA to the differences of

r

_x(t), there shall be only one cluster. If there are two or more than two clusters, this indicates that not all ages have similar mortality improvement rates and these two populations do not have similar mortality experience.

The distance measure plays an important role in deciding the number of clusters in the CA.

Among possible choices of distance measure, we choose the Euclidean distance, due to its popularity. In the beginning, each individual (observation or variable) is treated as a cluster (or group) and then individuals are merged into a cluster according to their distances, until meeting certain criteria. Basically, the final grouping is a balance of two criteria: minimizing the average distance within clusters and maximizing the distance between clusters. Usually, we can apply the AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion, or Schwarz’s Criterion) to decide the optimal number of clusters. The AIC & BIC are designed to avoid the possibility of

over-parameterization, which takes the fitting errors and the number of parameters into account. being estimated, and n is the number of observations. The smaller the AIC/BIC is, the better the model fit is.

Note that there are several ways to define distance within a cluster and distance between clusters. Usually, the average distance (or Average Linkage) and the Ward’s minimum variance criterion are the most popular choice of distance. The average distance within a cluster is to measure the average distance of all individuals to cluster center, and the distance between clusters are the Euclidean distance between two cluster centers. The Ward’s criterion is defined similarly, except the calculation of distance is replaced by the variance. In this study, we choose the Ward’s minimum variance criterion.

We will use the mortality data (age-specific mortality improvement of single age, 1970-2000) from Taiwan and U.S. to demonstrate the optimal number of clusters. The variables used are the mortality improvement of each age (ages 0-99) in Taiwan and U.S. and Table 3-1 shows the information of AIC and BIC for all possible numbers of clusters. In principle, the smaller the AIC &

BIC is, the better the choice is. For example, the optimal number of clusters is 6 judging via the AIC, and 2 via the BIC. Because the AIC and BIC usually have different choices, sometimes we calculate the ratio of distance for the current number of clusters to one less number of clusters (Ratio of distance measure on Table 2-1). A larger value of the ratio of distance indicates a larger distance between clusters, a preferred result. In this study, we will use the AIC as the primary criterion, and fewer clusters are preferred. Thus, it seems 3 clusters is a feasible choice and this suggests that the mortality experiences of Taiwan and U.S. are not very similar.

Table 2-1. Optimal Number of Clusters for Mortality Similarity between Taiwan & U.S.

 Using the Principal Component Analysis for Mortality Estimation

After deciding the countries that share similar mortality experience as the target population, we will use all the data from these countries to model mortality rates. Since there are a lot of variables, we need to apply techniques of data reduction before modeling the mortality. Similar to Yang et al.

(2010), we choose the principal component analysis (PCA) since the PCA is easy to use. Singular value decomposition (SVD), used in estimating the parameters of the LC model, is another possible choice for data reduction. Since the SVD and PCA usually give very similar results, we will consider the PCA only in this study. The functional PCA approach by Hyndman and Ullah (2005) is also a possible choice.

The PCA is a popular analysis method for dealing with multivariate data, and it can be used for data reduction and interpretation. As the dimension (i.e., number of variables) of data increases, summarizing the data would become more difficult. The PCA is to find uncorrelated components (as fewer components as possible) but still preserves the data properties. Note that the components extracted are linear combinations of original variables. The PCA is derived based on the covariance or correlation matrix (Johnson and Wichern, 2002) and its eigen-decomposition.

Similar to most of the mortality studies, the logarithms of mortality rates, instead of the mortality rates, will be used. We shall apply the PCA to the logarithms of age-specific mortality rates from countries which have similar mortality experience as the target population. Also, the

number of PC’s chosen can provide possible interpretations for the mortality rates, and the number is usually 1, 2, or 3 (Bell, 1997). For example, the LC model can be treated as an 1-PC model and the logarithms of central death rates of all age groups decrease linearly. The model proposed by Yang et al. (2010) is an example of 2-PC model, and the decreasing trends of logarithm of central death rates vary at two different time periods. The model by Heligman and Pollard (1980) can be treated as a 3-PC model, where they separate the human life into 3 different periods: infant &

childhood, younger adult, and adult.

在文檔中死亡壓縮及死亡模型在延壽風險的應用 (頁 32-36)