• 沒有找到結果。

3 Data and Methods

3.2. Empirical methods

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

3.2. Empirical methods 3.2.1. Survival analysis

An analysis of the promptness of a child to die before reaching one year of age due to the influence of the IBI and a vector of covariates could potentially be measured by using a logistical regression or a probabilistic approach. The issue is, then, we would only be measuring the likelihood of an event to occur subject to its control variables. A variable such as infant mortality does not only consider that the event –death– happens, but also when it happens. Thus, timing matters because there’s a time frame of 12 months for it to be considered that a subject suffered from an ‘infant death’ and not any other category. Another crucial factor being, the fact that the mortality of children accelerates dramatically the closer it is to their birth date. This is evidenced in the fact that large percentages of children that died under the category of infant mortality are concentrated in neonatal mortality (within 28 days postpartum) and much more so in the very vulnerable group of perinatal mortality (within 1 week after childbirth). Thus, the function for the survival of these children is a non-linear function highly concentrated within the very first periods, behavior that is best captured by a survival analysis approach.

When the outcome variable of interest is the time until a specific event takes place, the statistical analysis can be done using the non-parametric and parametric estimations contained in the so-called survival analysis. Therefore, the dependent variable now has two important components: time, i.e. time elapsed from the beginning of the study until the event occurs or the study ends; and the other part is the event, that is, the indication that the individual studied had the experience of interest (Kleinbaum & Klein, 2010).

These components are usually defined in survival analysis as the “survival time”, for the time to event, and “failure” to the occurrence of the event; terms derived from the initial heavy influence of biostatistics in these methods that usually contemplated a survival time to death, diseases or other negative life outcomes (Kleinbaum & Klein, 2010).

Regarding one of the components of analysis, the event, there are different classifications for it. For instance, there’s single events, i.e. those that account for duration for one event for each studied unit. Usually these events are assumed to be absorbing, i.e.

can only happen once. In contrast, there’s also the case of multiple events, which can be:

(1) of multiple types, that is, different and absorbing events; and (2) recurrent events, when the same event is studied in repeated occasions (Skrondal & Rabe-Hesketh, 2004).

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Additionally, following Hosmer, Lemeshow & May, 2008, while analyzing the other important component, ‘survival time’, it’s key to understand the issue of

“censoring”. As a time measurement, the survival time has to properly define and count units of time elapsed from a beginning to an ending point. In the process, the observation of time might become incomplete, issues which are called censoring and truncation. The authors describe that an observation can be ‘right-’ or ‘left-censored’, the former, occurring when the ‘time’ finished before the ‘event’ or outcome of interested has occurred; and the latter, happens when, on the contrary, the event of interest has already happened when the observation begins. In regards to truncation, observations are incomplete because of the design of the study selection process. It’s possible to encounter left truncation, also known as delayed entry, when the time to observe an individual is deliberately delayed; as well as, right truncation or length biased sampling, when all the studied population has experienced the event of interest and was selected for the analysis precisely because of that way before the study starts. The estimations contained in survival analysis can be divided into two main methodologies:

a) Non-parametric estimations

Due to the particular issues described above, i.e. censoring and truncation, standard descriptive statistics will not properly estimate the parameters. Thus, Hosmer, Lemeshow & May (2008) suggest finding the cumulative distribution that can generate statistics in line with the interest parameters. This measurement is found in the survival function, denoted as 𝑆(𝑡) = 𝑃𝑟(𝑇 > 𝑡) , that expresses the probability that an observation’s survival time (T) exceeds a specific point in time t. This is the most common measure, as the majority of studies are interested in subjects not to fail (e.g. live), rather than experience the failure (e.g. death), although focus on failure it’s also possible by using the hazard function.

Generally, the Kaplan-Meier survival curves are used to estimate the survival probability, since it considers all the available information from the observations, censored or uncensored. Its functional form is given by Equation 1:

𝑆̂(𝑡(𝑓−1)) = ∏ 𝑃𝑟 ̂

𝑓−1

𝑖=1

(𝑇 > 𝑡(𝑖) | 𝑇 ≥ 𝑡(𝑖) )

This definition indicates that the estimator derivates from the products of the sequence of conditional survival probability past the failure time (𝑡(𝑓)), thus aiding to observe the shape of the survival function. The Kaplan-Meier estimator allows each (1)

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

observation to contribute information while they are under the status of “surviving”, those who experienced the event or are right-censored provide information for the at-risk group and later the former sums to the number of observations that failed.

Once the survival probability has been observed, it’s key to determine whether or not the Kaplan-Meier survival curves are proportional or statically equivalent for relevant groups originated from the set of covariates, particularly, those that depict effects that are believed to be related to the survival of the study units. Specifically, it’s important to measure differences among groups in order to assess the validity of including those variables in the final model. In this sense, common statistical test such as two sample hypothesis testing or the rank-sum test, will not yield proper estimations when dealing with censored data observations.

Hence, it’s suggested to use the long-rank test, which builds on the survival curves (Kaplan-Meier) to provide evidence on differences at the population-level. Thus, the log-rank test runs a sample-wide χ2 test to compare the curves, by comparing per categories the cell counts of observed and expected events over all failure times. Ultimately, the long-rank test helps to test the null hypothesis that there is no difference between two or more survival curves (Kleinbaum & Klein, 2010).

In addition to the KM survival curves, there are alternative estimators, the main one being the Nelson-Aalen one that instead of using the estimator 𝑆(𝑡) defined in Equation 1, it developed one based on 𝐻(𝑡) . This estimator is also known as the cumulative hazard function in survival analysis and its graphical representation would emulate that of the survival function but in opposite direction, since, as 𝐻(𝑡)or the cumulative hazard increases there will be a decrease of the same magnitude in the survival function 𝑆(𝑡). The estimator is given by Equation 2:

𝐻̃(𝑡) = ∑ 𝑑𝑖 𝑛𝑖

𝑡(𝑖)≤1

Where 𝐻(𝑡) is the cumulative hazard, given the observed deaths 𝑑𝑖 and those at risk of dying 𝑛𝑖 at time 𝑡. Therefore, for large size of risk-to-events ratios, the survival functions found by both of the estimators will not yield great practical differences.

b) Semi-parametric estimation: The Cox proportional hazard model

The non-parametric methods described above cover a variety of methods within the so-called univariate analysis. Nonetheless, survival analysis can also exploit the time-to-event analysis with a set of variables that are thought to affect the occurrence of the

(2)

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

event. Indeed, strong theoretically or empirically associated variables should not be omitted and should be treated as potential confounders. Thus, more complex analysis is only feasible through a multivariate analysis, namely a parametric regression model under the time-to-event framework.

One empirical method that serves this purpose is the Cox proportional hazard model, proposed by Cox (1972) as an expansion on the work of Kaplan Meier while incorporating regression parameters, that is, including explanatory variables that provide coefficients on the basis of a time function. Following Kleinbaum & Klein (2010), the Cox proportional hazard model can be defined as:

ℎ(𝑡, 𝑋) = ℎ0 (𝑡) 𝑒𝑝𝑖=1𝛽𝑖𝑥𝑖

Where the first term at the right of Equation 3, ℎ0 (𝑡), represents the baseline hazard which considers time (𝑡), but not any effects on the X vector (explanatory variables). Meanwhile, the second term 𝑒𝑝𝑖=1𝛽𝑖𝑥𝑖 is the exponential expression, it addresses the covariates but not the time, since X’s are assumed to be time-independent.

This assumption is the proportional hazards which ultimately proposes that a change in an individual’s explanatory factors induces a proportional change in his/her hazard rate.

(thus, the multiplicative relation).

Among the strengths of the Cox proportional hazard model are: (1) it offers a robust estimation, i.e. the results will be consistent with those of the correct parametric model, which in turn, it’s hard to establish the appropriate model; and (2) the Cox PH model relies on the introduction of the hazard ratio (HR) to measure the risk of exposure to the event, as described by Hosmer, Lemeshow & May (2008). Therefore, a hazard ratio that’s equal to 1, will indicate a non-existing effect; a hazard ratio under 1 expresses a reduction in hazard; and finally, a HR greater than the unit means an increase in the hazard of exposure to the event. Additionally, the Cox PH model also allows to compare outcomes of hazard among groups, functioning as a “relative-risk” ratio when interpreting binary variables, for example.

Finally, to validate the use of the Cox PH model the assumptions of proportional hazards have to be assessed. In other words, assessing during postestimation the goodness of fit through hypothesis testing for relevant predictors. Majorly used, the Schoenfeld residuals, help provide a statistical test for covariates, both individually and globally, and whether they are or not related with the survival time. Testing for a null hypothesis of (3)

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

proportional hazard, the χ2 statistic will indicate the rejection or non-rejection of the null hypothesis described above.

c) Extended Cox proportional hazard model for time-dependent variables When the test for the independence of predictors with the survival time fails to reject its null hypothesis, those predictors are concluded to be time-dependent. This invalidates the proportional hazard assumption for the Cox PH model. When encountering this situation, the Cox model can be extended to include terms of interaction between the time-dependent variable and a specific function of time.

In a similar way to the Cox PH model definition contained in equation 3, the extended model also includes a baseline hazard denoted by ℎ0 (𝑡) multiplied by an exponential term. Nevertheless, according to Kleinbaum & Klein (2010) the exponential function for the extended model now introduces the recurrent time-independent covariates Xi and the time-dependent covariates Xj(t), seen in the right-hand side of Equation 4, while all predictors at time t are denoted by X(t) on the left-hand side of the equation:

ℎ(𝑡, 𝑋(𝑡)) = ℎ0 (𝑡) 𝑒[∑𝑝1𝑖=1𝛽𝑖𝑥𝑖+ ∑𝑝2𝑗=1𝛿𝑗𝑥𝑗(𝑡)]

The authors also point out the vital assumption of the extended Cox model being that the effect of a time-dependent variable Xj(t) on the survival probability at time t depends on the value of the predictor at that same time t. Thus, the model provides only one coefficient for each time-dependent variable, depicted in δj of equation 4. The statistical significance of the interaction term between a time-dependent covariate and the function of time represents the violation of the proportional hazard assumption for that specific covariate.

d) Parametric estimation: The Weibull model

The main characteristic of parametric survival models is that they assume a known distribution for the survival time. Parametric models are preferred when an assumption on the distribution is feasible and estimated parameters can fully specify the survival and hazard functions.

Among them, the Weibull model is the most popular one as it is deemed to be more flexible, while the hazard function remains simple as it only rescales t to a fixed power (Kleinbaum & Klein, 2010). In addition, the Weibull model is the only parametric model that is able to hold both accelerated failure time (AFT) assumption and the proportional hazard (PH) assumption. Thus, making it suitable for modelling data with (4)

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

hazard rates that either increase or decrease over time. The hazard function under the Weibull model is given by the form:

ℎ(𝑡) = 𝜆 𝑝(𝑡)𝑝−1

Where λ is reparametrized in terms of predictor variables and the newly added parameter p, also called the shape parameter, serves as the indicator of the shape of the hazard function. For instance, when p>1 the hazard function increases over time; when p=1 the hazard function is constant and it reverts back to an exponential approach, and lastly, when p<1 then the hazard function decreases as time goes on. It is from this added parameter that the Weibull model is considered of great flexibility, as it adapts to the behavior of the hazard over time rather than assuming it from the beginning.

3.2.2. Empirical model

Using the pooled data from the 1998-2011 Nicaraguan DHS, the two main components of the dependent variable, event and survival time, are defined as: event, binary variable for infant mortality (under-one year old deaths); and the survival time, as the age at death for under one year-old children and for surviving children, their age up until the cutoff of the event.3 The event, infant mortality, is –according to the definitions observed in Section 3.2.1.– a single and absorbing type of event.

In regards to censoring, the model for survival analysis of Nicaraguan children subject to the interbirth intervals incurs in right-censoring. This happens for all children that indeed experienced the event, i.e. they passed away but where not accounted for since they were older than 1 year old. On the other hand, the nature of the data collected by the DHS also allows to discuss the existence of unobserved left-censoring. That is because mothers could effectively have had a non-live outcome in her reproductive history, such as terminations of pregnancy and miscarriages, although those are not recorded by the DHS data. Therefore, there are children that experienced the event before being born, thus not becoming a live birth, but this is also the reason why they are not observations on the DHS, losing this information completely.

Starting with the non-parametric estimations, the Kaplan Meier survival curves will be estimated by groups of IBI categories and placed in the same graph, this definition will aid in recognizing graphically the survival differences among categories for the IBI ranging from very short, short and recommended birth intervals. It’s expected to find a

3 The rationale behind this is simulating a study that individually followed each child for a period of 1 year from their birth date. Thus, the variable “survival time” will stop counting after their first birthday for those who survived, as it simulates the study ‘ending’.

(5)

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

lower survival function and curves for very short interval, while a higher survival probability for those with a recommended length of IBI. Moving towards a proper definition of the semi-parametric model, log-rank tests will be used in all proposed explanatory variables to assess the differences among groups and provide evidence to support the inclusion of those variables in the final model. Finally, on the non-parametric methods, the consistency of the behavior found in the survival curves will be assessed through the calculation of the hazard functions using the Nelson-Aalen estimator and its graphical representations.

In respect to the semi-parametric proposed empirical strategy, the Cox proportional model will be used as depicted in Equation 3. Adjusted to the covariates selected for this study, the first model will have the form:

ℎ(𝑠𝑡𝑖𝑚𝑒, 𝑋) = ℎ0 (𝑠𝑡𝑖𝑚𝑒) 𝑒12𝑖=1𝛽𝑖𝑥𝑖

Where 𝑠𝑡𝑖𝑚𝑒 represents the survival time, 𝑋 is a vector of explanatory variables only at the child-level that include: 𝐼𝐵𝐼 (interbirth interval); 𝑚𝑜𝑟𝑡𝑝𝑐 (mortality of preceding child); 𝑔𝑒𝑛𝑑𝑒𝑟 (gender of index child); 𝑏𝑖𝑟𝑡ℎ𝑜𝑟𝑑 (birth order); and 𝑠𝑡𝑏𝑖𝑟𝑡ℎ (singleton/multiple birth). A second model, will add to the vector of explanatory variables the controls at the mother and households levels: 𝑚𝑜𝑡ℎ𝑒𝑟𝑒𝑑𝑢𝑐 (mother’s educational level); 𝑚𝑖𝑠𝑐𝑎𝑟𝑟 (has had a miscarriage); 𝑊𝐼 (wealth index); 𝑡𝑜𝑡𝑎𝑑𝑢𝑙𝑡𝑠 (total number of adults living in the household) and ℎ𝑒𝑎𝑙𝑡ℎ𝑠𝑒𝑟 (remoteness of health services).

Additionally, the following variables will be used to 𝑚𝑜𝑡ℎ𝑒𝑟𝑎𝑔𝑒 (age of mother at birth);

𝑎𝑟𝑒𝑎 (area of residence). After the estimation of the three models, the Schoenfeld residuals will be used to globally test each estimation, and thus, determine the proportionality of hazards.

If some predictors were found to have non-proportional hazard functions, i.e. to be time-dependent, the initial Cox model will be modified to the extended Cox PH model of the form:

ℎ(𝑠𝑡𝑖𝑚𝑒, 𝑋(𝑠𝑡𝑖𝑚𝑒)) = ℎ0 (𝑠𝑡𝑖𝑚𝑒) 𝑒[∑12𝑖=1𝛽𝑖𝑥𝑖+ ∑12𝑗=1𝛿𝑗𝑥𝑗(𝑠𝑡𝑖𝑚𝑒)]

Where 𝑠𝑡𝑖𝑚𝑒 is the survival time, 𝑋(𝑖) is the same vector of explanatory variables depicted in Equation 6 for all three models, and 𝑋(𝑗) comprises variables found to fail to reject the null hypothesis of time dependency in that specific predictor, as an interaction term with the time function. Finally, the specification of the parametric model using the Weibull distribution will be given by Equation 5, where λ will be reparametrized in terms of the vector of covariates at child level for model 1, at mother and household level for (6)

(7)

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

model 2, and t will be, similar to the models above, the survival time. Additionally, the Weibull model will be used for by-group definitions of the variables described in Table 2, with the purpose of finding evidence on which causal mechanism is more impactful.

3.2.3. Expected results

In line with the objectives of this study presented in Section 1.2., the methods described in this section aim to provide a basis to corroborate or deny the followings premises:

• There is a deleterious effect of spacing childbirth in less than 18 months for the Nicaraguan case, evidenced to a higher hazard rate for children born in this interval.

• Spacing childbirth in 18-35 months or more than 36 months (WHO recommended interval) is beneficial for children survival, it reduces the hazard rate for children born in these intervals.

• There is evidence of two causal mechanism found in the literature: maternal depletion syndrome and sibling competition: variables such as death of the preceding child, birth order and singleton/multiple birth find a higher hazard of infant death.

• The model can provide information on the most impactful of those mechanisms:

there’s a clearer effect of larger magnitude for one of the mechanisms.

相關文件