• 沒有找到結果。

申訴理賠時間與醫療成本的二元分配函數

N/A
N/A
Protected

Academic year: 2022

Share "申訴理賠時間與醫療成本的二元分配函數"

Copied!
53
0
0

加載中.... (立即查看全文)

全文

(1)

國立臺灣大學理學院數學系 碩士論文

Department of Mathematics College of Science

National Taiwan University Master Thesis

申訴理賠時間和醫療成本的二元分配函數

Bivariate Distribution of Claiming Time and Medicare Reimbursement Based on Incomplete Data

黃姿蓉

Tzu-Jung Huang

指導教授 : 江金倉 博士

Advisor: Chin-Tsang Chiang, Ph.D.

中華民國九十七年七月

July, 2008

(2)

口試委員審定書

(3)

Table of Contents

口試委員審定書 i

Table of Contents ii

List of Tables iii

List of Figures v

Acknowledgements vi

摘要 vii

Abstract viii

1 Introduction 1

2 Estimation and Inference Procedures without a Terminal Event 4

2.1 Review of Huang-Louis Estimator . . . 4

2.2 IPW Estimator . . . 6

2.3 Imputation Estimator . . . 8

2.4 Inferences Procedures . . . 9

3 Estimation and Inference Procedures with Terminal Events 13 4 Numerical Studies 17 4.1 Simulation Setting of (Xo, Yo) . . . 17

4.2 Senario I - Without a terminal event . . . 18

4.3 Senario II - With terminal events . . . 19

5 Application to Colorectal Cancer Data 32

6 Discussion 40

Bibliography 42

(4)

List of Tables

4.1 The averages and the standard deviations (SD) of 500 estimates FHL(t, u) and the averages of 500 standard errors (SE) at the selected points with the sample sizes (n) of 200 and 400, and the censoring rate of 30% . . . 21 4.2 The averages and the standard deviations (SD) of 500 estimates

FIP W(t, u) and the averages of 500 standard errors (SE) at the se- lected points with the sample sizes (n) of 200 and 400, and the cen- soring rate of 30% . . . 22 4.3 The averages and standard deviations (SD) of 500 estimates FIM(t, u)

at the selected points with the sample sizes (n) of 200 and 400, and the censoring rate of 30% . . . 23 4.4 The averages and standard deviations (SD) of 500 estimates FHL(t, u)

and the averages of 500 standard errors (SE) at the selected points with the sample sizes (n) of 200 and 400, and the censoring rate of 50% . . . 24 4.5 The averages and standard deviations (SD) of 500 estimates FIP W(t, u)

and the averages of 500 standard errors (SE) at the selected points with the sample sizes (n) of 200 and 400, and the censoring rate of 50% . . . 25 4.6 The averages and standard deviations (SD) of 500 estimates FIM(t, u)

at the selected points with the sample sizes (n) of 200 and 400, and the censoring rate of 50% . . . 26 4.7 The empirical coverage probabilities of FHL(t, u) at the selected points

with the sample sizes (n) of 200 and 400, and the censoring rates (c.r.) of 30% and 50% . . . 27 4.8 The empirical coverage probabilities of FIP W(t, u) at the selected

points with the sample sizes (n) of 200 and 400, and the censoring rates (c.r.) of 30% and 50% . . . 28

(5)

4.9 The averages and standard deviations (SD) of 500 estimates FIP W (t, u) and the averages of 500 standard errors (SE) at the selected points with the sample sizes (n) of 200 and 400, and the mixture rate of 30% 29 4.10 The averages and the standard deviations (SD) of 500 estimates

FIP W (t, u) and the averages of 500 standard errors (SE) at the se- lected points with the sample sizes (n) of 200 and 400, and the mix- ture rate of 50% . . . 30 4.11 The empirical coverage probabilities of FIP W (t, u) at the selected

points with two sample sizes (n) of 200 and 400, and the mixture rates of censoring and death (m.r.) of 30% and 50% . . . 31 5.1 The characteristics of the colorectal cancer data and subsample . . . 34 5.2 The estimates of P (D > Xo) and E(Yo) under different age layers

and disease stages . . . 35 5.3 The estimates of P (D > Xo) and E(Yo) under different age-disease

stage groups . . . 35

(6)

List of Figures

5.1 The joint distributions of claiming time and reimbursement for dif- ferent age layers and disease stages . . . 36 5.2 The joint distributions of claiming time and reimbursement for dif-

ferent age layers and disease stages . . . 37 5.3 The estimates of P (Xo ≤ t|D > Xo) and P (Yo ≤ u|D > Xo) for

patients with age layers of 61-70(solid line), 71-80(dashed line) and

> 80(dotted line) . . . 38 5.4 The estimates of P (Xo ≤ t|D > Xo) and P (Yo ≤ u|D > Xo) for

patients with disease stages of 0(solid line), 1(dashed line), 2(dotted line), 3(dotted-dash line) and 4(long-dashed line) . . . 39

(7)

Acknowledgements

感謝江金倉老師的指導與啟發, 讓我從對做研究一無所知, 到能對生物統計相關的議題都有大 致的了解。 老師對做研究的熱誠與堅持, 讓人十分地感動。 這篇論文的完成, 要感謝的人太多 了, 以至於我未能一一細數。 許多加油的隻字片語, 出現在不同的字裡行間; 更多令人會心溫 暖的影像, 紛然雜陳。 已忘記當初確切是如何的動力, 引領我走至今時今日, 就如同每一滴水, 都對於即將傾盆的暴雨有所貢獻。 然而在每一場大雨後, 我們總是都看不清水滴的面目。 唯一 能說的是, 未來不論我在何處何地, 我都不會忘記身為一個統計人的快樂, 驕傲與責任:

”The joy and job of statisticians are to listen to carefully to the data, not to tell the data how to behave.”

最後, 僅將此論文獻給, 總是無怨無悔支持著我, 最最親愛的爸爸與媽媽。

(8)

摘要

目前有關二元分配函數分析的研究, 都集中於處理雙存活時間的資料結構, 以及探討其相關的 統計推論。 有別於目前大部分的研究主題, 在這篇論文中, 我們關心存活時間(申訴理賠時間) 和指標變數(申訴產生的醫療理賠) 的二元分配函數。 根據實際觀測的右設限資料, 我們利用機 率倒數加權法(Inverse Probability Weighting Method) 和替換法 (Imputation Method) 提出了更具延伸性的估計式。 同時, 我們也運用機率倒數加權法, 針對有終點事件 (Terminal

Events) 發生的情況, 提出相關的二元分配函數的估計式。 進一步, 我們更建立上述估計式的

大樣本性質, 利用估計式的高斯過程逼近, 配合著變異矩陣的估計式, 建構其相對應的信賴區 間。 我們執行了一系列的模擬檢證這些估計式以及信賴區間在有限樣本下之特性, 此外, 應用 所提出的估計法在 SEER-Medicare 資料庫有關結腸癌的資料上。

關鍵字: 二元分配函數; 指標變數; 醫療成本; 設限; 終點事件; 機率倒數加權法; 替換法; 高斯 過程

(9)

Abstract

Although numerous attempts have been made on bivariate failure times, however, there is little attention on the study of relation between failure time(claiming time) and mark variable(medicare reimbursement). Meanwhile, from the viewpoints of brokers and the management in insurance company, it is more attractive and en- grossing to capture the dynamic pattern so that our research interest would focus on the joint distribution of claiming time and the corresponding medicare reimburse- ment. Based on survival censored data, we propose two estimation procedures:

the inverse probability weighting (IPW) method and the imputation method. Fur- thermore, it is meaningful to accommodate terminal events occurring prior to the realization of failure time and the IPW method could lead to the resolution of this obstacle. Moreover, the limiting Gaussian processes of estimated distributions and the estimators of variance-covariance functions are developed and enable us to construct approximated regions. To investigate the finite sample properties of pro- posed estimators and the performance of inference procedures, a class of simulations would be conducted. An application to the colorectal cancer data retrieved from the Surveillance, Epidemiology, and End Results (SEER) Medicare database is also presented. In the end, we provide a brief discussion and further research topics of interest.

KEY WORDS: bivariate distribution, mark variable, medical cost, censoring, termi- nal events, inverse probability weighting (IPW), imputation, U-statistics, Gaussian processes

(10)

Chapter 1 Introduction

Bivariate failure time data are often encountered in many biomedical contexts, for example, times to blindness in both eyes of diabetic retinopathy patients or gap times to successive stages in the progression of acquired immunodeficiency syndrome.

Estimation methods for dealing with parallel bivariate survival times can be found in Campbell (1981), Tsai, et al. (1986), Burke (1988), Dabrowska (1988), and Akritas, et al.(2003), among others. Statistical analyses for bivariate serial gap times include the works of Wang, et al. (1998) and Lin, et al. (1999). Distinguished from bivariate failure time data, an overwhelming emphasis is placed on health care cost controlling and fiscal accountability in medicine, and a brand new survival- type data structure emerge, called the cost data. The cost data consist of claiming time (time to the occurrence of medicare reimbursement) and the mark variable (medical cost or reimbursement), which is possibly correlated to claiming time and not observed until the corresponding claiming time fulfills. Rather than censoring, the marker would be missing due to the loss of follow-up or drop-out. Thus, it is inappropriate to use the existing statistical methods for bivariate failure time data.

(11)

Even though medical cost management has received immense attention, re- searchers merely highlighted the topic in two primary aspects over the past decades.

One is mean cost evaluation within the period of interest based on complete or cen- sored data. The other is cost-effectiveness (CE) ratio, the ratio of the mean difference in costs to the mean difference in effectiveness. CE ratio is a practical measurement for the trade off between budget constraints and patient benefits. In contrast, from the viewpoints of brokers and the management in insurance company, it is more cru- cial and appealing to explore the joint distribution of the claiming time Xo versus the medicare reimbursement Yo. Few attempts have made at this sort of phenomena except for Huang and Louis (1998) and Hudgens, et al (2007). Based on the censored cost data {Xi, δi, Yi}ni=1 with Xi = Xio ∧ Ci, Yi = δiYio, δi = I(Xio ≤ Ci) being the censoring status, and C being the censoring time, Huang and Louis obtained the estimator for joint distribution F (t, u) = P (Xo ≤ t, Yo ≤ u) by estimating cumulative mark-specific hazard function Λ(t, u). For this data setting, we propose an inverse probability weighting (IPW) estimator and an imputation estimator via using the induced binary responses Bi(t, u) = I(Xio≤ t, Yio≤ u), i=1, · · · , n.

In a colorectal cancer cohort study, the clinical trial might be terminated be- fore the study endpoint is reached. The terminal events, such as death, preclude any further medical costs and definitely relate with (Xo, Yo). For instance, it is instinctive to anticipate that poor health patients prone to death might raise larger medical costs. As long as the death arises before the first claim, the medicare reimbursements of the departed would be zero rather than missing. Instead of

(12)

only considering the joint distrbution of claiming times and reimbursements, it is more meaningful to exploit the joint distribution for those who have medicare re- imbursement prior to death, say, F(t, u) = P (Xo ≤ t, Yo ≤ u|D > Xo), where D denotes the terminal time. In this study, our research focuses on seeking for

an appropriate estimation method of F(t, u) based on the transformed cost data {Xi, δXio, δCi, Yi}ni=1, where Xi = Xio ∧ Ci ∧ Di, Yi = YioδXi, δXio = I(Xi = Xio),

and δCi = I(Xi = Ci). In addition, we provide the estimator for the probability of cost occurring before death P (D > Xo). and the mean cost of patients, which can be derived as E[Yo] ={

uduF(t, u)}P (D > Xo).

We divide the rest of this thesis into the following parts. In Chapter 2, we briefly reviews the estimation method of Huang-Louis and propose the IPW and imputation estimators for F (t, u). Moreover, the asymptotic properties and inference procedures are developed in this chapter. In Chapter 3, we further deal with the occurrence of terminal events in estimation. In Chapter 4, a class of simulations to investigate the finite-sample properties of the estimators and the performance of our proposed procedures. We present an application of our methods to the colorectal cancer data retrieved from the Surveillance, Epidemiology, and End Results (SEER) Medicare database in Chapter 5. Finally, a concise discussion and a further research topic are provided in Chapter 6.

(13)

Chapter 2

Estimation and Inference

Procedures without a Terminal Event

2.1 Review of Huang-Louis Estimator

Baed on the data {(Xi, δi), Yi}ni=1, Huang and Louis (1998) proposed a non- parametric estimator FHL(t, u) for F (t, u). Before introducing their estimation method, some concise notations are presented first. Let SXo(t) = 1−F (t, ∞), SX(t) = P (X > t), FXY(t, u) = P (X ≤ t, Y ≤ u, δ = 1), SC(t) = P (C > t), and Λ(t, u) =

t

0SX−1o(s)dsF (s, u) be the cumulative cost-specific hazard function, where

dsF (s, u)

is a Lebesgue-Stieltjes integration over s for fixed u. It was derived that F (t, u) can be expressed as

F (t, u) =

 t

0 SXo(s)dsΛ(s, u) =

 t

0



[0,s)

{1 − dvΛXo(v)}dsΛ(s, u), (2.1.1)

where

denotes the product integral and ΛXo(v) = Λ(v, ∞).

Under the assumption of random censorship (A1: C is independent of (Xo, Yo)),

(14)

one has

SX(t) = SXo(t)SC(t) and dtFXY(t, u) = SC(t){dtF (t, u)}. (2.1.2)

Substituting the empirical estimators FXY(t, u) = n−1n

i=1δiBi(t, u) and SX(t) = n−1n

i=1I(Xi > t) for FXY(t, u) and SX(t), an estimator for Λ(t, u) was proposed by

Λ(t, u) = t

0

dsFXY(s, u)

SX(s) . (2.1.3)

It is straightforward to obtain an estimator from (2.1.1) and (2.1.3) as

FHL(t, u) =

 t

0



[0,s)

{1 − dvXo(v)}dsΛ(s, u), (2.1.4)

with ΛXo(v) = Λ(v, ∞). In the following theorem, we state the uniform consis- tency and asymptotic Gaussian process of FHL(t, u). Further assumptions (A2:

No common jump points in the distributions SXo(t) and SC(t)) and (A3: SC(t) and F (t, u) are absolutely continuous on Ω = {(t, u) : 0 < t ≤ τ, u > 0} with τ = sup{t : SX(t) ≥  > 0} for some  > 0) are made throught the thesis for the main results.

Theorem 2.1.1. Supposed that assumptions (A1)-(A3) are satisfied. Then,

sup

(t,u)∈Ω| FHL(t, u) − F (t, u) |−→ 0,p (2.1.5) and n1/2( FHL(t, u) − F (t, u)) converges weakly to a mean zero Gaussian process with variance-covariance function Γ1(w1, w2) = Cov(Z1(w1), Z1(w2)) for wj = (tj, uj) Ω, j=1,2, where Z1(t, u) =t

0 F (s, u)dsϕ(s, ∞)+t

0 SXo(s)dsϕ(s, u)−F (t, u)ϕ(t, u) and ϕ(t, u) = SX−1(X)δB(t, u) −t

0 SX−2(s)I(X ≥ s)dsFXY(s, u).

(15)

2.2 IPW Estimator

The IPW technique has been widely adopted in estimation for dealing with the biased sample due to censoring. The main idea is to use subjects with available Bi(t, u)’s and weight each observation by the inverse of the selection probability πt = P (Vt = 1|Xo), where Vt = V1t + V2t with V1t = I(X > t) and V2t = I(X ≤ t, δ = 1). Under the validity of random censorship (A1), the selection probability πt

is desired to be V1tSC(t) + V2tSC(Xo). By the property,

E[Vtπt−1(B(t, u) − F (t, u))] = 0 (2.2.1)

and substituting a consistent estimator SC(t) for SC(t), an IPW estimator for F (t, u) can be constructed as

FIP W(t, u) =

n

i=1δiSnC−1(Xio)Bi(t, u)

i=1Vit−1it , (2.2.2) where it = Vi1tSC(t) + Vi2tSC(Xio). Naturally, the Kaplan-Meier estimator is used to estimate SC(t). The asymptotic Gaussian process of n1/2( FIP W(t, u) − F (t, u)) is established below.

Theorem 2.2.1. Suppose that assumptions (A1)-(A3) hold. Then, n1/2( FIP W(t, u)

− F (t, u)) converges weakly to a Gaussian process with mean zero and variance-

covariance function Γ2(w1, w2) = Cov(Zi2(w1), Zi2(w2)), where Zi2(t, u) = E[ψij(t, u)+

ψji(t, u)|((Xi, δi), Yi)] with MCi(t) = I(Xi ≤ t)(1 − δi) +t

0 I(Xi ≥ v)d(lnSC(v)) and ψij(t, u) = { Vi1t

SC(t)(1 + 1 n

 t

0

dMCj(s)

SX(s)) + Vi2t

SC(Xio)(1 + 1 n

 Xio

0

dMCj(s)

SX(s))}(Bi(t, u)−

F (t, u)).

(16)

Proof. From (2.2.2), one has

n1/2( FIP W(t, u) − F (t, u)) = n−1/2n

i=1Vitit−1(Bi(t, u) − F (t, u)) n−1n

i=1Vit−1it . (2.2.3) By the boundness of Vit’s, the uniform convergence of SC(t), and the Euclidean class of {Vitπit−1 : 0 < t ≤ τ } (cf. Akrita (1994), Pakes and Pollard (1989), and Pollard (1990)), it is entailed that n−1n

i=1Vitπit−1 uniformly converges to one. Thus,

n1/2( FIP W(t, u) − F (t, u)) = n−1/2

n i=1

Vit−1it (Bi(t, u) − F (t, u)) + r1n(t, u),

(2.2.4) where sup

(t,u)∈Ω|r1n(t, u)| = op(1). The first order Taylor expansion of the dominating term in (2.2.4) with respect to SC(t) = SC(t) yields that

n−1/2

n i=1

Vitit−1(Bi(t, u) − F (t, u)) = n−1/2

n i=1

Vitπ−1it (Bi(t, u) − F (t, u))−

n−1/2

n i=1

{ Vi1t

SC(t)(SˆC(t)

SC(t) − 1) + Vi2t

SC(Xio)(SˆC(Xio)

SC(Xio) − 1)}(Bi(t, u) − F (t, u)) + r2n(t), (2.2.5) where sup

t∈(0,τ]|r2n(t)| = op(n−1/2). Since{ SC(t)/SC(t) − 1} can be uniformly approx- imated by {−n−1n

i=1

t

0 dMCi(u)/SX(u)} (cf. Fleming and Harrington (1991)).

It follows from (2.2.4)-(2.2.5) that

n1/2( FIP W(t, u) − F (t, u)) = {n1/2(n − 1)}−1 

i=j

ψij(t, u) + r3n(t, u), (2.2.6)

where sup

(t,u)∈Ω|r3n(t, u)| = op(n−1/2). By the Euclidean class ofij(t, u) : (t, u) ∈ Ω}, the decomposition of a U-statistic into the sum of degenerate U-statistics, and Corollary 4 of Sherman (1994), n1/2( FIP W(t, u) − F (t, u)) can be uniformly approx- imated by the term of independent and identically distributed random quantities

(17)

n−1/2n

i=1Zi2(t, u). By the functional central limit theorem (Pollard (1990)) and the uniform convergence of n−1n

i=1{Vitit} to one, the proof for Theorem 2.2.1 is completed.

2.3 Imputation Estimator

Applying the method of Buckley and James (1979), we propose a alternative es- timator based on the considered data setting. In this estimation method, the unavailable statuses Bi(t, u)’s are substituted by the corresponding expectations E[Bi(t, u)|Xi, δi = 0]’s. Let Bi(t, u) = I(Vit= 1)Bi(t, u)+I(Vit= 0)E[Bi(t, u)|Xi, δi = 0]. A direct calculation ensures that

E[Bi(t, u)|Xi, δi] =

1 k=0

E[Bi(t, u)|Xi, δi]I(Vit = k)

=

1 k=0

E[Bi(t, u)|Xi, δi]I(Vit = k) = E[Bi(t, u)|Xi, δi], (2.3.1)

which implies that E[Bi(t, u)] = E[Bi(t, u)] = F (t, u). Under assumption (A1), we further derive that

E[Bi(t, u)|Xi = x, δi = 0] = SX−1o(x){F (t, u) − F (x, u)}. (2.3.2)

Following from (2.3.1)-(2.3.2), an estimation procedure for F (t, u) is proposed. The imputation estimator FIM(t, u) is obtained via solving

n−1

n i=1

(Bi(t, u) − F (t, u))

= n−1

n i=1

{(Bi(t, u) − F (t, u))I(Vit= 1) + (1− SXo(Xi))F (t, u) − F (Xi, u)

SXo(Xi) I(Vit= 0)}

 0, (2.3.3)

(18)

where SXo(t) is the Kaplan-Meier estimator of SXo(t).

Note that F (Xi, u) in (2.3.3) is unknown when Vit = 0. Generally, a consistent estimator F (Xi, u) is substituted for F (Xi, u). Let At be the collection of ordered censoring times C(1) < C(2) < · · · < C(kt) ≤ t with size kt. The joint distribution F (C(1), u) is first estimated by

FIM(C(1), u) = n−1

n i=1

Bi(C(1), u). (2.3.4)

Subsequently, the estimator for F (C(j), u), j=2,· · · , kt, can be obtained as

FIM(C(j), u) =

n

i=1Bi(C(j), u)I(ViC(j) = 1)j−1

l=1 SX−1o(C(l)) F (C(l), u)

n

i=1I(ViC(j) = 1)j−1

l=1 SX−1o(C(l))(1− SXo(C(l))) . (2.3.5) An explicit expression for the estimator of F (t, u) is then derived to be

FIM(t, u) =

n

i=1Bi(t, u)I(Vit = 1)n

i=1SX−1o(Xi) F (Xi, u)I(Vit= 0)

n

i=1I(Vit = 1)n

i=1SX−1o(Xi)(1− SXo(Xi))I(Vit= 0) . (2.3.6) Remark: Since the number of censoring times prior to t is random and might be

considerably large, it becomes cumbersome to develop the inference procedure for F (t, u) based on FIM(t, u). Currently, there is still no existing statistical methodol- ogy to facilitate this obstacle.

2.4 Inferences Procedures

In this subsection, pointwise and simultaneous confidence bands for F (t, u) are con- structed based on the asymptotic Gaussian processes of FHL(t, u) and FIP W(t, u). As shown in Theorem 2.1.1, the limiting process n1/2( FHL(t, u) − F (t, u)) is uniformly asymptotically equivalent to n−1/2n

i=1Zi1(t, u). The variance-covariance function

(19)

of FHL(t, u) is suggested to be estimated by Γ1(w1, w2) = n−1n

i=1Zi1(w1) Zi1(w2), where

Zi1(t, u) =

 t

0

FHL(s, u)ds ϕi(s) +

 t

0

SXo(s)ds ϕi(s, u) − FHL(t, u) ϕi(t, u) (2.4.1)

with ϕi(t, u) = SX−1(XiiBi(t, u) −t

0 SX−2(s)I(Xi ≥ s)ds FXY(s, u). Similarly, the limiting process of n1/2( FIP W(t, u)−F (t, u)) is asymptotically equivalent to n−1/2n

i=1Zi2

(t, u). Let MCj(t) = I(Xj ≤ t)(1 − δj) +t

0 I(Xj ≥ v)d(ln SC(v)). An estimator for the variance-covariance function Γ2(w1, w2) is proposed to be

2(w1, w2) = n−1

n i=1

Zi2(w1) Zi2(w2), (2.4.2)

where Zi2(t, u) = (n − 1)−1

j=i( ψij(t, u) + ψji(t, u)) and ψij(t, u) = (1

n Vi1t

SC(t)

 t

0

d MCj(s) SX(s) + 1

n Vi2t

SC(Xio)

 Xio

0

d MCj(s)

SX(s) )(Bi(t, u) − FIP W(t, u)).

The uniform consistency of Γ1(w1, w2) and Γ2(w1, w2) are established in the following theorem.

Theorem 2.4.1. Supposed that assumptions (A1)-(A3) hold. Then,

sup

w1,w2∈Ω| Γl(w1, w2)− Γl(w1, w2)|−→ 0, as n −→ ∞, l = 1, 2.p (2.4.3) Proof. By the uniform convergence of empirical estimators (Pollard (1990)), we can

show that FXY(t, u) and SX(t) converge to FXY(t, u) and SX(t) uniformly in (t, u).

Together with (2.4.1), Theorem 2.1.1, and the boundness of δiBi(t, u)’s, it is ensured that

sup

(t,u)∈Ω| ϕi(t, u) − ϕi(t, u) |−→ 0 as n −→ ∞, i = 1, · · · , n.p (2.4.4)

(20)

One further derives by the intergration by parts that

sup

(t,u)∈Ω| Zi1(t, u) − Zi1(t, u) |−→ 0 as n −→ ∞, i = 1, · · · , n.p (2.4.5) From (2.4.8), Theorem 2.2.1, and the inequality

| Γ1(w1, w2)− Γ1(w1, w2)| ≤ | Γ1(w1, w2)− n−1n

i=1

Zi1(w1)Zi1(w2)|

+| n−1

n i=1

Zi1(w1)Zi1(w2)− Γ1(w1, w2)|, (2.4.6)

the uniform consistency of Γ1(w1, w2) is obtained.

For the uniform consistency of Γ2(w1, w2), it is implied from (2.4.2) and Theorem 2.2.1 that

2(w1, w2) = n−1

n i=1

Zi2(w1) Zi2(w2)

={n(n − 1)2}−1

i,j,k

( ψij(w1) + ψji(w1))( ψik(w2) + ψki(w2)). (2.4.7)

The expression of Γ2(w1, w2) in (2.4.7) and the uniform convergence of SX(t), SC(t), and FIP W(t, u) entail that Γ2(w1, w2) is uniformly approximated by

{n(n − 1)(n − 2)} 

i=j=k

ij(w1) + ψji(w1))(ψik(w2) + ψki(w2)). (2.4.8)

Sinceij(t, u) : (t, u) ∈ Ω} is Euclidean, the term in (2.4.8) converges to E[(ψij(w1)+

ψji(w1))(ψik(w2) + ψki(w2))] = E[Zi2(w1)Zi2(w2)] uniformly in (w1, w2).

The limiting Gaussian processes in Theorems 2.1.1 and 2.2.1 as well as the esti- mated variance-covariance functions enable us to construct an approximated (1−α) pointwise confidence interval for F (t, u) via either

FHL(w) ± z1−α/21/21 (w, w) or FIP W(w) ± z1−α/21/22 (w, w), (2.4.3)

(21)

where z1−α is the 100(1− α)th percentile of a univariate standard normal distribu- tion and w = (t, u). To draw inference on the pattern of F (t, u), a simultaneous confidence band can also be established based on the quantities:

Kl= sup

w∈Υ| n−1/2n

i=1MiZil(w)

1/2l (w) |, l = 1, 2, (2.4.4)

where Υ is a region of interest within Ω, and {Mi : i = 1, · · · , n} are independent realizations of standard normal variable and are independent of {(Xi, δi), Yi}ni=1. An approximated (1− α) simultaneous confidence band for F (t, u) is then constructed via either

FHL(w) ± n−1/2Q1−α( K1)Γ1/21 (w, w) or FIP W(w) ± n−1/2Q1−α( K2)Γ1/22 (w, w),

(2.4.5) where Q1−α( Kl) is the 100(1− α)th percentile of realizations of (2.4.4) computed based on a large number of generations of{Mi : i = 1, · · · , n}, l = 1, 2.

(22)

Chapter 3

Estimation and Inference

Procedures with Terminal Events

By adapting the IPW method mentioned in Chapter 2 for the appearance of termi- nal events, we use subjects with available Bi(t, u)s and weight each subject with πi = P (δXio = 1|Xio)s. Under the assumption of random censorship (AA1: C is independent of (Xo, Yo, D)) and assumption (AA2: The distribution functions of Xo, C and D do not have jump points in common), πi is derived to be SC(Xio).

Using

E[δXoSC−1(Xo)(B(t, u) − F(t, u))] = 0 (3.1.1)

and replacing SC(t) by a consistent estimator SC(t), F(t, u) is proposed to be estimated by

F(t, u) =

n

i=1δXioSC−1(Xio)Bi(t, u)

n

i=1δXioSC−1(Xio) . (3.1.2) Generally, it is reasonably to estimate SC(t) by the Kaplan-Meier estimator. Based on (3.1.2), the probability P (D > Xo) and the mean cost E[Yo] are naturally

(23)

estimated by

P (D > X o) = n−1

n i=1

δXioSC−1(Xi) and E[Yo] = n−1

n i=1

δXioSC−1(Xio)Yio. (3.1.3)

To establish the asymptotic properties of n1/2( F(t, u) − F(t, u)), the condition

(AA3: SC(t), SD(t), and F(t, u) are absolutely continuous on (0, τ] and Ω = {(t, u) : 0 < t ≤ τ, u > 0 } with τ = sup{t : SX(t) ≥  > 0} for some  > 0. ) is

further made in the following theorem.

Theorem 3.1. Suppose that assumptions (AA1)-(AA3) are satisfied. n1/2( F(t, u)−

F(t, u)) converges weakly to a mean-zero Gaussian process with variance-covariance function Γ(w1, w2) = Cov(Zi(w1), Zi(w2)) for w1, w2 ∈ Ω, where Zi(t, u) = E[ψij(t,

u) + ψji(t, u)|((Xi, δXoi, δCi), Yi)], MCi(t) = δCiI(Xi ≤ t) +t

0 I(Xi ≥ v)d(lnSC(v)) and

ψij(t, u) = δXio

{SC(Xio)P (D > Xo)}(1 + 1 n

 Xo i

0 SX−1(s)dMCj(s))(Bi(t, u) − F(t, u)).

Proof. From (3.1.2), n1/2( F(t, u) − F(t, u)) can be re-expressed as

n−1/2n

i=1δXioSC−1(Xio)(Bi(t, u) − F(t, u)) n−1n

i=1δXioSC−1(Xio) . (3.1.4) By the boundedness of δXio’s and the uniform convergence of SC(t), the denominator term in (3.1.4) is shown to be asymptotically equivalent to P (D > Xo). Thus,

n−1/2( F(t, u) − F(t, u)) = n−1/2n

i=1δXioSC−1(Xio)(Bi(t, u) − F(t, u))

P (D > Xo) + r1n(t, u), (3.1.5)

(24)

where sup

(t,u)∈Ω|r1n(t, u)| = op(1). Applying the Taylor expansion theorem to the nu- merator term in (3.1.5) with respect to SC(t) = SC(t), it can be derived to be

n−1/2

n i=1

δXio

SC(Xio)(Bi(t, u) − F(t, u)) = n−1/2

n i=1

δXio

SC(Xio)(Bi(t, u) − F(t, u))−

n−1/2

n i=1

{ δXio

SC(Xio)(SˆC(Xio)

SC(Xio)− 1)(Bi(t, u) − F(t, u))} + r2n(t), (3.1.6)

where sup

t∈(0,τ]|r2n(t)| = op(n−1/2). Together with (3.1.5) and the proof argument for Theorem 2.2.1, we obtain that

n1/2( F(t, u) − F(t, u)) = {n1/2(n − 1)}−1 

i=j

ψij(t, u) + rn(t), (3.1.7)

where sup

t∈(0,τ]|rn(t)| = op(n−1/2). The asymptotic Gaussian process of F(t, u) is developed.

Let MCi(t) = δCiI(Xi ≤ t) + t

0 I(Xi ≥ v)d(ln SC(v)), i = 1, · · · , n, and SX(t) = n−1n

i=1I(Xi > t). The variance-covariance function Γ(w1, w2) is sug- gested to be estimated by Γ(w1, w2) = n−1n

i=1Zi(w1) Zi(w2) with Zi(t, u) = (n − 1)−1

j=i

( ψij(t, u) + ψji(t, u)) and

ψij(t, u) = δXio{ SC(Xio) P (D > Xo)}−11 n

 Xio

0

d MCj(s)

SX(s)(Bi(t, u) − F(t, u)).

(3.1.8) The uniform consistency of Γ(w1, w2) is given in the following theorem.

Theorem 3.2. Supposed that assumptions (AA1)-(AA3) hold. Then,

w1,wsup2∈Ω| Γ1(w1, w2)− Γ1(w1, w2)|−→ 0, as n −→ ∞.p (3.1.9)

(25)

Proof. By using the techniques in the proof of Theorem 3.1, the uniform convergence

of P (D > Xo) is obtained. Paralleling the argument for the uniform consistency of

2(w1, w2) in Theorem 2.4.1, the uniform consistency of Γ(w1, w2) is developed.

Similarly to the aforementioned procedure, approximated (1− α) pointwise and simultaneous confidence intervals are seperately constructed via

F(w) ± z1−α/2∗1/2(w, w) and F(w) ± n−1/2Q1−α( K)Γ∗1/2(w, w), (3.1.10)

where

K = sup

w∈Υ | n−1/2n

i=1MiZi(w)

∗1/2(w) |

with Υis a region of interest within Ω, and (Mi : i = 1, · · · , n) are independent re- alizations of standard normal variable and are independent of{(Xi, δXio, δCi), Yi}ni=1.

(26)

Chapter 4

Numerical Studies

In this chapter, we conduct two simulation scenarios to investigate the finite sample properties of proposed estimators and the performance of inference procedures. One is for the case of censoring data and the other accommodates the appearance of terminal events. In the simulation process, data are repeatedly generated 500 times with the sample sizes of 200 and 400, and the censoring rates of 30% and 50%. The estimators are evaluated at the selected grid points (t, u), where u takes values of 0.25, 0.5, 0.75, and 1, and t takes values of 0.2231, 0.5108, 0.9163, and 1.6094 with the cumulative probabilities of Xo being 0.2, 0.4, 0.6, and 0.8.

4.1 Simulation Setting of (X

o

, Y

o

)

The pair random vector (Xo, Yo) is specified from the Frank’s bivariate family (Gen- est (1987)), in which

FX(α)oYo(t, u) =

⎧⎪

⎪⎪

⎪⎪

⎪⎩

logα{1 + (αFXo(t)− 1)(αFY o(u)− 1)/(α − 1)}, α = 1

FXo(t)FYo(u), α = 1.

(27)

Simulations are implemented with α = exp(−10), which implies a positive associa- tion between the claiming time and the medical cost.

4.2 Senario I - Without a terminal event

In the section, we examine the finite sample properties of FHL(t, u), FIP W(t, u), and FIM(t, u), and evaluate the inference procedures based on FHL(t, u) and FIP W(t, u).

The censoring time C is independently generated from an exponential distribution with different scale parameters 0.5 and 1 for the expected censoring rates of 30%

and 50%.

Tables 4.1-4.6 exhibit the averages and standard deviations of 500 estimates, and the averages of 500 standard errors based on (2.1.4), (2.2.2), and (2.3.7) at the selected points. We detect that the averages of 500 estimates FIP W(t, u) are more close to F (t, u) than those of FHL(t, u), especially for a higher censoring rate.

Furthermore, FIM(t, u) is found to have larger biases at points of (1.6094, 0.75) and (1.6094, 1.0) when the sample size is small and the censoring rate is high. The biases of these estimators are negligible when the sample size is large enough. The stan- dard deviations of three estimates are almost the same. As expected, the standard deviations decrease with increasing sample size and decreasing censoring rate. It is revealed in these tables that the averages of 500 standard errors of FHL(t, u) and FIP W(t, u) are very close to the standard deviations of their 500 estimates. Note

that the averages of 500 standard errors of FHL(t, u) diverge from the standard de- viations of estimates as the value of time is large, while those of FIP W(t, u) seem

數據

Table 4.1: The averages and the standard deviations (SD) of 500 estimates  F HL (t, u) and the averages of 500 standard errors (SE) at the selected points with the sample sizes (n) of 200 and 400, and the censoring rate of 30%
Table 4.2: The averages and the standard deviations (SD) of 500 estimates F  IP W (t, u) and the averages of 500 standard errors (SE) at the selected points with the sample sizes (n) of 200 and 400, and the censoring rate of 30%
Table 4.3: The averages and standard deviations (SD) of 500 estimates  F IM (t, u) at the selected points with the sample sizes (n) of 200 and 400, and the censoring rate of 30% n 200 400 t u F (t, u) Mean SD Mean SD 0.2231 0.25 0.158 0.158 0.0279 0.158 0
Table 4.4: The averages and standard deviations (SD) of 500 estimates  F HL (t, u) and the averages of 500 standard errors (SE) at the selected points with the sample sizes (n) of 200 and 400, and the censoring rate of 50%
+7

參考文獻

相關文件

Calculate the amortized cost of each operation based on the potential function. Calculate total amortized cost based on

² Stable kernel in a goals hierarchy is used as a basis for establishing the architecture; Goals are organized to form several alternatives based on the types of goals and

We propose two types of estimators of m(x) that improve the multivariate local linear regression estimator b m(x) in terms of reducing the asymptotic conditional variance while

In this paper, we propose a practical numerical method based on the LSM and the truncated SVD to reconstruct the support of the inhomogeneity in the acoustic equation with

We have also discussed the quadratic Jacobi–Davidson method combined with a nonequivalence deflation technique for slightly damped gyroscopic systems based on a computation of

For the data sets used in this thesis we find that F-score performs well when the number of features is large, and for small data the two methods using the gradient of the

This section explains in detail how the function evaluation method based on non- uniform segmentation is used to compute the f and g functions for Gaussian noise generation

By integrating data from a variety of government and commercial sources, we discovered 19,397 potential new commercial properties to inspect, based on the property usage types that