廣義聯立方程模式與廣義路徑分析：存活或事件史資料(I)

(1)

行政院國家科學委員會專題研究計畫成果報告

廣義聯立方程模式與廣義路徑分析：存活或事件史資料(I) 計畫類別：個別型計畫計畫編號： NSC91-2118-M-002-004- 執行期間： 91 年 08 月 01 日至 92 年 07 月 31 日執行單位：國立臺灣大學公共衛生學院流行病學研究所計畫主持人：胡賦強報告類型：精簡報告處理方式：本計畫可公開查詢中華民國 93 年 2 月 18 日

(2)

Draft

Is the Maximum Partial Likelihood Estimator for

Cox’s Proportional Hazards Model

Also a General Least Squares Estimator?

Fu-Chang Hu

,

M.S., Sc.D.1

Assistant Professor

Division of Biostatistics

Graduate Institute of Epidemiology

College of Public Health

National Taiwan University

Taipei, Taiwan 100

R.O.C.

Chyong-Mei Chen

,

M.S.

Doctoral Candidate

Division of Biostatistics

Graduate Institute of Epidemiology

College of Public Health

National Taiwan University

Taipei, Taiwan 100

R.O.C.

June 4, 2003

1Correspondence: Fu-Chang Hu, Division of Biostatistics, Graduate Institute of Epidemiology,

College of Public Health, National Taiwan University, 1 Jen-Ai Road, Section 1, Room 1541, Taipei, Taiwan 100, R.O.C., Phone: +886 2 / 2394-2050 or +886 2 / 2312-3456 × 8349, Fax: +886 2 /

2351-1955, and Email: [email protected]. The first author has also served as Adjunct Assistant Research Fellow in the Department of Psychiatry, National Taiwan University Hospital, since August 1998.

(3)

SUMMARY

The equivalences in estimation between the maximum likelihood approach (e.g., the usual maximum likelihood estimator) and the least squares approach (e.g., the ordinary, weighted, generalized, and iterative reweighted least squares estimators) have been estab-lished for many well-known classes of statistical regression models such as linear regression model, logistic regression model, and generalized linear models (GLMs). However, no such connection has been discovered yet for the maximum partial likelihood estimator (MPLE) of the regression coefficients in Cox’s proportional hazards model (Cox 1972, 1975). In this study, by choosing an appropriate ”moment condition” of generalized method of moments (GMM) estimation, we find that with the ”asymmetric orthogonal expected information ap-proach” of adaptive estimation, the optimal martingale estimating function obtained from the minimization of the corresponding GMM quadratic form for a consistent estimator of the regression coefficients reduces to the partial score function of the Cox’s proportional hazards model, which implies that the well-behaved MPLE is also a general least squares estimator. This finding is not only very interesting in its own rights, but it provides us with an oppor-tunity to develop GLMs-type regression models locally for stochastic processes and to apply some powerful GMM-related estimating techniques such as the instrumental variables method to deal with several known statistical modeling problems including measurement error and simultaneous-equations bias in analysis of survival or time-to-event data.

Key words

:

Partial score function; MPLE; Moment conditions; Generalized method of moments; GMM; Estimating functions; Martingales; Nuisance parameters; Adaptive estimation.

(4)

1 INTRODUCTION

1.1 Motivation

The equivalences in estimation between the maximum likelihood approach, e.g., the usual maximum likelihood estimator (MLE), and the least squares approach, e.g., the or-dinary, weighted, generalized, and iterative reweighted least squares (OLS, WLS, GLS, and IRLS) estimators, have been established for many well-known classes of statistical regres-sion models such as linear regresregres-sion model, logistic regresregres-sion model, and generalized linear models (GLMs) (Nelder and Wedderburn 1972, Wedderburn 1974, McCullagh and Nelder 1989). Godambe and Kale (1991) gave a nice discussion on this issue from the viewpoint of estimating functions. However, to the best of our knowledge, no such connection has been discovered yet for the maximum partial likelihood estimator (MPLE) of the regression coefficients in Cox’s proportional hazards model (Cox 1972, 1975).

1.2 Objectives

In this study, by choosing an appropriate ”moment condition” ofgeneralized method of moments (GMM) estimation (see, e.g., Greene 2000, pp. 474-488), we find that with an aid fromadaptive estimationto deal with the nuisance parameter, the optimalmartingale estimat-ing function (m.e.f.) obtained from the minimization of the corresponding GMM quadratic form for a consistent estimator of the regression coefficients reduces to the usual partial score function of the Cox’s proportional hazards model, which implies that the well-behaved MPLE is also a general least squares estimator. In the derivations, we will show (1) how the

(6)

opti-mal m.e.f. for estimating the regression coefficients can be obtained from a GMM quadratic form and (2) how the baseline hazard function as the nuisance parameter can be purposely eliminated to yield an adaptive estimation of the regression coefficients. Finally, we discuss some interesting properties and extensions associated with this class of m.e.f.’s.

2 REVIEW

2.1 Cox’s Proportional Hazards Model

The semi-parametric Cox’s proportional hazards model for modeling the hazard func-tion λ(t | zi(t)) by a set of covariate values zi(t) is

λ(t | zi(t)) = λ0(t)eβ

0

zi(t) (1)

whereλ0(t)is an unspecified baseline hazard function,βis a vector ofpunknown regression

coefficients, and exp{β0zi(t)} > 0 is a risk multiplier (Cox 1972). It is by far the most

popular regression model in survival analysis. Under the assumption of independent censoring (conditioning on the covariates zi(t)), the MPLE of β can be obtained by maximizing the

partial likelihood function for β (Cox 1975), in which the infinite dimensional nuisance

parameter λ0(t) is nicely eliminated due to the unique structure of the proportional hazards

model. The asymptotic properties of the MPLEβb was studied by Tsiatis (1981) (using the

standard classical approach) and by Andersen and Gill (1982) and Gill (1984) (using the modern counting process martingale approach) respectively. And, as shown by Begun, et. al. (1983), one important property of the MPLEβb is that without knowing λ0(t), it achieves the

(7)

risk multipliers was investigated by Prentice and Self (1983). There were many theoretical and methodological research works on Cox’s proportional hazards model in the past thirty years or so. However, as far as we know, no one has specifically answered the question about whether the MPLE βb is equivalent to any least squares-type estimators.

2.2 Theory of Estimating Functions

On the other hand, estimating function is a flexible and robust way to estimating the parameters of interest in many general regression settings. Godambe (1960) might be the first one to formally study the properties of estimating functions. The theory of estimating functions for discrete-time stochastic processes was discussed by Godambe (1985) and extended to the continuous-time cases by Thavanewswaran and Thompson (1986). And, Godambe and Heyde (1987) and Heyde (1997) gave nice reviews of the important results in this area. Greenwood and Wefelmeyer (1991) discussed optimal m.e.f.’s for partially specified counting process models based on various optimality criteria. In particular, Chang and Hsiung (1990) studied the finite sample optimality of MPLE βb for Cox’s proportional hazards model within the

framework of estimating functions, which was quite different from the usual partial likelihood approach and thus gave us a good insight into the problem.

3 MAIN RESULTS

Suppose that there are n independent subjects indexed byiin a biomedical study. Let Ti and Ci be the event time and the right censoring time respectively for subject i. Due

(8)

∆i = I{Ti ≤ Ci}. And, zi is a (p × 1) vector containing the observed values of pcovariates

Zi, which can be time-dependent in general cases, for each subject i. As usual, we assume

independent censoring conditioning on the covariates, i.e., (Ti ⊥ Ci) | zi, for all subjects.

Next, we define a counting process Ni(t) = I{Xi ≤ t, ∆i = 1}, an status indicator

κi(t) = I{Xi ≥ t}, and a right continuousfiltration (or ”history”){F (t) : t ≥ 0}, where

F (t) = σ{Ni(u), κi(u+), Zi(u) : 0 ≤ u ≤ t andi = 1, 2, · · · , n}

is the smallest σ-algebra (or contains the information) generated by Ni(u), κi(u+), and

Zi(u)inside the bracket. Then, given a correctly specified Cox’s proportional hazards model

λ(t | zi) = λ0(t) exp{β

0

zi}, the intensity process for an increment of Ni(t) is

E{dNi(t) | F (t−)} = κi(t)λ0(t)eβ 0 zi_dt. Thus, dMi(t) = dNi(t) − κi(t)λ0(t)eβ 0 zi_dt

is an increment of the martingale Mi(t) associated with the above filtration {F (t), t ≥ 0}.

Finally, we shall denote a bounded predictable processwith respective to the same filtration

{F (t), t ≥ 0}by Hi(t; β).

3.1 Generalized Method of Moments Estimation

A natural choice of the moment condition for obtaining a GMM estimatorβe of the

regression coefficients β in Cox’s proportional hazards model (Eq. (1)) is to usedMi(t)(for

i = 1, 2, · · · , n), which are usually calledmartingale residualsafter being properly estimated

(9)

empirical information from which we can possibly get the right estimates of the unknown parameters over n subjects. Any other moment conditions (e.g., the conditions from higher

moments), if available, can also be incorporated into the framework of the GMM estimation by stacking them together.

SinceMi(t) is an F (t)-martingale, we have

V ar{dNi(t) | F (t−)} = V ar{dMi(t) | F (t−)} = κi(t)λ0(t)eβ

0

zi_dt

which is denoted as d < M >i (t), so that locally at timet, dNi(t) | F (t−)behaves just like

aPoissonrandom variable (McCullagh and Nelder 1989, Sec. 6.4, pp. 209-214, Fleming and Harrington 1991, p. 87, Andersen, et. al. 1993, pp. 54-55, 223, and 482). Then, following the standard procedure for obtaining a GMM estimatorβe (based on theL2-norm), we construct

the following quadratic form from the chosen moment condition

QT(β) = n X i=1 Z T 0 [dMi(u)] 2 κi(u)λ0(u)eβ 0 zidu (2)

where T is a stopping time.

3.2 The Estimating Functions

However, direct minimization of the above quadratic formQT(β) would usually yield

an inconsistent estimator of the unknown parameter β because the conditional variance of dNi(t) given F (t−), i.e., V ar(dNi(t) | F (t−)), contains β (see, e.g., Greene 2000, p. 477).

Thus, to derive an unbiased estimating function (e.f.) from the GMM quadratic form for a consistent estimation of β, we suggest making the following remedy. As discussed by

(10)

(1991, Sec. 11.3, esp., Eq. (3.2), pp. 269-271) in similar settings, we should avoid taking the derivative ofκi(u)λ0(u) exp{β

0

zi}duin the denominator with respective to βto preserve

∂QT(β)/∂β to have a zero mean (at the true value of β). Hence, the remedy is to treat the

β inside the weight function h

κi(u)λ0(u) exp{β

0 zi}du

i−1

as if it were known in taking the derivative of the quadratic form QT(β) with respective to β.

Then, although there exists another unknown parameterλ0(t), we obtain the following

(p×1) e.f.UT(β)from the GMM quadratic formQT(β)(Eq. (2)) for estimating the unknown

parameter of interest β UT(β) ≡ n X i=1 Z T 0 ∂ [dMi(u)] ∂β h κi(u)λ0(u)eβ 0 zi_du i−1 dMi(u) = n X i=1 Z T 0 d ˙Mi(u) h κi(u)λ0(u)eβ 0 zi_du i−1 dMi(u) = − n X i=1 Z T 0 ∂ ∂β h κi(u)λ0(u)eβ 0 zi_du i h κi(u)λ0(u)eβ 0 zi_du i−1 dMi(u) = − n X i=1 Z T 0 zidMi(u) (3)

which, conditioning on the filtration F (t−)locally (at any given time t ≤ T), has exactly the

same form as the optimal e.f. (i.e., the quasi-score function) of the GLM for the conditional mean of a Poisson-like response,E{dNi(t) | F (t−)}, with the canonical link function (log)

(McCullagh and Nelder 1989, Eq. (9.5), p. 327) and as an optimal m.e.f. too (Desmond 1991, p. 143, Greenwood and Wefelmeyer 1991, p. 153). In this e.f.,

Hi(t; β) ≡ zi

is a (p × 1) bounded predictable process with respective to the same filtration {F (t), t ≥ 0}.

(11)

It can be seen now that the ”orthogonality” between the derivatives of the intensity process κi(u)λ0(u) exp{β

0

zi}du with respective to β and the martingale residuals dMi(u)

is the key to obtaining a good estimate of β. This fact may imply that orthogonality is

also a more fundamental property than minimization(or maximization) required in statistical estimation in the dependent case.

3.3 The Nuisance Parameter Problem

However, the unspecified infinite dimensional nuisance parameter λ0(t) sitting inside

the intensity process κi(u)λ0(u) exp{β

0

zi}du of dMi(u) in the e.f. UT(β) (Eq. (3)) causes

a major difficulty in the estimation ofβ. Even though a consistent estimator for λ0(t), eλ0(t)

say, may be available, the asymptotic properties of the resulting GMM estimatorβe would

generally depend oneλ0(t). Nonetheless, following the idea of Williams (1982) and Moore

(1986) developed in a different setup, we find that one plausible way to get around this nuisance parameter problem is to search for a suitable transformation on zi in the crude e.f.

UT(β) to derive an e.f. U?T(β) which satisfies the following key condition for an adaptive

estimation of β E ∂U? T(β) ∂λ0(t) = 0. (4)

The rationale behind this approach is that if the expectation of the cross derivative of the e.f.

U?T(β) for β with respective to λ0(t) is zero, then the inverse of the joint expected

quasi-information matrix for (β, λ0(t)) has a zero in its right-top corner so that the asymptotic

distribution of βe?, which is the root of the derived e.f. U?

T(β), would not depend on eλ0(t)

(12)

To this end, we replace the zi in the crude e.f. UT(β) (Eq. (3)) with the following

(p × 1) boundedF (t)-predictable process

H?i(t; β) ≡ zi − ¯z(t) = zi − Pn j=1κj(t)eβ 0 zj_z j Pn j=1κj(t)eβ 0 zj (5)

where ¯z(t) is an empirical weighted mean of zj’s at time t ≤ T. Certainly, H?i(t; β) itself

should not contain the nuisance parameter λ0(t). Then, the derived e.f. for β is

U?T(β) = n X i=1 Z T 0 H?i(u; β)dMi(u) = n X i=1 Z T 0 (zi−¯z(u)) dMi(u). (6)

In taking derivatives of the above e.f.’s forβwith respective to the infinite dimensional

parameterλ0(t), we develop a three-step approach to avoid the potential technical complexity:

1. At first, we assume aconstant baseline hazard function λ0 (for allt ∈ [0, T ]).

2. Next, we consider apiecewise constantbaseline hazard functionλ0(k)(for1 ≤ k ≤ K),

where K is the total number of small time intervals within [0, T ].

3. Finally, we extend the result to ageneralbaseline hazard functionλ0(t) (fort ∈ [0, T ])

by lettingK → ∞.

Now, assuming a constant baseline hazard function λ0(t) = λ0 (for all t ∈ [0, T ]), we

first check E ∂UT(β) ∂λ0 = E " ∂Pn_i=1R₀T zidMi(u) ∂λ0 # = E " ∂Pn_i=1R₀T zidNi(u) ∂λ0 # − E " ∂Pn_i=1R₀Tziκi(u)λ0eβ 0 zi_du ∂λ0 # = −E " _n X i=1 Z T 0 ziκi(u)eβ 0 zi_du # 6= 0

(13)

unless it is properly centered. Thus, even in the simplest case, the above-mentioned key condition does nothold for the crude e.f. UT(β).

In contrast, assuming a constant baseline hazard functionλ0(t) = λ0 (for allt ∈ [0, T ])

again, we have E ∂U? T(β) ∂λ0 = E "

∂Pn_i=1R₀T (zi −z(u)) dM¯ i(u)

∂λ0 # = −E   n X i=1 Z T 0  zi − Pn j=1κj(u)eβ 0 zj_z j Pn j=1κj(u)eβ 0 zj   κi(u)eβ 0 zi_du   = E Z T 0   n X i=1   Pn j=1κj(u)eβ 0 zj_z j Pn j=1κj(u)eβ 0 zj   κi(u)eβ 0 zi ₋ n X i=1 ziκi(u)eβ 0 zi   du = E Z T 0     Pn j=1κj(u)eβ 0 zj_z j Pn j=1κj(u)eβ 0 zj   n X i=1 κi(u)eβ 0 zi ₋ n X i=1 ziκi(u)eβ 0 zi   du = E Z T 0 " _n X j=1 κj(u)eβ 0 zj_z j − n X i=1 ziκi(u)eβ 0 zi # du = 0.

By the same token, considering a piecewise constant baseline hazard function λ0(t) = λ0(k)

(for 1 ≤ k ≤ K), we obtain the same result for the derived e.f. U?

T(β). Finally, to extend

this result to any general baseline hazard functions λ0(t) (for t ∈ [0, T ]), we let K → ∞.

Notice that as just shown in the case of Cox’s proportional hazards model, the nice algebraic structure which makes the key condition to be true for the preferred e.f. U?

T(β) is n X i=1 Z T 0 (zi−¯z(u)) κi(u)eβ 0 zi_{du =} Z T 0 "_Xn i=1 (zi−¯z(u)) κi(u)eβ 0 zi # du = 0 (7)

which is exactly the same crucial equality required for the martingale representation of the partial score function ofβin the partial likelihood approach (see, e.g., Fleming and Harrington

(14)

1991, p. 150). And, it can be seen clearly thatcenteringthe covariate values by their empirical weighted means at timet(Eq. (5)) provides us with a solution to the failure of the key condition

for adaptive estimation in the crude e.f. UT(β).

In fact, according to this equality (Eq. (7)), the proposed e.f. U?T(β) (Eq. (6)) reduces

to a rather simple form

U?_T(β) = n X i=1 Z T 0 (zi−¯z(u)) dMi(u) = n X i=1 Z T 0 (zi−¯z(u)) dNi(u) − n X i=1 Z T 0

(zi−¯z(u)) κi(u)λ0(u)eβ

0 zi_du = n X i=1 Z T 0 (zi−¯z(u)) dNi(u) − Z T 0 λ0(u) " _n X i=1 (zi−¯z(u)) κi(u)eβ 0 zi # du = n X i=1 Z T 0 (zi−¯z(u)) dNi(u) (8)

which doesnotcontain the nuisance parameterλ0(t)(see, e.g., Fleming and Harrington 1991,

Eq. (3.24), p. 149). It becomes too simple to see that it is actually a least squares-type normal equation. In this case, the equality (Eq. (7)) helps us remove the infinite dimensional nuisance parameter λ0(t) from the derived e.f. U?T(β) completely so that the estimation of β from it

does not depend on λ0(t) at all.

Finally, we must point out that the above e.f. U?

T(β) (esp., Eq. (8)) is identical to

the partial score function derived from the partial likelihood function (based on multinomial

probabilities at each event timet) for obtaining the MPLE of βin Cox’s proportional hazards

model (see, e.g., Fleming and Harrington 1991, Eqs. (2.4), (2.5), (3.24), and (3.25), pp. 12-13 and 149-150, Andersen, et. al. 1993, pp. 105 and 483-487). Hence, the root βe? of the

proposed e.f. U?

(15)

3.4 Properties

To examine the unbiasedness properties of the crude e.f. UT(β) (Eq. (3)) and the

derived e.f. U?

T(β) (Eqs. (6) and (8)), we first compute

E{UT(β)} = E " _n X i=1 Z T 0 zidMi(u) # = n X i=1 E Z T 0 ziE (dMi(u) | F (u−)) = n X i=1 E Z T 0 zi h

E (dNi(u) | F (u−)) − κi(u)λ0(u)eβ

0

zi_du

i

= 0

since zi is bounded and predictable given the filtration F (u−). By the same token, we have

E{U?T(β)} = E " _n X i=1 Z T 0 (zi−¯z(u)) dMi(u) # = 0.

Equivalently, according to the Theorem 1.5.1 of Fleming and Harrington (1991, pp. 46-47 and 49), both e.f.’s UT(β) and U?T(β) are m.e.f.’s.

For any vector v, let the outer product vv0 be denoted as v⊗2. Then, assuming that

the cumulative baseline hazard function Λ0(t), where Λ0(t) =

Rt

0 λ0(u)du, is continuous (if

not, then an alternative assumption is needed), we obtain the (p × p) predictable covariation

process of the m.e.f. U?T(β), which is the compensator of [U ? T(β)] ⊗2, that < U?(β) >T = n X i=1 < U?_i(β) >T + X i6=j < U?_i(β), U?_j(β) >T = n X i=1 Z T 0 [zi−z(u)]¯ ⊗2 κi(u)λ0(u)eβ 0 zi_du (9)

(16)

= 0, for all i 6= j (Fleming and Harrington 1991, pp. 42, 81-82, 86-87, and 150, Andersen,

et. al. 1993, pp. 69 and 74).

On the other hand, since the bounded F (t)-predictable process H?

i(t; β) is a function

of β, the (p × p) derivative processof the m.e.f. U?T(β) with respective to β (based on Eq.

(6)), denoted as_U˙?

T(β), is a sum of two terms

˙ U?T(β) ≡ ∂ ∂β "_Xn i=1 Z T 0 H?i(u; β)dMi(u) # = n X i=1 Z T 0 ˙ H?_i(u; β)dMi(u) + n X i=1 Z T 0 H?_i(u; β)d ˙Mi(u) (10)

where the ”dot” put on the tops ofH?

i(u; β)anddMi(u)respectively indicates the derivatives

with respective toβ. SinceH˙?

i(u; β)is also a boundedF (t)-predictable process, the first term

on the right-hand side of Eq. (10) is a martingale (Fleming and Harrington 1991, Theorem 1.5.1, pp. 46-47 and 49). Thus, the second term on the right-hand side of Eq. (10)

n X i=1 Z T 0 H?i(u; β)d ˙Mi(u) = − n X i=1 Z T 0 (zi−¯z(u)) z 0 iκi(u)λ0(u)eβ 0 zi_du (11)

is the compensator of the derivative process_U˙?

T(β). However, as shown below, Eqs. (11)

and (9) are closely related:

− " _n X i=1 Z T 0 H?_i(u; β)d ˙Mi(u) #0 − < U?(β) >T = n X i=1 Z T 0 zi(zi−¯z(u)) 0 κi(u)λ0(u)eβ 0 zi_{du −} n X i=1 Z T 0 (zi−¯z(u)) (zi−z(u))¯ 0 κi(u)λ0(u)eβ 0 zi_du = Z T 0 ¯ z(u) " _n X i=1 (zi−¯z(u)) 0 κi(u)eβ 0 zi # λ0(u)du = 0

(17)

due to the equality of Eq. (7). Hence, a version of the so-called quasi-information equality

holds for the derived m.e.f. U?_T(β) in the estimation of β alone (see, e.g., Andersen, et. al.

1993, pp. 102-103).

Alternatively, by computing the (p × p)derivative processU˙?T(β) of Eq. (10) directly

from Eq. (8), we have

˙ U?_T(β) ≡ ∂ ∂β " _n X i=1 Z T 0 H?_i(t; β)dNi(u) # = − n X i=1 Z T 0 ∂ ∂β   Pn j=1κj(u)eβ 0 zj_z j Pn j=1κj(u)eβ 0 zj   dNi(u) = − n X i=1 Z T 0   Pn j=1κj(u)eβ 0 zj_[z j] ⊗2 Pn j=1κj(u)eβ 0 zj   −   Pn j=1κj(u)eβ 0 zj_z j Pn j=1κj(u)eβ 0 zj   ⊗2 dNi(u) = − n X i=1 Z T 0   Pn j=1κj(u)eβ 0 zj_[z j] ⊗2 Pn j=1κj(u)eβ 0 zj   − [¯z(u)]⊗2 dNi(u). (12) Since Pn j=1κj(u)eβ 0 zj_[z j−¯z(u)] ⊗2 Pn j=1κj(u)eβ 0 zj = Pn j=1κj(u)eβ 0 zj_[z j] ⊗2 −zj¯z 0

(u) − ¯z(u)z0j+ [¯z(u)] ⊗2 Pn j=1κj(u)eβ 0 zj and Pn j=1κj(u)eβ 0 zj_z j¯z 0 (u) Pn j=1κj(u)eβ 0 zj = nPn j=1κj(u)eβ 0 zj_z j o ¯ z0(u) Pn j=1κj(u)eβ 0 zj = [¯z(u)]⊗2

we obtain from Eq. (12) that

˙ U?_T(β) = − n X i=1 Z T 0   Pn j=1κj(u)eβ 0 zj_[z j−¯z(u)] ⊗2 Pn j=1κj(u)eβ 0 zj   dNi(u) (13)

where the thing inside the big parentheses is an empirical weighted mean of the covariances of zj’s at time u. Again, according to the fact that the first term on the right-hand side of

Eq. (10) is a martingale, the compensator of the derivative process_U˙?

(18)

is simply − n X i=1 Z T 0   Pn j=1κj(u)eβ 0 zj_[z j−¯z(u)] ⊗2 Pn j=1κj(u)eβ 0 zj   κi(u)λ0(u)eβ 0 zi_du. (14)

Then, as shown below, Eqs. (14) and (9) are also closely related:

n X i=1 Z T 0   Pn j=1κj(u)eβ 0 zj_[z j−¯z(u)] ⊗2 Pn j=1κj(u)eβ 0 zj   κi(u)λ0(u)eβ 0 zi_du − n X i=1 Z T 0 [zi−z(u)]¯ ⊗2 κi(u)λ0(u)eβ 0 zi_du = Z T 0   n X i=1   Pn j=1κj(u)eβ 0 zj_[z j −¯z(u)] ⊗2 Pn j=1κj(u)eβ 0 zj   κi(u)eβ 0 zi − n X i=1 [zi−¯z(u)] ⊗2 κi(u)eβ 0 zi # λ0(u)du = Z T 0     Pn j=1κj(u)eβ 0 zj_[z j −¯z(u)] ⊗2 Pn j=1κj(u)eβ 0 zj   n X i=1 κi(u)eβ 0 zi − n X i=1 [zi−¯z(u)] ⊗2 κi(u)eβ 0 zi # λ0(u)du = Z T 0 " _n X j=1 κj(u)eβ 0 zj_[z j −¯z(u)] ⊗2 − n X i=1 [zi−¯z(u)] ⊗2 κi(u)eβ 0 zi # λ0(u)du = 0

which implies that −E[ ˙U?T(β)] = E

[U?T(β)]

⊗2 . And, analogous to Eq. (7), we get n X i=1 Z T 0  [zi−¯z(u)] ⊗2 − Pn j=1κj(u)eβ 0 zj_[z j −¯z(u)] ⊗2 Pn j=1κj(u)eβ 0 zj   κi(u)eβ 0 zi_{du = 0.} (15)

Hence, an equivalent version of the quasi-information equality also holds for the derived m.e.f. U?

T(β) in the estimation of β alone (see, e.g., Fleming and Harrington 1991, Eq.

(19)

Finally, Eq. (15) may suggest that n X i=1 Z T 0  [zi−¯z(u)] ⊗2 − Pn j=1κj(u)eβ 0 zj_[z j −¯z(u)] ⊗2 Pn j=1κj(u)eβ 0 zj   dNi(u) = 0 (16)

be used as the basis for constructing an information equality-type model specification test, as long as the thing inside the big parentheses is bounded and F (t)-predictable, since Eq. (16),

like Eq. (8), does not involve the nuisance parameter λ0(t) at all.

3.5 Some Extensions

In this study, we use only one moment conditiondMi(t)for estimating the pregression

coefficients β in Cox’s proportional hazards model (Eq. (1)). However, when there is more

information available to be used as moment conditions in some cases, the GMM approach actually provides us with a natural way to combine the information aboutβfrom all available

moment conditions by constructing a properly stacked quadratic form (see, e.g., Greene 2000, Secs. 11.5-11.6, pp. 474-496). Thus, the most efficient estimate ofβ can be obtained in such

a systematic way. And, by doing so, the resulting m.e.f. for β would still consist of exact p

equations although there are more than one moment condition used in the estimation of β.

Moreover, in view of the algebraic structure in the equality of Eq. (7), we may take more general forms of the (p × 1) boundedF (t)-predictable process thanH?i(t; β) ≡ zi−¯z(t)

for estimating β in Cox’s proportional hazards model or other similar regression models for

hazard functions. For example, lettingh1(t), h2(t), · · · , hn(t) be any (p × 1) bounded F (t)

-predictable processes and g1(·), g2(·), · · · , gn(·) be known deterministic functions, we can

choose H??_i (t; β) ≡ gi(e β0zi(t)_)h i(t) eβ0zi(t) − Pn j=1κj(t)gj(eβ 0 zj(t)_)h j(t) Pn j=1κj(t)eβ 0 zj(t)

(20)

which does not contain the nuisance parameter λ0(t) and holds the same kind of equality as

Eq. (7) by direct calculations. When gi(exp{β

0

zi(t)}) = exp{β

0

zi} and hi(t) = zi for all

subjects, it reduces to H? i(t; β).

Finally, as proposed by Prentice and Self (1983), we can also extend the above-derived results for Cox’s proportional hazards model to the following more general class of regression models for hazard functions

λ(t | zi(t)) = λ0(t)r(β

0 zi(t))

in which the risk multiplier r(·) > 0 is a twice differentiable known function of the linear

combination β0zi(t). In this setting, the log partial likelihood function to be maximized for

obtaining the MPLE of β is LT(β) = n X i=1 Z T 0 ( log h r(β0zi(u)) i − log " _n X j=1 κj(u)r(β 0 zj(u)) #) dNi(u)

and thus the partial score function for estimating β becomes ∂LT(β) ∂β = n X i=1 Z T 0 ˙r(β0zi(u))zi(u) r(β0zi(u)) − Pn j=1κj(u) ˙r(β 0 zj(u))zj(u) Pn j=1κj(u)r(β 0 zj(u)) ! dNi(u)

which can also be derived as an m.e.f. for obtaining an adaptive GMM estimator of β as

long as the thing inside the big parentheses is bounded and F (t)-predictable.

4 DISCUSSION

4.1 Summary

In brief, to estimate the unknown regression coefficientsβ in Cox’s proportional

(21)

mo-ment conditiondMi(t)and the corresponding quadratic form QT(β)(Eq. (2)) from the GMM

estimation together as a useful device to constructing a class of m.e.f.’s for estimatingβ

con-sistently. In particular, by ignoring purposely the β inside the weight function of QT(β) as

if it were known in the minimization of QT(β), we maintain the unbiasedness of the crude

m.e.f. UT(β) (Eq. (3)).

Next, we derive the proposed m.e.f.U?

T(β)(Eq. (6)), which satisfies the key condition

for adaptive estimation that E [∂U?T(β)/∂λ0(t)] = 0 (Eq. (4)), to get rid of the impact of

the unknown nuisance parameterλ0(t)on the asymptotic distribution of its rootβe ?

, although we need to plug a consistent estimator of λ0(t) into it directly or iteratively in the estimation

of β. In fact, it happens in the case of Cox’s proportional hazards model that by doing

so, we get rid of the unknown nuisance parameter λ0(t) completely from the derived m.e.f.

U?

T(β) (Eq. (6)) due to the equality of Eq. (7), and thus the estimation of β using the

derived m.e.f. U?T(β) in its reduced form (Eq. (8)) does not depend on λ0(t) at all. Notice

that under regularity conditions, this asymmetric orthogonal expected information approach

to dealing with nuisance parameters is valid for the unbiased joint e.f.’s for the parameters of interest and the nuisance parameters with either a symmetric or non-symmetric expected joint derivative matrix (see Appendix) so that it is more general than the usual orthogonal parameter approach, which requests orthogonality (i.e., zero covariances) between the two sets of unbiased e.f.’s for the parameters of interest and the nuisance parameters respectively (Chang and Hsiung 1990, 1991, Small and McLeish 1994, Sec. 5.2, pp. 109-110, Pagan and Ullah 1999, Sec. 5.4, pp. 217-225), for adaptive estimation.

Since the partial score function for estimatingβ in Cox’s proportional hazards model

can be represented as an adaptive GMM m.e.f. U?

(22)

has a GMM interpretation, and thus it can be considered as a general least squares estimator of β. Then, with this result in mind, we may rewrite the Cox’s proportional hazards model

in the form of a nonlinear regression model: For i = 1, 2, · · · , n,

E[dNi | F (t−)] = λ0(t)eβ

0

zi_dt

log (E[dNi | F (t−)]) = log(λ0(t))dt + β

0 zi or equivalently dNi | F (t−) = λ0(t)eβ 0 zi_{dt + ς} i(t)dt Ni = Z Xi 0 λ0(t)eβ 0 zi_{dt +} Z Xi 0 ςi(t)dt = Z Xi 0 λ0(t)eβ 0 zi_{dt +} i(Xi)

where Xi = Ti∧ Ci = min (Ti, Ci) and 0 ≤ t ≤ Xi.

4.2 Future Work

Given the moment condition dMi(t) for estimating the regression coefficients β in

Cox’s proportional hazards model (Eq. (1)), we have learned from this study the important roles of zi and¯z(t)in the optimal m.e.f. U?T(β)(Eqs. (6) and (8)) respectively: (1) zi comes

from the derivatives ∂ [dMi(t)]/∂β, and thus it is expected to be orthogonal todMi(t) given

the unbiasedness of the m.e.f.; and (2) centering zi by ¯z(t) helps us get rid of the nuisance

parameterλ0(t). Since the statistical methodologies adopted in this study including the GMM

estimation method, theory of e.f.’s, adaptive estimation, and martingales have not yet exerted their full powers in our case, the finding of this study is not only very interesting in its own rights, but it provides us with an opportunity to develop GLMs-type regression models locally

(23)

(at each time t) for more general stochastic processes and to apply some powerful

GMM-related estimating techniques such as the instrumental variables (IV) method for nonlinear equations to deal with several known statistical modeling problems in analysis of survival or time-to-event data.

In particular, Bowden and Turkington (1984, esp., Sec. 1.2, pp. 10-16), Greene (2000, Sec. 9.5 and Subsecs. 10.3.2, 11.5.5, 13.7.3, and 16.5.2, pp. 370-387, 430-438, 483-488, 550-552, and 680-690), and Newey (1990) among others gave nice reviews and discussions on the IV estimation method and its applications in econometric models. In the linear equation settings, the IV method has proved its power in solving the estimation problems occurring when some of the covariates in the equation arecorrelatedwith equation’s error. Thus, it can be used to deal with four kinds of statistical modeling problems: (1) the error-in-variable (or measurement error) problem, (2) the self-selection problem, (3) the simultaneous-equations bias, and (4) the time series problem (Bowden and Turkington 1984, pp. 3-10). Then, based on the result of this study, we are specifically interested in developing IV estimators for the regression coefficients β in Cox’s proportional hazards model to deal with the measurement

error problem and simultaneous-equations bias respectively. It is our understanding that when these problems arise in Cox’s proportional hazards model, some of the covariates zi would

notbe orthogonal todMi(t). And, for this purpose, our IV estimation method for GLMs (Hu,

et. al. 2002) may be applied to Cox’s proportional hazards model locally (at each timet).

5 ACKNOWLEDGEMENTS

(24)

author’s advice. And, this research was financially supported by the National Science Council of the Republic of China (NSC 91-2118-M-002-004).

6 APPENDIX

We shall give a sketch of the rationale behind the proposed asymmetric orthogonal expected information approach to dealing with the nuisance parameter λ0(t) for an adaptive

estimation of the interested parametersβ. Yet, this general approach can possibly be applied to

many different settings. Without loss of generality, we assume thatλ0(t)can be parameterized

by a finite number of parameters for simplicity. Let the estimating functions for β and λ0(t)

be denoted as U?

β(β, λ0(t)) and Uλ0(β, λ0(t)) respectively, of which both are functions of

(β, λ0(t)). And, given a random sample of size n, we assume that both U?_β(β, λ0(t)) and

Uλ0(β, λ0(t)) are unbiased, the solutions(eβ

?

, eλ0(t)) to these two estimating equations exist,

and they are consistent estimates of (β, λ0(t)).

Thekey conditionrequired for adaptively estimating β (see Eq. (4) in the text) is that

at the true values of β, E h I?_βλ₀ i = − E _∂U? β(β, λ0(t)) ∂λ0(t) = 0.

Then, symbolically, the expected values of the minus mixed derivatives of the joint estimating functions U?

β(β, λ0(t)) and Uλ0(β, λ0(t)) with respect to the parameters(β, λ0(t))are

E     I? ββ I?βλ0 Iλ0β Iλ0λ0     =     i? ββ 0 iλ0β iλ0λ0    

(25)

partitioned matrix, it is straightforward to show that     i? ββ 0 iλ0β iλ0λ0     −1 =     (i?_ββ)−1 0 iλ0β _(i λ0λ0) −1     where iλ0β _{6= (i} λ0β)

−1 and it is in a complicated form (see, e.g., Khuri 1993, p. 34).

Now, expanding the joint estimating functions(U?

β(β, λ0(t)), Uλ0(β, λ0(t)))about the

true values (β•, λ•

0(t)) of the parameters (β, λ0(t)) by themean value theorem yields

    U? β(eβ ? , eλ0(t)) Uλ0(eβ ? , eλ0(t))    =     U? β(β•, λ•0(t)) Uλ0(β • , λ• 0(t))    −     I? ββ I?βλ0 Iλ0β Iλ0λ0     β∗_,λ∗ 0(t)     e β? − β• eλ0(t) − λ•0(t)    .

Then, under the regularity conditions,

n1/2     e β? − β• eλ0(t) − λ•0(t)     =     I? ββ n I? βλ0 n I_λ0β n I_λ0λ0 n     −1 β∗_,λ∗ 0(t) n−1/2     U? β(β•, λ•0(t)) Uλ0(β • , λ• 0(t))     ∼ =     i? ββ n 0 i_λ0β n i_λ0λ0 n     −1 n−1/2     U? β(β•, λ•0(t)) Uλ0(β • , λ• 0(t))     =     _i? ββ n −1 0 n iλ0β _i λ0λ0 n −1    n−1/2     U?_β(β•, λ•0(t)) Uλ0(β • , λ•0(t))    .

Thus, by picking up the first p elements in the vector on the left-hand side, n1/2(eβ? − β•) ∼= _i? ββ n −1 n−1/2U?_β(β•, λ•0(t)) + 0· n −1/2 Uλ0(β • , λ•0(t)) = _i? ββ n −1 n−1/2U?_β(β•, λ•₀(t)).

According to the multivariate central limit theorem,

n−1/2U?_β(β•, λ•0(t)) D −−−→ N 0, lim n→∞ _I? ββ(β•, λ•0(t)) n .

(26)

Therefore, n1/2(eβ?− β•) −−−→D N 0, lim n→∞ _I? ββ(β•, λ•0(t)) n −1!

which is exactly the same asymptotic distribution ofβe? as if λ0(t) were known.

7 REFERENCES

Andersen, P. K., Borgan, Ø., Gill, R. D. & Keiding, N. (1993).Statistical Models Based on Counting Processes. New York, NY: Springer-Verlag.

Andersen, P. K. & Gill, R. D. (1982). Cox’s regression model for counting process: A large sample study. Ann. Statist. 10, 1100-20.

Begun, J. M., Hall, W. J., Huang, W. & Wellner, J. A. (1983). Information and asymptotic efficiency in parametric-nonparametric models. Ann. Statist.11, 432-53.

Bowden, R. J. & Turkington, D. A. (1984). Instrumental Variables. Cambridge: Cambridge University Press.

Chang, I. -S. & Hsiung, C. A. (1990). Finite sample optimality of maximum partial likelihood estimation in Cox’s model for counting process. J. Statist. Plan. Inf. 25, 35-42.

Chang, I. -S. & Hsiung, C. A. (1991). Applications of estimating function theory to repli-cates of generalized proportional hazards models. In: V. P. Godambe (Ed), Estimating Functions, Oxford: Clarendon Press, pp. 23-33.

Cox, D. R. (1972). Regression models and life tables (with discussion).J. Roy. Statist. Soc., Ser. B 34, 187-220.

(27)

Cox, D. R. (1975). Partial likelihood.Biometrika 62, 269-76.

Desmond, A. F. (1991). Quasi-likelihood, stochastic processes, and optimal estimating func-tions. In: V. P. Godambe (Ed), Estimating Functions, Oxford: Clarendon Press, pp. 133-46.

Fleming, T. R. & Harrington, D. D. (1991).Counting Processes and Survival Analysis. New York, NY: John Wiley & Sons.

Gill, R. D. (1984). Understanding Cox’s regression model: A martingale approach.J. Amer. Statist. Assoc. 79, 441-7.

Godambe, V. P. (1960). An optimum property of regular maximum likelihood estimation.

Ann. Math. Statist.31, 1208-12.

Godambe, V. P. (1985). The foundations of finite sample estimation in stochastic processes.

Biometrika72, 419-28.

Godambe, V. P. & Heyde, C. C. (1987). Quasi-likelihood and optimal estimation. Intern. Statist. Rev. 55, 231-44.

Godambe, V. P. & Kale, B. K. (1991). Estimating functions: An overview. In: V. P. Godambe (Ed), Estimating Functions, Oxford: Clarendon Press, pp. 3-20.

Greene, W. H. (2000).Econometric Analysis, 4th ed. Upper Saddle River, NJ: Prentice-Hall. Greenwood, P. E. & Wefelmeyer, W. (1991). On optimal estimating functions for partially

specified counting process models. In: V. P. Godambe (Ed), Estimating Functions, Oxford: Clarendon Press, pp. 147-60.

(28)

Heyde, C. C. (1997).Quasi-Likelihood and Its Application: A General Approach to Optimal Parameter Estimation. New York, NY: Springer-Verlag.

Hu, F. -C., Lai, S. -H., Tsai, T. -L. & Shau, W. -Y. (2002). The method of instrumental vari-ables for generalized linear models. Technical report. Division of Biostatistics, Graduate Institute of Epidemiology, College of Public Health, National Taiwan University, Taipei, Taiwan, R.O.C..

Khuri, A. I. (1993).Advanced Calculus with Applications in Statistics. New York, NY: John Wiley & Sons.

McCullagh, P. (1991). Quasi-likelihood and estimating functions. In: D. V. Hinkley, N. Reid & E. J. Snell (Eds), Statistical Theory and Modelling: In Honour of Sir David Cox, FRS, London: Chapman and Hall, pp. 265-86.

McCullagh, P. & Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. London: Chap-man & Hall.

Moore, D. F. (1986). Asymptotic properties of moment estimators for overdispersed counts and proportions. Biometrika73, 583-8.

Nelder, J. A. & Wedderburn, R. W. M. (1972). Generalized linear models. J. Roy. Statist. Soc., Ser. A 135, 370-84.

Newey, W. K. (1990). Efficient instrumental variable estimation of nonlinear models. Econo-metrica58, 809-38.

(29)

Pagan, A. & Ullah, A. (1999). Nonparametric Econometrics. New York, NY: Cambridge University Press.

Prentice, R. L. & Self, S. (1983). Asymptotic distribution theory for Cox-type regression models with general relative risk form. Ann. Statist.11, 804-13.

Small, C. G. & McLeish, D. L. (1994).Hilbert Space Methods in Probability and Statistical Inference. New York, NY: John Wiley & Sons.

Thavanewswaran, A. & Thompson, M. E. (1986). Optimal estimation for semimartingales.

J. Appl. Probab. 23, 409-17.

Tsiatis, A. A. (1981). A large sample study of Cox’s regression model. Ann. Statist. 9, 93-108.

Wedderburn, R. W. M. (1974). Quasilikelihood functions, generalized linear models and the Gauss-Newton method.Biometrika 61, 439-47.

Williams, D. A. (1982). Extra-binomial variation in logistic linear models.Appl. Statist.31, 144-8.

廣義聯立方程模式與廣義路徑分析：存活或事件史資料(I)

行政院國家科學委員會專題研究計畫 成果報告

Draft

Is the Maximum Partial Likelihood Estimator for

Cox’s Proportional Hazards Model

Also a General Least Squares Estimator?

Fu-Chang Hu

,

Assistant Professor

Division of Biostatistics

Graduate Institute of Epidemiology

College of Public Health

National Taiwan University

Taipei, Taiwan 100

R.O.C.

[email protected]

Chyong-Mei Chen

,

Doctoral Candidate

Division of Biostatistics

Graduate Institute of Epidemiology

College of Public Health

National Taiwan University

Taipei, Taiwan 100

R.O.C.

[email protected]

June 4, 2003

SUMMARY

Key words

:

Contents

1

INTRODUCTION

1.1

Motivation

1.2

Objectives

2

REVIEW

2.1

Cox’s Proportional Hazards Model

2.2

Theory of Estimating Functions

3

MAIN RESULTS

3.1

Generalized Method of Moments Estimation

3.2

The Estimating Functions

3.3

The Nuisance Parameter Problem

3.4

Properties

3.5

Some Extensions

4

DISCUSSION

4.1

Summary

4.2

Future Work

5

ACKNOWLEDGEMENTS

6

APPENDIX

7

REFERENCES

行政院國家科學委員會專題研究計畫成果報告