www.elsevier.com/locate/jspi
Incomplete covariates data in generalized linear models
Yi-Hau Chen
∗, Hung Chen
Graduate Institute of Epidemiology and Department of Mathematics, National Taiwan University, Taipei 10018, Taiwan, ROC
Received 15 December 1997; accepted 9 November 1998
Abstract
We consider regression analysis when part of covariates are incomplete in generalized linear models. The incomplete covariates could be due to measurement error or missing for some study subjects. We assume there exists a validation sample in which the data is complete and is a simple random subsample from the whole sample. Based on the idea of projection-solution method in Heyde (1997, Quasi-Likelihood and its Applications: A General Approach to Optimal Parameter Estimation. Springer, New York), a class of estimating functions is proposed to estimate the regression coecients through the whole data. This method does not need to specify a correct parametric model for the incomplete covariates to yield a consistent estimate, and avoids the
‘curse of dimensionality’ encountered in the existing semiparametric method. Simulation results shows that the nite sample performance and eciency property of the proposed estimates are satisfactory. Also this approach is computationally convenient hence can be applied to daily data analysis. c 1999 Elsevier Science B.V. All rights reserved.
AMS classiÿcation: 62P10; 62P20
Keywords: Estimating function; Generalized linear models; Incomplete data; Measurement error; Missing covariates; Validation sample
1. Introduction
Consider the estimation problem in the context of the generalized linear models (GLMs) with ‘incomplete’ covariates data. Here the ‘incomplete’ covariates could be due to measurement error, or missing for some study subjects either by design or by happenstance. To facilitate discussions, we rst describe the problem settings and introduce the notations. According to Nelder and Wedderburn (1972), the log likelihood of the response yi (i = 1; : : : ; n), conditional on the q-dimensional covariate vector wi,
∗Corresponding author.
0378-3758/99/$ - see front matter c 1999 Elsevier Science B.V. All rights reserved.
PII: S0378-3758(98)00255-9
2. The proposed method
Without loss of generality, let V ={1; : : : ; m} and NV={m+1; : : : ; n} be the index sets for the validation and nonvalidation samples, respectively. First, consider the following example motivating the proposed method.
Example (Linear model). Consider the linear model y=x+, with ∼ N(0; 2). The covariate x is observed only in the validation sample. Instead, the surrogate z of x is always observed. Suppose that x = z + where ∼ N(0; 2) and is independent of .
Recall that the complete-data score function for isP
xiei() with ei() ≡ (yi− xi), and the maximum likelihood estimate for can be obtained by using the EM algorithm (Dempster et al., 1977), which provides solution to the estimating equation
P
i∈Vxi(yi− xi) + P
i∈NVE;{xiei()|yi; zi} = 0;
where = { ; 2; 2}. Another two candidates of the unbiased estimating functions for are
P
i∈Vxi(yi− xi) + P
i∈NVE (xi|zi){yi− E (xi|zi)}
=P
i∈Vxi(yi− xi) + P
i∈NV zi(yi− zi) (1)
and P
i∈Vxi(yi− xi) + P
i∈NVE (xi|zi){yi− E;(xi|yi; zi)}: (2) By the result that
E;(x|y; z) = 2
2+ 22y + 2
2+ 22z;
Eq. (2) can be simplied to P
i∈Vxi(yi− xi) + 2
2+ 22 P
i∈NV zi(yi− zi):
Recall that var;(y|z) = 2+ 22. It is then expected that Eq. (2) is preferred to Eq. (1) since Eq. (2) takes into account the variance structure among validation and nonvalidation samples. We will give a formal justication for this issue in the next section.
Our proposal is motivated from Eq. (2). Further, following the idea of the projection- solution method (Heyde, 1997), we can replace the conditional expectations in Eq. (2) by least-squares predictors to relax the assumptions about the model relating z to x.
For general GLMs with canonical link, let SV() = P
i∈Vwi(yi− i); (3)
which is the score function based on the validation data. It would serve as an unbiased estimating function if the validation sample is representative.
In the nonvalidation sample, for each , we replace the unobserved quantities w and
in Eq. (3) by their least-squares predictors ˆw ≡ ˆTh and ˆ ≡ gTˆ, respectively, where h=h(z) is a chosen set of basis functions of z; g=g(y; z) ≡ (y; hT)T, and ˆand ˆ are the corresponding coecients, which are estimated from the validation sample.
Then, dene the estimating function based on the nonvalidation data to be SNV() = P
i∈NV ˆwi(yi− ˆi) = P
i∈NV
ˆThi(yi− giTˆ):
Observed that under some regularity conditions SNV() is asymptotically unbiased, i.e., n−1SNV(∗) = op(1) as n → ∞, where ∗ is the true parameter. This follows from the law of large numbers and the identities
E{h(y − ∗)} = 0 and E{h(∗− gT∗)} = 0;
where ∗ is evaluated at ∗, and ∗ ≡ {E(ggT)}−1E(g∗), the limiting value of ˆ evaluated at ∗.
Combining SV() and SNV(), dene the estimating function S() ≡ SV() + SNV();
and the proposed estimator ˆ is dened to be the solution to 0 = S().
To ensure the existence of a consistent solution to 0 = S(), it is required that the derivative of n−1S() evaluated at the true parameter to be negative-denite asymp- totically. This can be achieved if the coecient ˆ is obtained by using a weighted least-squares method, with {di; i ∈ V } as the weights. Assume that ≡ limnm=n ∈ (0; 1], and the usual regularity conditions hold, then for ∈ B, a neighborhood of ∗; n−1(@=@)S() → −F() in probability as n → ∞, where
F() ≡ E(dwwT) + (1 − )E(dwhT){E(dhhT)}−1E(dhwT):
The derivation is relegated to Appendix A.
Remark 1. Note that the unbiasedness of S() does not rely on the choice of the basis h, hence the proposed estimating procedure is robust against the choice of h which releases the burden of practitioners to specify a correct parametric model relating z to x.
Furthermore, the basis h is of nite dimension hence avoids the curse of dimensionality.
However, a good choice of the basis h will give better approximation for w and , therefore may improve the eciency.
Remark 2. The estimating function S() is, however, not the projection of a quasi-score estimating function. Consequently, as commented in Heyde (1997), the optimality prop- erties associated with the quasi-score estimating function will not preserved for S().
3. Asymptotic theory
We will derive the asymptotic theory for ˆ under the regularity conditions usually assumed for the maximum likelihood estimation in GLMs, see for example, Fahrmeir
and Kaufmann (1985), and additional assumptions ensure the n1=2 consistency of the least-squares estimates ˆ and ˆat ∗. Detailed assumptions and a sketch of the proof are given in Appendix B.
Theorem. Under regularity conditions speciÿed in Appendix B and suppose that (i) the validation sample is a simple random subsample from the total sample and the validation fraction m=n → ∈ (0; 1] as n → ∞;
(ii) E(ggT) and E(d∗hhT) exist and are positive deÿnite, where d∗ is d evaluated at ∗.
Let ∗ and ∗ be, respectively, the limit of ˆ and ˆ at ∗ as n → ∞. Then ˆ is consistent and n1=2( ˆ − ∗) is asymptotically normal with mean zero and covariance matrix given by
= F∗−1{V + (1 − )NV+(1 − )2
+ (1 − )(C+ TC)}F∗−1; (4) where
F∗= E(d∗wwT) + (1 − )E(d∗whT){E(d∗hhT)}−1E(d∗hwT);
V = E(d∗wwT); NV= ∗TE{h(y − gT∗)2hT} ∗;
= ∗TE{h(∗− gT∗)2hT} ∗;
C= y∗E(d∗whT) ∗; ∗yis the component of ∗corresponding to y:
The asymptotic variance (4) can be consistently estimated by replacing the population quantities in Eq. (4) with their empirical counterparts in the validation sample.
From the proof of the theorem (Appendix B) we can see that the component NV of is the variance of SNV(∗) if ∗ were known, and arises from the variation due to the estimation of ∗ through the validation sample. These two components will become the dominant terms of as the validation fraction decreases because they are multiplied by factors (1 − ) and (1 − )2=, respectively. Since ˆ is a function of yi; i ∈ V; SV(∗) and SNV(∗) are correlated and the component C is the covariance between them.
Example (Continued). Let ∗; ∗; ∗2 and ∗2 be the true values for the corresponding parameters. In view of Eq. (4), the asymptotic variance for ˆ is
=∗2
F∗ + ∗2
∗2+ ∗2∗2 (1 − )
∗2∗2 ∗2E(z2) F∗2 ;
where F∗= E(x2) + (1 − ) ∗2E(z2) = ∗2+ ∗2E(z2). On the other hand, based on the estimating function in Eq. (1), with estimated from the validation sample, Gourieroux and Monfort (1981) proposed an estimator ˜ for , which can also be viewed as the estimator obtained in the same way as ˆ but with g redened as g ≡ h.
The asymptotic variance ˜ for ˜ can thus be derived from Eq. (4) by replacing g with h = z, and setting C≡ 0. Accordingly, we have
˜ = F∗2∗ +(1 − )
∗2∗2 ∗2E(z2) F∗2 :
∗= 0 or ∗= 0.
4. Simulation studies
In this section, through simulation studies we wish to get insight of the following issues of great interest: (1) the nite sample performance of ˆ; (2) the eect of the choice of the basis h on the eciency of ˆ; (3) the loss in eciency for ˆ compared to the maximum likelihood estimator ˆML when the model regarding x given z is correctly specied.
The simulation studies reported in Section 4.1 address problems (1) and (2), and in Section 4.2 the third problem is investigated under a simple scenario. The results presented are based on 500 replications.
4.1. Small sample properties and the choice of the basis
A linear regression model and a logistic regression model are considered in the present simulation study. In both cases, the covariate variable z is generated using the uniform distribution U(−2; 2), and x is generated through x = 0:5z2+ , where
∼ N(0; 2) which is independent of z. In the linear model case, the outcome y is generated according to y = + , where = ∗0+ 1∗z + ∗2x and ∼ N(0; 1) which is independent of (z; x). In the logistic model case y is binary such that Pr(y = 1|z; x) = exp()={1 + exp()}, and = ∗0+ ∗1z + ∗2x. In both cases (∗0; ∗1; ∗2) = (−2; 1; 1).
The validation sample is randomly selected from the total n = 300 observations with probability = 0:2, and for the remaining observations x is missing.
For two choice of the basis h, the ‘wrong’ choice h(1)= (1; z)T and the ‘correct’
choice h(2)= (1; z; z2)T, Table 1 demonstrates the simulation results, including the esti- mated asymptotic relative eciency (ARE) of ˆ versus the complete-case-only estimate ˆV, assessed via ˆVar( ˆV) = ˆVar( ˆ), here ˆVar(·) denotes the average of the estimated asymptotic variances. As a comparison, we also present the results for ˜ obtained as ˆ but with g redened as g ≡ h.
Several conclusions can be made from this empirical study:
1. Even for the ‘wrong’ choice h = (1; z)T, the bias of ˆ is negligible. The variance estimate is adequate, too.
2. In most cases, the ‘correct’ choice h = (1; z; z2)T provides more ecient ˆ than the
‘wrong’ choice of h = (1; z)T does. However, such improvement seems to be slight when 2= 1, that is, x and z are weakly correlated.
3. In most cases, as expected, ˆ achieves higher eciency than ˜ does.
Table 1
Results of ˜ and ˆ: bias, variance (Var), estimated variance ( ˆVar) and asymptotic relative eciency (ARE) to complete-case-only estimate V; = ∗0+ ∗1z + ∗2x; x = 0:5z2+ ; ∼ N(0; 2); n = 300; = 0:2: h(1)= (1; z)T; h(2)= (1; z; z2)T
2 h bias(×103) Var(×103) ˆVar(×103) ARE
˜ ˆ ˜ ˆ ˜ ˆ ˜ ˆ
(a) Linear model
∗1 = 1
1 h(1) 1.99 0.72 18.76 9.72 18.25 8.55 0.73 1.56
1 h(2) 1.46 1.27 14.02 8.27 12.68 7.51 1.05 1.78
0.5 h(1) 0.79 0.04 13.38 8.54 13.39 7.70 1.00 1.74
0.5 h(2) 0.33 0.23 8.36 6.29 7.58 5.85 1.76 2.28
∗2 = 1
1 h(1) 1.61 1.61 12.61 12.61 13.28 13.28 1.00 1.00
1 h(2) 5.55 4.79 21.54 14.61 22.24 14.39 0.60 0.92
0.5 h(1) 2.97 2.97 20.12 20.12 21.06 21.06 1.00 1.00
0.5 h(2) 7.14 6.08 20.18 16.23 19.81 15.69 1.09 1.34
(b) Logistic model
∗1 = 1
1 h(1) 39.04 48.77 64.56 67.95 70.75 64.59 2.23 2.44
1 h(2) 23.56 27.60 45.44 48.40 49.18 47.27 3.19 3.32
0.5 h(1) 44.34 52.66 55.24 56.85 60.49 55.96 2.65 2.86
0.5 h(2) 26.04 28.05 33.04 34.33 33.21 33.79 4.84 4.75
∗2 = 1
1 h(1) 44.41 66.52 17.76 19.81 17.13 17.76 1.13 1.09
1 h(2) 25.33 38.49 13.09 13.73 13.36 13.12 1.44 1.47
0.5 h(1) 67.79 82.05 25.21 26.90 24.06 24.58 1.12 1.10
0.5 h(2) 35.57 41.06 11.77 11.83 12.64 12.56 2.13 2.14
4.2. Comparison of eciency
To gauge the eciency property of the proposed method, now we consider a simple conguration for the measurement error problem. A linear model y=x+ is considered where x ∼ N(0; 2) and ∼ N(0; !2) which is independent of x. The covariate x is subject to measurement error hence z = x + is observed for the total n = 100 subjects, where ∼ N(0; 2) which is independent of x and . Only in the validation sample x is ascertained. The maximum likelihood estimate ˆML of is based on the likelihood
L = Q
i∈VPN(yi|xi; xi; !2)PN(xi|zi; zi; 2) Q
i∈NVPN(yi|zi; zi; 22+ !2);
where = 2=(2+ 2), 2= 22=(2+ 2), and PN(·; u; v) denotes the normal density function with mean u and variance v. Parameters (; !2; ; 2) are estimated simultane- ously. True values for ; !2, and 2 are set to be 1, 1, and 4. Choose h=z. For various values of and 2, Table 2 demonstrates the estimated asymptotic relative eciencies (ARE) of ˆ; ˜, and ˆV to ˆML.
Table 2
Asymptotic relative eciency of ˆ; ˜, and the complete-case-only estimate ˆV to maximum likelihood estimate ˆML: y = x + ; z = x + ; ∼ N(0; 1); ∼ N(0; 2); x ∼ N(0; 4); ∗= 1. h = z; g = (y; z)T
2 = 0:2 = 0:5
ˆ ˜ ˆV ˆ ˜ ˆV
10 0.48 0.18 0.69 0.78 0.45 0.82
2 0.76 0.34 0.61 0.87 0.59 0.77
1 0.91 0.59 0.52 0.92 0.73 0.71
0.5 1.00 0.81 0.42 0.96 0.86 0.65
0.1 1.00 1.00 0.24 1.00 1.00 0.54
Here we make some remarks on Table 2:
1. ˆ is highly ecient except when 2 is fairly large and is small. On the other hand, ˜ is satisfactory only when the measurement error 2 is very small relative to !2.
2. Discarding the nonvalidation data would cause a tremendous loss in eciency when both and 2 are small. Nevertheless, for the case that 2 is very large, that is, z is a bad surrogate for x; ˆV is more ecient than both ˆ and ˜. In such case extracting information from the nonvalidation sample might be fairly dicult.
5. An illustrative example
The data for this illustration is the Husbands and Wives Data obtained from Hand et al. (1994). A random sample of 195 married women and their husbands was selected and data on their heights and ages was recorded. The focus was on the relationship between husband’s and wife’s heights, adjusting for wife’s age. A linear regression model was to be tted, where the outcome variable was wife’s height (WHeight), and the covariates included husband’s height (HHeight) and wife’s age (WAge). There were 27 women with missing values in their ages. Since the data set also included data on husband’s age (HAge), which was completely observed, we used it as a surrogate for wife’s age, and chose h=(1; HHeight; HAge)T. As an empirical check for the surrogate condition, that is, controlling for WAge, HAge did not appear in the true model relating HHeight to WHeight, a linear regression model which further included HAge as one covariate was tted based on the validation data. The p-value for the signicance test of HAge was 0.29 hence the surrogate condition might be reasonable. The results are presented in Table 3.
Although the validation sample in this study was fairly large, composing 86% of the whole sample, by incorporating the nonvalidation sample we still gain about 10% of the estimation eciency for the coecient of HHeight, and about 5% for the coecient of WAge, compared with the analysis using the validation data only.
Table 3
Parameter estimates and standard errors (SE) for Husbands and Wives Data
Covariate Coecient (SE)
Proposed method Complete-case-only
HHeight (mm) 0.305 (0.062) 0.260 (0.069)
WAge (year) −0:829 (0.386) −0:894 (0.404)
Constant 1106.573 (111.750) 1188.998 (123.662)
6. Final discussions
In the analysis of incomplete covariates data, the association between x and z must be taken into account so that the information contained in the nonvalidation sample can be retrieved. The method proposed in this paper uses a parametric approach to obtain a ‘working’ association between x and z, by which an estimating function for can be constructed. The unbiasedness of the proposed estimating function has no bearing on the ‘true’ association between x and z, hence the proposed method is robust to the specication of the model regarding x given z. To enhance the eciency of the proposed estimator, the data analysts may choose a better model relating z to x by use of the usual model-selection strategies.
The proposed estimating procedure can be extended to the case of the nonnatural link in the following way. Let ˜w = w0(). Note that in the nonnatural link case the score function for the validation sample is of the form
SV() = P
i∈V ˜wi(yi− i):
For a chosen basis h, redene ˆ as before but with w replaced by ˜w. Then the estimating function S() = SV() + SNV() is asymptotically unbiased at = ∗, and the theorem in Section 3 still holds with w replaced by ˜w.
The limitation of the proposed approach is that the validation sample must be a sim- ple random subsample of the whole sample. In the context of the missing data problem, the proposed estimator is valid when the missing mechanism is missing completely at random (MCAR) as described in Rubin (1976). To extend the proposed method to cases of more general missing mechanism, it may require that the ‘selection process’
that identies subjects as members of the validation sample must be known up to an additional set of unknown parameters.
Acknowledgements
The authors are grateful to the referee for helpful comments.
Appendix A. Derivation of F()
Let W =(w1; : : : ; wm)T; H =(h1; : : : ; hm)T; H =(hm+1; : : : ; hn)T; D=diag(d1; : : : ; dm);
G = (g1; : : : ; gm)T; G = (gm+1; : : : ; gn)T; U = (1; : : : ; m)T. Write −(@=@)S() = n ˆFn() − n ˆRn(), where
ˆFn() = n−1{WT+ ˆTHTG(G TG)−1GT}(@U=@)
= n−1{WTDW + WTDH(HTDH)−1HTG(G TG)−1GTDW}
and
ˆRn() = n−1(@ ˆT=@) HT( y − G ˆ)
with y = (ym+1; : : : ; yn)T. For ∈ B, a neighborhood of ∗, under some regularity conditions,
ˆFn()→ E(dwwp T)
+(1 − )E(dwhT){E(dhhT)}−1E(hgT){E(ggT)}−1E(dgwT)
= E(dwwT) + (1 − )E(dwhT){E(dhhT)}−1E(dhwT);
since E(hgT){E(ggT)}−1= (Ir; 0), where Ir is the identity matrix of size r = dim(h), and 0 denotes the r-vector of 0’s. Also, n−1HT( y− G ˆ)→(1−){E(hy)−E(h)}=0.p It then follows that, for ∈ B,
F() ≡ − lim
n n−1(@=@)S()
= E(dwwT) + (1 − )E(dwhT){E(dhhT)}−1E(dhwT):
Appendix B. Consistency and asymptotic normality of ˆ
First we state the regularity conditions needed in the proof of Theorem 1.
(R1) ∈ which is a convex compact subset in Rq; the true parameter ∗ lies in the interior of ;
(R2) (yi; zi; xi); i = 1; : : : ; n, are independently and identically distributed;
(R3) is twice dierentiable in for each w;
(R4) the matrix F∗≡ F(∗) exists and is positive denite;
(R5) E(sup∈B||dwwT||); E(sup∈B||dwhT||); E(sup∈B||(dhhT)−1||), and E(sup∈B||(@ ˆT=@)hT(y − )||) are all nite for some neighborhood B of ∗. Here
||M|| = (P
i;jm2ij)1=2 where M = (mij).
I. (Consistency) Following Foutz (1977), it suces to check the following conditions to assure the consistency of ˆ: (a) the elements of (@=@)S() exist and are continuous in ; (b) the matrix n−1(@=@)S() evaluated at ∗ is negative denite with probability going to 1 as n → ∞; (c) n−1(@=@)S() converges to F() in probability uniformly for ∈ B; (d) n−1S(∗) = op(1) as n → ∞.
Condition (a) follows from (R3), (b) holds by (R4) and the result in Appendix A.
According to (R5), we can apply the strong law of large numbers for Banach space
valued random variables to obtain uniform convergence of n−1(@=@)S() (Fahrmeir and Kaufmann, 1985), thus (c) holds. Finally, statement (d) follows by the arguments in Section 2.
II. (Normality) Let m ≡ n−m be the sample size of the nonvalidation sample. Then n−1=2SNV(∗) = n−1=2ˆT∗ P
i∈NVhi(yi− giTˆ∗)
= ( m=n)1=2
m−1=2 P
i∈NV
∗Thi(yi− gTi∗)
−(m=n)−1=2( m=n) ∗T
P
i∈NVhigTi= m
{m1=2( ˆ∗− ∗)} + op(1)
≡ (1 − )1=2A − −1=2(1 − ) ∗T(B · C) + op(1);
where A = m−1=2P
i∈NVai with ai= ∗Thi(yi− giT∗); i ∈ NV. Recall that E{h(y − gT∗)} = 0, so that a0is are independent and identically distributed random vectors with mean zero and variance NV= ∗TE{h(y − gT∗)2hT} ∗. Hence A is asymptotically normal with mean zero and variance NV. Also, B ≡ P
i∈NVhigiT= m→ E(hgp T) by assumption (ii), and C ≡ m1=2( ˆ∗− ∗) is asymptotically normal with mean zero and variance = R−1E{g(∗− gT∗)2gT}R−1; with R = E(ggT). It is easy to see that {ai; i ∈ NV} and B · C are asymptotically uncorrelated.
It is known that under the regularity conditions n−1=2SV() is asymptotically nor- mal with mean zero and variance V at = ∗, with V = E(d∗wwT). Note that n−1=2SV(∗) and n−1=2SNV(∗) are correlated through the term C. The covariance be- tween them is given by (1 − )C where
C= E{w(y − ∗)(gT∗− ∗)gT}{E(ggT)}−1E(ghT) ∗
= E{w(y − ∗)(gT∗− ∗)hT} ∗
= ∗yE(d∗whT) ∗;
the last equality holds from the result that gT∗, the population least-squares regression gT∗ of ∗ on g = (y; hT)T, can be written as gT∗= ∗yy + (1 − ∗y)hT∗, where hT∗ is the population least-squares regression of ∗ on h with ∗= E(hhT)−1E(h∗), and
∗y= E(∗− hT∗)2 E(y − ∗)2+ E(∗− hT∗)2:
As a result, as n → ∞; n−1=2S(∗) is asymptotically normal with mean zero and variance
V + (1 − )NV+(1 − )2
+ (1 − )(C+ TC);
where = ∗TE(hgT)E(ghT) ∗= ∗TE{h(∗−gT∗)2hT} ∗. Recall that n−1(@=@) S()→ −Fp ∗ at = ∗. By the regularity conditions and the Taylor expansion we thus conclude the result of theorem.
References
Carroll, R.J., Wand, M.P., 1991. Semiparametric estimation in logistic measurement error models. J. Roy.
Statist. Soc. Ser. B 53, 573–587.
Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39, 1–38.
Fahrmeir, F., Kaufmann, H., 1985. Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann. Statist. 13, 342–368.
Foutz, R., 1977. On the unique consistent solution to the likelihood equations. J. Amer. Statist. Assoc. 72, 147–148.
Gourieroux, C., Monfort, A., 1981. On the problem of missing data in linear models. Review of Economic Studies XLVIII, 579–586.
Hand, D.J., Daly, F., Lunn, A.D., McConway, K.J., Ostrowski, E., 1994. A Handbook of Small Data Sets.
Chapman & Hall, London.
Heyde, C.C., 1997. Quasi-Likelihood and Its Applications: A General Approach to Optimal Parameter Estimation. Springer, New York.
Nelder, J.A., Wedderburn, R.W.M., 1972. Generalized linear models. J. Roy. Statist. Soc. Ser. A 135, 370–384.
Pepe, M.S., Fleming, T.R., 1991. A non-parametric method for dealing with mismeasured covariate data.
J. Amer. Statist. Assoc. 86, 108–113.
Robins, J.M., Hsieh, F., Newey, W., 1995. Semiparametric ecient estimation of a conditional density with missing or mismeasured covariates. J. Roy. Statist. Soc. Ser. B 57, 409–424.