• 沒有找到結果。

異質變異數矩陣之穩健估計

N/A
N/A
Protected

Academic year: 2021

Share "異質變異數矩陣之穩健估計"

Copied!
47
0
0

加載中.... (立即查看全文)

全文

(1)

行政院國家科學委員會專題研究計畫 成果報告

異質變異數矩陣之穩健估計

研究成果報告(精簡版)

計 畫 類 別 : 個別型 計 畫 編 號 : NSC 98-2118-M-004-006- 執 行 期 間 : 98 年 08 月 01 日至 99 年 08 月 31 日 執 行 單 位 : 國立政治大學統計學系 計 畫 主 持 人 : 鄭宗記 計畫參與人員: 此計畫無其他參與人員 報 告 附 件 : 出席國際會議研究心得報告及發表論文 處 理 方 式 : 本計畫可公開查詢

中 華 民 國 99 年 09 月 26 日

(2)

On Robust Estimation of the Heteroscedasticity

Covariance Matrix

Tsung-Chi Cheng

1

Introduction

In a normal regression model, the assumption of homogenous variance is not always appropriate. When the disturbance process in a regression model entails heteroscedas-ticity, the standard inference procedure becomes invalid because of the inappropriate estimation of the standard error. A conventional way to overcome this problem in statistical and econometric literature is to specify the model with an assumed error structure and apply maximum likelihood estimation, generalized squares, or other ap-proaches, such as residual maximum likelihood (Verbyla 1993). In addition, Bianco, Boente, and di Rienzo (2000) and Hallina and Mizera (2001) consider the robust estimator when outliers exit in a heteroscedastic regression model. However, these approaches require a specific function of the error variance. There is usually little or no guidance regarding the form of heteroscedasticity though.

Robust estimations and diagnostics for linear regression models with the assump-tion of constant errors have been widely discussed (see Atkinson (1985); Rousseeuw and Leroy (1987); Atkinson and Riani (2000)). Swamping (i.e. when inliers appear as outlying) and masking (i.e. when outliers appear as inlying) effects due to multiple outliers can be avoided by robust diagnostics. Both outliers and heteroscedasticity in the data also can lead to the inflation of the estimate of scale and deteriorate both the swamping and masking effects. For a successful analysis with regard to out-liers and leverage points, a robust estimation is required, preferably one with a high

Department of Statistics, National Chengchi University, 64 ZhihNan Road, Section 2, Taipei

(3)

breakdown point. The (finite) sample breakdown point of an estimator is the smallest proportion of observations that, when altered, can cause the value of the estimator to become arbitrarily large or small. Therefore, one of the desirable properties for a robust estimator is a high breakdown point that can handle multiple outliers.

The approach proposed in this article employs the weighted least absolute de-viation (WLAD) estimator suggested by Hubert and Rousseeuw (1997) and Giloni, Simonoff, and Sengupta (2006) to deal simultaneously with outliers and heteroscedas-ticity in the linear regression model without specifying the variance function. The difficulty is differentiating those observations that inflate the variation and belong to outlying points from those attributable to the (natural) heteroscedastic structure of the data. A jigsaw plot that uses the simulated envelopes of Atkinson (1985) for the absolute standardized residuals can represent both characteristics for each case in the dataset. Furthermore, plugging the resulting residuals into the estimation of the heteroscedasticity consistent covariance matrix (HCCM) yields a robust quasi-t test for the estimated coefficients.

2

Weighted least absolute deviation estimator

Consider the linear model

y = Xβ + ², (1)

where y is an n×1 vector of the response variable, β = (β0, · · · , βp) denotes (p+1)×1

regression coefficients, X = (xT

1, · · · , xTn) is an n × (p + 1) design matrix, and ² is

the n × 1 vector of random errors. The random vector ² = {²1, · · · , ²n} is assumed

to be independent and follows MN (0, Ω), where Ω = diag(ω1, · · · , ωn). Given the

form of the variance function, ωi, the maximum likelihood estimation (MLE) for

model (1) has been discussed by Harvey (1976) and Aitkin (1987) and the residual maximum likelihood (REML) estimation is presented by Patterson and Thompson (1971), Harville (1974), and Cooper and Thompson (1977). However, outliers can influence both MLE and REML (see Cheng (2010)). Without specifying the variance function, the WLAD procedure for model (1) essentially follows both Hubert and Rousseeuw (1997) and Giloni et al. (2006).

(4)

Hubert and Rousseeuw (1997) propose the RDL1 estimator for robust regression

with both continuous and categorical predictors. The RDL1 consists of three stages:

identifying leverage points, downweighting the leverage points when estimating the parameters, and estimating the residual scale. To adapt the RDL1 estimator for

model (1) with heteroscedastic errors, this study first computes the robust distance of continuous regressors (if discrete regressors are included in X, they are excluded at this stage, as in Hubert and Rousseeuw (1997)), as follows:

RD(xi) =

q

(xi− t)TC−1(xi− t), i = 1, · · · , n, (2)

where t and C are the robust location and scale estimates of the X matrix, re-spectively. Hubert and Rousseeuw (1997) use the minimum volume ellipsoid (MVE) estimator for t and C. These distances (2) can identify the leverage points for the space of continuous regressors and serve as the weights for estimating the regression coefficients by a weighted L1 procedure in the second stage. The current approach

applies the minimum covariance determinant (MCD) estimator to obtain the robust location and scale estimates of the X matrix, and then obtains the distance (2) for the weights. Both MVE and MCD estimators provide a high breakdown of the robust es-timation of multivariate location and shape (Rousseeuw and Leroy 1987). Moreover, Butler, Davies, and Jhun (1993) show that the MCD estimator has better theoretical properties than the MVE. Woodruff and Rocke’s (1994) empirical results show that the MCD is preferable to the MVE for their applications. Croux and Haesbroeck (1999) discuss other statistical properties and the robustness of the MCD.

At the second stage, the parameters β of model (1) can be estimated by min β n X i=1 wi|ri(β)|, (3) where ri(β) = yi− xTi β, and wi = min ( 1, p (RD(xi))2 ) (4) for i = 1, 2, · · · , n. The final stage requires calculating the estimate of the scale of the residuals:

ˆ

(5)

where the constant 1.4826 leads to a consistent estimator under a normality assump-tion. The standardized residual for the ith case then can be defined as

ti =

ri

ˆ

σ. (6)

An observation is flagged as an outlier if its absolute value from equation (6) exceeds 2.5 with an assumption of constant errors. The breakdown property of RDL1 is

referred to Hubert and Rousseeuw (1997).

Maronna and Yohai (2000) suggest there must be some null residuals by a well-known property of the weighted L1 estimate (3) lead to an underestimation of the

error variability. Instead of equation (5), they suggest using ˆ

σ∗ = 1.4826s, (7)

where s∗ is the median of absolute non-null residuals. The standardized residual (6)

then is replaced by ˆσ∗, as follows:

t∗i = ri ˆ

σ∗. (8)

Maronna and Yohai’s suggestions are appropriate for the following discussion. Fur-thermore, the entire computation is easy to conduct. Both MCD and L1 estimations

are built-in functions in R and other commercial statistical packages, such as SPLUS and SAS.

Giloni et al. (2006) discuss some properties of the weighed L1 estimator and

suggest using the weight qminj(hjj)hii, where hij is the (i, j)th element of the hat

matrix, H = X(XTX)−1XT. This method enables them to match the results of

Ellis and Morgenthaler (1992), who argue that breakdown is related to distance rather than squared distance.

2.1

Jigsaw plot

As mentioned in the previous section, the critical values that flag the possible outliers for standardized residuals (6) and (8) are ±2.5 under constant errors for model (1). However, these values may not fulfil the requirement when heteroscedasticity exists in the data, and therefore, the estimation uses the unequal weights. To identify outlying

(6)

cases with an approach based on the WLAD estimate, the proposed method uses the half normal plot with envelopes (see Section 4.2 of Atkinson (1985)). The envelope then determines the threshold for the identification of outliers.

To construct the envelopes, the matrix X is fixed, such that the weight (4) remains the same throughout the simulation procedure. The response variable is generated from a normal distribution with zero mean, and its variance appears in Cook and Weisberg’s (1983) study as follows for the ith case:

σ2{(1 − h

ii)wi+

X

k6=i

wihik/(1 − hii)}. (9)

Suppose that on the mth simulation of the n observations, the absolute values of the standardization residual is denoted by t∗

mi, i = 1, · · · , n. The corresponding order is

given by t∗

m(i). The simulation can be repeated a fixed number of times, such as 80

times, which roughly coincides with the previous cut point, 2.5, for the percentile under normality. The simulated limits are given by

t∗l(i)= min

m t m(i),

t∗

u(i)= maxm t∗m(i), (10)

where t∗

l(i)and t∗l(i)form the lower and upper envelopes, respectively. The lower bound

may be (near) 0 for all i due to null residuals of the L1 estimate. This scenario results

in a jigsaw shape when plotting the lower and upper bounds from equation (10) together.

There are two kind of estimates for σ in equation (9), which differentiate outliers and heteroscedastic structures in the data. One is the estimated scale from equa-tion (7), and the other is the standardized residual in equaequa-tion (8) for case i. The former provide a threshold for the test of outliers, whereas the latter reflects the het-eroscedastic error for each case. The weighted hat matrix also might be used instead of H, which may yield some different jigsaw plots according to the data structure. Nevertheless, the conclusion does not vary in either case. The weighted hat matrix is denoted by Hw = Xw(XTwXw)−1XTw, where Xw = W X, and W is a matrix with

(7)

2.2

Types of outlying cases

Rousseeuw and van Zomeren (1990) propose a diagnostic plot of Studentised residuals (based on the least median squares estimate) versus robust distances (using MVE) of the X matrix, on the basis of which they classify the different types of outliers. Rousseeuw and Van Driessen (1999) adapt it as a D-D plot for the standardized residuals (using the least trimmed squares estimates) versus robust distances based on MCD. Employing a similar idea, this subsection describes the data types for the linear regression model with heteroscedastic error.

Part (a) of Figure 1 presents a scatter plot of 30 simulated data points, in which they are classified according to regular point, good leverage point, vertical outlier, and bad leverage point. Applying the proposed approach to this dataset results in Part (b) of Figure 1, which shows the diagnostic plot based on the WLAD estimate and locates all observations into their corresponding areas. The cutoff lines to separate these areas are ±2.5 and qχ2

p,0.975 (here, p = 1) for horizontal and vertical lines,

respectively.

In Figure 1, Parts (c) and (d) are the jigsaw plots of the absolute values of the standardized residuals (8), denoted by the symbol ×, together with the envelopes generated by using (7) and (8) for the σ of equation (9), respectively. The former reveals that cases 29 and 30 are outlying cases, whereas the latter indicates which observations are attributable to the heteroscedastic structure in the data. Both plots coincide with the data pattern in Part (a) in terms of outliers and heteroscedasticity. The jigsaw shape in Part (c) provides the corresponding threshold for each observation by identifying whether it is an outlier, which takes into account the unequal weight property. Regular points with labeled case numbers yield larger values for the upper bound of the envelope in Part (d), which indicates the source of heteroscedasticity.

2.3

Artificial data

This subsection identifies high leverage points for the heteroscedastic regression model using simulated data. The following model is employed for good data

(8)

where xi is generated from a uniform distribution U(1, 7) and ²i =

5xiη. Here,

ηiid∼ N(0, 0.52). For bad data, the following bivariate normal distribution applies:

à yi xi ! ∼ MN Ãà 3 20 ! , à 0.5 0 0 0.5 !! , i = 39, . . . , 50.

Therefore, the data contain 24% outliers. Part (a) of Figure 2 shows the scatter plot for this artificial dataset, which actually resulted from an adaption of the famous synthetic data provided by Rousseeuw (1984).

The analysis of this data set begins by applying the MM estimator of Yohai (1987). The function lmrob in the library robustbase provides the solution of an

MM -regression estimator. It uses a bi-square re-descending score function, and by

default, it returns a highly robust and highly efficient estimator. The computation of the robust standard errors relies on the formulas provided by Croux, Dhaene, and Hoorelbeke (2003). Part (b) in Figure 2 presents the resulting diagnostic plot, which identifies 12 bad leverage cases as good leverage points and three good observations as vertical outliers. Without taking into account heteroscedastic errors, the swamping and masking effects exist even though the high breakdown estimator is used.

On the contrary, WLAD successfully identifies leverage points, as shown in the standardized residual plot and jigsaw plot of Parts (c) and (d) of Figure 2, respec-tively. The latter presents the heteroscedastic configuration as a dashed line as well as the cutoff values for outliers, depicted by a doted line. These leverage points cause heteroscedasticity in the data as well, which may partly explain the existence of the masking and swamping effects when the MM estimate gets applied to these data.

3

Estimation of heteroscedasticity consistent

co-variance matrix

White (1980) proposes an estimator of the variance covariance matrix of the least squares regression coefficient that is consistent in certain conditions, which is also known as the HC0 estimator. Tests based on a heteroscedasticity-consistent covari-ance matrix (HCCM) estimator are popular in application, because there is no need to specify the structural form of heteroscedasticity, and it is easy to compute.

(9)

Despite this popularity of White’s HC0 estimator, several critiques and improve-ments have been proposed. Chesher and Jewitt (1987) show that the estimator ex-hibits bias even for large samples under certain regression designs. MacKinnon and White (1985) thus propose a close variant form of HC0 based on the unreplicated “almost unbiased estimator” of Horn , Horn, and Duncan (1975). Long and Ervin (2000) compare several HCCM estimators with Monte Carlo studies. Cribari-Neto (2004) proposes a new estimator for HCCM that takes into account the leverage ef-fect of the design matrix on associated quasi-t tests. For some additional alternatives, readers should turn to Bera , Surprayitno. and Premaratne (2002).

Several authors also argue that the leverage points are more decisive for the finite sample behavior than the degree of heteroscedasticity in the HCCM estimation (see Chesher and Jewitt 1987; Kempthorne and Mendel 1990; Furno 1997; Cribari-Neto and Zarkos 2001; and Cribari-Neto 2004). However, these approaches are based on ordinary least squares (OLS) estimator, which is notoriously influenced by outliers. Zhou and Portnoy (1998) provide inferential results derived from heteroscedastic mod-els based on regression quantiles, where the median regression is a special case. The variance function for weights is specified but may be estimated by regressing the local estimates of standard errors on regression.

The most popular regression estimator for model (1) is the OLS estimate of β, as given by ˆβ = (XTX)−1XTy. It is unbiased, and the corresponding variance

covariance matrix V ar(ˆβ) = Ψ denoted by

Ψ = (XTX)−1XTΩX(XTX)−1. (12)

Under homoscedasticity, this equation simplifies to (XTX)−1σˆ2, where ˆσ2 =Pe2

i/(n−

p) denotes the estimated variance of model (1). The OLS estimator for the regression

coefficients is consistent but inefficient in the general linear model with heteroscedastic errors.

Various heteroscedasticity consistent (HC) estimators for Ψ have been suggested and can be constructed by plugging an estimate of ˆΩ = diag(ˆω1, · · · , ˆωn) into equation

(12). These estimators differ in the choice of ˆωi, given as follows:

(10)

HC0: ˆωi = e2i, (14) HC1: ˆωi = n n − pˆe 2 i, (15) HC2: ˆωi = e 2 i 1 − hii , (16) HC3: ˆωi = e2 i (1 − hii)2 , (17) HC4: ˆωi = e2 i (1 − hii)δi , (18)

where δi = min{4, hii/¯h}, and ¯h is the average of hii, i = 1, · · · , n.

Equation (13) yields the standard estimator ˆΨ for homoscedastic errors; the other all lead to different kinds of HC estimators. The estimators HC1, HC2, and HC3, according to MacKinnon and White (1985), improve the performance in small sam-ples. Long and Ervin (2000) conduct an extensive simulation study based on sample samples and conclude that HC3 provides the best performance. Cribari-Neto (2004) instead recommends the estimator HC4, which takes into account the leverage effect of the design matrix. Zeileis (2004) provides an R package for all these HC estimates.

3.1

Robust HCCM estimator

It is well-established that HCCM is influenced by outliers, especially the leverage points (e.g., Cribari-Neto (2004)). All the estimates in equations (13) to (18), based on OLS, provide consistent results under heteroscedasticity, but this property may vanish when outliers exist in the data. This subsection therefore contains the robust estimation for (12) that avoids the influence of outliers, using the approach discussed in Section 2.

With the WLAD estimate, it is possible to consider the choices of ˆωi, which are

analogous to those of classical HCCM as follows:

RHC0: ˆωi = ri2, (19) RHC1: ˆωi = n n − p − 1r 2 i, (20) RHC2: ˆωi = r2 i 1 − hwii , (21)

(11)

RHC3: ˆωi = r

2

i

(1 − hwii)2

. (22)

In these cases, ri is the ith residual based on the weighted L1 estimate in equation

(3), and hwii denotes the ith diagonal element of the weighted hat matrix Hw.

3.2

Simulation study

A examination of the capability of the proposed robust test procedure simulates model (1) under heteroscedasticity. The data generation follows the method described in Subsection 2.3, with a focus on the problem of bad leverage points. The good data can be generated by the following model:

y = β0+ β1x1+ · · · + β4x4+ ²,

where the parameter β0 = 20, and all other β are 1. The values of x1can be generated

from a uniform distribution with values between 0 and 10, and the other x variables have values between 0 and 20. To simplify the study, only one explanatory variable,

x1, is related to the error function, namely, ² ∼ N(0, exp(σ2i)), where σi = δ0+ δ1x1i.

The values of the parameters are set to δ0 = 0.001 and δ1 = 0.0, 0.3, 0.6, or 0.9

(these different values of δ1 are denoted by data types a, b, c, and d, respectively, in

the subsequent discussion). The differing values of δ1 produce a constant error and

a relatively moderate to very severe degree of heteroscedasticity. The “bad” data derive from a multivariate distribution with the following form:

à xT i yi ! ∼ MN Ãà xT m ym− 20 ! , diag(0.252, · · · , 0.252) ! ,

where xm is the maximum value of good x2 plus 10, and ym denotes the smallest

values of good yi. The adjustment of xm and ym allows for more distinct distances

between bad and good data when heteroscedasticity is more severe in the data. The sample sizes are 50, 100, 200, and 400, and each dataset contains 0%, 5%, 10%, 15%, and 20% outliers. One thousand replications compare the coverage of

β when the robust results from equations (19) to (22) help estimate equation (12).

The comparison of the ratio of the number of tests H0 : βj = β0j, j = 0, 1, · · · , 4,

(12)

the average of the empirical p-values for tests of βj according to different approaches,

proportions of outliers, sample sizes, and the severity of the heteroscedasticity. All classical HCCM using (14) to (18) are spoiled by outliers, though they may have good properties without any single outlier in the data, as discussed in Long and Ervin (2000). Therefore, only the results pertaining to HC3 are reported here.

The robust standard error of the MM estimator proposed by Croux et al. (2003) provides reasonable results when the degree of heteroscedasticity is not severe and/or the proportions of outliers are not too large in the data. The sample size is a factor that influences the behavior of the test with regard to data types c and d with the

MM estimate. The different versions of the robust HCCM from equations (19) to

(22) supply similar results, regardless of the configurations of the data structure. All empirical p-values are close to 0.05. These four robust HCCM yield almost the same results for the same dataset, whereas the classical HCCM in equations (14) to (18) produce quite varied outcomes. The quasi-t tests using the robust HCCM also can resist bad leverage points.

References

Aitkin, M., 1987. Modelling variance heterogeneity in normal regression using GLIM. Applied Statistics 36, 332-339.

Atkinson, A.C., 1985. Plots, Transformations and Regression. Oxford University Press, Oxford.

Atkinson, A.C., Riani, M., 2000. Robust Diagnostic and Regression Analysis. Springer, New York.

Bianco, A., Boente, G., di Rienzo, J., 2000. Some results for robust GM-based estimators in heteroscedastic regression models. Journal of Statistical Planning and Inference 89, 215-242.

Bera, A.K., Surprayitno, T., Premaratne, G., 2002. On some heteroskedasticity-robust estimators of variance matrix of the least-squares estimators. Journal of

(13)

Statistical Planning and Inference 108, 121-136.

Butler, R.W., Davies, P.L., Jhun, M., 1993. Asymptotics for the minimum covari-ance determinant estimator. The Annals of Statistics 21, 1385-1400.

Cheng, T.-C., 2010. Discussion: the forward search: theory and data analysis. Journal of the Korean Statistical Society 39, 153-159.

Chesher, A., Jewitt, I., 1987. The bias of a heteroskedasticity consistent covariance matrix estimator. Econometrica 55, 1217-1222.

Cook, R. D., 1986. Assessment of local influence (with discussion). Journal of the Royal Statistical Society B 48, 133-169.

Cook, R.D., Weisberg, S., 1983. Diagnostics for heteroscedasticity in regression. Biometrika 70, 1-10.

Cooper, D.M., Thompson, R., 1977. A note on the estimation of the parameters of the autoregressive-moving average process. Biometrika 64, 625-628.

Cribari-Neto, F., 2004. Asymptotic inference under heteroskedasticity of unknown form. Computational Statistical and Data Analysis 45, 215-233.

Cribari-Neto, F., Zarkos, S.G., 2001. Heteroskedasticity-consistent covariance ma-trix estimation: White’s estimator and the bootstrap. Journal of Statistical Computation and Simulation 68, 391-411.

Cribari-Neto, F., Zarkos, S.G., 2004. Leverage-adjusted heteroskedastic bootstrap methods. Journal of Statistical Computation and Simulation 74, 215-232. Croux, C., Dhaene, G., Hoorelbeke, D., 2003. Robust standard errors for robust

estimators. Discussion Papers Series 03.16, K. U. Leuven, CES.

Croux, C., Haesbroeck, G., 1999. Influence function and efficiency of the mini-mum covariance determinant scatter matrix estimator. Journal of Multivariate Analysis 71, 161-190.

(14)

Draper, N.R., Smith, H., 1998. Applied Regression Analysis, 3rd ed., New York: John Wiley.

Ellis, S.P., Morgenthaler, S., 1992. Leverage and breakdown in L1-regression.

Jour-nal of the American Statistical Association 87, 143-148.

Furno, M., 1997. A robust heteroskedasticity consistent covariance matrix estimator. Statistics 30, 201-219.

Giloni, A., Simonoff, J.S., Sengupta, B., 2006. Robust weighted LAD regression. Computational Statistical and Data Analysis 50, 3124-3140.

Greene, W., 1997. Econometric Methods, 3rd ed., Prentice-Hall, Englewood Cliffs, NJ.

Hallina, M., Mizera, I., 2001. Sample heterogeneity and M-estimation. Journal of Statistical Planning and Inference 93, 139-160.

Harvey, A.C., 1976. Estimating regression models with multiplicative heteroscedas-ticity. Econometrika 38, 375-386.

Harville, D.A., 1974. Bayesian inference for variance components using only error contrasts. Biometrika 61, 383-385.

Horn, S.D., Horn R.A., Duncan, D.B., 1975. Estimating heteroscedastic variances in linear model. Journal of the American Statistical Association 70, 380-385. Hubert, M., Rousseeuw, P.J., 1997. Robust regression with both continuous and

binary regressors. Journal of Statistical Planning and Inference 57, 153-163. Kempthorne, P.J., Mendel, M.B., 1990. Comment, Journal of the American

Statis-tical Association 85, 647-648.

Koenker, R., Bassett, G., 1978. Regression quantiles. Econometrica 46, 211-244. Long, J.S., Ervin, L.H., 2000. Using heteroskedasticity consistent standard errors in

(15)

MacKinnon, J.G., White, H., 1985. Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of Econo-metrics 29, 305-325.

Maronna, R.A., Yohai, V.J., 2000. Robust regression with both continuous and categorical predictors. Journal of Statistical Planning and Inference 89, 197-214.

Patterson, H.D., Thompson, R., 1971. Recovery of inter-block information when block sizes are unequal. Biometrika 54, 545-554.

Rousseeuw, P.J., 1984. Least median of squares regression. Journal of the American Statistical Association 79, 871-880.

Rousseeuw, P.J., Leroy, A.M., 1987. Robust Regression and Outlier Detection. John Wiley, New York.

Rousseeuw, P.J., van Driessen, K., 1999. A fast algorithm for the minimum covari-ance determinant estimator. Technometrics 41, 212-223.

Rousseeuw, P.J., van Zomeren, B.C., 1990. Unmasking multivariate outliers and leverage points (with discussion). Journal of the American Statistical Associa-tion 85, 633-651.

Verbyla, A.P., 1993. Modelling variance heterogeneity: residual maximum likelihood and diagnostics. Journal of the Royal Statistical Society B 55, 493-508.

Wei, B.-C., Hickernell, F.J., 1996. Regression transformation diagnostics for ex-planatory variables. Statistica Sinica 6, 433-454

Weisberg, S., 1980. Applied Linear Regression. John Wiley, New York.

White, H., 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48, 817-838.

(16)

Woodruff, D.L., Rocke, D.M., 1994. Computable robust estimation of multivariate location and shape in high dimension using compound estimators. Journal of the American Statistical Association 89, 888-896.

Yohai, V.J., 1987. High breakdown-point and high efficiency estimates for regression. The Annals of Statistics 15, 642-665.

Zeileis, A., 2004. Econometric computing with HC and HAC covariance matrix estimators. Journal of Statistical Software 11, 1-17.

Zhou, K.Q., Portnoy, S.L., 1998. Statistical inference on heteroscedastic models based on regression quantiles. Journal of Nonparametric Statistics 10, 239-260.

(17)

0 5 10 15 20 0 5 10 15 20 x y (a)

good leverage point

vertical outlier

bad leverage point

28 29 30 7 13 5 12 14 25 2 0 1 2 3 4 −5 0 5 RDi ti * (b) 28 29 30 x x x x x x xx x x x x xx x x x x x xx x x xx xx x x x 0 5 10 15 20 25 30 0 2 4 6 8 ti * (c) 20 10 23 16 3 1 19 2218 6 26 4 11 15 27 21 17 24 9 8 25142 12 5 28 13 730 29 x x x x x x x x x x x x x x x x x x x x x x xx x x x x x x 0 5 10 15 20 25 30 0 10 20 30 40 50 60 ti * (d) 20102316 31 192218 6 26 4 11 15 27 211724 9 8 2514 2 12 528 13 7 30 29

(18)

5 10 15 20 5 10 15 x y (a) 0 1 2 3 4 5 6 7 −2 −1 0 1 2 3 RDi ti (b) 0 10 20 30 40 50 −10 −5 0 Index ti * (c) xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxx x xxxxxxxxxxx x 0 10 20 30 40 50 0 5 10 15 20 ti * 18364 614372616 34 21535 19 11 2232 17 7 205 27 1238 21 30 28 10 25 24 8 1 3129 33 133 9 23 42 48 47 44 45 46 41 49 39 50 43 40 (d)

Figure 2: Simulated data with bad leverage points: (a) scatter plot; (b) diagnostic plot based on MM estimate; (b) standardized WLAD residual plot; (d) jigsaw plot.

(19)

observed probability of type I error por tion of outliers 0 0.05 0.1 0.15 0.2 0.0 0.2 0.4 0.6 0.8 1.0 a HCCM3 b HCCM3 0.0 0.2 0.4 0.6 0.8 1.0 c HCCM3 d HCCM3 0 0.05 0.1 0.15 0.2 a MM b MM c MM d MM 0 0.05 0.1 0.15 0.2 a RHCCM0 b RHCCM0 c RHCCM0 d RHCCM0 0 0.05 0.1 0.15 0.2 a RHCCM1 b RHCCM1 c RHCCM1 d RHCCM1 0 0.05 0.1 0.15 0.2 a RHCCM2 b RHCCM2 c RHCCM2 d RHCCM2 0 0.05 0.1 0.15 0.2 a RHCCM3 0.0 0.2 0.4 0.6 0.8 1.0 b RHCCM3 c RHCCM3 0.0 0.2 0.4 0.6 0.8 1.0 d RHCCM3 n=50 n=100 n=200 n=400

Figure 3: Average of empirical p-values for t tests based on different HCCM estimates and approaches

(20)

Monitoring Profile Based on a Linear Regression Model with

Correlated Errors

Tsung-Chi Cheng

Department of Statistics

National Chengchi University

64 Zhih-Nan Road, Section 2

Taipei 11623, Taiwan

E-mail: chengt@nccu.edu.tw

(21)

Profile monitoring

Profile monitoring is the use of control charts for cases in

which the quality of a process or product can be characterized

by a functional relationship between a response variable and

one or more explanatory variables.

Linear regression model: Kang and Albin (2000); Kim

et al.

(2003); Mahmoud and Woodall (2004); Wang and Tsung

(2005); Gupta

et al.

(2006); Mahmoud

et al.

(2006)

Nonparametric regression model: Zhou

et al.

(2007)

Nonlinear mixed model: Jensen et al. (2008); Jensen and

Birch (2009)

General: Woodall

et al.

(2004); Woodall (2007)

A simple linear profile with AR(1) error: Noorossana

et al.

(22)

Linear regression model with ARMA errors

Consider the linear regression model

y

t

= x

Tt

β

+ 

t

,

t = 1, 2, · · · , T

(1)

where

y

t

is the response variable,

x

t

is

k × 1

vector of explanatory variables, and

β

is

a vector of unknown parameters.

The random error



t

follows an ARMA process, which can be expressed as

Φ(B)

t

= Θ(B)ν

t

,

(2)

where

Θ(B)

=

1 + θ

1

B + . . . + θ

q

B

q

Φ(B)

=

1 − ϕ

1

B − ϕ

2

B

2

− . . . − ϕ

p

B

p

and

ν

t

∼ W N (0, σ

2

)

.

(23)

Maximum likelihood estimation

The computation for the estimates can be easily

implemented by means of converting models

(1) and (2) into the state space form and

applying the Kalman filter recursive approach.

See Harvey (1989); Durbin and Koopman

(2001).

(24)

Hotelling’s

T

2

test for coefficients

To monitor the departure of coefficients,

δ

= {β

0

, β

1

, · · · , β

k

, φ

1

, · · · , φ

p

, θ

1

, · · · , θ

q

}

,

from the profile models (1) and (2) applied to

m

datasets, the analogous Hotelling’s

T

2

test is then

T

1i2

= (ˆ

δ

i

− ¯

δ)

T

−1

δ

i

− ¯

δ), i = 1, 2, · · · , m,

where

δ

ˆ

i

denotes the estimate of

δ

for the

i

th dataset,

δ

¯

entails the averages of all

ˆ

δ

i

’s, and

∆ =

m

X

i=1

δ

i

− ¯

δ

)(ˆ

δ

i

− ¯

δ

)

T

/(m − 1)

.

The

100(1 − α)

percentile of the

F

distribution can be used to construct an upper

control limit (UCL) represented by

r(T −1)

(T −r)

F

α,r,T −r

for phase I control chart

r(T +1)(T −1)

T(T −r)

F

α,r,T −r

for phase II monitoring scheme

(25)

T

2

test based on residuals

If

e

i

denotes the

T × 1

residual vector for the

i

th dataset

and

σ

ˆ

i

2

is the corresponding estimate of

σ

2

for dataset

i

,

then we check the stability of the variance,

σ

2

, in the

profile using the following test statistic,

T

2i

2

= (e

i

− 0)

T

Σ

−1

e

(e

i

− 0), i = 1, 2, · · · , m,

where

Σ

e

= ¯

σ

2

I

,

σ

¯

2

is the average of all

σ

ˆ

i

2

’s, and

I

is the

identity matrix.

The UCL for this test statistic is

χ

2

α,T −1

, which denotes the

100(1 − α)

percentile of the

χ

2

distribution with

T − 1

(26)

Simulation study

Consider the following model

y

t

=

β

0

+ β

1

x

1t

+ β

2

x

2t

, t = 1, 2, · · · , T,



t

=

φ

1



t−1

+ φ

2



t−2

+ φ

3



t−3

+ ν

t

where

ν

t

∼ W N (0, σ

2

)

.

Given that

β

0

= β

2

= 1

,

φ

2

= 0.1

and

φ

3

= −0.1

, we focus on evaluating the impact

of changes in

β

1

,

φ

1

, and

σ

2

on the monitoring profile.

There are 50000 replicates used for Phase II diagnostic monitoring, while 20000

replicates are carried out for Phase I control chart scheme.

For the latter, both values of

β

1

and

σ

2

are assigned to be 1, while the values of

φ

1

vary from -0.6 and 0.6 to avoid non-stationary series occurring in the data generating

process.

(27)

In-control ARL

The combination of

T

1

2

and

T

2

2

control chart

schemes is considered to yield an overall

in-control average run length (ARL) of

approximately 185.

The overall in-control ARL can be calculated by

1/ARL

overall

= 1 − (1 − α

1

)(1 − α

2

)

, where

α

1

and

α

2

denote the probability of committing

false alarms for

T

1

2

and

T

2

(28)

The simulated ARL values under the change of

β

1

from 1

β

1

φ

1

1

1.02

1.04

1.06

1.08

1.10

T = 150

-0.6

145.349

19.231

1.588

1.007

1.000

1.000

-0.4

158.228

29.851

2.249

1.044

1.000

1.000

-0.2

178.571

46.685

3.618

1.194

1.004

1.000

0.0

175.439

66.225

7.198

1.663

1.053

1.001

0.2

171.233

79.872

13.221

2.685

1.257

1.022

0.4

190.114

107.527

25.253

5.247

1.882

1.164

0.6

170.068

125.313

40.783

10.156

3.181

1.573

T = 300

-0.6

177.305

4.224

1.011

1.000

1.000

1.000

-0.4

188.679

6.973

1.068

1.000

1.000

1.000

-0.2

177.936

11.743

1.249

1.001

1.000

1.000

0.0

188.679

20.309

1.750

1.016

1.000

1.000

0.2

187.266

34.916

2.962

1.116

1.001

1.000

0.4

194.553

54.171

5.503

1.468

1.031

1.000

0.6

181.159

68.213

9.750

2.190

1.152

1.010

(29)

The simulated ARL values under the varying values of

φ

1

φ

1

T

φ

1

0

-0.1

-0.2

-0.3

-0.4

-0.5

-0.6

150

-0.6

1.000

1.001

1.024

1.283

3.089

19.216

147.929

-0.4

1.149

1.968

6.539

43.365

180.505

112.867

16.706

-0.2

10.301

60.976

159.236

108.225

20.938

3.544

1.305

300

-0.6

1.000

1.000

1.000

1.003

1.282

7.504

142.450

-0.4

1.000

1.056

2.058

17.094

177.936

39.777

2.302

-0.2

2.802

27.337

189.394

44.326

3.294

1.111

1.001

0

0.1

0.2

0.3

0.4

0.5

0.6

150

0.0

175.439

78.125

12.994

2.935

1.317

1.032

1.001

0.2

13.203

76.453

149.254

77.760

12.713

2.621

1.206

0.4

1.260

2.460

9.651

60.168

166.113

79.365

10.231

0.6

1.000

1.006

1.094

1.709

5.572

38.580

163.399

300

0.0

188.679

33.201

3.043

1.142

1.002

1.000

1.000

0.2

3.128

32.489

171.821

34.626

2.903

1.097

1.001

0.4

1.001

1.094

2.511

24.606

191.571

31.726

2.169

(30)

The simulated ARL values under the shifts of

σ from 1

σ

φ

1

1

1.1

1.2

1.3

1.4

1.5

T = 150

-0.6

145.349

5.677

1.415

1.034

1.001

1.000

-0.4

158.228

5.578

1.399

1.034

1.001

1.000

-0.2

178.571

5.648

1.410

1.033

1.002

1.000

0.0

175.439

5.752

1.411

1.033

1.002

1.000

0.2

171.233

5.633

1.407

1.032

1.001

1.000

0.4

190.114

5.476

1.399

1.032

1.001

1.000

0.6

170.068

5.725

1.412

1.033

1.001

1.000

T = 300

-0.6

177.305

2.619

1.040

1.000

1.000

1.000

-0.4

188.679

2.571

1.040

1.000

1.000

1.000

-0.2

177.936

2.588

1.040

1.000

1.000

1.000

0.0

188.679

2.614

1.041

1.000

1.000

1.000

0.2

187.266

2.605

1.040

1.000

1.000

1.000

0.4

194.553

2.582

1.041

1.000

1.000

1.000

0.6

181.159

2.618

1.041

1.000

1.000

1.000

(31)

Conclusions from simulation

The test statistics are sensible to the alterations

of the coefficients, namely

β

1

,

φ

1

, and

σ

2

.

The ARL value is more sensible for the

difference when

T = 150

than that for

T = 300

.

The detection of the change in

β

1

may depend

on the values of

φ

1

.

The pattern about how this differs may

depend on more factors, such as the

complexity of error function (2) and/or shifts

in different parameters simultaneously.

To verify this, more simulations should be

expected.

(32)

Real data illustration

The “Babyfinder” is a device designed to detect if any event of concern occurs.

For example, it could be widely used in healthcare, warehouse, for baby sisters,

heart trouble patients, stolen bicycles, etc.

It includes transceiver and receiver.

Once there is a distance between transceiver and receiver, a signal strength is

generated.

The signal strength is called Received Signal Strength Indicator (RSSI), a

measurement of the power presents in a received radio signal (measured in

decibels, dBs), in wireless communication technology.

In wireless communication theory, the functional relationship between RSSI and

distance should be expressed by the model:

RSSI = a + b log (distance),

where a is the intercept and b is the slope.

(33)

Phase I control chart

The study problem of Babyfinder is to analyze the behavior of the wireless signal

strength through various action models.

The occurring events would change the functional relationship of RSSI and

distance.

Hence, it is important to effectively detect if the functional relationship of RSSI

and distance has changed.

Suppose that the Babyfinder is developed to protect a bicycle from being stolen.

To collect the data of RSSI under specified distances, seventeen no-stolen

experiments (or in-control experiments) are designed, which result from different

situations and environments.

The following regression model is first applied to analyze these datasets:

y

t

= β

0

+ β

1

log(x

t

) + 

t

,

t = 1, 2, · · · , 147,

(34)

      (a)      (b) 

Figure 1. Babyfinder: (a) Transceiver and (b) receiver                                     (a)      (b)              (c) (d)  

Figure 2. The pictures of an experiment: (a) the receiver is in the owner’s bag; (b) the transceiver in on the bicycle; (c) the rope and the marked distances; (d) the owner leaves the bicycle and walks ahead.

(35)

Line charts

0 1 2 3 4

50

60

70

80

90

100

110

log(x)

y

(36)

Residuals of fitting a regression with constant errors

0 10 20 30 40 50 60 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF

(a−1) Dataset A residuals

0 10 20 30 40 50 60 0.0 0.2 0.4 0.6 Lag Partial ACF

(a−2) Dataset A residuals

0 10 20 30 40 50 60 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF (b−1) Dataset B residuals 0 10 20 30 40 50 60 −0.2 0.0 0.2 0.4 0.6 Lag Partial ACF (b−2) Dataset B residuals 0 10 20 30 40 50 60 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF (c−1) Dataset C residuals 0 10 20 30 40 50 60 −0.1 0.0 0.1 0.2 Lag Partial ACF (c−2) Dataset C residuals

(37)

Profile model

A candidate profile model is then developed to

analyze these datasets:

y

t

= β

0

+ β

1

log(x

t

) + 

t

, t = 1, 2, · · · , 147,



t

= φ

1



t−1

+ φ

2



t−2

+ φ

3



t−3

+ ν

t

where

ν

t

∼ W N (0, σ

2

(38)

Residuals of fitting a regression with AR(3) errors

0 10 20 30 40 50 60 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF

(a−1) Dataset A residuals

0 10 20 30 40 50 60 −0.15 −0.05 0.00 0.05 0.10 0.15 Lag Partial ACF

(a−2) Dataset A residuals

0 10 20 30 40 50 60 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF (b−1) Dataset B residuals 0 10 20 30 40 50 60 −0.15 −0.05 0.00 0.05 0.10 0.15 Lag Partial ACF (b−2) Dataset B2 residuals 0 10 20 30 40 50 60 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF (c−1) Dataset C residuals 0 10 20 30 40 50 60 −0.15 −0.05 0.00 0.05 0.10 0.15 Lag Partial ACF (c−2) Dataset C residuals

(39)

Plots of

T

2

statistics

Dataset

T

1 2 A B C D E F G H I J K L M N O P Q

0

5

10

15

20

(a) Dataset

T

2 2 A B C D E F G H I J K L M N O P Q

0

50

100

150

200

250

(b)

(40)

Boxplots of residuals

A B C D E F G H I J K L M N O P Q

−10

−5

0

5

10

residual

(41)

Phase II monitoring scheme

To evaluate the capability of the proposed

approaches, Eleven stolen experiments (or

out-of-control experiments) are designed by the

occurring special causes, such as moving

(42)

Plots of

T

2

statistics under out-of-control cases

Dataset

T

1 2 a b c d e f g h i j k

0

20

40

60

80

Dataset

T

2 2 a b c d e f g h i j k

0

50

100

150

200

250

(43)
(44)

98 年度專題研究計畫研究成果彙整表

計畫主持人:鄭宗記 計畫編號: 98-2118-M-004-006-計畫名稱:異質變異數矩陣之穩健估計 量化 成果項目 實際已達成 數(被接受 或已發表) 預期總達成 數(含實際已 達成數) 本計畫實 際貢獻百 分比 單位 備 註 ( 質 化 說 明:如 數 個 計 畫 共 同 成 果、成 果 列 為 該 期 刊 之 封 面 故 事 ... 等) 期刊論文 0 1 100% 研究報告/技術報告 0 0 100% 研討會論文 1 0 100% 篇 論文著作 專書 0 0 100% 申請中件數 0 0 100% 專利 已獲得件數 0 0 100% 件 件數 0 0 100% 件 技術移轉 權利金 0 0 100% 千元 碩士生 0 0 100% 博士生 0 0 100% 博士後研究員 0 0 100% 國內 參與計畫人力 (本國籍) 專任助理 0 0 100% 人次 期刊論文 0 0 100% 研究報告/技術報告 0 0 100% 研討會論文 0 0 100% 篇 論文著作 專書 0 0 100% 章/本 申請中件數 0 0 100% 專利 已獲得件數 0 0 100% 件 件數 0 0 100% 件 技術移轉 權利金 0 0 100% 千元 碩士生 0 0 100% 博士生 0 0 100% 博士後研究員 0 0 100% 國外 參與計畫人力 (外國籍) 專任助理 0 0 100% 人次

(45)

其他成果

(

無法以量化表達之成 果如辦理學術活動、獲 得獎項、重要國際合 作、研究成果國際影響 力及其他協助產業技 術發展之具體效益事 項等,請以文字敘述填 列。) 無 成果項目 量化 名稱或內容性質簡述 測驗工具(含質性與量性) 0 課程/模組 0 電腦及網路系統或工具 0 教材 0 舉辦之活動/競賽 0 研討會/工作坊 0 電子報、網站 0 目 計畫成果推廣之參與(閱聽)人數 0

(46)
(47)

國科會補助專題研究計畫成果報告自評表

請就研究內容與原計畫相符程度、達成預期目標情況、研究成果之學術或應用價

值(簡要敘述成果所代表之意義、價值、影響或進一步發展之可能性)

、是否適

合在學術期刊發表或申請專利、主要發現或其他有關價值等,作一綜合評估。

1. 請就研究內容與原計畫相符程度、達成預期目標情況作一綜合評估

■達成目標

□未達成目標(請說明,以 100 字為限)

□實驗失敗

□因故實驗中斷

□其他原因

說明:

2. 研究成果在學術期刊發表或申請專利等情形:

論文:□已發表 ■未發表之文稿 □撰寫中 □無

專利:□已獲得 □申請中 ■無

技轉:□已技轉 □洽談中 ■無

其他:(以 100 字為限)

已將本研究成果撰寫成論文送至國際期刊審稿中。

3. 請依學術成就、技術創新、社會影響等方面,評估研究成果之學術或應用價

值(簡要敘述成果所代表之意義、價值、影響或進一步發展之可能性)(以

500 字為限)

本研究藉由加權最小絕對離差(weighted least absolute deviation,WLAD)估計法,成 功地利用圖形方式區分資料之異質性與離群值;就文獻及方法而言,有其極大的突破與貢 獻。本文之結果,在資料分析上提供一個有利的工具;預期本文將可於統計計算方面的國 際期刊發表。

數據

Figure 1: Data types: (a) scatter plot; (b) diagnostic plot; (c) jigsaw plot using (7);
Figure 2: Simulated data with bad leverage points: (a) scatter plot; (b) diagnostic plot based on MM estimate; (b) standardized WLAD residual plot; (d) jigsaw plot.
Figure 3: Average of empirical p-values for t tests based on different HCCM estimates and approaches
Figure 2. The pictures of an experiment: (a) the receiver is in the owner’s bag; (b) the  transceiver in on the bicycle; (c) the rope and the marked distances; (d) the owner leaves  the bicycle and walks ahead

參考文獻

相關文件

Project 1.3 Use parametric bootstrap and nonparametric bootstrap to approximate the dis- tribution of median based on a data with sam- ple size 20 from a standard normal

In particular, we present a linear-time algorithm for the k-tuple total domination problem for graphs in which each block is a clique, a cycle or a complete bipartite graph,

In this paper, we propose a practical numerical method based on the LSM and the truncated SVD to reconstruct the support of the inhomogeneity in the acoustic equation with

Based on [BL], by checking the strong pseudoconvexity and the transmission conditions in a neighborhood of a fixed point at the interface, we can derive a Car- leman estimate for

Monopolies in synchronous distributed systems (Peleg 1998; Peleg

Corollary 13.3. For, if C is simple and lies in D, the function f is analytic at each point interior to and on C; so we apply the Cauchy-Goursat theorem directly. On the other hand,

Corollary 13.3. For, if C is simple and lies in D, the function f is analytic at each point interior to and on C; so we apply the Cauchy-Goursat theorem directly. On the other hand,

• elearning pilot scheme (Four True Light Schools): WIFI construction, iPad procurement, elearning school visit and teacher training, English starts the elearning lesson.. 2012 •