國
立
交
通
大
學
統計學研究所
碩
士
論
文
由混合變異建構基因表現分析之無母數檢定
Nonparametric Test based on Combined Variation
for Gene Expression Analysis
研究生:連紫汝
指導教授:陳鄰安 博士
Nonparametric Test based on Combined Variation for Gene Expression Analysis
研 究 生:連紫汝 ..Student:Tzu-Ju Lian 指導教授 :陳鄰安 Advisor:Dr. Lin-An Chen
國 立 交 通 大 學 統計學研究所
碩 士 論 文
A Thesis
Submitted to Institute of Statistics College of Science National Chiao Tung University
in partial Fulfillment of the Requirements for the Degree of Master
in Statistics June 2011
Hsinchu, Taiwan, Republic of China
i
由混合變異建構基因表現分析之無母數檢定
研究生:連紫汝 指導教授:陳鄰安博士 國立交通大學統計學研究所 摘要 在致病基因的檢測問題上,因為 Tomlins et al.(2005)的發現使得 探討離群分配變成一個重要主題。不同於離群平均只能檢測中心位置 之改變,我們提出一個統計量它同時可以檢測中心與離心兩種變異。 這個統計量還有一個好處。由它所建立的檢定統計量不用估計未知分 配的密度函數值。我們利用模擬分析比較了幾種檢定方法的檢力並且 做了比較。我們也進一步做了一個簡單的實際資料分析。ii
Gene Expression Analysis
Student:Tzu-Ju Lian Advisor:Dr. Lin-An Chen
Institute of Statistics
National Chiao Tung University
SUMMARY
Observed by Tomlins et al. (2005), detection of the shift for outlier- distribution is a new topic useful in gene expression analysis. Alternative to the outlier mean test, we introduce a nonparametric statistic that can simultaneously detect the location shift and variation shift in the outlier distribution. There is an advantage, comparing with the outlier mean, that the test based on this statistic requires no prediction of distributional densities. Comparisons of this test statistic with some other methods in terms of mean square errors for estimation of their population parameters and powers for their abilities in detection of disease genes are simulated and displayed. Finally, a simple real data analysis is also performed and presented.
iii
誌 謝
畢業將至,回想起兩年的研究所生活,除了在專業上更精進之外,更 獲得許多學業以外的知識。 首先,最要感謝陳鄰安老師,謝謝您這些日子來的指導與教誨,從您 身上我學到做學問的態度,啟發我對研究的熱忱,都是支持我繼續努力的 動力。除此之外,老師也時常教導我許多人生哲理,使我獲益良多。一句 謝謝您,代表我最深的敬意。感謝口試委員許文郁老師、謝文萍老師以及 黃冠華老師對論文的指導與建議,使論文能更趨完善。 感謝所有的同學,彼此間的互相打氣,與論文奮戰的每一天,你們的 陪伴及鼓勵,都支持著我繼續努力下去,也因為有你們,讓我的碩士生涯 更多采多姿。謝謝親愛的室友們,不論在研究上、或在生活上妳們都是我 最佳的傾聽者,謝謝妳們無私的關心。感謝所辦的郭姐,協助我們處理大 小事務,讓我們可以專心做研究。感謝家人的支持,讓我能無後顧之憂的 繼續唸書,當我最強力的後盾。 謝謝每個曾經幫助過我的人,在此致上最深的謝意。 連紫汝 謹誌于 國立交通大學統計學研究所 中華民國 一百 年 六 月iv 中文提要 ... i 英文提要 ... ii 誌謝 ... iii 目錄 ... iv 1. Introduction ... 1
2. Combined Outlier Quantity... 2
3. The Test based on Combined Outlier Quantity ... 7
4. Power Comparison by Simulation and a Simple Real Data Analysis ... 10
5. Appendix ... 12
Nonparametric Test based on Combined Variation
for Gene Expression Analysis
SUMMARY
Observed by Tomlins et al. (2005), detection of the shift for outlier-distribution is a new topic useful in gene expression analysis. Alternative to the outlier mean test, we introduce a nonparametric statistic that can simultaneously detect the location shift and variation shift in the outlier distribution. There is an advantage, comparing with the outlier mean, that the test based on this statistic requires no prediction of distributional densities. Comparisons of this test statistic with some other methods in terms of mean square errors for estimation of their population parameters and powers for their abilities in detection of disease genes are simulated and displayed. Finally, a simple real data analysis is also performed and presented.
Key words: Gene expression analysis Outlier mean Outlier sum t-test.
1. Introduction
DNA microarray technology, which simultaneously probes thousands of gene expression proles, has been successfully used in medical research for disease classication (Agrawal et al. (2002) Alizadeh et al. (2000) Ohki et al. (2005)) Sorlie et al. (2003)). Among the existed techniques in dieren-tial genes detection, common statistical methods for two-group comparisons such ast-test, are not appropriate due to a large number of genes expressions and a limited number of subjects available. Several statistical approaches have been proposed to identify those genes where only a subset of the sam-ple genes has high expression. Among them, Tomlins et al. (2005) observed that there is small number of outliers in samples of dierential genes and then introduced a method called cancer outlier prole analysis that identies outlier proles by a statistic based on the median and the median absolute deviation of a gene expression prole. With this observation, a sequence of approaches then concentrated on detecting dierential genes based on out-lier samples while Tibshirani and Hastie (2007) and Wu (2007) suggested to
TypesetbyA M
S-T E
use an outlier sum, the sum of all the gene expression values in the disease group that are greater than a specied cuto point. The common disad-vantage of these techniques is that the distribution theory of the proposed methods has not been discovered so that the distribution based p value can not been applied. Recently Chen, Chen and Chan (2010) considered the outlier mean (average of outlier sum) and developed its large sample theory that allows us to formulate thepvalue based on its asymptotic distribution. For evaluation, they performed simulation studies in a parametric study by specifying the normal distribution. Although the outlier sum or outlier mean is shown interesting in detection of inuential genes through statistical anal-ysis and some real data analanal-ysis, however, these techniques can detect only the location shift in the outlier distribution, not the distributional variation. We propose a statistic that can detect simultaneously the location shift and variation shift of the outlier distribution that is generalized from the combined control chart applied in quality control (see Cheng and Thaga (2006) for a review). In Section 2, we present the reasons for the need for the combined outlier quantity. In Section 3, we introduce an asymptotic dis-tribution for the combined outlier quantity and use this theory to introduce a new test for gene expression analysis where a discussion of power based on this new test is given. In Section 4, a comparison between this test and a test combined from the outlier mean and outlier variance is given. Finally, the proofs of theorems are provided in Section 5.
2. Combined Outlier Quantity
In a general study that consists of n1 subjects in the normal control
group and n2 subjects in the disease group, suppose that there are m genes
to be investigated. Their gene expression can be represented as Xiji =
12:::n1j = 1:::m for normal control group and Yiji = 12:::n2j =
12:::m for the disease group. However, in our study, we restrict on one gene with expression variableX for group of normal subject and expression variable Y for group of disease subject where the distribution functions for them are FX and FY respectively. We assume that we have observations
An important observation by Tomlins et al. (2005) from a study of prostate cancer, outlier genes are over-expressed only in a small number of disease samples. With dening a cuto point ^ determined from the data of the variable X, Tibshirani and Hastie (2007) and Wu (2007) con-sidered the sum of variables Y0
is that are over higher cuto point ^ given
by Pn
2
i=1YiI(Yi ^) as a test statistic for detection if the disease group
distribution is dierent from the normal group distribution. Latter Chen, Chen and Chan (2010) developed the asymptotic distribution for its aver-age, called the outlier mean, Yout = (Pn
2 i=1I(Yi ^)) ;1 Pn 2 i=1YiI(Yi ^)
for constructing a distribution based p value. In this paper, we choose
=F;1
X (), the population th quantile, and ^ = ^F;1
X (), the th
empiri-cal quantile from the sample X1:::Xn
1. Then, the population type outlier
means for distributions of X and Y are
Xout =E(XjX F ;1
X ()) and Yout =E(YjY F ;1
X ()) (2.1)
and the population type outlier variances are
2 Xout =V ar(XjX F ;1 X ()) and 2 Yout =V ar(YjY F ;1 X ()): (2.2)
The outlier mean based analysis is to test if Yout is statistically dierent
from Xout and the outlier variance based analysis is to test if 2
Yout is
statistically dierent from 2
Xout.
For the following two distribution settings, Normal :X N(01)Y N(
2) = 0:5
Mixed normal :X N(01)Y 0:9N(01) + 0:1N(
2) = 0:5
we choose parameter values of such that either outlier means are equal, i.e., Xout =Yout, or outlier variances are equal, i.e., 2
Xout = 2
Yout. In
Table 1, we display, for each distribution setting, two outlier means, two outlier variances.
Xout Yout 2 Xout 2 Yout 2 YX Normal (I) = 0:85 1:313 1:554 1:554 0:194 0:125 0:125 = 0:9 1:465 1:754 1:754 0:169 0:112 0:112 = 0:95 1:695 2:062 2:062 0:138 0:096 0:096 (II) = 0:85 1:799 1:554 1:866 0:194 0:194 0:292 = 0:9 1:861 1:754 1:977 0:169 0:169 0:218 = 0:95 2:012 2:062 2:210 0:138 0:138 0:159 Mixed Normal (I) = 0:85 1:313 1:554 1:554 0:194 0:170 0:170 = 0:9 1:465 1:754 1:754 0:169 0:145 0:145 = 0:95 1:695 2:062 2:062 0:138 0:115 0:115 (II) = 0:85 1:638 1:554 1:630 0:194 0:194 0:200 = 0:9 1:768 1:754 1:833 0:169 0:169 0:175 = 0:95 1:971 2:062 2:140 0:138 0:138 0:144 We have several comments for the results in Table 1:
We see that the outlier means Xout and Yout for two three 's in (I)
and the outlier variances 2
Xout and 2
Yout for two three 's in (II) are
all identical. This indicates that for any underlying distribution, there is chance that using outlier mean or outlier variance to test equality of two distributions may not be appropriate.
We then consider a test that can simultaneously interpret the combined change in both outlier mean Yout and outlier variance 2
Yout. The
com-bined outlier quantity is dened as
2 YX =Ef(Y ;Xout) 2 jY F ;1 X ()g:
This combined outlier quantity when Y and X have the same distribution is 2 Xout =Ef(X ;Xout) 2 jX F ;1 X ()g:
The aim of combined outlier quantity is to verify if 2
YX and 2
Xout are
identical. In Table 1, the values of combined outlier quantity 2
YX in all
two distributions and dierent's are displayed. With a comparison of2
YX
and 2
This allows us to propose a combined outlier quantity based test for gene expression analysis.
We further consider the following three types of distribution setting,
Type 1: X N(01)Y ( 2(10) +) Type 2: X t(10)Y 0:9t(10) + 0:1N( 2) = 1 Type 3: X t(10)Y 0:9t(10) + 0:1( 2(10) +):
and present the diernces of outlier means, outlier variances and combined outlier quantity as Dfm =Yout;Xout Dfv = 2 Yout; 2 Xout Dfcomb =2 YX ; 2 Xout in Table 2.
Dfm Dfv Dfcom Type 1 = 0 = 0:85 3:594 25:86 38:78 = 0:9 4:340 27:38 46:22 = 0:95 5:480 27:16 57:20 = 2 = 0:85 4:444 35:10 54:85 = 0:9 5:392 36:60 65:67 = 0:95 6:853 34:83 81:80 = 4 = 0:85 5:296 46:29 74:33 = 0:9 6:444 47:81 89:35 = 0:95 8:232 44:19 111:9 Type 2 = 2 = 0:85 0:222 0:167 0:216 = 0:9 0:205 0:122 0:164 = 0:95 0:153 0:044 0:067 = 4 = 0:85 0:965 1:519 2:451 = 0:9 1:062 1:337 2:467 = 0:95 1:118 0:953 2:203 Type 3 = 0 = 0:85 3:517 25:05 37:42 = 0:9 4:218 26:33 44:12 = 0:95 5:245 25:85 53:37 = 2 = 0:85 4:368 34:11 53:19 = 0:9 5:268 35:31 63:08 = 0:95 6:614 33:23 76:99 = 4 = 0:85 5:219 45:12 72:36 = 0:9 6:321 46:29 86:26 = 0:95 7:994 42:30 106:2
It is seen that the dierences of combined outlier quantities are much more larger than the other two dierences. This probably indicates that the combined outlier quantity may be more ecient in detecting the inuential genes.
The sample estimator of combined outlier quantity is dened as
S2 YX = n 2 X i=1 I(Yi F^;1 X ())];1 n2 X i=1 (Yi;^Xout) 2I(Y i F^;1 X ())
where the sample outlier mean is ^Xout = Pn 1 i=1I(Xi F^ ;1 X ())];1 Pn 1 i=1XiI(Xi ^ F;1
X ()). It is also interesting to evaluate the eciencies in estimating the
parameters of outlier mean, outlier variance and combined outlier quan-tity. We denote the mean square errors for XoutYout2
Xout2
Yout and
2
YX are, respectively, as MSEXoutMSEYoutMSE 2
XoutMSE 2 Yout
and MSE2
YX. Under the following distribution setting, with n= 30,
X1:::Xn iid N(01)Y1:::Yn iid 0:9N(01) + 0:1N(1)
we display these results in Table 3.
Table 3
. MSE's comparison for parameters' estimations (n1 = n2 = n =30)
MSEXout MSEYout MSE 2 Xout MSE 2 Yout MSE 2 YX = 1 = 0:85 0:0977 0:0996 0:0288 0:0426 0:1007 = 0:9 0:1235 0:1283 0:0283 0:0382 0:1269 = 0:95 0:2191 0:1788 0:0263 0:0335 0:1580 = 3 = 0:85 0:0981 0:2419 0:0276 0:3354 1:2345 = 0:9 0:1240 0:2895 0:0306 0:3415 1:3698 = 0:95 0:2137 0:3587 0:0265 0:3270 1:8171 It is seen that the MSE's for combined outlier quantity are relatively
larger than the other outlier mean and outlier variance quantity. This is due to that a quantity that can simultaneously predict the dierence in outlier mean and outlier variance should be more dicult. The appropriateness of the test based on combined outlier quantity needs to be justied through the power comparisons.
3. The Test based on Combined Outlier Quantity
We here introduce some asymptotic properties of the combined outlier quantity and then provide a test based on its asymptotic distribution.
Theorem 3.1.
(a) n1=2 2 (S 2 YX ; 2 YX) =n;1=2 1 n1 X i=1 1( ;I(Xi F ;1 X ())) +2(Xi ;Xout) I(Xi F;1 X ())] +;1 Y n;1=2 2 n2 X i=1 (Yi;Xout) 2 ; 2 YX]I(Yi F;1 X ()) +op(1) where we let 1 = ; ;1 Y (F;1 X ();Xout) 2f Y(F;1 X ())f;1 X (F;1 X ()) + 2;1 X F;1 X ()(Yout;Xout)] 2 = ;2 ;1 X (Yout ;Xout) with Y =P(Y F;1 X ())X = 1;: (b) We haven1=2 2 (S 2 YX; 2Yout) converges in distribution toN(0vy) where
vy =(1;) 2 1+ 2 2E(X ;Xout) 2I(X F ;1 X ())];2 12(1 ;) E(X;Xout)I(X F ;1 X ())] +;2 Y Ef(Y ;Xout) 4I(Y F ;1 X ())g; 4 YX: where;2 Y Ef(Y ;Xout) 4I(Y F ;1 X ())g; 4 YX =V ar(Y;Xout) 2 jY F;1 X ()].
From the above theorem, then underH0 :Fx =Fy, we have the following,
PH0 f p n2( S2 YX; 2 XX p vY )zg! Z z ;1 (z)dz
for z 2R where represents the probability density function of N(01). If
we further have ^2
XX and ^vY, respectively, estimates of 2
XX and vY, we
may dene an outlier combined test as rejecting H0 if n 1=2 2 ( S2 YX ;^ 2 XX p ^ vY ) z: (3.1)
Having this outlier combined test, it is desired to verify the power perfor-mance of this test when there exists distributional shift for the disease group
distribution. An approximate power with signicant level may be derived as bellows Y =PFY f p n2( S2 YX ;^ 2 XX p ^ vY ) zg =PFY f p n2( S2 YX ; 2 YX p vY ) z p ^ vY +p n2(^ 2 XX ; 2 YX) p vY g PfZ z+ p n2( 2 XX ; 2 YX p vY )g (3.2)
The test dened in (3.1) requires that estimator ^vY is consistent for
pa-rameter vY. There is diculty in providing ecient density estimates
in-volved in 1. There is one way to get rid of this diculty since a level
test is restricted on size when two distributions FY and FX are identical.
Corollary 3.2.
When Y and X have the same distribution, we have, bythe fact that 2
XX =2 Xout, n1=2 2 (S 2 YX ; 2 Xout) =; ;1 X (F;1 X ();Xout) 2n ;1=2 1 n1 X i=1 (;I(XiF ;1 X ())) +;1 X n;1=2 2 n2 X i=1 (Xi;Xout) 2 ; 2 Xout]I(Xi F;1 X ()) +op(1): We have n1=2 2 (S 2 YX ; 2
Xout) converges in distribution to N(0vX) where
vX =;2 X (1;)(F ;1 X ();Xout) 4 +;2 X E(X;Xout) 4I(X F ;1 X ())]; 4 Xout:
Suppose that we have estimators ^2
XX and ^vX, respectively, for
estima-tion of 2
XX and vX. We then can dene the following test
Combined test : rejecting H0 if n 1=2 2 S2 YX ;^ 2 XX p ^ vX > z: (3.3)
The interest by applying this test of (3.3) is thatvX itself involves no density
point so that estimation of it is much easier. We can similarly derive the approximate power for the above test as
X =PXf p n2( S2 YX ;^ 2 XX p ^ vX ) zg: (3.4)
Power representations (3.2) and (3.4) provide approximate powers based on tests in (3.1) and (3.3). We display the powers of this test (3.1) in Table 4 when the underlying distributions for control group and disease group as
X N(01) and Y (1;)N(01) +N(1):
Table 4
.Asymptotic power Y for mixed normal distribution (n= 30)= 1 = 3 = 5 = 10 = 0:1 = 0:8 0:078 0:281 0:281 0:541 = 0:85 0:074 0:247 0:414 0:534 = 0:2 = 0:8 0:098 0:422 0:682 0:825 = 0:85 0:090 0:345 0:618 0:810 Without simulation study, it is not known if (3.2) and (3.4) present ap-propriate powers for these two tests. If they are actually in-apap-propriate, the critical points z require an adjustment. We will answer this in next
section.
4. Power Comparison by Simulation and a Simple Real Data
Anal-ysis
Two tasks will be done in this section. First, we will show by simulation that the setting of critical point z of (3.4) by approximation theory is too
conservative and we will study present the appropriate level critical point. Second, we will compare this outlier combined test with a combination of t-test and F-test in terms of power. The classical t-test is designed to detect a change in distributional mean and F-test is to detect a change in distribu-tional variation. Hence, a combination of t-test and F-test is to detect the shift in mean and variation simultaneously. It is then desired to compare powers of these two combined tests.
A t and F combined test is rejecting H0 if Y ;X Sp q 1 n1 + 1 n2 > t=2(n 1+n2 ;2) or S2 X S2 Y > F=2(n 1 ;1n 2 ;1) or < 1 F=2(n 2 ;1n 1 ;1)
where S2 p = P n 1 i=1 (X i ; X) 2 + P n 2 i=1 (Y i ; Y) 2 n1 +n 2 ;2 S 2 X = 1 n1 ;1 Pn 1 i=1(Xi ; X) 2 and S2 Y = 1 n2 ;1 Pn 2 i=1(Yi ;Y) 2.
We consider a simulation with sample size n=n1 =n2 and replications
m = 100000 to evaluate the power when X and Y are from the following setting of distribution:
X N(01) andY 0:9N(01) + 0:1N(1):
In Tables 5 and 6, we display the simulated results for n= 50 and n= 100 when level of signicance is 0:05 and in Table 7, we display the simulated results for n= 50 when = 0:1.
We have comments for the results in Tables 5, 6 and 7:
(a) Although the contamination percentage of outlier in mixed normal dis-tribution is small as 0:1 the combined outlier quantity of cuto with small
's are more powerful than it with larger 's.
(b) The tests based on the combined outlier quantity of cuto with small's are relatively more powerful than thetandF combined test. This indicates that simultaneously detect the shift in outlier mean and outlier variance is appropriate when we choose appropriately for the cuto.
(c) The power for the test based on the combined outlier quantity is increas-ing when the contaminated location shift is increasing.
We next consider that alternative distribution has a constant shift as Setting I : X N(01) and Y (1;)N(01) +fg
Setting II : X N(01) and Y (1;)t(10) +fg
We list the simulated results in Tables 8-11.
We consider a real data of control group and disease group that includes 22283 genes. Considering the signicance level = 0:05, the constantsz's
in table are the critical points designed to ensure that the sizes of the tests and 's are appropriately 0:05. Then, we evaluate the percentages of gene numbers to be rejected for all the respective tests in all 's. The computed results are displayed in Table 12.
Table 12
. Percentages of genes larger than critical values ( = 0:05)Outlier mean Outlier variance Combined q t-test
= 0:6 0(z = 2:68) 0:1247(z = 2:85) 0:1195(z = 3:07) 0:0325 = 0:65 0(z = 2:46) 0:1263(z = 2:97) 0:1168(z = 3:28) = 0:7 0(z = 2:21) 0:1286(z = 3:11) 0:1165(z = 3:51) = 0:75 0(z = 1:86) 0:1256(z = 3:42) 0:1157(z = 3:85) = 0:8 0(z = 1:54) 0:1242(z = 3:74) 0:1092(z = 4:35) = 0:85 0(z = 1:23) 0:1230(z = 4:45) 0:1102(z = 5:28) = 0:9 0:00004(z = 1:01) 0:1247(z = 5:54) 0:1049(z = 7:25) = 0:95 0:0004(z = 0:77) 0:1267(z = 9:15) 0:1154(z = 15:5)
We have several comments on the results in this table:
(a) It is seen that the outlier mean test performed poorly with very low percentages of genes to be rejected. This shows that it can not detect any gene as inuentials.
(b) The tests based on outlier variance and outlier combined quantity are with relatively moderate percentages of genes been claimed inuential. Since the genes are measured simultaneously from the same subjects, there is need a simultaneous test that would remarkedly reduce the percentages of genes to be claimed inuetial. We will not further pursuit this study. However, we see that only outlier variance and outlier combined quantity are with hope to be able to nd genes been inuential.
5. Appendix
Three assumptions for the two sample outlier variance test are as follows. ASSUMPTION 1: The limit =limn1n2
!1n ;1 1 n
2 exists.
ASSUMPTION 2: Pobability density function fX of distribution FX is
bounded away from zero in neighborhoods of F;1
X ( ) for 2(01) and the
population cuto point .
ASSUMPTION 3: Probability density function fY is bounded away from
Proof of Theorem 3.1
: First, we consider the following expansion n2 X i=1 (Yi;^Xout) 2I(Y i F^;1 X ()) = n 2 X i=1 (Yi;Xout) 2I(Y i F^;1 X ()) + (^Xout;Xout) 2 n2 X i=1 I(Yi F^;1 X ());2(^Xout;Xout) n2 X i=1 (Yi ;Yout) +n 2(Yout ;Xout)]I(Yi F^ ;1 X ()): (5.1)From the theory for the outlier mean by Chen, Chen and Chan (2010), we may see that n1=2
2 (^Yout ;Yout) = Op(1), n 1=2 1 (^Xout ;Xout) = Op(1) and n;1=2 2 Pn 2 i=1(Yi ;Yout)I(Yi F^ ;1 X ()) =Op(1). We then, from (5.1),
may re-write the combined quantity as
n1=2 2 (S 2 YX ; 2 YX) =n1=2 2 ( n2 X i=1 I(Yi F^;1 X ()));1 f n2 X i=1 (Yi;Xout) 2I(Y i F;1 X () +n;1=2 2 T) ;I(Yi F ;1 X ())] + n 2 X i=1 (Yi;Xout) 2 ; 2 YX]I(Yi F;1 X ())g ;2(Yout;Xout)n 1=2 2 (^Xout ;Xout) +op(1) (5.2) where we let T =n1=2 1 ( ^F ;1 X ();F ;1 X ()).
With Assumptions 2 and 3, and techniques from Ruppert & Carroll (1980) and Chen & Chiang (1996), we may see that
n;1=2 2 n2 X i=1 (Yi;Xout) 2I(Y i F;1 X () +n;1=2 2 T ) ;I(Yi F ;1 X ())] =;(F ;1 X ();Xout) 2f Y(F;1 X ())T (5.3)
for any sequence T =O
p(1).
We may also see from Chen, Chen and Chang (2010) that the outlier mean ^Xout has the following representation
n1=2 1 (^Xout ;Xout) = ;1 X F;1 X ()n;1=2 1 n1 X i=1 (;I(Xi F ;1 X ())) +;1 X n;1=2 1 n1 X i=1 (Xi;Xout)I(Xi F ;1 X ()) +op(1): (5.4)
The result of this theorem is induced by plugging (5.3) and (5.4) into (5.2) and applying a representation for empirical quantile ^F;1
X () in Chen, Chen