由混合變異建構基因表現分析之無母數檢定

(1)

國

立

交

通

大

學

統計學研究所

碩

士

論

文

由混合變異建構基因表現分析之無母數檢定

Nonparametric Test based on Combined Variation

for Gene Expression Analysis

研究生：連紫汝

指導教授：陳鄰安博士

(2)

Nonparametric Test based on Combined Variation for Gene Expression Analysis

研究生：連紫汝 ..Student：Tzu-Ju Lian 指導教授：陳鄰安 Advisor：Dr. Lin-An Chen

國立交通大學統計學研究所

碩士論文

A Thesis

Submitted to Institute of Statistics College of Science National Chiao Tung University

in partial Fulfillment of the Requirements for the Degree of Master

in Statistics June 2011

Hsinchu, Taiwan, Republic of China

(3)

i

由混合變異建構基因表現分析之無母數檢定

研究生：連紫汝指導教授：陳鄰安博士國立交通大學統計學研究所摘要在致病基因的檢測問題上，因為 Tomlins et al.(2005)的發現使得探討離群分配變成一個重要主題。不同於離群平均只能檢測中心位置之改變，我們提出一個統計量它同時可以檢測中心與離心兩種變異。這個統計量還有一個好處。由它所建立的檢定統計量不用估計未知分配的密度函數值。我們利用模擬分析比較了幾種檢定方法的檢力並且做了比較。我們也進一步做了一個簡單的實際資料分析。

(4)

ii

Gene Expression Analysis

Student：Tzu-Ju Lian Advisor：Dr. Lin-An Chen

Institute of Statistics

National Chiao Tung University

SUMMARY

Observed by Tomlins et al. (2005), detection of the shift for outlier- distribution is a new topic useful in gene expression analysis. Alternative to the outlier mean test, we introduce a nonparametric statistic that can simultaneously detect the location shift and variation shift in the outlier distribution. There is an advantage, comparing with the outlier mean, that the test based on this statistic requires no prediction of distributional densities. Comparisons of this test statistic with some other methods in terms of mean square errors for estimation of their population parameters and powers for their abilities in detection of disease genes are simulated and displayed. Finally, a simple real data analysis is also performed and presented.

(5)

iii

誌謝

畢業將至，回想起兩年的研究所生活，除了在專業上更精進之外，更獲得許多學業以外的知識。首先，最要感謝陳鄰安老師，謝謝您這些日子來的指導與教誨，從您身上我學到做學問的態度，啟發我對研究的熱忱，都是支持我繼續努力的動力。除此之外，老師也時常教導我許多人生哲理，使我獲益良多。一句謝謝您，代表我最深的敬意。感謝口試委員許文郁老師、謝文萍老師以及黃冠華老師對論文的指導與建議，使論文能更趨完善。感謝所有的同學，彼此間的互相打氣，與論文奮戰的每一天，你們的陪伴及鼓勵，都支持著我繼續努力下去，也因為有你們，讓我的碩士生涯更多采多姿。謝謝親愛的室友們，不論在研究上、或在生活上妳們都是我最佳的傾聽者，謝謝妳們無私的關心。感謝所辦的郭姐，協助我們處理大小事務，讓我們可以專心做研究。感謝家人的支持，讓我能無後顧之憂的繼續唸書，當我最強力的後盾。謝謝每個曾經幫助過我的人，在此致上最深的謝意。連紫汝謹誌于國立交通大學統計學研究所中華民國一百年六月

(6)

iv 中文提要 ... i 英文提要 ... ii 誌謝 ... iii 目錄 ... iv 1. Introduction ... 1

2. Combined Outlier Quantity... 2

3. The Test based on Combined Outlier Quantity ... 7

4. Power Comparison by Simulation and a Simple Real Data Analysis ... 10

5. Appendix ... 12

(7)

Nonparametric Test based on Combined Variation

for Gene Expression Analysis

SUMMARY

Observed by Tomlins et al. (2005), detection of the shift for outlier-distribution is a new topic useful in gene expression analysis. Alternative to the outlier mean test, we introduce a nonparametric statistic that can simultaneously detect the location shift and variation shift in the outlier distribution. There is an advantage, comparing with the outlier mean, that the test based on this statistic requires no prediction of distributional densities. Comparisons of this test statistic with some other methods in terms of mean square errors for estimation of their population parameters and powers for their abilities in detection of disease genes are simulated and displayed. Finally, a simple real data analysis is also performed and presented.

Key words: Gene expression analysis Outlier mean Outlier sum t-test.

1. Introduction

DNA microarray technology, which simultaneously probes thousands of gene expression proles, has been successfully used in medical research for disease classication (Agrawal et al. (2002) Alizadeh et al. (2000) Ohki et al. (2005)) Sorlie et al. (2003)). Among the existed techniques in dieren-tial genes detection, common statistical methods for two-group comparisons such ast-test, are not appropriate due to a large number of genes expressions and a limited number of subjects available. Several statistical approaches have been proposed to identify those genes where only a subset of the sam-ple genes has high expression. Among them, Tomlins et al. (2005) observed that there is small number of outliers in samples of dierential genes and then introduced a method called cancer outlier prole analysis that identies outlier proles by a statistic based on the median and the median absolute deviation of a gene expression prole. With this observation, a sequence of approaches then concentrated on detecting dierential genes based on out-lier samples while Tibshirani and Hastie (2007) and Wu (2007) suggested to

TypesetbyA M

S-T E

(8)

use an outlier sum, the sum of all the gene expression values in the disease group that are greater than a specied cuto point. The common disad-vantage of these techniques is that the distribution theory of the proposed methods has not been discovered so that the distribution based p value can not been applied. Recently Chen, Chen and Chan (2010) considered the outlier mean (average of outlier sum) and developed its large sample theory that allows us to formulate thepvalue based on its asymptotic distribution. For evaluation, they performed simulation studies in a parametric study by specifying the normal distribution. Although the outlier sum or outlier mean is shown interesting in detection of inuential genes through statistical anal-ysis and some real data analanal-ysis, however, these techniques can detect only the location shift in the outlier distribution, not the distributional variation. We propose a statistic that can detect simultaneously the location shift and variation shift of the outlier distribution that is generalized from the combined control chart applied in quality control (see Cheng and Thaga (2006) for a review). In Section 2, we present the reasons for the need for the combined outlier quantity. In Section 3, we introduce an asymptotic dis-tribution for the combined outlier quantity and use this theory to introduce a new test for gene expression analysis where a discussion of power based on this new test is given. In Section 4, a comparison between this test and a test combined from the outlier mean and outlier variance is given. Finally, the proofs of theorems are provided in Section 5.

2. Combined Outlier Quantity

In a general study that consists of n1 subjects in the normal control

group and n2 subjects in the disease group, suppose that there are m genes

to be investigated. Their gene expression can be represented as Xiji =

12:::n1j = 1:::m for normal control group and Yiji = 12:::n2j =

12:::m for the disease group. However, in our study, we restrict on one gene with expression variableX for group of normal subject and expression variable Y for group of disease subject where the distribution functions for them are FX and FY respectively. We assume that we have observations

(9)

An important observation by Tomlins et al. (2005) from a study of prostate cancer, outlier genes are over-expressed only in a small number of disease samples. With dening a cuto point ^ determined from the data of the variable X, Tibshirani and Hastie (2007) and Wu (2007) con-sidered the sum of variables Y0

is that are over higher cuto point ^ given

by Pn

2

i=1YiI(Yi ^) as a test statistic for detection if the disease group

distribution is dierent from the normal group distribution. Latter Chen, Chen and Chan (2010) developed the asymptotic distribution for its aver-age, called the outlier mean, Yout = (Pn

2 i=1I(Yi ^)) ;1 Pn 2 i=1YiI(Yi ^)

for constructing a distribution based p value. In this paper, we choose

=F;1

X (), the population th quantile, and ^ = ^F;1

X (), the th

empiri-cal quantile from the sample X1:::Xn

1. Then, the population type outlier

means for distributions of X and Y are

Xout =E(XjX F ;1

X ()) and Yout =E(YjY F ;1

X ()) (2.1)

and the population type outlier variances are

2 Xout =V ar(XjX F ;1 X ()) and 2 Yout =V ar(YjY F ;1 X ()): (2.2)

The outlier mean based analysis is to test if Yout is statistically dierent

from Xout and the outlier variance based analysis is to test if 2

Yout is

statistically dierent from 2

Xout.

For the following two distribution settings, Normal :X N(01)Y N(

2) = 0:5

Mixed normal :X N(01)Y 0:9N(01) + 0:1N(

2) = 0:5

we choose parameter values of such that either outlier means are equal, i.e., Xout =Yout, or outlier variances are equal, i.e., 2

Xout = 2

Yout. In

Table 1, we display, for each distribution setting, two outlier means, two outlier variances.

(10)

Xout Yout 2 Xout 2 Yout 2 YX Normal (I) = 0:85 1:313 1:554 1:554 0:194 0:125 0:125 = 0:9 1:465 1:754 1:754 0:169 0:112 0:112 = 0:95 1:695 2:062 2:062 0:138 0:096 0:096 (II) = 0:85 1:799 1:554 1:866 0:194 0:194 0:292 = 0:9 1:861 1:754 1:977 0:169 0:169 0:218 = 0:95 2:012 2:062 2:210 0:138 0:138 0:159 Mixed Normal (I) = 0:85 1:313 1:554 1:554 0:194 0:170 0:170 = 0:9 1:465 1:754 1:754 0:169 0:145 0:145 = 0:95 1:695 2:062 2:062 0:138 0:115 0:115 (II) = 0:85 1:638 1:554 1:630 0:194 0:194 0:200 = 0:9 1:768 1:754 1:833 0:169 0:169 0:175 = 0:95 1:971 2:062 2:140 0:138 0:138 0:144 We have several comments for the results in Table 1:

We see that the outlier means Xout and Yout for two three 's in (I)

and the outlier variances 2

Xout and 2

Yout for two three 's in (II) are

all identical. This indicates that for any underlying distribution, there is chance that using outlier mean or outlier variance to test equality of two distributions may not be appropriate.

We then consider a test that can simultaneously interpret the combined change in both outlier mean Yout and outlier variance 2

Yout. The

com-bined outlier quantity is dened as

2 YX =Ef(Y ;Xout) 2 jY F ;1 X ()g:

This combined outlier quantity when Y and X have the same distribution is 2 Xout =Ef(X ;Xout) 2 jX F ;1 X ()g:

The aim of combined outlier quantity is to verify if 2

YX and 2

Xout are

identical. In Table 1, the values of combined outlier quantity 2

YX in all

two distributions and dierent's are displayed. With a comparison of2

YX

and 2

(11)

This allows us to propose a combined outlier quantity based test for gene expression analysis.

We further consider the following three types of distribution setting,

Type 1: X N(01)Y ( 2(10) +) Type 2: X t(10)Y 0:9t(10) + 0:1N( 2) = 1 Type 3: X t(10)Y 0:9t(10) + 0:1( 2(10) +):

and present the diernces of outlier means, outlier variances and combined outlier quantity as Dfm =Yout;Xout Dfv = 2 Yout; 2 Xout Dfcomb =2 YX ; 2 Xout in Table 2.

(12)

Dfm Dfv Dfcom Type 1 = 0 = 0:85 3:594 25:86 38:78 = 0:9 4:340 27:38 46:22 = 0:95 5:480 27:16 57:20 = 2 = 0:85 4:444 35:10 54:85 = 0:9 5:392 36:60 65:67 = 0:95 6:853 34:83 81:80 = 4 = 0:85 5:296 46:29 74:33 = 0:9 6:444 47:81 89:35 = 0:95 8:232 44:19 111:9 Type 2 = 2 = 0:85 0:222 0:167 0:216 = 0:9 0:205 0:122 0:164 = 0:95 0:153 0:044 0:067 = 4 = 0:85 0:965 1:519 2:451 = 0:9 1:062 1:337 2:467 = 0:95 1:118 0:953 2:203 Type 3 = 0 = 0:85 3:517 25:05 37:42 = 0:9 4:218 26:33 44:12 = 0:95 5:245 25:85 53:37 = 2 = 0:85 4:368 34:11 53:19 = 0:9 5:268 35:31 63:08 = 0:95 6:614 33:23 76:99 = 4 = 0:85 5:219 45:12 72:36 = 0:9 6:321 46:29 86:26 = 0:95 7:994 42:30 106:2

It is seen that the dierences of combined outlier quantities are much more larger than the other two dierences. This probably indicates that the combined outlier quantity may be more ecient in detecting the inuential genes.

The sample estimator of combined outlier quantity is dened as

S2 YX = n 2 X i=1 I(Yi F^;1 X ())];1 n2 X i=1 (Yi;^Xout) 2I(Y i F^;1 X ())

(13)

where the sample outlier mean is ^Xout = Pn 1 i=1I(Xi F^ ;1 X ())];1 Pn 1 i=1XiI(Xi ^ F;1

X ()). It is also interesting to evaluate the eciencies in estimating the

parameters of outlier mean, outlier variance and combined outlier quan-tity. We denote the mean square errors for XoutYout2

Xout2

Yout and

2

YX are, respectively, as MSEXoutMSEYoutMSE 2

XoutMSE 2 Yout

and MSE2

YX. Under the following distribution setting, with n= 30,

X1:::Xn iid N(01)Y1:::Yn iid 0:9N(01) + 0:1N(1)

we display these results in Table 3.

Table 3

. MSE's comparison for parameters' estimations (n1 = n2 = n =

30)

MSEXout MSEYout MSE 2 Xout MSE 2 Yout MSE 2 YX = 1 = 0:85 0:0977 0:0996 0:0288 0:0426 0:1007 = 0:9 0:1235 0:1283 0:0283 0:0382 0:1269 = 0:95 0:2191 0:1788 0:0263 0:0335 0:1580 = 3 = 0:85 0:0981 0:2419 0:0276 0:3354 1:2345 = 0:9 0:1240 0:2895 0:0306 0:3415 1:3698 = 0:95 0:2137 0:3587 0:0265 0:3270 1:8171 It is seen that the MSE's for combined outlier quantity are relatively

larger than the other outlier mean and outlier variance quantity. This is due to that a quantity that can simultaneously predict the dierence in outlier mean and outlier variance should be more dicult. The appropriateness of the test based on combined outlier quantity needs to be justied through the power comparisons.

3. The Test based on Combined Outlier Quantity

We here introduce some asymptotic properties of the combined outlier quantity and then provide a test based on its asymptotic distribution.

(14)

Theorem 3.1.

(a) n1=2 2 (S 2 YX ; 2 YX) =n;1=2 1 n1 X i=1 1( ;I(Xi F ;1 X ())) +2(Xi ;Xout) I(Xi F;1 X ())] +;1 Y n;1=2 2 n2 X i=1 (Yi;Xout) 2 ; 2 YX]I(Yi F;1 X ()) +op(1) where we let 1 = ; ;1 Y (F;1 X ();Xout) 2f Y(F;1 X ())f;1 X (F;1 X ()) + 2;1 X F;1 X ()(Yout;Xout)] 2 = ;2 ;1 X (Yout ;Xout) with Y =P(Y F;1 X ())X = 1;: (b) We haven1=2 2 (S 2 YX; 2

Yout) converges in distribution toN(0vy) where

vy =(1;) 2 1+ 2 2E(X ;Xout) 2I(X F ;1 X ())];2 12(1 ;) E(X;Xout)I(X F ;1 X ())] +;2 Y Ef(Y ;Xout) 4I(Y F ;1 X ())g; 4 YX: where;2 Y Ef(Y ;Xout) 4I(Y F ;1 X ())g; 4 YX =V ar(Y;Xout) 2 jY F;1 X ()].

From the above theorem, then underH0 :Fx =Fy, we have the following,

PH0 f p n2( S2 YX; 2 XX p vY )zg! Z z ;1 (z)dz

for z 2R where represents the probability density function of N(01). If

we further have ^2

XX and ^vY, respectively, estimates of 2

XX and vY, we

may dene an outlier combined test as rejecting H0 if n 1=2 2 ( S2 YX ;^ 2 XX p ^ vY ) z: (3.1)

Having this outlier combined test, it is desired to verify the power perfor-mance of this test when there exists distributional shift for the disease group

(15)

distribution. An approximate power with signicant level may be derived as bellows Y =PFY f p n2( S2 YX ;^ 2 XX p ^ vY ) zg =PFY f p n2( S2 YX ; 2 YX p vY ) z p ^ vY +p n2(^ 2 XX ; 2 YX) p vY g PfZ z+ p n2( 2 XX ; 2 YX p v_Y )g (3.2)

The test dened in (3.1) requires that estimator ^vY is consistent for

pa-rameter vY. There is diculty in providing ecient density estimates

in-volved in 1. There is one way to get rid of this diculty since a level

test is restricted on size when two distributions FY and FX are identical.

Corollary 3.2.

When Y and X have the same distribution, we have, by

the fact that 2

XX =2 Xout, n1=2 2 (S 2 YX ; 2 Xout) =; ;1 X (F;1 X ();Xout) 2n ;1=2 1 n1 X i=1 (;I(XiF ;1 X ())) +;1 X n;1=2 2 n2 X i=1 (Xi;Xout) 2 ; 2 Xout]I(Xi F;1 X ()) +op(1): We have n1=2 2 (S 2 YX ; 2

Xout) converges in distribution to N(0vX) where

vX =;2 X (1;)(F ;1 X ();Xout) 4 +;2 X E(X;Xout) 4I(X F ;1 X ())]; 4 Xout:

Suppose that we have estimators ^2

XX and ^vX, respectively, for

estima-tion of 2

XX and vX. We then can dene the following test

Combined test : rejecting H0 if n 1=2 2 S2 YX ;^ 2 XX p ^ vX > z: (3.3)

The interest by applying this test of (3.3) is thatvX itself involves no density

point so that estimation of it is much easier. We can similarly derive the approximate power for the above test as

X =PXf p n2( S2 YX ;^ 2 XX p ^ vX ) zg: (3.4)

(16)

Power representations (3.2) and (3.4) provide approximate powers based on tests in (3.1) and (3.3). We display the powers of this test (3.1) in Table 4 when the underlying distributions for control group and disease group as

X N(01) and Y (1;)N(01) +N(1):

Table 4

.Asymptotic power Y for mixed normal distribution (n= 30)

= 1 = 3 = 5 = 10 = 0:1 = 0:8 0:078 0:281 0:281 0:541 = 0:85 0:074 0:247 0:414 0:534 = 0:2 = 0:8 0:098 0:422 0:682 0:825 = 0:85 0:090 0:345 0:618 0:810 Without simulation study, it is not known if (3.2) and (3.4) present ap-propriate powers for these two tests. If they are actually in-apap-propriate, the critical points z require an adjustment. We will answer this in next

section.

4. Power Comparison by Simulation and a Simple Real Data

Anal-ysis

Two tasks will be done in this section. First, we will show by simulation that the setting of critical point z of (3.4) by approximation theory is too

conservative and we will study present the appropriate level critical point. Second, we will compare this outlier combined test with a combination of t-test and F-test in terms of power. The classical t-test is designed to detect a change in distributional mean and F-test is to detect a change in distribu-tional variation. Hence, a combination of t-test and F-test is to detect the shift in mean and variation simultaneously. It is then desired to compare powers of these two combined tests.

A t and F combined test is rejecting H0 if Y ;X Sp q 1 n1 + 1 n2 > t=2(n 1+n2 ;2) or S2 X S2 Y > F=2(n 1 ;1n 2 ;1) or < 1 F=2(n 2 ;1n 1 ;1)

(17)

where S2 p = P n 1 i=1 (X i ; X) 2 + P n 2 i=1 (Y i ; Y) 2 n1 +n 2 ;2 S 2 X = 1 n1 ;1 Pn 1 i=1(Xi ; X) 2 and S2 Y = 1 n2 ;1 Pn 2 i=1(Yi ;Y) 2.

We consider a simulation with sample size n=n1 =n2 and replications

m = 100000 to evaluate the power when X and Y are from the following setting of distribution:

X N(01) andY 0:9N(01) + 0:1N(1):

In Tables 5 and 6, we display the simulated results for n= 50 and n= 100 when level of signicance is 0:05 and in Table 7, we display the simulated results for n= 50 when = 0:1.

We have comments for the results in Tables 5, 6 and 7:

(a) Although the contamination percentage of outlier in mixed normal dis-tribution is small as 0:1 the combined outlier quantity of cuto with small

's are more powerful than it with larger 's.

(b) The tests based on the combined outlier quantity of cuto with small's are relatively more powerful than thetandF combined test. This indicates that simultaneously detect the shift in outlier mean and outlier variance is appropriate when we choose appropriately for the cuto.

(c) The power for the test based on the combined outlier quantity is increas-ing when the contaminated location shift is increasing.

We next consider that alternative distribution has a constant shift as Setting I : X N(01) and Y (1;)N(01) +fg

Setting II : X N(01) and Y (1;)t(10) +fg

We list the simulated results in Tables 8-11.

We consider a real data of control group and disease group that includes 22283 genes. Considering the signicance level = 0:05, the constantsz's

in table are the critical points designed to ensure that the sizes of the tests and 's are appropriately 0:05. Then, we evaluate the percentages of gene numbers to be rejected for all the respective tests in all 's. The computed results are displayed in Table 12.

(18)

Table 12

. Percentages of genes larger than critical values ( = 0:05)

Outlier mean Outlier variance Combined q t-test

= 0:6 0(z = 2:68) 0:1247(z = 2:85) 0:1195(z = 3:07) 0:0325 = 0:65 0(z = 2:46) 0:1263(z = 2:97) 0:1168(z = 3:28) = 0:7 0(z = 2:21) 0:1286(z = 3:11) 0:1165(z = 3:51) = 0:75 0(z = 1:86) 0:1256(z = 3:42) 0:1157(z = 3:85) = 0:8 0(z = 1:54) 0:1242(z = 3:74) 0:1092(z = 4:35) = 0:85 0(z = 1:23) 0:1230(z = 4:45) 0:1102(z = 5:28) = 0:9 0:00004(z = 1:01) 0:1247(z = 5:54) 0:1049(z = 7:25) = 0:95 0:0004(z = 0:77) 0:1267(z = 9:15) 0:1154(z = 15:5)

We have several comments on the results in this table:

(a) It is seen that the outlier mean test performed poorly with very low percentages of genes to be rejected. This shows that it can not detect any gene as inuentials.

(b) The tests based on outlier variance and outlier combined quantity are with relatively moderate percentages of genes been claimed inuential. Since the genes are measured simultaneously from the same subjects, there is need a simultaneous test that would remarkedly reduce the percentages of genes to be claimed inuetial. We will not further pursuit this study. However, we see that only outlier variance and outlier combined quantity are with hope to be able to nd genes been inuential.

5. Appendix

Three assumptions for the two sample outlier variance test are as follows. ASSUMPTION 1: The limit =limn1n2

!1n ;1 1 n

2 exists.

ASSUMPTION 2: Pobability density function fX of distribution FX is

bounded away from zero in neighborhoods of F;1

X ( ) for 2(01) and the

population cuto point .

ASSUMPTION 3: Probability density function fY is bounded away from

(19)

Proof of Theorem 3.1

: First, we consider the following expansion n2 X i=1 (Yi;^Xout) 2I(Y i F^;1 X ()) = n 2 X i=1 (Yi;Xout) 2I(Y i F^;1 X ()) + (^Xout;Xout) 2 n2 X i=1 I(Yi F^;1 X ());2(^Xout;Xout) n2 X i=1 (Yi ;Yout) +n 2(Yout ;Xout)]I(Yi F^ ;1 X ()): (5.1)

From the theory for the outlier mean by Chen, Chen and Chan (2010), we may see that n1=2

2 (^Yout ;Yout) = Op(1), n 1=2 1 (^Xout ;Xout) = Op(1) and n;1=2 2 Pn 2 i=1(Yi ;Yout)I(Yi F^ ;1 X ()) =Op(1). We then, from (5.1),

may re-write the combined quantity as

n1=2 2 (S 2 YX ; 2 YX) =n1=2 2 ( n2 X i=1 I(Yi F^;1 X ()));1 f n2 X i=1 (Yi;Xout) 2I(Y i F;1 X () +n;1=2 2 T) ;I(Yi F ;1 X ())] + n 2 X i=1 (Yi;Xout) 2 ; 2 YX]I(Yi F;1 X ())g ;2(Yout;Xout)n 1=2 2 (^Xout ;Xout) +op(1) (5.2) where we let T =n1=2 1 ( ^F ;1 X ();F ;1 X ()).

With Assumptions 2 and 3, and techniques from Ruppert & Carroll (1980) and Chen & Chiang (1996), we may see that

n;1=2 2 n2 X i=1 (Yi;Xout) 2I(Y i F;1 X () +n;1=2 2 T ) ;I(Yi F ;1 X ())] =;(F ;1 X ();Xout) 2f Y(F;1 X ())T (5.3)

for any sequence T =O

p(1).

We may also see from Chen, Chen and Chang (2010) that the outlier mean ^Xout has the following representation

n1=2 1 (^Xout ;Xout) = ;1 X F;1 X ()n;1=2 1 n1 X i=1 (;I(Xi F ;1 X ())) +;1 X n;1=2 1 n1 X i=1 (Xi;Xout)I(Xi F ;1 X ()) +op(1): (5.4)

(20)

The result of this theorem is induced by plugging (5.3) and (5.4) into (5.2) and applying a representation for empirical quantile ^F;1

X () in Chen, Chen

(21)

(22)

(23)

(24)

(25)

(26)

(27)

(28)

(29)

由混合變異建構基因表現分析之無母數檢定

國

立

交

通

大

學

統計學研究所

碩

士

論

文

由混合變異建構基因表現分析之無母數檢定

Nonparametric Test based on Combined Variation

for Gene Expression Analysis

研究生：連紫汝

指導教授：陳鄰安 博士

由混合變異建構基因表現分析之無母數檢定

Gene Expression Analysis

誌 謝

Nonparametric Test based on Combined Variation

for Gene Expression Analysis

SUMMARY

1. Introduction

2. Combined Outlier Quantity

Table 3

3. The Test based on Combined Outlier Quantity

Theorem 3.1.

Corollary 3.2.

Table 4

4. Power Comparison by Simulation and a Simple Real Data

Anal-ysis

Table 12

5. Appendix

Proof of Theorem 3.1

指導教授：陳鄰安博士

誌謝