• 沒有找到結果。

離群比例之基因表現分析

N/A
N/A
Protected

Academic year: 2021

Share "離群比例之基因表現分析"

Copied!
21
0
0

加載中.... (立即查看全文)

全文

(1)

國立交通大學

統計學研究所

碩士論文

離群比例之基因表現分析

Outlier Proportion for Gene Expression

Analysis

研 究 生: 徐國誠

指導教授: 陳鄰安 教授

(2)

離群比例之基因表現分析

Outlier Proportion for Gene Expression Analysis

研究生:徐國誠 student:Guo-Cheng Shu 指導教授:陳鄰安 教授 Advisor:Lin-An Chen 國 立 交 通 大 學 理 學 院 統 計 學 研 究 所 碩 士 論 文 A Thesis

Submitted to Institute of Statistics College of Science

National Chiao Tung University in Partial Fulfillment of the Requirements

for the Degree of Master

in Statistics June 2012

Hsinchu, Taiwan, Republic of China 中 華 民 國 一 百 零 一 年 六 月

(3)

I

Outlier Proportion for Gene Expression Analysis

Student: Guo-Cheng Shu Adviser: Lin-An Chen Institute of Statistics

National Chaio Tung University

Abstract

Discovering the influential genes through the detection of outliers in samples from disease group subjects is a very new and important approach for gene expression analysis. Extended the outlier mean of Chen. Chen and Chan(2010), we develop the asymptotic distribution of the outlier proportion for linear regression model. Power comparison shows that tests based on this outlier estimator is very competitive and promising in detecting a shift of parent tail distribution.

Key words: Cancer outlier profile analysis、 Gene expression、 Outlier mean、 Outlier proportion、 Regression model.

(4)

II 離群比例之基因表現分析 研究生:徐國誠 指導教授:陳鄰安 教授 國立交通大學理學院 統計學研究所 摘要 從有病的樣本中透過離群值的檢測,發現有影響力的基因是目前一個 很新也很重要的基因表現分析,延續 Chen. Chen and Chan(2010) 離 群平均的想法,我們開發了以迴歸模型為基礎的離群比例的漸近分佈,

比較其檢定力後發現,以這離群估計為基礎的測試是非常有競爭力,

並有望檢測母體背後分佈的轉變。

關鍵詞: 癌症離群比例分析、基因表現、離群的平均、離群的比例、

(5)

III 誌謝 很榮幸能成為陳鄰安 教授的指導學生。在跟老師相處的一年多 時間,教導我的不僅僅是論文上的研究。並且讓我學習到:如何將所 學得之知識加以應用並解決問題。相信這對我對未來在面臨更大的挑 戰時,一定能更有效率、邏輯地去思考解決。再來要感謝口試委員: 許文郁 教授、蕭金福 教授、以及彭南夫 教授 在口試時對於我論文 提出更佳之建議及修改方向。 使我此篇研究結果能更加完整。 在碩士班兩年的期間,我要感謝409 研究室的所有同學。當我在 程式上遇到難題時,研究室的同學總會熱心地一起協助陪我渡過難 關。才能使我論文如期完成。且在課餘閒暇之時,和研究室的同學一 起去球場運動。讓我的課業壓力得以適當地釋放。讓我覺得在碩士生 涯的兩年期間過的相當愉快。 最後,我要感謝最支持我的父母,及弟弟。在我忙碌、壓力大 時,總會給我適時的鼓勵及安慰。讓我能全心全意地專注在研究上。 離開學校後,自己會更加努力、專注去做好每一件事。 徐國誠 謹誌于 國立交通大學統計學研究所 中華民國一百零一年六月

(6)

IV

Contents

1. Introduction 1

2. Formalization of Outlier Proportion 1

3. Outlier Proportion Estimator and Its Large Sample Theory 4

4. Power Performance for a Test based on Outlier Proportion 8

(7)

V

List of Tables

1. Outlier proportion differences �Dop

eff� . . . 3

2. Mean square error for outlier proportion estimation . . . 5

3. Variance Vb,out comparison . . . 7

4. Evaluation of probability of type 1 error . . . 9

5. Power performance of outlier proportion test under mixed normal distribution . . . 9

6. Power performance of outlier proportion test . . . 10

(8)

1. Introduction.

Among the existing techniques in dierential genes detection, common

statistical methods for two-group comparisons, such as t-test, are not

ap-propriate due to a large number of genes and a limited number of subjects available. Tomlins et al. (2005) observed in a study of prostate cancer that dierential genes are over expressed in a small number of disease samples. The problem of constructing statistical procedures based on outlier samples has been attracted considerable recent attention. Tibshirani and Hastie (2007) and Wu (2007) suggested to use an outlier sum, the sum of all the gene expression values in the disease group that are greater than a speci ed cuto point. The common disadvantage of these techniques is that the dis-tribution theory of the proposed methods has not been discovered so that

the distribution based p value can not be applied. Recently Chen, Chen

and Chan (2010) considered the outlier mean (average of outlier sum) and developed its large sample theory that allows us to formulate a distribution

based p value. Simulation study and data analysis show desired eciency

for tests based on outlier mean.

Uncertainties of gene expressions also show causal eect upon one or some biological conditions (independent variables, see Jin, Si et al. (2006), Huang and Pan (2003), Rambow, Piton et al. (2008) and Muller, Chiou and Leng (2008)). From their observation, Tomlins et al. (2005), investigating and verifying the characteristics of the parent tail distribution of the disease group data in linear regression models through estimation and hypothesis testing is new but important topic to be explored.

We consider the sample conditional quantile based on healthy group data as cuto for determination of outliers and introduce the sample proportion computed from these outliers for monitoring the outlier distribution and present its asymptotic distribution. With simulation study, the outlier pro-portion based test is shown desirable in terms of powers.

2. Formalization of Outlier Proportion

We consider that there are two gene expression variables ya and yb,

re-spectively, for normal (control) group population and disease group

popu-TypesetbyA M S-T E X 1

(9)

lation that follows linear regression models as

ya=x0

a+ and yb = x0

b+

wherexisp-vector of covariates (biological conditions) with constant one on

the rst element. A key to the challenge of quantifying outlier information

in model for gene response variable yb is to reparametrize the regression

parameters that characterizes the information contained in tail distribution of this variable.

By denoting the two distributions for ya and yb as Fya(:

jx) and Fy

b(: jx)

when vectorxis given, the main objective for conduction of gene expression

analysis is to perform a test for hypothesis of distributional equality as

HF :Fya(: jx) =Fy b(: jx)x2R + (2.1) where R+ = f(1x 0 1) 0 : x 1 2 Rp ;1

g. The classical approach conducts this

testing this hypothesis through verifying if there are equal conditional means as

x0

b =x0

ax2R

+ (2.2)

which is equivalent to verify if b = a is true. Following their

observa-tion, Tomlins et al. (2005) proposed to verify the parameters in outlier

distribution instead of original distributions Fya and Fyb. This requires a

formalization of regression parameters that contains the information in tail distribution.

Given a xedx, we consider the-th quantile of variable ya as the cuto

for outlier detection threshold that may be written as F;1

ya () = x 0

a()

wherea() =a+F;1

 ()eis the population regression quantile of Koenker

and Bassett (1978) and where p-vectore= (10:::0)0 andF;1

 () is the

-th quantile for error variablewith distribution functionF. Observations of

yaandyb over this quantile point are considered outliers. Our idea based on

Tomlins et al.'s observation that is extended from outlier mean to regression

model for testing hypothesis (2.1) is through a veri cation on variable yb's

conditional outlier proportion bout(x) =P(yb x

0

(10)

We may see that

bout(x) =P( F ;1

 () +x0(

a;b)) (2.3)

and the variable ya's conditional outlier proportion aout(x) = P(ya 

x0

a()) = 1; . For testing hypothesis (2.1), we propose to verify the

following relation:

bout(x) = 1;x2R

+: (2.4)

By denoting the dierence of two conditional outlier means as Dop(x) =

bout(x);(1;). For validation of this veri cation, considering the

fol-lowing model settings,

ya= 1 + 2x+ and yb = 1:1 + 2:1x+

N(01) and  N(01) + (1;)N( 1)

and with sample size n = 100, we display the sizes Dop(x) and eciency

eff =Dop=aout in Table 1.

Table 1.

Outlier proportion dierences Deffop

  = 0:1  = 0:3  = 0:6  = 0:9 = 1 x = 1 (00::037041) (00::121)084 (00::113284) (00::072725) x = 2 (00::048053) (00::160)112 (00::152380) (00::098976) x = 3 (00::058064) (00::196)137 (00::190478) (10::125248) x = 4 (00::066073) (00::229)160 (00::227568) (10::154542) = 5 x = 1 (00::037042) (00::127)089 (00::131327) (10::126257) x = 2 (00::049054) (00::165)116 (00::167417) (10::147468) x = 3 (00::058065) (00::200)140 (00::202506) (10::170701) x = 4 (00::066074) (00::232)163 (00::238594) (10::196955)

(11)

Signi cant dierences in size between Dom(x) and Dm(x) showing in this

table reveals that detection of dierence in conditional outlier means may be better in terms of power than detection of dierence in non-outlier con-ditional means.

We have observed the positive sign of using variable yb's outlier

propor-tion for gene expression analysis. However, consistent estimator of outlier proportion is too complicated since it requires to build up a consistent

es-timator of x-related parameter function bout(x) in (2.3). The diculty

can be solved if a reformalization of regression parameters can be done. By

letting b = b0 b1  and a = a0 a1 

with b0 and a0 the intercept

pa-rameters and b1 and a1 being, respectively, vectors of slope parameters,

we then set the following restriction

a1 =b1 (2.5)

which is true when H0 of (2.1) is true. This allows us to write outlier

proportion as bout=P( F ;1  () +a0 ;b 0): (2.6)

The next objective is establishing an estimator for the outlier proportion

bout of (2.6) and developing its distributional theory for construction of a

test for hypothesis (2.1).

3. Outlier Proportion Estimator and Its Large Sample Theory

For this gene expression study, we assume that there are n1 subjects in

the normal control group and n2 subjects in the disease group. Suppose

that there are m genes to be investigated. The gene expressions for normal

group subject have the regression model

yai =x0

ia+ii= 1:::n1 (3.1)

wherei's are iid error variables with distribution functionF and the disease

group subject have the regression model

ybi =x0

(12)

where i's are iid error variables with distribution function F.

We let the sample threshold be ^F;1

ya () = x 0^

a() where ^a() is the

sample regression quantile of Koenker and Bassett (1978) that solves

Minb2R p n1 X i=1 (yai;x 0 ib)(;I(yai x 0 ib)):

The estimator of the outlier proportion is ^ bout =n;1 2 n2 X i=1 I(ybi x 0 i^a()): (3.3)

We consider a simulation study for eciency of sample outlier proportion with the following models:

yai =1 + 2xi+ii= 1:::n1 where 

0

is are iid N(01) and

ybi =0+ 2xi+ii= 1:::n2 where 

0

is are iid F: (3.4)

Under F = 0:9N(01) + 0:1N(11), we perform replications m = 1000

with n = n1 = n2 from the above models and display the MSE's in Table

2 while their corresponding true outlier proportions are listed in ( )'s with

n= 50.

(13)

 0 = 1:1 0 = 1:3 0 = 1:5 n= 50 0:7 (00::00993738) (00::01104482) (00::01115247) 0:8 (00::00812664) (00::01013323) (00::01154041) 0:9 (00::00561496) (00::00751975) (00::00972541) n= 100 0:7 0:0049 0:0054 0:0053 0:8 0:0044 0:0051 0:0055 0:9 0:0029 0:0039 0:0052 n= 200 0:7 0:0026 0:0027 0:0028 0:8 0:0025 0:0027 0:0029 0:9 0:0019 0:0025 0:0029 n= 500 0:7 0:0012 0:0011 0:0012 0:8 0:0015 0:0014 0:0013 0:9 0:0014 0:0016 0:0016

Small MSE's shows that outlier proportion is a parameter appropriate for statistical inferences of hypothesis (2.1).

For the asymptotic properties for the outlier proportion, we need the fol-lowing assumptions:

(a) Assumption 2: limn2n

1!1 n2 n1 =`yx, a xed constant. (b) Assumption 3: limn2!1n ;1 2 Pn 2

i=1xi = x which is a xedp-vector.

From Asumptions (a) and (b), we see thatlimn1!1n

;1 1

Pn 1

i=1xi =`yx x Let

us further denote f and f as probability densitity functions, respectively,

for F and F. For the rest of this section, we assume that condition (2.4)

(14)

Theorem 3.1.

(a) The outlier proportion ^bout has the following repre-sentation n1=2 2 (^bout ;bout) =;f(F ;1  () +a0 ;b 0)` 1=2 yx f;1  (F;1  ()) 0 xQ;1 x n;1=2 1 n1 X i=1 xi(;I(i F ;1  ())) +n;1=2 2 n2 X i=1 I(iF ;1  () +a0 ;b 0) ;bout] +op(1): (b) n1=2 2 (^bout

;bout) converges in distribution to a normal random

vari-able with distribution N(0vbout) where

vbout =(1;)`yxf(F ;1  () +a0 ;b 0)f ;1  (F;1  ())]2 0 xQ;1 x x+bout(1;bout): (3.5)

By lettingX N(01) and Y N(m1), we computevbout for

compar-ison.

Table 3.

Variance vbout comparison

FY m= 0 m= 1 m= 3 m= 5 N(m1)  = 0:7 0:42 0:437 0:007 3:8E;6  = 0:8 0:32 0:562 0:018 1:6E;5  = 0:9 0:18 0:667 0:065 0:000 = 0:1  = 0:7 0:42 0:434 0:405 0:403  = 0:8 0:32 0:353 0:334 0:331  = 0:9 0:18 0:224 0:232 0:226 = 0:2  = 0:7 0:420 0:447 0:384 0:381  = 0:8 0:320 0:385 0:339 0:333  = 0:9 0:180 0:271 0:271 0:259 = 0:3  = 0:7 0:420 0:456 0:358 0:353  = 0:8 0:320 0:415 0:334 0:325  = 0:9 0:180 0:317 0:296 0:277

Larger percentage gives the outlier proportion estimator ^bout the smaller

asymptotic variance. Hence, larger percentage's may also give ^boutbetter

(15)

The above asymptotic distribution allows us to consider an outlier pro-portion based asymptotic pivotal quantity as

p n2( ^ bout ;(1;) p ^ vout

where ^vout is estimator ofvbout. However, it is unpleasant for this quantity

being involved with densitiesf andf so that their estimations when they

are unknown could be very in-ecient for not enough sample sizes. Hence, when (2.1) is true with

vaout =(1;)`yx 0

xQ;1

x x+bout(1;bout):

Let ^vout be estimate of vout. We then de ne an outlier proportion based

test for hypothesis (2.1) as

rejecting H0 if n 1=2 2 ( ^ bout;(1;) p ^ vaout )z (3.6)

wherez is the (1; )th quantile of the standard normal distribution. This

is an extension of the classical proportionptest to this outlier gene problem.

4. Power Performance for a Test based on Outlier Proportion

We consider the following design:

yai =1 + 2xi+ii= 1:::n1 where 

0

is are iid F and

ybi =0+hxi+ii= 1:::n2 where 

0

is are iid F: (4.1)

for evaluation of the test of (3.6). Additional to the outlier proportion ^bout

and regression quantile ^a(), some estimates are de ned as follows:

^ x = 1n 2 n2 X i=1 xbi `^xy = n2 n1 ^ Qx = 1n 2 n2 X i=1 xbix0 bi ^ vaout =(1;)^`xy ^ 0 xQ^;1 x ^x+ ^bout(1;^bout):

(16)

With = 0:05, we evaluate the following approximate power  = 1mXm j=1 I(n1=2 2 ( ^ bout;(1;) p ^ vaout )1:645): (4.2)

By letting h = 2 and F = F = F, the rst aim is to measure the type I

error probabilities of this test. We consider several distributions F and the

simulated sizes under this setting are displayed in Table 4.

Table 4.

Evaluation of probability of type I error

F  = 0:5  = 0:6  = 0:7  = 0:8  = 0:9 N(01) 0:049 0:049 0:045 0:041 0:057 t(3) 0:051 0:052 0:046 0:05 0:063 t(5) 0:047 0:05 0:047 0:05 0:057 t(10) 0:048 0:056 0:044 0:051 0:053 Lapalce(01) 0:048 0:049 0:046 0:057 0:049 Lapalce(11) 0:049 0:052 0:054 0:058 0:057

It is seen that the outlier proportion based test is quite robust with simulated

sizes all closer to the speci ed signi cance level (0:05).

Next we evaluate the power performance for the outlier proportion based

test. With the same design of experimment besides h and consider the

distributionF =N(01) and some F's, the simulated powers are displayed

in Table 5.

We consider mixed distribution for variable yb with F = N(01) and

F = 0:9N(01) + 0:1N( 1). The simulated powers are displayed in Table

6.

Table 5.

Power performance of outlier proportion test under mixed normal distribution

(17)

 h= 2:1 h= 2:3 h= 2:5 F =N( 1) = 1  = 0:5 1 1 1  = 0:7 1 1 1  = 0:8 1 1 1  = 0:9 0:999 1 1 = 2  = 0:5 1 1 1  = 0:7 1 1 1  = 0:8 1 1 1  = 0:9 1 1 1 F =2(3) ;2:5  = 0:7 0:542 0:933 0:998  = 0:8 0:875 0:993 1  = 0:9 0:995 1 1 F =t(10) + 0:5  = 0:7 0:993 1 1  = 0:8 0:980 1 1  = 0:9 0:955 1 1

Table 6.

Power performance of outlier proportion test

 h= 2:1 h = 2:3 h = 2:5 = 1  = 0:5 0:552 0:991 1  = 0:7 0:455 0:987 1  = 0:8 0:368 0:964 1  = 0:9 0:283 0:896 0:999 = 2  = 0:5 0:785 0:999 1  = 0:7 0:672 0:998 1  = 0:8 0:564 0:992 1  = 0:9 0:423 0:966 1

The powers displayed in these two tables show that the outlier proportion is quite satisfactory for gene expression analysis.

Lai, Chen and Chen (2012, unpublished) proposed a least squares esti-mator as ^ bls = (X0 bAbXb);1X0 bAbyb: (4.3) where we denoteXb = (xb1xb2:::xbn 2) 0, trimming matrixA b = diagfaii =

(18)

I(ybi x 0 bi^a())i= 1:::n2 gand yb = (yb 1:::ybn 2)

0. Two tests (denoted

by OL1 and OL2) based on this estimator are also introduced. With two outlier LSE based tests available, it is desired to verify if these two tests are

competitive. With the same design of experiment besides various hand the

error distributional settings: F =N(01) and several distributionsF. The

simulated powers are displayed in Table 7.

Table 7.

Power comparison of two outlier LSE based tests

 h= 2:1 h= 2:3 h= 2:5 F =N( 1) = 1  = 0:8OL1 0:213 0:31 0:397 OL2 0:634 0:799 0:871 OP 1 1 1  = 0:85OL1 0:157 0:186 0:258 OL2 0:573 0:706 0:819 OP 1 1 1 = 2  = 0:8OL1 0:681 0:755 0:828 OL2 0:985 0:997 0:998 OP 1 1 1  = 0:85OL1 0:444 0:502 0:589 OL2 0:97 0:983 0:989 OP 1 1 1 F =2(3) ;2:5  = 0:8OL1 0:899 0:932 0:928 OL2 0:960 0:982 0:984 OP 0:875 0:993 1  = 0:85OL1 0:823 0:821 0:845 OL2 0:957 0:967 0:979 OP 0:962 1 1 F =t(10) + 0:5  = 0:8OL1 0:899 0:932 0:928 OL2 0:960 0:982 0:984 OP 0:98 1 1  = 0:85OL1 0:823 0:821 0:845 OL2 0:957 0:967 0:979 OP 0:97 1 1

5. Appendix

(19)

Three assumptions for the asymptotic representation of the sample outlier mean test are as follows.

(c). Pobability density functionfX of distributionFX is bounded away from

zero in neighborhoods of F;1

X ( ) for 2 (01) and the population cuto

point .

(d). Probability density functionfY is bounded away from zero in a

neigh-borhood of the population cuto point .

Proof of Theorem 3.1.

From the expression of outlier proportion of (3.3) and linear regression model of (3.2), we have ^ bout =n;1 2 n2 X i=1 I(i F ;1  () +a0 ;b 0 +n ;1=2 1 x 0 iTa);I(i F ;1  () +a0 ;b 0)] +n;1 2 n2 X i=1 I(i F ;1  () +a0 ;b 0) (5.1) where Ta =n1=2 1 (^a() ;a()).

With (2.5), the rst term on the right hand side of (5.1) can be shown (Ruppert and Carroll (1980) and Chen and Chiang (1996) as

n;1=2 2 n2 X i=1 (I(i F ;1  () +a0 ;b 0+n ;1=2 1 x 0 iTn);I(i F ;1  () +a0 ;b 0)) =;` 1=2 yx f(F;1  () +a0 ;b 0) 0 xTn+op(1) (5.2)

for any sequence Tn = Op(1). Also, a representation of regression quantile

^ a() may be formulated as p n1(^a() ;a()) =f;1  (F;1  ())Q;1 x n;1=2 1 n1 X i=1 xi;I(i F ;1  ())] +op(1) (5.3)

see, for example, Ruppert and Carroll (1980). By letting Tn = Ta and

(20)

References

Agrawal, D., Chen, T., Irby, R., et al. (2002). Osteopontin identi ed as lead marker of colon cancer progression, using pooled sample expression

pro ling. J. Natl. Cancer Inst.,

94

, 513-521.

Alizadeh, A. A., Eisen, M. B., Davis, R. E., et al. (2000). Distinct types of diuse large B-cell lymphoma identi ed by gene expression pro ling.

Nature,

403

, 503-511.

Chen, L.-A., Chen, D.-T. and Chan, W.. (2010). The p Value for the

Outlier Sum in Dierential Gene Expression Analysis. Biometrika, 97,

246-253.

Chen, L.-A. and Chiang, Y. C. (1996). Symmetric type quantile and trimmed

means for location and linear regression model. Journal of

Nonpara-metric Statistics.,

7

, 171-185.

Cheng, S, W., and Thaga, K. (2006). On single variable control charts: an

overview. Quality and Reliability Engineering International,

22

,

811-820.

Hoaglin, D. C., Mosteller, F. and Tukey, J. W. (1983). Understanding

Ro-bust and Exploratory Data Analysis, Wiley: New York.

Huang, X. and Pan, W. (2003). Linear regression and two-class classi cation

with gene expression data. Bioinformatics

19

, 2072-2078.

Jin, R, Si L, Srivastava S, Li Z, Chan, C. A knowledge driven regression

model for gene expression and microarray analysis. Conference

Pro-ceedings - IEEE Engineering in Medicine and Biology Society

1

, 5326-9.

Koenker, R. and Bassett, G.J. (1978). Regression quantiles. Econometrica

46

, 33-50.

Luan, H. L. Y. (2003). Kernel Cox regression models for liking gene

expres-sion pro les to censored survival data. Pacic Symposium on

Biocom-puting,

8

, 65-76.

Muller, H.-G., Chiou, J.-M. and Leng, X. (2008). Inferring gene expression

(21)

60.

Ohki, R., Yamamoto, K., Ueno, S., et al. (2005). Gene expression pro ling of human atrial myocardium with atrial brillation by DNA microarray

analysis. Int. J. Cardiol.

102

, 233-238.

Rambow, F., Piton, G., Bouet, S., Leplat, J.-J., Baulande, S., Marrau, A., Stam, M., Horak, V., Vincent-Naulleau, S. (2008). Gene expression

sig-nature for spontaneous cancer regression in Melanoma pigs. Neoplasia

10

, 714-726.

Rong, J., Luo, S., Srivastava, S., Zheng, L. and Chan, C. (2006). A knowl-edge driven regression model for gene expression and microarray

anal-ysis. EMBS 06, 28th Annual International Conference of the IEEE,

5326-5329.

Ruppert, D. and Carroll, R.J. (1980). Trimmed least squares estimation

in the linear model. Journal of American Statistical Association

75

,

828-838.

Sorlie, T., Tibshirani, R., Parker, J., eta l. (2003). Repeated observation of

breast tumor subtypes in independent gene expression data sets. Proc.

Natl. Acad. Sci. U.S.A.,

100

, 8418-8423.

Tibshirani, R. and Hastie, T. (2007). Outlier sums dierential gene

expres-sion analysis. Biostatistics,

8

, 2-8.

Tomlins, S. A., Rhodes, D. R., Perner, S., eta l. (2005). Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer.

Science,

310

, 644-648. Vinciotti, V. and Yu, K. (2009). M-quantile

regression analysis of temporal gene expression data. Statistical

Appli-cations in Genetics and Molecular Biology, 8 (1) : 41

Wu, B. (2007). Cancer outlier dierential gene expression detection.

數據

Table 1. Outlier proportion dierences D op
Table 2. Mean square error for outlier proportion estimation
Table 3. Variance v bout comparison
Table 4. Evaluation of probability of type I error
+3

參考文獻

相關文件

Thus, for example, the sample mean may be regarded as the mean of the order statistics, and the sample pth quantile may be expressed as.. ξ ˆ

In this process, we use the following facts: Law of Large Numbers, Central Limit Theorem, and the Approximation of Binomial Distribution by Normal Distribution or Poisson

Lee [2006] uses a difficulty level based on a weighted sum of the techniques required to solve a puzzle, showing from a sample of 10,000 Sudoku puzzles that there is a

← This allows teachers to adapt the school-based English Language curriculum and devise learning/teaching materials that better suit the diverse abilities, needs

Population: the form of the distribution is assumed known, but the parameter(s) which determines the distribution is unknown.. Sample: Draw a set of random sample from the

These are quite light states with masses in the 10 GeV to 20 GeV range and they have very small Yukawa couplings (implying that higgs to higgs pair chain decays are probable)..

The major topics of the paper are Chan, Chan Buddhism, the very beginning of Chan, method of Chan, master or teacher of Chan, the mean between the two extremes, understanding

The point should then be made that such a survey is inadequate to make general statements about the school (or even young people in Hong Kong) as the sample is not large enough