• 沒有找到結果。

Statistical Issues on Genomic Composite Biomarker Classifiers

N/A
N/A
Protected

Academic year: 2021

Share "Statistical Issues on Genomic Composite Biomarker Classifiers"

Copied!
43
0
0

加載中.... (立即查看全文)

全文

(1)

Statistical Issues on Genomic

Composite Biomarker Classifiers

Jen-pei Liu1,2 and Shein-Chung Chow3

1 Division of Biometry, Department of Agronomy

National Taiwan University, Taipei, Taiwan

2 Division of Biostatistics and Bioinformatics

National Health Research Institutes, Zhunan, Taiwan

3 Department of Biostatistics and Bioinformatics

Duke University School of Medicine Durham, NC 27706, U.S.A

At

2006 Joint Statistical Meeting Seattle, Washington, U.S.A.

(2)

Outlines

„

Introduction

„

Selection of Genes and Representation

„

Distribution

„

Agreement and Reproducibility

„

Estimation of Treatment Effects in

Targeted Clinical Trials

(3)
(4)

Introduction

„

Post HGP (Human Genome Project) Era

„

Pharmacogentics and Pharmacogenomics

„

Biochip Products

„

Target Clinical Trials

„

Personalized Medicine

„

Diagnosis and Individualized Treatment

(5)

Introduction

„ TAILORx (JCO, 2006)

„ Patients

„ 10,000 patients with early-stage breast cancer

„ ER and/or PR+ and Her2/new – and not spread to

lymph nodes

„ Diagnostic Device

„ Likelihood of distant recurrence

„ Based on 21 prospectively selected genes in

paraffin-embedded tumor tissue

(6)

Introduction

„

TAILORx (JCO, 2006)

„

Treatment

„ Recurrence scores

„ >25: chemotherapy + hormonal therapy „ <11: hormonal therapy

„ 11 – 25: randomization

„ Standardized combination chemotherapy + adjuvant

hormonal therapy

„ Adjuvant hormonal therapy

„

Follow-up:

10 years additional 20 years after initial

(7)

Introduction

„ Issues in GCB classifiers

„ Selection of genes

„ Functional form for the overall representation of

expression levels

„ Distribution of GCB classifiers and determination

of thresholds

„ Evaluation of agreement and reproducibility for

GCB classifiers

(8)

Selection of Genes

and Representation

„ Differentially expression genes for diagnosis

of molecular targets

„ Current hypothesis

Ho: μTi - μCi =0 and Ha: μTi - μCi ≠ 0, i=1,…,G

„ Statistical significance based on hypothesis of

difference does not take into consideration of magnitude of levels of differentially expressed genes and their biological significance

„ Fold change does not take into consideration of

(9)

Selection of Genes

and Representation

„ Statistical hypothesis for identification of

differentially expression genes should take into consideration biological significance

Ho: μTi - μCi ≥ δLi and μTi - μCi ≤ δUi Vs.

HA: μTi - μCi < δLi and μTi - μCi > δUi

(10)

Selection of Genes

and Representation

T i C i L i L 2 i T i C i T i C i U i L 2 i T i C i T i C i T i C i A tw o o n e -s id e d te s t Y Y T 1 1 s ( ) n n a n d Y Y T 1 1 s ( ) n n

G e n e i is c la im e d to b e d iffe re n tia lly e x p re s se d a t th e s ig n ific a n c e le v e l if T > t( , n n 2 ) o r T < t( , n n 2 ), δ δ α α α − − = + − − = + + − + − i = 1 ,...,G

(11)

Selection of Genes

and Representation

„ The average type I error rate is controlled at

the nominal level

„ The power function is a parabola and

symmetric and reaches the minimum at (δLi + δUi)/2

„ Simulation studies was conducted to compare

with current methods

„ Unadjusted, Bonferroni adjustment, fold changes

(12)

Selection of Genes

and Representation

„

Functional Form

„ Differentially expressed genes

„ Over-expressed in T and under-expressed in C „ Under-expressed in T and over-expressed in C

„ Representation

„ Difference of expression levels of differentially expressed genes between the test and control „ Ratio of expression levels of differentially

(13)

Distributions

„ Genomic Composite Biomarker (GCB)

Classifier

„ The number of differentially expressed gene for

diagnosis of certain molecular targets is a random variables

„ The number of genes in GCB classifier is a random

variable

„ The expression level for each selected gene in

(14)

Distributions

„ Only consider the linear function

„ X = w1Y1 + w2Y2 + … + wgXg,

„ g is the number of differentially

expressed genes selected from a pool of a total of G genes

„ Yi is the expression level of gene i

based on log 2

(15)

Distributions

„ Suppose all weights are equal and Yi are i.i.d. with

mean μ and variance σ2 and the number of

differentially expressed gene follows a Poisson distribution with mean λ

„ The distribution of X can be expressed by

convolution

g* g g=0

F(x) = p .S (x),

where the probability that the number of differentially expressed genes is g, and

(16)

Distributions

„ E(X) = λμ „ Var(X) = λσ2 + λμ2 „ Asymptotic normality 2 2 i i 1 1 2 Let Y Y , and s (Y Y)

An estimate of variance of X is given as Var(X) = m Y ms X-E(X) Z= N(0,1) Var(X ) g g i= i= = = − + →

(17)

Agreement and Reproducibility

„ Only recognition and evaluation of agreement

and reproducibility:

„ Dobbin et al. (Clinical Cancer Research, 2005) „ Larkin et al. (Nature Methods, 2005)

„ Irizarry et al. (Nature Methods, 2005)

„ Members of the Toxicogenomic Research Consortium

(Nature Methods, 2005)

„ Tan, et al. (Nucleic Acids Research, 2003) „ Yauk, et al. (Nucleic Acids Research, 2004)

(18)

Agreement and Reproducibility

„

Correlation coefficient

„ A measure for association

„ Not a measure for similarity (or agreement)

„

Euclidean distance

„ A measure for agreement

(19)

Agreement and Reproducibility

„ Example

Case I Case II Case III

X1 X2 X1 X2 X1 X2

1 1 1 2 1 4

2 2 2 4 2 8

3 3 3 6 3 12

(20)

Agreement and Reproducibility

„

Hypothesis of no correlation can not prove

agreement not reproducibility

„

With 5000 genes, a correlation of 0.05 is

statistically significant from 0 at 1% level

„

Use of concordance correlation coefficient

(21)

Agreement and Reproducibility

1i 2i 2 1i 1 1 12 2 2i 2 12 2

We want to evaluate agreement of expression levels of two replicates of genes,

i.e., Y Y , i=1,...,G, and

Y

N ,

Y

Concordance Correlation Coefficient

μ σ σ μ σ σ = ⎡ ⎛ ⎞⎤ ⎛ ⎞ ⎛ ⎞ ⎢ ⎜ ⎟⎥ ⎜ ⎟ ⎜ ⎟ ⎜ ⎝ ⎠ ∼ ⎢⎝ ⎠ 12 2 2 2 1 2 1 2 (CCC) 2 = ( )

The sample estimate is given as 2s

σ ρ

(22)

Agreement and Reproducibility

„ Hypothesis for Agreement

„ HO: ρ ≤ ρa vs. HO: ρ > ρa , where ρa is the minimal required level of agreement

„ Reject HO and conclude agreement if the

lower limit of the (1-α)% asymptotic C.I. is

greater than ρa

„ Use the concept of generalized pivotal

(23)

Agreement and Reproducibility

2 2 2 1.2 1 2 2 1.2 1 2 2 2 2 12 1 12 1.2 2 12 1.2 22 2 1.2 12 U ( 1), U ( 2), Z N(0,1), and Z N(0,1); U , U , Z , Z are independent s Define R , U s Z 1 R [ s s x x ], U U U s R R , G G χ − χ − = = − = + ∼ ∼ ∼ ∼

(24)

Agreement and Reproducibility

1 2 1 2 1 2 12 2 1 2 2 2 12 1.2 1 2 2 12 2 1 2 R R 2R R (y y ) Z , G G G s s s . s A GPQ for CCC is given as 2R R= R R R

A (1- )% C.I. can be obtained by Monte Carlo method

μ μ μ μ

α

− − = − − + − = − + +

(25)

Agreement and Reproducibility

„

Reproducibility

„ Dobbin et al. (Clinical Cancer Research, 2005)

„ Reproducibility of 4 different labs based on

Affymetrix Human Genome U133A array

„ Each of 12 tumor tissue was divided into 6

blocks

(26)

Agreement and Reproducibility

Site 1 AABBCCD DEEFFGHI JKL The underlined sample was failed sample Site 2 AABBCDE FGGHHIIJ JKL Site 3 ABCCDDF GGHHIJK KLL Site 4 ABCDEEF F G H I I J J K LL

(27)

Agreement and Reproducibility

„ Source of variation

„ Between laboratory (αi) „ Between samples (βj) „ Imbalance

„ Model: Two-way classification random-effects

model without interaction

„ Yijk = μ + αi + βj +eijk,

i = 1,…,a;j = 1,…,b;k=1,…nij;

where μ is an unknown constant, αi ~ N(0, σα2),

βj ~ N(0, σβ2),and e

(28)

Agreement and Reproducibility

„ Parameters of Reproducibility

„ Hypothesis of Reproducibility

HO: ρα ≤ ρα0 vs. HA: ρα > ρα0 ,

where ρα0 is the minimal required level for

reproducibility 2 2 2 2 e α α α β

σ

ρ =

σ + σ + σ

(29)

Agreement and Reproducibility

„

Reject the null hypothesis if the upper

limit of the (1-α)% C.I. for ρ

α

is greater

than ρ

α0

„

For imbalanced case and small sample

size, apply the method of generalized

pivotal quantity (GPQ) to obtain the

exact C.I.

(30)

Agreement and Reproducibility

2 max 0 A E max B

Application of GPQ to obtain the (1- )% C.I. for The generalized pivotal quantity for is given as

A T(Y,y, )= , A+B+C ss (e) 1 ss( ) where A=max[0,( )( ( ) ), b U U 1 ss( ) B=max[0,( )( ( a U α α α ρ ρ ξ α − λ β − λ D 2 0 E 2 max 0 E ss (e) ) ), and U ss (e) C= ( ) , U λ D D

(31)

Agreement and Reproducibility

2 2 2 A B E ' a b a 1 b a a 1 0 U (a 1), U (b 1), U (n ab a b 2), is a abxab orthogonal matrix with the first row being /ab such that

1 ( ) ' diag(1, , , ) and b 1 ( ) ' diag(1, , , ); b is a (a+b-2)x(a+b-2) − − χ − χ − χ − − − + ⊗ = ⊗ = P 1 P I J P I 0 0 P J I P 0 I 0 D ∼ ∼ ∼ submatrix of corresponding

to α and , the orthogonal transformation of vector of cell means;β P

(32)

Agreement and Reproducibility

ij ij n a b 2 ' ij. 0 ijk i 1 j 1 k 1 n n ij ' n-ab ' 1 2 3 a+b-2 n-ab-a-b+2 1 2 3 ' ' 1 1 2 2 0 ss (e) (Y Y ) , diag{ / n }

There exists an orthogonal matrix such that diag( , ) =( : : )diag( , , )( : : ) = ; ss (e) ss = = = = − = = − = + =

∑∑∑

y Hy H I J H G I 0 G G G G I I 0 G G G G G G G ' ' ' ' 1 2 1 1 2 2 ' ' ' ' ' 1/ 2 ' max 0 a+b-2 0 1 ' ' (e) ss (e) y y y y; w (w , w ) ( , ) [ ( ) ] y; ss( ) w w and ss( )=w w α β α β α α β β + = + = = + λ − α = β G G G G z z D I D G

(33)

Treatment Effects for

Targeted Clinical Trial

„

Enrichment design for targeted clinical

trials

„ Patients with positive diagnosis for the

molecular targets were randomized into the test drug or control group

„ The diagnostic device for the molecular

targets is not a perfect device

„ Patients with positive diagnosis may not

(34)

Treatment Effects for

Targeted Clinical Trial

„

Enrichment design for targeted clinical

trials

„ The objective of targeted clinical trials is to

estimate the treatment effect of the test drug

„ Due to the FPR, the observed mean

difference between the test and control groups under-estimate the true treatment effect

(35)

Treatment Effects for

Targeted Clinical Trial

„ Enrichment design for targeted clinical

trials

„ The expected value of mean difference

„ Under-estimation of the true treatment effect

becomes more severe as the prevalence rate of molecular targets decreases

T C +T C -T -C

(36)

Treatment Effects under

Targeted Clinical Trials

„ The true status of molecular targets for each

patient is not available

„ The positive predictive value can be

estimated from the clinical validation trials for the medical diagnostic device

„ Under normal assumption, the EM algorithm

can be applied to estimate the true treatment effect of the test drug for the patients with

molecular targets by assuming the PPV is a constant

(37)

Treatment Effects under

Targeted Clinical Trials

ij

i ij

ij

Let Y be the observation of patient j

receiving treatment i; i=T,C; j=1,...,n

Let

be the indicator for molecular target

1(0) if patient j in treatment group i

has the molecular target

π

π

π

=

. . i i d

(38)

Treatment Effects under

Targeted Clinical Trials

ij

ij

2 2

ij 2 ij +i ij -i

ij

Under normal assumption, given a value , for treatment i, y has a normal distribution with density

1 1

f(y 1 ) exp{ [ (y - ) (1- )(y - ) ] (a)

The conditional probability given

ij ij ij π π π μ π μ σ σ π ∝ − + i ij 2 ij ij -i -i ij n 2 2 2 ij +i ij -i j=1 y P( 1y ) 1/{1 exp[( - )( -2y ) / 2 ]} (b) The log-likelihood l=nlog { [ (y - ) (1- )(y - ) ]} / 2 (c) i i ij ij π μ μ μ μ σ σ π μ π μ σ + + = + + +

(39)

Treatment Effects under

Targeted Clinical Trials

T -T 2 C -C ij T -T C -C Procedure

(1) E step: substitute "current" estimates of , ,

, , into (b) to obtain provisional

values for the expectation of

(2) M Step: Obtain the MLE of , ,

, μ μ μ μ σ π μ μ μ μ + + + + 2 ij , after replacing in (c)

(3) Repeat (1) and (2) until the estimates of

(40)

Discussion and Summary

„

Issues in evaluation of quality and

utility of GCB classifiers

„

Identification of differentially expressed

genes

„

Distribution of GCB classifiers

„

Agreement and reproducibility

„

Estimation of treatment effects for

(41)

Discussion and Summary

„

Identification of differentially expressed

genes

„ Determination of thresholds

„ Definition of type I error

„ Sample size estimation

„ Optimal functional form for an overall

(42)

Discussion and Summary

„

Distribution of GCB classifiers

„ Correlated expression levels among genes

„ Expression levels are not identically

distributed

„ Estimation of weights

„ Exact distribution

„ Determination of decision thresholds and

evaluation of their systematic bias

(43)

Discussion and Summary

„

Agreement and Reproducibility

„ Determination of minimal required threshold

„ Correlation of expression levels among

genes

„

Estimation of Treatment Effects

„ Two or more molecular targets for different

pathways

„ Variability of the estimated positive

參考文獻

相關文件

4G - Index and principal rates of change of the Composite Consumer Price Index at section, class and group levels of goods and services. 4A - Index and principal rates of change

4G - Index and principal rates of change of the Composite Consumer Price Index at section, class and group levels of goods and services. 4A - Index and principal rates of change

4G - Index and principal rates of change of the Composite Consumer Price Index at section, class and group levels of goods and services. 4A - Index and principal rates of change

4G - Index and principal rates of change of the Composite Consumer Price Index at section, class and group levels of goods and services. 4A - Index and principal rates of change

4G - Index and principal rates of change of the Composite Consumer Price Index at section, class and group levels of goods and services. 4A - Index and principal rates of change

4G - Index and principal rates of change of the Composite Consumer Price Index at section, class and group levels of goods and services. 4A - Index and principal rates of change

4G - Index and principal rates of change of the Composite Consumer Price Index at section, class and group levels of goods and services. 4A - Index and principal rates of change

5.1.1 This chapter presents the views of businesses collected from the business survey, 12 including on the number of staff currently recruited or relocated or planned to recruit