110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 1
Statistical Methods for
Biotechnology Products
Part II: Biochip Diagnostic Products
Evaluation of Performance of
Microarray Products
by
Professor, Jen-pei Liu, PhD
Division of Biometry, Department of Agronomy
National Taiwan University, and
Division of Biostatistics and Bioinformatics
National Health Research Institutes
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 2
Outline
Introduction
Performance Evaluation of Affymetrix Gene
Chip
Performance Evaluation of Scanner
Performance Evaluation of GeneChip
Fluidics Station 450
Examples of Association between Expressi
on Levels and Clinical Outcomes
Performance Evaluation of Different Platfor
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 3
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 4
Introduction
Special instrument requirements for Roc
he AmpliChip CYP450 Microarray:
Affymetrix GeneChip Microarray Instrument
ation System (Platform)
GeneChip Fluidics Station 450DX
GeneChip Scanner 3000DX with Autoloader
Data station for GeneChip Operating Software a
nd AmpliChip CYP450 Data Analysis Software
GeneChip Operating Software (GCOS) Version 1.
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 5
Types of Microarray Platforms
Characteristics of AmpliChip CYP450 Microarray
15,129 probes with ~10
7of the specific oligonucle
otide probe each in 11 m square position for a ~
2cm glass
Length of probe sequence: 16-22 bases
A single Probe Set consists of 4 Probes (or Feature
s) which have a fixed target except for at the subs
titution position wherean A, C, G, and T are includ
ed to generate four unique probes: one Perfect M
atch (PM) and three Mismatch (MM)
High-density oligonucleotide microarray probre-tili
ng approach to genotying
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 7
Types of Microarray Platforms
Characteristics of AmpliChip CYP450 Microarray
A Probe Set Pair consists of a Wild-type Probe Set
(for Wild-type allele) and a Mutant Probe Set (for
a known polymorphism).
To distinguish 29 polymorphisms in CYP2D6
including gene duplications and gene deletion
To identify 27 distinct alleles including 7 CYP2D6
gene duplication genes
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 8
Types of Microarray Platforms
25mer oligonucleotide probe sets printe
d by photolithography
Single spotted 30mer oligonucleotide
Single spotted 60mer oligonucleotide sy
nthesized in situ
Covalent attachment of prefabricated oli
gonucleotide
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 9
Introduction
Statistical Evaluations for
Each Instrument
a Complete System
Different Platforms
Between Laboratories
Issues Concerning Statistical Methods for
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 10
Performance Evaluation of Aff
ymetrix GeneChip
Technology
Semiconductor fabrication techniques
Solid phase chemistry
Random access combinatory chemistry
Molecular biology
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 11
Performance Evaluation of Aff
ymetrix GeneChip
Quality Controls for Manufacturing
Probe Design and Selection
Synthesis Process
Random Access Combinatory Chemistry
Signal Intensity
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 12
Performance Evaluation of Aff
ymetrix GeneChip
Manufacturing (Fabrication)
Photolithographic process
Light-sensitive chemical compound to
prevent coupling
Lithographic masks
5”x5” wafer surface flooded with solution
of A,T,C or G
Nucleotides also coupled with
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustrationhttp://www.affymetrix.com/technology/manufacturing/index.aff13
x
Photolithographic fabrications
Performance Evaluation of Aff
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 15
Performance Evaluation of Aff
ymetrix GeneChip
Combinations of probes are
simultaneously synthesized through
repeated cycles of random access
combinatorial chemistry
High-density arrays over 10
7
probes
Each wafer is diced into tens, hundreds or
thousands of individual arrays
Sampling of arrays from every wafer by
running control hybridization using
standardized control probes
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 16
Performance Evaluation of Aff
ymetrix GeneChip
Random Access Combinational Chemistry
The number of compounds of length N, composed
of Y different subunits, is equal to Y
N
One can synthesize each of Y
Ncompounds in Y
times N steps
25mer probes of A,C,G,T
Synthesis of 4
15probes in 4 times 25 or 100 steps
Can examine the entire 3.1 billion bases in the
human in 100 steps or less
Easy implementation of in-process control for
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 17
Performance Evaluation of Aff
ymetrix GeneChip
Probe Design and Selection
25mer probes (why?)
22 probes for expression measurements
40 probes for genotype calls
Balance between sensitivity and specificity
Discrimination between signal and
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 19
Performance Evaluation of Aff
ymetrix GeneChip
Mismatch Probes
To detect and eliminate false or contaminat
ing fluirescence
Hybridization to nonspecific sequences as e
ffectively as perfect match
To serve internal control for perfect match
To quantify cross-hybridization
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 20
Performance Evaluation of Aff
ymetrix GeneChip
Design Verification
Design of photolithographic masks
Automatic software tests to verify the desig
n by array
in silico
Goal: the correct probe sequence are synth
esized in the correct (x,y) position in the ar
ray
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 21
Performance Evaluation of Aff
ymetrix GeneChip
Synthesis Verification
Manufacturing execution software system
Correct reagents
Correct mask
Each mask in a design set has a unique identification
number
Correct wafer
Correct timing
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 23
Performance Evaluation of Aff
ymetrix GeneChip
Combinatorial Chemistry Strategy
Goal: oligonucleotide probes on the array
are synthesized correctly
Control probes
Probe 1 is built in cycles 1,2,4, and 5
Probe 2 is built in cycles 3,5,7, and 8
Analyze probes 1 and 2 over all 8 cycles
No need to analyze all probes in an array
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 25
Performance Evaluation of Aff
ymetrix GeneChip
Signal Verification
Use of a second panel of control probes
Recombinase gene from P1 bacteriophage
cre
Genes in the biotin synthesis of E. coli
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 26
Performance Evaluation of Aff
ymetrix GeneChip
Analytical Sensitivity
Quantitative Testing: The change in
response of a measuring system or
instrument divided by the corresponding
change in the stimulus
Quality Testing: The test’s ability to obtain
positive results in concordance with
positive results obtained by the reference
method
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 27
Performance Evaluation of Aff
ymetrix GeneChip
Analytical Specificity
Quantitative Testing: the ability of an analytical m
ethod to determine only the component it purport
s to measure or the extent to which the assay res
ponds only to all subsets of a special analyte and
not to other substances present in the sample
Quality (Semi-quantitative, i.e., ordinal) Testing: T
he method’s ability to obtain the negative results i
n concordance with negative results by the refere
nce method
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 28
Performance Evaluation of Aff
ymetrix GeneChip
New Statistical Algorithms
Statistical significance for detection and
change calls
Confidence limits for log ratio values (fold
changes)
Fine tuning parameters to vary the
stringency of the analyses
Elimination of negative expression values
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 29
Performance Evaluation of Aff
ymetrix GeneChip
Single Array Analysis
Detection Algorithm
Detection p-value
Detection call
Signal Algorithm
Comparison Analysis
Change Algorithm
Robust Normalization
Change p-value
Change Call
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 30
Performance Evaluation of Aff
ymetrix GeneChip
Detection p-value against
user-definable cut-offs to determine the
Detection call
A Call indicates whether a transcript is
reliably detected (present) or not
detected (Absent)
A signal value is calculated which assigns
a relative measure of abundance to the
transcript
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 31
Performance Evaluation of Aff
ymetrix GeneChip
A transcript is represented as a probe set
A probe set is made up of probe pairs of Perfect Matc
h (PM) and Mismatch (MM)
Intensity of probe pair: key ingredients for expressio
n measurement
This measurement is calculated for each probe set in
the form of qualitative and quantitative values
Expression measurements of a baseline and experime
nt array can be compared to understand relative cha
nge in abundance of a transcrip
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 32
Performance Evaluation of Aff
ymetrix GeneChip
Detection p-value
Calculation of the Discriminant score for ea
ch probe pair
R = (PM-MM)/(PM+MM)
Test the Discriminant score against the use
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 35
Performance Evaluation of Aff
ymetrix GeneChip
PM intensity is 80
MM intensity increases from 10 to 100
Discriminant score decreases as MM intensity
increases
The ability to discriminate between PM and M
M decreases as MM increases
The dashline is the threshold = 0.015
The one-sided Wilcoxon’s signed rank test to
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 36
Wilcoxon Signed-rank Test
Example:
A frozen foods manufacturer
A: own frozen product
B: competitors frozen product
Purchasing managers: 10-point Likert scale
(High score: better product)
Price, package design, variety, company pr
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 37
Wilcoxon Signed-rank Test
Likert Score
Manager
A
B
Difference(A-B)
1
6.5
4.0
2.5
2
4.0
8.0
-4.0
3
5.5
6.0
-1.0
4
5.5
10.0
-4.5
5
7.0
8.0
-1.0
6
4.0
9.0
-5.5
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 38
Wilcoxon Signed-rank Test
Likert Scale
Impression
The manager is asked to mark his/her respon
se on the scale
The mark defines an assumed measure of acc
eptability for the product in question
Vary poor Acceptable outstanding
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 39
Wilcoxon Signed-rank Test
Exact Methods for Small Sample(n ≤ 30)
1.
Calculate the difference( x
A- x
B)for each of the n pairs.
Differences equal to zero are eliminated, and the num
ber of pairs, n, is reduced accordingly.
2.
Rank the absolute values of the differences, assigning
1 to the smallest, 2 to the second smallest, and so on.
Tied observations are assigned the average of the ran
k that would have been assigned with no ties.
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 40
Wilcoxon Signed-rank Test
3.
Calculate the rank sum for the negative differences
and label this value T
-
. Similarly, calculate T
+
, the
rank sum for the positive differences.
4.
For a two-tailed test we use the smaller of these two
quantities, T, as test statistic to test the null
hypothesis that the two population relative
frequency histograms are identical.
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 41
Wilcoxon Signed-rank Test
Methods for Larger Samples (n>30)
1.Null hypothesis: H
0: The population relative
frequency distributions for A and B are identical.
2.
Alternative hypothesis: H
a: The population relative
frequency distributions differ in location (a
two-tailed test). Or H
a:The population relative frequency
distribution for A is shifted to the right (or left) of the
relative frequency distribution for population B (a
one-tailed test).
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 42
Wilcoxon Signed-rank Test
3.
Test statistic:
and T can be either T
+
or T
-
24
/
)]
1
n
2
)(
1
n
(
n
[
4
/
)
1
n
(
n
T
z
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 43
Wilcoxon Signed-rank Test
4.
Rejection region:
Reject H
0if or for a two-tailed
test. For a one-tailed test, place all of
in
one tail of the z distribution. To detect a shift
in the distribution of A observations to the
right of the distribution of B observations, let
T=T
+
and reject H
0when . To detect a
shift in the opposite direction, let T=T
-
and
reject H
0if . Tabulated values of z
are given in Table 3 in the Appendix.
2 /
z
z
z
z
/2
z
z
z
z
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 44
Wilcoxon Signed-rank Test
Example(continued)
Likert score
Manager
A
B
Difference
(A-B)
Rank of
Abs. Diff
Signed
Rank(+)
Signed
Rank(-)
1
6.5
4.0
2.5
3
0
3
2
4.0
8.0
-4.0
4
4
0
3
5.5
6.0
-1.0
1.5
1.5
0
4
5.5
10.0
-4.5
5
5
0
5
7.0
8.0
-1.0
1.5
1.5
0
6
4.0
9.0
-5.5
6
6
0
Total
T
-=18
T
+=3
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 45
Wilcoxon Signed-rank Test
N=7<30 Exact Method
Two-tailed: T=min(T
-
, T
+
)=min(18, 3)=3
=0.05, T
0
=2 T=3 > T
0
=2
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 47
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 54
Performance Evaluation of Aff
ymetrix GeneChip
↑ the number of false present calls (↑sensitiv
ity) and the number of true present calls ( sp
ecificity)
Before the detection call, the level of photomultipl
ier saturation for each probe is evaluated.
If all probed pairs in a probe set are saturated, the
probe set is immediately given a Present call.
A probe pair is rejected for further analysis when
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 55
Performance Evaluation of Aff
ymetrix GeneChip
Signal Algorithm
Signal is a quantitative metric calculated for each
probe set which represented the relative level of e
xpression of a transcript
Weighted mean by One-step Tukey’s Biweight Esti
mate
The vote: an estimate of the real signal due to hy
bridization of the target
The MM intensity is used to estimate stray signal
The real signal is estimated by taking the log of th
e PM intensity after subtracting the stray signal es
timate
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 56
Performance Evaluation of Aff
ymetrix GeneChip
Signal Algorithm
The probe pair signal close to the median
for a probe set is weighted more strongly
The quantitative metric Signal is the mean
of the weighted intensity values for a probe
set
MM > PM because of cross hybridization
and no additional information
Use of an imputed value = Change
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 57
Performance Evaluation of Aff
ymetrix GeneChip
(1) If MM < PM, MM is informative and is an
estimate of stray signal
(2) If MM is generally informative for a probe
set and only a few noninformative MM, use
adjusted MM for the noninformative MM
(3) If MM is general noninformative for a probe
set, this probe set is called Absent by the
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 59
Performance Evaluation of Aff
ymetrix GeneChip
Comparison Analysis (Experiment vs.
Baseline)
Change Algorithm
Robust Normalization
Change p-value
Change Call
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 60
Performance Evaluation of Aff
ymetrix GeneChip
Change Algorithm
Paired comparison: each probe set on the
experimental array is compared to its
counterpart on the baseline array
Change p-value: increase, decrease, or no
change in gene expression (quantitative)
Change calls: Increase, Marginal Increase,
No change, Decrease, or Marginal
Decrease
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 62
Performance Evaluation of Aff
ymetrix GeneChip
Normalization and scaling is applied to the
data from a selected user-defined group of
probe sets or to all probe sets
Normalization: intensities of the probe sets on
the experimental array are normalized to those
on the baseline array
Scaling, intensities of probe sets from both
experimental and baseline arrays are scaled to
a user-defined target intensity
Global scaling is recommended for comparing
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 63
Performance Evaluation of Aff
ymetrix GeneChip
Additional normalization is performed
by adjusting the normalization factor up
and down using a user-modified
parameter called perturbation
The range of perturbation is from 1 to
1.4 and the default value is 1.49
The higher value of perturbation, the
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 66
Performance Evaluation of Aff
ymetrix GeneChip
Calculation of Change p-values
Wilcoxon signed rank test
PM – MM of experimental array
PM – background of baseline array
Two-tailed p-value
P-value near 0 increase
P-value near 1 decrease
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 67
Performance Evaluation of Aff
ymetrix GeneChip
Change Call
The change p-value is categorized by t
wo cutoff values 1 and 2
1 and 2 are derived from iL and iH, i
=1,2
Default for 1: 1 = 1L = 1H = 0.0025
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 68
Performance Evaluation of Aff
ymetrix GeneChip
The final cutoffs for a given probe set are
calculated using a linear interpolation
between the L and the H limits, based on the
probe set’s Signal position over the entire
array signal change
This fine tuning cutoffs are used if Change
call accuracy is not optimal at one or both
extremes of Signal range of the array
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 74
Performance Evaluation of Aff
ymetrix GeneChip
The Signal Log Ratio estimates
Magnitude
Direction
Cancel out differences due to different
probe binding coefficients
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 75
Performance Evaluation of Aff
ymetrix GeneChip
A mean of the log ratios of probe pair in
tensities across the two array by a
one-step Tukey’s Biweight method
A 95% confidence interval is also provid
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 76
The biweight location estimate is defined as:
where
and
C=6(using 6 means that residuals up to approximately 4 are
included).
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 77
Performance Evaluation of Aff
ymetrix GeneChip
Comparison between MAS 5.0 and MAS 4.0
Selection of the components for the new algo
rithm
A “training set”
Each transcript group was spiked into a labele
d mixture of RNA from a tissue source
A 14x14 Latin square to monitor the detectabi
lity of transcripts over a range of concentratio
n
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 88
Performance Evaluation of Aff
ymetrix GeneChip
Validation of GeneChip
Human Genom
e U133 set
Key feature: The number of probe pair per
sequence reduces from 16 to 11
To increase sequence content per array
Validation as compared to HG-U95
Sensitivity
Concordance
Control genes
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 89
Performance Evaluation of Aff
ymetrix GeneChip
Sensitivity
A set of 50 human control clones as control
set
14x14 Latin square design for validation
experiment with 14 spiked concentrations
described previously
Detection Calls
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 90
Performance Evaluation of Aff
ymetrix GeneChip
Detection Call
Over 80% of the spikes are called present at a co
ncentration of 1.5 pM which is corresponding to a
pproximately one transcript in 100,000 or 3.5 copi
es per call
False positive rate is less than 10%
Comparison Call
2-fold change detected in 80% of spikes between
1.5 pM and 3.0 pM
4-fold change detected in 80% of spikes between
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 93
Performance Evaluation of Aff
ymetrix GeneChip
Concordance Call with HG-95
The percent of concordance calls (percent
agreement) is 80% using MAS 5.0
The percent of concordance calls (percent
agreement) dropping to 78% using MAS
5.0 for HG-133 and MAS 4.0 for HG-95
R
2
of signal log ratios between hG-95 and
HG-133 is only 0.54 for all calls and
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 95
McNemar 配對樣品檢定法
適合度檢定,獨立性檢定與同質性檢
定中,每一個觀測值均來自不同的個體
,觀測值是互相獨立。
選民在看電視辯論兩次投票行為結果
皆來自同一位選民,故此觀測值是具有
相關而種配對二項隨機變數。
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 96
例:立委候選人電視辯論對選民投
票之影響
辯論後
辯論前
欲投甲
欲投乙
和
欲投甲
491
9
500
欲投乙
1
499
500
和
492
508
1000
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 97
電視辯論有影響
電視辯論無影響
:
:
0
a
H
H
0
1.
.1
1.
.1
:
:
a
H P
P
H P
P
0
12
21
12
21
:
:
a
H P
P
H P
P
1.
11
12
.1
11
21
P
P
P
P
P
P
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 98
辯論後
辯論前
甲
乙
和
甲
n
11
(p
11
)
n
12
(p
12
)
n
1.
(p
1.
)
乙
n
21
(p
21
)
n
22
(p
22
)
n
2.
(p
2.
)
和
n
.1
(p
.1
)
n
.2
(p
.2
)
n (p)
n
n
P
n
n
P
n
n
P
ij
ij
,
i
.
i
.
,
.
j
.
j
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 99
僅需考慮 n
12
及 n
21
令S= n
12
+n
21
若 H
0
為真 n
12
為二項 Bin(n
1
,1/
2)
其期望數為:
(n
12
+ n
21
)/2
0 2 1 , 2 21 12 2 21 11 21 12 2 21 12 21 21 12 2 21 12 11 2 2,
2
2
2
2
H
n
n
n
n
n
n
n
n
n
n
n
n
n
n
E
E
O
i i i拒絕
決策分法
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 100
電視辯論
84
.
3
4
.
6
10
8
1
9
1
9
2
1
,
05
.
0
2
2
21
12
2
21
12
2
n
n
n
n
拒絕 H
o
,電視辯論對選民投票行為有影響
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 101
Call Concordance Data
2
12
21
2
12
21
2
2
2
0.05,1
3616 2678
938
3616 2678
6294
139.79
3.84
n
n
n
n
Reject Ho , the proportion of present call by
HG-U133 is different from that of HG-U96
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 102
Continuity Correction
2
12
21
2
12
21
1
c
n
n
n
n
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 105
Performance Evaluation of Aff
ymetrix GeneChip
Tissue-Specific Expression Studies
Fifteen gene exhibiting tissue-specific expression in
one of four tissues – heart, fetal, pancreas and
placenta – were analyzed and confirming using
real-time RT-PCR
Change threshold values were plotted against log
signal data
Specific genes produce a high log signal value with
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 107
Performance Evaluation of Aff
ymetrix GeneChip
Normalization Control Probe Sets
a set of 100 normalization control genes
common characteristic of being called Present
over a wide range of expression levels
relatively low variability
Identify the corresponding probes via BLAST
100 normalization probe sets ID number 200000
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 108
Performance Evaluation of Aff
ymetrix GeneChip
Bacterial Control Probe Sets
Indicators of array quality, proper
hybridization and staining
18 11-probe pair bacterial probe sets vs.
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 111
Performance Evaluation of Aff
ymetrix GeneChip
Validation of Human Genome U133 Plus
2.0 and Human Genome U133A2.0
U133 Plus 2.0: 54,000 probe sets for
47,000 transcripts
U133A2.0: 22,000 probe sets for 18,400
transcripts
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 113
Performance Evaluation of Aff
ymetrix GeneChip
Spikes are detected at a concentration
of 0.75 pM, approximately one transcrip
t per 200,000.
At 0.75 pM, two-fold increases in conce
ntration were routinely detected
A labeling reagent based on biotinylated
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 126
Performance Evaluation of
Scanner
DNA Microarray Scanner
Independent of other system components
Nominally identical oligo microarrays
19,777 features of 60mer oligos
100 probes with 10 replicated
The same targets
All three microarrays were scanned 8 times
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 129
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 134
Performance Evaluation of
Scanner
Affymetrix GeneChip
Scanner 3000
Beta test plan: external testing at 6 customer sites
(academia, biotech, biopharam)
Alpha test plan: 144 internal site scans ( 2 probe arr
ay designs x 3 lots of each probe array design x 3 re
plicates for each lot x 4 spike targets x 2 scans
Evaluation Metrics
Present calls, false change between replicates, correlation
coefficients
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 143
Performance Evaluation of
Autoloader
Autoloader for Affymetrix GeneChip
Scanner
3000
Experiment 1: Impact of storage at 15
oC on array
performance with the array’s performance
Experiment 2 and 3: Autoloader vs. Manual Scanni
ng
Performance metrics: Unscaled signal intensity, pr
esent call percentage, detection call accuracy, at a
concentration of 1.5 pM, false change rate, detecti
on call concordance
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 156
Performance Evaluation of
Fluidics Station 450
After loading the GeneChip arrays and al
l tubes, the FS 450 run unattended until
completion
Beta test plan
Pair of one FS 450 and one FS 400
Use of HG U133A from a single lot
53 spiked transcripts at 0, 0.75, 1.50, and 3
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 162
Limitations of Validation
Not independent evaluations by third party
Use of their own favorable
Designs
Target samples
With their own products
Performance metrics
Statistical methods
Correlation is for association not for agreement
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 163
Performance Evaluation
Standardization
Protocol
Design
Target samples
Reference samples
Performance criteria
Statistical methods
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 164
Michiels, Koscielny, and Hill
(2005, Lance
t)
To identify a subset of genes most
differentially expressed in patients with
different outcomes (a molecular signature) in
a training set
Estimate the proportion of misspecification in
an independent validation set of patients
To suggest a strategy by multiple random sets
to investigate the stability of molecular
signature and the proportion of
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 165
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 169
Two-gene expression ratio predicts
clinical outcome (Ma, et al, 2004)
Tamoxifen for breast cancer
A competitive inhibitor of estrogen binding to estrogen recep
tor (ER)
Reduction of 40%-50% in annual risk of recurrence
5.6% improvement in 10-year survival
ER and progesterone receptor (PR, an indicator of a function
al ER pathway) currently the best predicator of tamoxifen re
sponse
25% of ER+/PR+, 66% of ER+/PR-, and 55% of ER-/PR- fai
l to respond
To predict tamoxifen treatment outcome in early-stage brea
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 181
Evaluation of Commercial
Platforms (Tan et al, 2003)
Platforms
Affymetrix (U95Av2, GeneChips, 25mer olig
o probe sets)
Agilent (Human 1, cDNA probes)
Amersham (Codelink UniSet Human I Bioar
rays, 30mer Oligo probes)
A group of 2009 common genes present on
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
duct names only for illustration 196
Standardization between Laboratories and
across Platforms
(MTRC, Nature methods, 2005)
Different platforms
Different protocols
Different computational and statistical
tools
Reproducibility
Standard array (intensity, ratio)
Resident arrays
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro
110/07/17 Copyright by Jen-pei Liu, PhD, Pro