Instructor:
Jen-pei Liu, Ph.D.Department of Statistics
National Cheng-Kung University Division of Biostatistics,
National Health Research Institutes
Lecture III:
Statistical Principles for Analysis of Clinical DataDesign and Analysis of
Clinical Trials
Statistical Methods for
Biotechnology Products II
Statistical Principles for Analysis of Clinical Data
Instructor: Jen-pei Liu, Ph.D. Division of Biometry
Department of Agronomy
National Taiwan University, and
Division of Biostatistics and Bioinformatics National Health Research Institutes
Types of Data
Continuous Endpoints
Numerical discrete data
Heart beats per minutes Total NIHSS
Total Hamilton Rating Scale for Depression Total Alzheimer’s Disease Assessment Scale
Types of Data
Continuous Endpoints
Numerical continuous data
Age Weight
ALT
Peak flow rate (liters per minute)
FEV
Types of Data
Categorical Endpoints
Nominal scale data
Classification of patients according to their attributes
Gender Race
Occurrence of a particular adverse reaction
Types of Data
Ordered (ordinal scale) categorical data
A certain order among different categories Symptom score
0 = no symptom, 1 = mild, 2 = moderate, 3 = severe
Severity of adverse reactions Severity of disease
Types of Data
Censored Endpoints
Time to the occurrence of a pre-defined
event
Time (continuous) and occurrence
(categorical)
The occurrence of the event may not
observed for some patients. Then the time
to the occurrence of the event for these
subjects is censored
Types of Data
Chapman, et al (NEJM 1991; 324: 788-94)
The use of prednisone in reduction of relapse
within 21 days of the treatment of acute asthma in the emergency room.
Primary endpoint
Time to unscheduled visit to clinics because of worsening asthma.
Types of Data
Cross-sectional vs. longitudinal data
Cross-sectional data
(snap shot at one time point)Clinical data are collected and evaluated at
a particular time point during the trial
Longitudinal data
(snap shots at several timepoints)
Clinical data collected and evaluated over a
series of time points during the trial
Example
Knapp et al (JAMA 1994; 271: 985-991)
A multi-center trial with 33 centers
Double-blind, randomized, 4 parallel groups Forced escalation
30 weeks of randomized treatment
6 visits
The start of randomized treatment (baseline) 6,12,18,24, and 30 weeks
Cross-sectional data
CIBI and ADAS-cog evaluated at the start of randomized treatment
Longitudinal
Types of Comparison
Within-group (patient) comparison
Comparison of the changes within the same
patients at different time points during the
trial.
Between-group (patient) comparison
Comparison between groups of patients
under different treatments.
Example
: Major depression disorder
Stark and Hardison (VCP, 1985;46,53-58) Cohn and Wilcox (JCP,1985:46,21-31)
Double-blind, randomized, three parallel groups One-week placebo washout period
Fluoxetine vs. imipramine vs. placebo 6 weeks of randomized treatments Primary efficacy endpoint
HAM-D score at the last follow-up visit
Within each group
Change from baseline in HAM-D score
Between groups
Endpoints
Raw measurements at a time point.
Change at a time point from baseline.
Percent change at a time point from baseline. Clinically meaningful targeted value attained
at a time point, i.e. sitting DBP <= 85 mm Hg
Selection of time points should be able to
measure the effect of the intervention.
Selection of Endpoints
Endpoints should reflect the change of clinical status
caused by the intervention.
Endpoints should be sensitive to the change of
clinical status caused by the intervention.
Endpoints should be validated.
Raw measurements at a time point can only measure
the static clinical status.
Change at a time point from baseline can measure
the magnitude of the change of clinical status caused by the intervention.
Change from baseline has the same unit as the raw
Selection of Endpoints
Percent change at a time point from baseline
measures the relative magnitude of the change of clinical status caused by the intervention.
Percent change from baseline is unitless. The same percent change may reflect
different magnitudes of change
Selection of Endpoints
One of the key inclusion criteria for clinical trial in
treatment of mild to moderate essential hypertension is sitting DBP being between 95-115 mm Hg.
Three changes from baseline: 115 105, 105 95,
95 85.
95 Changes from baseline: 8.7%, 9.5%, 10.5% Only 95 85 reaches the clinically meaningful
Selection of Endpoints
Endpoints should reflect clinically meaningful
interpretation and applicability.
Clinically meaningful targeted value >
change from baseline > percent change from baseline.
Clinical investigators should have
responsibility for determination of the efficacy endpoints used in the clinical trials.
Selection of Endpoints
LDL HDL TG
Targeted Value < 100mg/dL 40-60 mg/dL < 150 mg/dL Bile acid
Binding Resin 15-30% 3-5% no change Nicotinic acid 5-25% 15-35% 15-25% Fibric acid 5-20% 10-20% 20-50% HMG-CoA 18-55% 3-5% 7-30% Inhibitor
Descriptive Statistics
All statistics are estimates with sampling errors
Continuous Data
Central tendency
Mean: arithmetic average of all observations y Median: the middle observation
Dispersion
Standard deviation s
Minimum: the smallest observation Maximum: the largest observation Range: maximum minus minimum
Log-transformation: Mean on the log-scale
Descriptive Statistics
Presentation of results Individual groups Comparative difference Example Adkinson, et al (NEJM 1997;336:324-31)Immunotherapy for asthma in allergic children
Endpoint PEFR Placebo Immunotherapy DifferenceMean
N 60 61
Categorical Data
Proportion of the patients with a certain
attribute: the number of the patients with the
attribute divided the total number of the
patients in the group
Presenting both of counts and proportions m, p
Chapman, et al (NEJM 1991; 324: 788-94)
The use of prednisone in reduction of replapse
within 21 days of the treatment of acute asthma in the emergency room
Characteristics Prednisone Placebo N 48 45 Smoking status Current 21 (50.0%) 13 (31.0%) Former 5 (11.9%) 6 (14.3%) Never 16 (38.1%) 24 (54.8%) Use of oral steroids for previous exacerbations.
Yes 15 (36.6%) 13 (32.5%) No 26 (63.4%) 27 (67.5%)
No standard deviation are usually given for categorical data because given the number of the patients in each group it can be directly calculated from the proportion. The standard deviation of a proportion is at the maximum
Measures for comparison
between groups
Difference in the proportions
Relative risk
The ratio of the proportions of the test group to the control.
Odds ratio
The ratio of the odds of the test group to the control.
Odds
The number of patients with the attribute to that without the attribute.
The US Physicians’ Health Study (NEJM 1989; 321: 129-35) Aspirin Placebo N 11037 11034 MI 139 (1.26%) 239 (2.17%) No MI 10898 (98.74%) 10795 (97.83%) Difference in proportion of MI = 1.26% - 2.17% = -0.91%
(average of fewer 91 MIs per 10,000)
Relative risk of MI for aspirin = 1.26% / 2.17% = 0.581 (the risk of MI in aspirin reduces 42%)
Odds ratio of MI for aspirin = (139 / 10898) / (239 / 10798) = 1.275% / 2.214% = 0.576 (the odds of MI in aspirin reduces 42%)
Difference in proportions and relative risk can only be used in prospective studies while odds can be used in both prospective as well as retrospective studies.
Categorical Endpoints
Difference in proportions provides the absolute
magnitude of difference.
Both relative risk and odds ratio gives the relative
magnitude of difference.
50% 25% and 0.05% 0.025% both yield a
relative risk of 50% but differences in proportion are 25% and 0.025% respectively.
Relative risk and odd ratio are appropriate when the
proportion of the event for control group is small (<5%).
When the proportion of the event is small (<5%),
Censored Data
Kaplan-Meier curve (Actuarial probabilities)
The proportions of the patients with occurrence of a pre-defined event over a period of time.
Median survival
The time to the pre-defined event (e.g. death) occurring in 50% of the patients.
Hazard ratio
The hazard of the occurrence of a pre-defined event of the test group to the control group
Example: Crawford, et al
(NEJM 1989; 321: 419-24)
A controlled trial of leuprolide with and without
flutamide in prostatic carcinoma
Randomized, double-blind, 2 parallel groups Primary endpoint: overall survival
Treatment Median Survival
Leuprolide + flutamide 35.6 Months
681 676 675 673 670 611 669 665 655 651 648 594 677 675 672 668 667 612 Months since first dose Sample size C 200 BID C 400 BID Placebo Log-rank statistic 8.74 (p= 0.013) Est ima ted prob abili ty o f CV de ath, MI , st rok e, o r C HF 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 6 12 18 24 30 36
Kaplan-Meier Estimates of the Risk of Serious CV
Events in the APC Trial by Treatment Arm*
Kaplan-Meier Estimates of the Risk of Serious CV
Events in the APC Trial by Treatment Arm*
Inferential Statistics
Inference from the sample to the target
population
A decision process for clinical hypotheses
based on the trial objective through
Example: Farlow et al
(JAMA 1992; 268: 2523-2529)
Randomized, double-blind, parallel groups
Objective
To compare the tacrine (20, 40, 80 mg per day) versus placebo for probable Alzheimer’s disease
Null hypothesis
No difference in ADAS-cog scale between 80 mg of tacrine and placebo.
Alternative hypothesis
There exists a true difference in ADAS-cog scale between 80 mg of tacrine and placebo.
Example: The NINDS rt-PA Stroke
Study Group (NEJM 1996; 335:
841-7)
Objective for partⅠ
A greater proportion of patients with acute ischemic stroke treated with t-PA, as compared with those given placebo, have early improvement (>= 4 from baseline on NIHSS).
Primary efficacy endpoint
Proportion of patients with improvement
Null hypothesis
No difference in the proportions of patients with improvement between t-PA and placebo.
Alternative hypothesis
The minimal difference in the proportions of patients with improvement between t-PA and placebo is at least 24%.
Decision Based on Results
True State No difference Minimal difference of 24%
No difference Correct Type I Error
(false positive) Minimal difference
Decision Based on Results
Significance level: The consumer’s risk
The chance that the decision based on the results there is a minimal difference of 24% improvement between t-PA and placebo when in fact there is no difference.
Power = 1 – producer’s risk
The chance that decision based on the results concludes a minimal difference of 24% improvement between t-PA and placebo in fact there is.
Statistical Testing Procedures
Step1
State the null and alternative hypotheses
Null hypothesis: the one to be questioned
No difference in the proportions of patients with improvement between t-PA and placebo.
Alternative hypothesis: the one of particular interest to investigators
The minimal difference in the proportions of patients with improvement between t-PA and
Statistical Testing Procedures
Step 2
Choose an appropriate test statistics such as two-sample t-statistics.
Step 3
Select the nominal significance level
the risk of type error you are willing to commitⅠ Usually 5%
Statistical Testing Procedures
Step 4
Determine the critical value, rejection region and decision
rule
For large samples, two-sided alternative andα= 0.05, the critical value is z(0.025) = 1.96 and rejection region will be the one such that the absolute value of the test statistic is greater than 1.96.
Decision rule
reject the null hypothesis if the resulting test statistic is in the rejection region.
Statistical Testing Procedures
Step 1 to step 4 should be determined and
pre-specified in the Statistical Method
section of the protocol before initiation of
the study.
Statistical Testing Procedures
Step 5
When the study is completed or the data are
available for interim analysis, complete the value of the test statistic specific in Step2 (protocol).
Step 6
Make decision based on the resulting value of the test statistic and decision rule specified in Step 4 (protocol).
Statistical Testing Procedures
Conclusion
Reject the null hypothesis
The sampling error is an unlikely explanation of discrepancy between the null hypothesis and observed values and the
alternative hypothesis is proved at a risk of 5%.
Fail to reject null hypothesis
The sampling error is a likely explanation and the data fail to provide sufficient evidence to doubt the validity of the null hypothesis.
P - value
If there is no difference in ADAS-cog between
the two groups (i.e., the null hypothesis is true),
the chance of obtaining a mean difference at
least as large as the observed mean difference.
If p-value is small, it implies that the observed
difference is unlikely to occur if there is no
difference in ADAS-cog scale between 80mg of
tacrine and placebo.
P - value
How small the p-value is sufficient enough to
conclude that there exists a true difference in ADAS-cog scale between 80 mg of tacrine and placebo?
It depends upon the risk that the investigator is
willing to take for committing type I error.
Nominal significance level = risk of type I error
(The chance of concluding existence of a true difference in ADAS-cog when in fact there is no difference)
P - value
If the observed p-value < the nominal significance level (i.e., the
observed p-value < risk of type error), then conclude there exists a Ⅰ true difference in ADAS-cog.
The nominal significance level = 5% or 1%
The p-value for the observed difference in mean ADAS-cog is
0.015.
If the nominal significance level is 5%, then it is concluded that
there is a difference in ADAS-cog between 80mg of tarcine and placebo in target population of patients with probable Alzheimer’s disease.
P - value
We can not make the same decision if the
nominal significance level is chosen to be 1%.
Should always reported the observed p-value
and let readers and reviewers judge the
strength of evidence by themselves and do not
use p-value < 0.05.
Confidence Interval
Example
Adkinson, et al (NEJM 1997; 336: 324-31)
Immunotherapy for asthma in allergic children
Endpoint PEFR Placebo Immuno. Mean Diff. 95% C.I. P-value
N 60 61 Baseline 84.8 ± 8.6 81.9 ±10. 8. -2.9 ± 9.8 (-0.6, 6.4) 0.17 Change -1.4 ± 11.1 2.1 ± 11.1 3.8 ± 11.1 (-7.8, 0.1) 0.05 P-value 0.11 0.24 Symptom score Baseline 0.37 ± 0.35 0.34 ± 0.27 0.03 ± 0.31 (-0.09, 0.14) 0.98
Confidence Interval
Estimates about the true population difference.
Random intervals which can be different if the
same trial is repeated.
A 95% confidence interval implies 95%
chance that the interval (-7.8, 0.1) will cover
the true difference in average PEFR between
the two groups.
Statistical Testing Procedures
---Continuous data
Within-group
Parametric methods: paired t-test Nonparametric methods:
Wilcoxon signed rank test
Between-group
Parametric methods:
unpaired t-test, analysis of variance Nonparametric methods:
Computation of Test Statistics
Test Statistics is a measure to quantify whether the discrepancy
between the observed descriptive statistic and the hypothetical value assumed under the null hypothesis exceeds the sampling error under the null hypothesis
A general formula for test statistics is usually a function of ratios Difference / Standard Error
where:
Difference
Difference between the observed descriptive statistic and the hypothetical value assumed under the null hypothesis.
Standard error of descriptive statistic
Computation of Test Statistics
Large sample z-statistic
Unpaired t-statistic
2 2 1 2 1 2 1 2 1 2 s s n n (Y -Y )-(μ-μ) Z= Z ³Z(α/2) + 1 2 1 2 1 2 1 2 1 1 n n (Y -Y )-(μ-μ) t= t > t(α/2, n +n -2) s +Computation of Confidence Intervals
A general formula for 95% C.I.
Descriptive statistic ± z(0.025) standard error where
Descriptive statistics
mean, difference in means
proportion, difference in proportions odds, odds ratio
hazards, hazard ratio
z(0.025): the upper 5% percentile of a standard normal distribution.
Computation of Confidence Intervals
For continuous data and small sample
(<30), use percentile of the student
t-distribution.
Standard error of descriptive statistic
Confidence Intervals – Continuous Data
One-sample
Large-sample (n 30)≧ Small-sample (n<30) Paired sample
Small-sample (n<30) (Y-Z(α/2)s/ n, Y+Z(α/2)s/ n ) (Y-t(α/2, n-1)s/ n, Y+t(α/2, n-1)s/ n )Confidence Intervals – Continuous Data
Two independent samples
Large-sample (n 30)≧ Small-sample (n<30)
Nonparametric confidence intervals are also
available, see Chow and Liu (2004).
2 2 2 1 2 1 ) 2 / ( ) (Y1 Y 2 Z sn ns 2 1 1 1 2 1 2 1 ) ( / 2, 2) (Y Y t n n s n n
Pulmonary Function in Patients Receiving Prednisone and Those Receiving Placebo
before and after Treatment in the Emergency Room and Improvement over in the Course of Treatment. *
Variable Before Treatment At Discharge Percent Improvement # Prednisone
(N=34) Placebo (N=34) Prednisone(N=47) Placebo (N=44) Prednisone (N=34) Placebo (N=34) PEFR (liters/min) 246±119 225 ±106 323 ±134 321 ±136 +47.9 ±52.0 +58.4 ±78.1 FVC(liters) 2.83 ±1.4 2.67 ±1.1 3.55 ±1.1 3.47 ±1.1 +44.6 ±59.1 +36.4 ±36.6 FEV1(liters) 1.84 ±0.9 1.59 ±0.7 2.27 ±0.9 2.24 ±0.9 +39.2 ±33.6 +47.3 ±37.6 MMEFR(liters/sec) 1.42 ±0.8 1.23 ±0.7 1.75 ±1.1 1.70 ±1.0 +56.4 ±56.1 53.2 ±52.6 V50(liters/sec) 1.68 ±0.9 1.45 ±0.9 1.98 ±1.2 1.99 ±1.1 +48.7 ±44.6 +57.2 ±56.3 V25(liters/sec) 0.86 ±0.5 0.60 ±0.4 0.76 ±0.5 0.76 ±0.5 +34.4 ±61.4 +43.4 ±68.0 *Plus-minus values are means ±SD. PEFR denotes peak expiatory flow rate. MMEFR mean midexpiratory flow rate. V50 instantaneous flow rate at 50 percent of vital capacity, and V25 instantaneous flow rate at 25 percent of vital capacity.
# Expressed as a percentage of pretreatment results. Source: Chapman, et al (1991)
Pulmonary Function before Treatment in the Emergency Room and at the Time of Discharge and Improvement Observed during the First Home Visit after Treatment. According to the Presence or Absence of Relapse at Any Time during the 21-Day Follow-up.*
Variable Before Treatment At Discharge Visit 1 Prednisone (N=51) Placebo (N=17) Prednisone (N=67) Placebo (N=24) Prednisone (N=61) Placebo (N=21)
% change from discharge value
PEFR (liters/min) 233±103 235 ±129 333 ±136 309 ±136 +16.2 ±24.9 +9.17 ±25.5 FVC(liters) 2.86 ±1.32 2.36 ±0.99 3.62 ±1.14 3.21 ±1.02 +6.6 ±16.9 +0.5 ±16.5 FEV1(liters) 1.74 ±0.80 1.55 ±0.80 2.26 ±0.85 2.24 ±0.85 +12.2 ±22.7 -2.7 ±17.0 #1 MMEFR(liters/sec) 1.34 ±0.73 1.37 ±0.90 1.65 ±1.03 1.95 ±1.07 +29.6 ±74.6 -7.2 ±28.6 #2 V50(liters/sec) 1.59 ±0.91 1.54 ±0.98 1.87 ±1.12 2.27 ±1.22 +34.8 ±83.6 -3.6 ±31.2 #3 V25(liters/sec) 0.72 ±0.39 0.66 ±0.48 0.71 ±0.46 0.90 ±0.56 +33.7 ±76.3 -7.0 ±31.9 #2 *Plus-minus values are means ±SD. PEFR denotes peak expiatory flow rate. MMEFR mean midexpiratory flow rate. V50 instantaneous flow rate at 50 percent of vital capacity, and V25 instantaneous flow rate at 25 percent of vital capacity.
#1 p<0.05 for the comparison with the nonrelapse group. #2 p<0.005 for the comparison with the nonrelapse group. #3 p<0.01 for the comparison with the nonrelapse group.
Wilcoxon Signed Rank Test
-
within-group comparison
Compute the change from baseline in morning
PEF = Visit – baseline
Take the absolute value of the differences Rank the absolute the differences from the
smallest to the largest
Define a new variable. This new variable is 1 if
the change from baseline is positive. This new variable is 0 if the change is negative.
Wilcoxon Signed Rank Test
---
for within-group comparison
Multiple the rank of the absolute difference
and the sign variable. This is called the
signed rank.
Sum the signed ranks.
Compared the sum of the signed ranks with
the critical values from the table.
Patient
Number Placebo Hydrochlorothiazide Diff. Abso. Diff. Rank Sign VariableSign Signed Rank
1 211 181 -30 30 7 - 0 0 2 210 172 -38 38 8 - 0 0 3 210 196 -14 14 4 - 0 0 4 203 191 -12 12 2 - 0 0 5 196 167 -29 29 9.5 - 0 0 6 190 161 -29 29 9.5 - 0 0 7 191 178 -13 13 3 - 0 0 8 177 160 -17 17 5 - 0 0 9 173 149 -24 24 6 - 0 0 10 170 119 -51 51 11 - 0 0 11 163 156 -7 7 1 - 0 0 Total 0 Critical values: w(0.025, 11) = 11, w(0.975, 11) = 55 Sum of signed rank = 0 < w(0.025, 11) = 11
Wilcoxon Rank Sum Test
Combine all observations from two
independent samples.
Rank all observations in the combined sample.
Sum all ranks from the test group.
This is called the sum of ranks.
Compare the sum of ranks from the test group
with the critical values from the table (Chow
and Liu, 1998).
12.3 Mann-Whitney-Wilcoxon Test
Two independent random samples
Example
Effectiveness of two CRA training programs
for GCP certification
8 divisions of a large drug firm
50 junior CRAs from each division # of junior CRAs passed the GCP
12.3 Mann-Whitney-Wilcoxon Test
Example
Training Program 1 2 28 33 31 29 27 35 25 3012.3 Mann-Whitney-Wilcoxon Test
Method
Rank the observations in the combined
sample from the smallest (1) to the largest (n1+n2)
In case of ties, use the averaged rank
12.3 Mann-Whitney-Wilcoxon Test
Example
Training Program
1 2
obs. Rank obs. Rank
28 3 33 7
31 6 29 4
27 2 35 8
25 1 30 5
12.3 Mann-Whitney-Wilcoxon Test
Exact Method for Small Samples(n1+n2 ≤ 30)
1. Null hypothesis: H0: The population relative
frequency distributions for 1 and 2 are identical.
2. Alternative hypothesis: Ha: The population relative
frequency distributions are shifted in respect to their relative location(a two-tailed test). Or Ha:The
population relative frequency distribution for
population 1 is shifted to the right of the relative
frequency distribution for population 2 (a one-tailed test).
12.3 Mann-Whitney-Wilcoxon Test
3. Test statistics:For a two-tailed test, use U, the smaller of
and
Where T1 and T2 are the rank sums for samples 1
and 2, respectively.
For a one-tailed test, use U .
1 1 1 2 1 1 T 2 ) 1 n ( n n n U 2 2 2 2 1 2 T 2 ) 1 n ( n n n U
12.3 Mann-Whitney-Wilcoxon Test
4. Rejection region:
a. For the two-tailed test and a given value of α , reject H0 if U ≤ U0, where p(U≤ U0)=α/2.
[Note: Observe that U0 is the value such that
P(U ≤ U0) is equal to half ofα]
b. For a one-tailed test and a given value of α, reject H0 if U1 ≤ U0, where P(U1 ≤ U0)= α
12.3 Mann-Whitney-Wilcoxon Test
Method for larger Samples (n1+n2>30 )1. Null hypothesis: H0: The population relative
frequency distributions for 1 and 2 are identical.
2. Alternative hypothesis: Ha: The population
relative frequency distributions are not identical. Or Ha:The population relative
frequency distribution for 1 is shifted to the right (or left) of the relative frequency
12.3 Mann-Whitney-Wilcoxon Test
3. Test Statistics:
, and let U=U1for one-sided alternative.
12
/
)
1
n
n
(
n
n
)
2
/
n
n
(
U
Z
2 1 2 1 2 1
12.3 Mann-Whitney-Wilcoxon Test
4. Rejection region:
Reject H0 if z > zα/2 or z < -zα/2 for a
two- tailed test. For a one-tailed test, place all of α in one tail of the z distribution. To
detect a shift in distribution 1 to the right of distribution 2, let U=U1 and reject H0 when
z<-zα . To detect a shift in the opposite
direction, let U=U1 and reject H0 when z>zα.
Tabulated values of z are given in Table 3 in the Appendix.
12.3 Mann-Whitney-Wilcoxon Test
Example:
Fail to reject H0 of no difference between train
8 n n , 4 n , 4 n1 2 1 2 2 14 ) 4 )( 4 ( U 14 12 2 ) 1 4 ( 4 ) 4 )( 4 ( U 2 1 1 U 2 U 0.0572 1) 2P(U : tailed 2 0.0286 1) P(U : tailed -one 0.05 test tailed -Two 0
The Number of Subjects with Malignant Neoplasms in the Beta Carotene Component of the Physician’s Health Study
Malignant Neoplasm Beta Carotene Placebo
N 11036 11035 Yes 1273 1293 Year 1-2 120 130 Year 3-4 157 136 Year 5-9 500 567 >=10 years 496 460 No 9763 9742
The Number of Subjects with Clinical Improvement
Part 1 rt-PA Placebo
0 – 90 Min N 71 68 Yes 36 31 No 35 37 91 – 180 Min N 73 79 Yes 31 26 No 42 53 Part 2 0 – 90 Min N 86 77 Yes 51 30 No 35 47 91 – 180 Min N 82 88 Yes 29 35
Summary of Frequencies of Transition of Status Score from Baseline to Visit 3
Treatment: Placebo
Visit 3
Baseline Terrible Poor Fair Good Excellent Total
Terrible 0 0 0 0 0 0 Poor 4 2 3 1 1 11 Fair 4 1 9 4 2 20 Good 1 3 2 4 9 19 Excellent 2 0 0 2 3 7 Total 11 6 14 11 15 57
Treatment: Test Drug
Visit 3
Baseline Terrible Poor Fair Good Excellent Total
Terrible 0 0 2 1 0 3
Poor 0 3 2 1 4 10
Fair 2 0 5 3 8 18
Statistical Testing Procedures
---Categorical data
Within-group
McNemar test for two categories
Stuart-Maxwell-Bhapkar test for more than two categories
Between-group
2×2 Table
Fisher’s exact test
Pearson’s chi-square test
2×2 tables for ordered categories
Mantel-Haenszel test
Logistic regression Poisson regression Log-linear model
Data Structure of Binary Endpoint for
a Parallel Two-group Trial
Binary Response
Treatment No Yes Total
Test Drug Y10 (P10) Y11 (P11) Y1. (1) Placebo Y20 (P20) Y21 (P21) Y2. (1)
Categorical Data
Large sample tests---One sample
Paired samples---McNemar test
) 2 / ( ) ( 0 z p v p p ) 1 , ( , ) 1 ( ) ( ) ( 2 2 2 10 01 2 10 01 2 10 01 2 M y y y y y y
Computation of Statistics for Hypothesis Based on Binary Data for a Parallel Two-group Trial
Binary Response
Treatment No Yes Total
Test Drug O y10 y11 y1. E m10 m11 O-E y10 - m10 y11 - m11 (O-E)2 / E (y 10 - m10)2 / m10 (y11 - m11)2 / m11 Placebo O y20 y21 y2. E m20 m21 O-E y20 - m20 y21 - m21 (O-E)2 / E (y 20 - m20)2 / m20 (y21 - m21)2 / m21 Total Y Y Y 2 p
Two independent samples
Large sample test with cell size greater than 5
Pearson’s chi-square test
Randomization chi-squares test
Fisher’s exact test
Computation of p-value for the 2×2 tables at least as extreme as the ) 1 , ( 1 , 0 2 , 1 , ) ( ) ( , ) ( 2 2 . . 2 1 1 0 2 2 p j i ij i j ij ij ij p i and j N y y m m m y 2 2 2 1 . 0 . . 2 . 1 11 11 2 11 11 2 1 ) 1 ( , ) ( p R R N N N N y y y y v m m y
Confidence Intervals – Categorical Data
One-sample
Large-sample (n 30)≧ Paired sample ) 1 ( 1 ) ( , ) ( ) 2 / ( p p n p v p v z p
2
10 01 10 01 10 01 ( ) ( ) 1 ) ( , ) ( ) 2 / ( ) ( p p p p n v v z p p Confidence Intervals – Categorical Data
Two independent samples
Difference in proportions Relative risk Odds ratio 2 21 21 1 11 11 21 11 ) 1 ( ) 1 ( ) ( ) ( ) 2 / ( n p p n p p d v d v z p p p p 21 11 21 21 11 11 1 1 ) 2 / ( ) ln( exp p p RR y p y p z RR 10 11 1 1 1 1 ) 2 / ( ) ln( exp p p p p OR y y y y z OR
Comparison of Proportions of Subject
with Improvement for the NINDS Trial
Treatment N Improvement Difference (SE) 95% Confidence Interval
rt-PA 312 147(47.12%) 0.0801 (0.0016) (0.0027, 0.1576)
Summary of Estimated Odds Ratio and Relative Risk for Malignant Neoplasm Due to Beta-carotene in U.S. Physicians’ Study
Beta-carotene
(N = 11036) Placebo (N = 11036) Odds Ratio (95% C.I.) Relative Risk (95% C.I.)
1273 (11.53%) 1293 (11.72%) 0.98 (0.91, 1.07) 0.98 (0.92, 1.06)
Data Structure of Binary Endpoint with
H strata for a Parallel Two-group Trial
Binary Response
Treatment No Yes Total
Test Drug Yh10(Ph10) Yh11(Ph11) Yh1.(1) Placebo Yh20(Ph20) Yh21(Ph21) Yh2.(1)
Total Yh.0 Yh.1 Yh..
Combining Results of 2×2 Tables
from Different Strata
Mantel-Haenszel’s Technique
For each 2×2 Table, following the randomization chi-square
test, compute the expected number of patients who respond to the test drug and variance of the observed number of patients who respond to the test drug.
Add the observed numbers, expected numbers, and variances
over all strata.
Square the difference between the sums of the observed and
expected numbers.
Divide the squared difference in sums by the sum of the
Summary of Results of Binary
Endpoint from H Strata
Strata Frequency Observed Frequency Expected Difference Variance 1 y111 m111=y11.y1.1/N1 y111-m111 v1 H yh11 mh11=yh1.yh.1/Nh yh11-mh11 vh H yh11 mH11=yH1.yH.1/NH yH11-mH11 vH Sum Σyh11 Σmh11 Σ(yh11-mh11) Σvh
Combining 2×2 tables across strata
Cochran-Mantel Haenszel test
H h h H h h H h h h c h h h h h h h h h h h MH H h H h h MH z m y a v m y OR H h N N y y y y v N y y m X v m y X 11 11 1 1 11 11 2 1 . 0 . . 2 . 1 1 . . 1 11 2 1 2 1 11 11 ) 2 / ( ) ( interval confidence % 100 ) 1 ( ) ( exp ratio odds common . , , 1 , ) 1 ( , ) 1 , ( , ) (
Summary of Comparison in Proportions of the Subjects with Clinical
Improvement between rt-Pa and placebo for Adjustment of Part and Time from Onset to Treatment in NTNDS Trial
Part Time FrequencyObserved FrequencyExpected Oh – Eh Vh (Oh – Eh )2/Vh
1 0 – 90 36 34.2230 1.7770 8.7351 0.3615 1 9 – 180 31 27.3750 3.6250 8.9513 1.4680 2 0 – 90 51 42.7362 8.2638 10.2188 6.66829 2 91 – 180 29 30.8706 -1.8706 10.0230 0.3491 Sum 147 135.2048 11.7952 37.9281 8.8615
The observed and expected frequencies are referred to those with an improvement in rt-PA group.
Computation of Kaplan-Meier Survival
Divide the time into intervals by the time points where the
pre-defined event (death) occurred.
For each interval, count the number of the patients who were
alive at the beginning of the interval and the number of the patients who were still alive at the end of the interval.
Compute the survival rate for each interval as the number of the
patients still alive at the end of interval by the number of the patients alive at the beginning of the interval.
For the time point where pre-defined event occurred, the
Time in Moths to Progression of the Patients with Stage or A Ⅱ Ⅲ Ovarian Carcinoma by Low-grade or Well-differentiated Cancer
Patient Number Time in Months Censored Cell Grade
1 0.92 Yes Low Grade
2 2.93 Yes Low Grade
3 5.76 Yes Low Grade
4 6.41 Yes Low Grade
5 10.16 Yes Low Grade
6 12.40 No Low Grade
7 12.93 No Low Grade
8 13.85 No Low Grade
9 14.70 No Low Grade
10 15.20 Yes Low Grade
11 23.32 No Low Grade
12 24.47 No Low Grade
13 25.33 No Low Grade
14 36.38 No Low Grade
15 39.67 No Low Grade
19 6.55 Yes High Grade
20 9.21 Yes High Grade
21 9.57 Yes High Grade
22 9.84 No High Grade
23 9.87 No High Grade
24 10.16 Yes High Grade
25 11.55 Yes High Grade
26 11.78 Yes High Grade
27 12.14 Yes High Grade
28 12.14 Yes High Grade
29 12.17 Yes High Grade
30 12.34 Yes High Grade
31 12.57 Yes High Grade
32 12.89 Yes High Grade
33 14.11 Yes High Grade
34 14.84 Yes High Grade
35 36.81 No High Grade
Data Layout for Computation of Kaplan-Meier Estimates of Survival Function Ordered Distinct Event Time Number of Events Number of Censored in [y(k), y(k+1)] Number in
Risk Set S(y) Y(0) = 0 d0 = 0 m0 n0 1
Y(1) d1 m1 n1 1- d0 /nm0
Y(2) d2 m2 n2 (1- d0 /n0) (1- d2 / n2)
Y(k) dk mk nk (1- d0 / n0)(1- d2 / n2)…(1- dk / nk)
Computation of Kaplan-Meier Estimates of Survival Function for Patients with Low-grade Cancer
Ordered Distinct
Progression Time of EventsNumber Number of Censored in [y(k), y(k+1)] Number in Risk Set S(y)
0 0 0 15 1 0.92 1 0 15 0.933 2.93 1 0 14 0.8667 5.76 1 0 13 0.8000 6.41 1 0 12 0.7333 10.16 1 4 11 0.6667 15.20 1 1 6 0.5556
Kaplan-Meier Estimates of Proportions for Patients with Ovarian Carcinoma by Low-grade or Well-differentiated cancer
Time in
Months Censored ProportionEstimated Estimated Variance Standard Error Lower 95% Limit Upper 95% Limit
0.92 No 0.93333 0.004148 0.06440 0.80710 1.00000 2.93 No 0.866667 0.007703 0.08777 0.69464 1.00000 5.76 No 0.80000 0.010666 0.10328 0.59758 1.00000 6.41 No 0.73333 0.013037 0.11418 0.50955 0.95712 10.16 No 0.66667 0.014814 0.12171 0.42811 0.90523 12.40 Yes 0.66667 12.93 Yes 0.66667 13.85 Yes 0.66667 14.70 Yes 0.66667 15.20 No 0.55556 0.020575 0.14344 0.27441 0.83670 23.32 Yes 0.55556 24.47 Yes 0.55556 25.33 Yes 0.55556 36.38 Yes 0.55556 39.67 Yes 0.55556
Statistical Testing Procedures
---Censored data
Within-group
France-Lewis-Kay and Liu-Chow test
Between-group
Log-rank test: difference later in time Gehan’s test: difference early in time Cox’s proportional hazard model
Logrank Test for Comparison of
Two Independent Survival Curves
Divide the time into intervals by the time
points where the pre-defined event (death)
occurred.
Then there are a series of 2×2 tables
stratified by the time points where the
pre-defined event occurred.
Apply the Mantel-Hasenszel’s technique to
combine the results.
Data Structure of Comparing Two Survival Functions at y(k) by Log-rank Method
Status
Treatment Event No Event Total Test Drug d1k n1k - d1k n1k
Placebo d2k n2k - d2k n2k dk nk - dk nk k= 1,…, K
Computation of Log-rank Statistic Ordered Distinct Event Time Observed Number of Events Expected Number of
Events Difference Variance y(1) d11 e11=n11d11/n1 d11-e11 v11 y(2) d12 e12=n12d12/n2 d12-e12 v12 y(k) d1k e1k=n1kd1k/nk d1k-e1k vk d1 e1 d1-e1 v1 where vk=n1kn2kdk (nk-dk)/[n2 k(nk-1)].
Censored Data
Logrank test statistic
k k k k k k k k k K k k k k k k K k k LR LR v v K k n n d n d n n v e e n d n e d d X v e d X 1 1 2 2 1 1 1 1 1 1 1 1 1 2 1 2 1 1 , , 1 , ) 1 ( ) ( ) 1 , ( , ) ( Confidence Intervals – Censored Data
Hazard ratio
1 1 1 1 1 1 1d -e
Point estimate:λ=exp[
],
v
Confidence interval
d -e
exp [
] z(α/2)
v
v
Computation of Log-rank Test Statistic for the Data of the Patients with Stage or A Ⅱ Ⅲ Ovarian Carcinoma Time in months dik d2k dk nik n2k nk eik dik-eik vik 0.92 1 0 1 15 20 35 0.42857 0.57143 0.24490 1.12 0 1 1 14 20 34 0.41176 -0.41176 0.24221 2.89 0 1 1 14 19 33 0.42424 -0.42424 0.24426 2.92 1 0 1 14 18 32 0.43750 0.56250 0.24609 4.51 0 1 1 13 18 31 0.41935 -0.41935 0.24350 5.76 1 0 1 13 17 30 0.43333 0.56667 0.24556 6.41 1 0 1 12 17 29 0.41379 0.58621 0.24257 6.55 0 1 1 11 17 28 0.39286 -0.39286 0.23852 9.21 0 1 1 11 16 27 0.40741 -0.40741 0.24143 9.57 0 1 1 11 15 26 0.42308 -0.42308 0.24408 10.16 1 1 2 11 12 23 0.95652 0.04348 0.47637 11.56 0 1 1 10 11 21 0.47619 -0.47619 0.24943 11.78 0 1 1 10 10 20 0.50000 -0.50000 0.25000 12.14 0 2 2 10 9 19 1.05263 -1.05263 0.47091 12.17 0 1 1 10 7 17 0.58824 -0.58824 0.24221 12.34 0 1 1 10 6 16 0.62500 -0.62500 0.23438 12.57 0 1 1 9 5 14 0.64286 -0.64286 0.22959 12.89 0 1 1 9 4 13 0.69231 -0.69231 0.21302
Logrank test
X
LR= (-5.33279)
2/5.10898 = 5.5664
>2(0.05,1) = 3.84
Estimate of hazard ratio is
exp[-5.33279/5.10898]=0.35231
95% CI for hazard ratio
exp[(-5.33279/5.10898)1.965.10898]
= (0.14794, 0.83805)
Statistical Testing Procedures
---Longitudinal data
Multivariate analysis of variance
Regression methods
Random effects models
Repeated measurement models
Time series
Proportional odds models
Robustness of Statistical
Analysis Reproducibility
All statistical procedures are derived from some
assumptions
Robustness of statistical procedures
Use of statistical methods with fewer assumptions to be
verified
Use of model-free methods vs. model-dependent
methods
Use of simpler parsimonious models
Robustness of Statistical
Analysis Reproducibility
Robustness of Results
Consistency in
Estimated treatment effects Primary conclusion of the trial
With respect to
different statistical procedures
parametric vs. nonparametric
different statistical assumptions
normal vs. non-normal
Limitation of the data
sub-group analyses
Different analyzed datasets
Two-sided versus One-sided Hypotheses
Evans et al (NEJM 1997; 337: 1412-8)
Randomized, double-blind, two parallel
groups
Objective
To compare the low-dose inhaled budesonide (400 ug, bid) plus theophylline (250 or 375 mg bid) and high-dose inhaled budesonide (800 ug, bid) for
Two-sided versus One-sided Hypotheses
Two-sided hypothesis Null hypothesis
No difference in average FVC between the two groups Alternative hypothesis
there exists a difference in average FVC between the two groups
One-sided hypothesis Null hypothesis
No difference in average FVC between the two groups Alternative hypothesis
The low-dose inhaled budesonide plus theophylline improves FVC better than the high-dose inhaled budesonide
Two-sided versus One-sided Hypotheses
Two-sided hypotheses are to evaluate existence of
the difference between the test drug and control. The difference may be either positive (better) or negative (worse).
Two side hypotheses are for the newly developed
pharmaceuticals with unknown and unproved efficacy and safety.
One-side hypotheses are to prove that the
Two-sided versus One-sided Hypotheses
With a significance level of 5%, the level of proof is 1/20 and 1/40
for one-sided hypotheses respectively. The level of proof is 1/400 and 1/1600 for one sided and two-sided hypotheses for two trials (US FDA requirement).
With a significance level of 5%, the sample size required is
increased by 27% for two-sided hypothesis vs. one-side hypothesis.
Most of regulatory agencies suggest (require) two-sided hypotheses
for approval.
Need to specify whether one-sided or two-sided hypotheses is used
in the protocol and provide the justification if the one-sided hypothesis is elected.
Missing Values
Little and Rubin (1987), Little (1995)
Missing patterns
Dropouts
The data are missing after the visit where patients withdrew from the study
Intermittent
The patients complete the study but a few visits are missed by the patients
Missing Values
Little and Rubin (1987), Little (1995)
Missing patterns
Incomplete data
One or few items of some scales or scores (NIHSS, ADAS, HAM-D,nasal symptom) are missing.
Symptom, PEFR or awakening of diary is missing for few days.
Missing Mechanism
Yo: data observed
Ym: data supposed to be observed but missed
Missing completely at random (MCAR)
Missing mechanism is independent of both Yo and Ym (ignorable data)
Inference on complete data is valid but less efficient (Lachin, 1988)
Missing at random (MCR)
Missing mechanism is independent of Ym only.
e.g. older males more likely to have missing values than young males or females
Missing Mechanism
Informative missing
Missing mechanism is dependent Y
m.
termination due to hepatotoxicity.
Assumptions:
Missing patterns:
Diggle and Kenward (1994)
Completely random dropout Diggle (1989),
Ridout (1991).
Analysis Sets
Intention-to-treat (all randomized) sets
All randomized patients according to their randomized treatments
Per protocol (evaluable) sets
Inclusion of the patients if
Requirement of a minimal exposure to the treatments
The availability of measurements of the primary efficacy
endpoints
The absence of major protocol violations
The choice of analysis sets
To minimize bias
Analysis Sets
The intention-to-treat set >= the per protocol set For a superiority trial
Intention-to-treat sets is conservative
Per protocol maximizes the chance of proving the efficacy
For an equivalence trial, the role is reversed Perform both analyses – sensitivity
Consistence results – increase confidence
Lachin (2000)
An unbiased trial
unbiased in estimation of the treatment effects unbiased in testing in controlling type I error rate
Randomization
a sufficient condition for provision of an unbiased trial
Lachin (2000)
Two other two necessary conditions
The outcomes should be evaluated in a like and unbiased manner
Blinding (or masking)
Data are missing, if any, do not bias the comparison between treatments
Lachin (2000)
Available methods (Simon and Simonoff,
1986; Little, 1988) are to disprove MCAR but
not able to prove it.
Many other assumptions for MCAR or MAR is
in fact untestable.
Many methods for imputation of missing
values are also untestable.
Lachin (2000)
Last Observation Carried Forward (LOCF)
another imputation method
Last observation is an unbiased estimate of missing values
Lachin (2000)
The only complete solution of the “missing
data” problem is not to have them (Cochran,
1957, P. 82)
The best way to deal with the problem of
missing data is to have as little missing data
as possible (Lachin, 2000, Begg, 2000)
Intention-to-treat Analysis and
Design
Peto, et al (1976); Lachin (2000) Intention-to-treat analysis
All patients included in the analysis: not received treatment
received the wrong treatment
withdrawal due to AE or other reasons
Intention-to-treat Analysis and
Design
Peto, et al (1976); Lachin (2000) Intention-to-treat design
ITT principle requires that complete follow-up of all
randomized patients
All patients should be followed and all scheduled
evaluations should be performed until the death of the patients or the end of study irrespective of withdrawals from treatments
In an ITT design, withdrawal from treatment does not
Adjustment of Covariates
Covariates are factors that affect the primary efficacy
endpoints
prognostic, risk, or confounding factors age, gender, race, disease severity, etc.
Patient-specific covariates
Covariates measured before randomization
Baseline FEV1, FVC, etc.
Time-dependent covariates
Covariates measured after randomization
May be affected by the treatments
CD
Adjustment of Covariates
Stratification based on known covariates before
randomization and conduct of the trials.
Adjustment of covariates in the analysis improvement
of the precision of the estimated treatment effects.
The estimated treatment effect is unbiased without
adjustment of covariates as long as assignment of treatments is random.
Avoid to adjust the primary endpoints for the
covariates measured after randomization
Baseline Comparisons
Objectives
Description of patients characteristics with respect to
inclusion and exclusion criteria – targeted population
Measurements of initial disease severity Comparability between treatment groups Referenced values
Change from baseline
Baseline Comparisons
Baseline Data
Demographic data: age, gender, race Disease factors
Entry criteria
Duration, stage, severity of disease
Baseline values of primary efficacy and safety endpoints Concomitant illness
Relevant previous diseases Relevant previous treatments
Baseline Comparisons
Multiple Baseline
The FDA guidelines (Boyarsky and Paulson, 1987) for benign prostatic hyperplasia
Baseline measurements are collected in a placebo run-in period of 28 days.
Stability of the disease state Placebo effects
Existence Estimation
Baseline Comparisons
Stabilization of baseline
Use of Hotelling T2 statistic with Helmert matrix for p
multiple baseline jth row
The first j-1 elements are 0 The jth element is 1
The rest are 1/(1-p), j=1,…, p-1.
Combination of stabilized baseline
Clinical judgment
Average of three measurements of diastolic BP at 1-minute
interval if each measurement dose not differ from the average by more than 5 mm Hg.
Statistical justifications
Generalized least squares procedure (O’Brien, 1984)
1 ' ' ' 1 1 ' 1 ' 1 ' 1 ' 1 ' 1 ' 1 1 1 1 1 average Simple ) 1 1 ( 1 variance Estimated 1 1 1 (EGLS) squares least d generalize Estimated 1 1 as given is Variance 1 1 1 S Y Z S P N H N v S Y S X Y X p p i p i p p p x p p p i p p i i p p p i p i
Issues of Active Control Trials
Efficacy of a treatment should be established against placebo Equivalence or non-inferiority of test treatment to the active
control
Both superior to placebo Both inferior to placebo
The efficacy of the active control in the relevant indication
has been clearly established and quantified in well-design and well-documented superiority trials and that can be reliably expected to exhibit similar efficacy in the contemplated active control trial
Issues of Active Control Trials
Same design features Inclusion/exclusion criteria Dose
Primary endpoints
Objective of equivalence or non-inferiority must be stated in
the protocols with the equivalence limits
Inclusion of a concurrent placebo control for interval validity Superiority of active concurrent control to placebo
Multicenter Trials
Conducted under a single protocol SOP for conduct and evaluations
Centralized data management system Questions for analysis
Separate interpretation for small centers Domination by several large centers
Centers out of line, reasons? Trend in wrong direction?
Multicenter Trials
Purpose of analysis of a multicenter trial
Verify a consistent treatment effect
Obtain an estimate of the overall treatment effect
Fixed or random effects for centers
Definition of the overall treatment effect Stratified analysis
Multicenter Trials
Analysis of a multicenter trial under a mixed
effects model can be very complicated (Fleiss, 1986) and only approximate results are available (Chkravorti and Grizzle, 1975; Mielke and
McHugh, 1965)
Selection of centers
Requirement of a minimum # of patients Expertise and experience of investigators Special equipment provided by center
Multicenter Trials
Center is more appropriate to be considered
as a fixed effect than a random effect
(Fleiss, 1986, Goldberg and Koury, 1990)
The weights in stratified analysis may not
be optimal in the presence of
treatment-by-center interaction and reflects the efforts
that centers enroll and retain patients and
does not represent the composition of the
target population.
Descriptive Statistics for Site j
of a Multicenter Trial
Treatment
Statistics Placebo Test Drug Difference
N npj nTj
Mean Ypj Ytj dj=Ypj-Ytj Standard devistion Spj sTj Sj Confidence interval CIPj CITj Cij
) / 1 ( ) / 1 ( ); 2 , 2 / ( ) ( ) ( , ), 1 , 2 / ( ) ( ) 2 ( ) 1 ( ) 1 ( . 2 2 2 ij ij ij ij ij Tj Pj Tj Tj pj Pj j n n w n n t w s Y Y CI P T i n t n s Y CI n n s n s n s Definition of an overall treatment effect
Simple average of the treatment effects over all centers
The test based on the point estimate and its estimated variance is still valid regardless of treatment-by-center interaction and sample size.
Test for qualitative interaction (Simon and Gail, 1985)
J J Tj. Pj. j j=1 j=1 2 j 2 1 1 Estimate d= d = (Y -Y ) J J s w Estimated variance v(d)= J j j - + j j 2 2 j j d d
Q = I[d >0] and Q = I[d <0]
s s
Example of Multiplicity
Lepor (NEJM 1996; 335: 533-9)
Targeted population
Patients with benign prostatic hyperplasia
Treatment
Terazonsin (10 mg daily) α1 blocher
Finasteride (5 mg daily) 5 α-reductase inhibitor
Design: 2×2 factorial design
Group Terazosin Finasteride
Ⅰ Placebo Placebo
Example of Multiplicity
Primary efficacy endpoint
Sum of AUA symptom score Peak urinary flow rate (ml/sec)
Statistical parameters
Raw measurements Change from baseline
Time of evaluations
Baseline and 2, 4, 13, 26, 39, and 52 weeks post-randomization
Sub-groups
Race: Caucasian vs. non-Caucasian Age: <, >= 65 years old
Summary of Possible Number of
Comparisons
Item Number of Comparisons Pairwise Comparison Primary 3 Secondary 3 Visit 7 Primary Endpoint 2 Response 2 Race 2
Baseline Severity of Disease