An Introduction to Statistical Evaluation of Drug Products

(1)

by

Jen-pei Liu, Ph.D

.

National Taiwan University and National Health Research Institutes

[email protected] at Feng-Chia University November 17, 2005 Taichung, Taiwan

An Introduction to Statistical

Evaluation of Drug Products

(2)

(3)

Introduction

Effectiveness of Drug Products

Equivalence of Generic Drug Products

Estimation of Shelf-life of Drug Products

Quality Control of Drug Products

Evaluation of Diagnostic Devices

Summary

(4)

Introduction

Evidence from clinical trials must prove that the drug

is efficacious – drug is better than no drug

Inference from the sample (patients in trials) to the

targeted population (patients in clinical practice)

A decision process for clinical hypotheses based on

the trial objectives through statistical testing procedures

(5)

Introduction - Clinical Trials?

FDA (21 CFR 312.3, April 1994)

A clinical trial is the clinical investigation of a drug which is administered or dispensed to, or used involving one or more human subjects.

Chow and Liu (2004)

A clinical trial is the clinical investigation in which treatments are administered, dispensed or used

involving one or more human subjects for evaluation of the treatments.

(6)

Introduction –

Three Key Components

Experimental unit

A subject from a targeted population under study. For example

Healthy human subjects

(7)

Introduction –

Three Key Components

Treatment

It could be a placebo or any combinations of A new pharmaceutical entity

A new diet

A surgical procedure A diagnostic test

(8)

Introduction –

Three Key Components

Evaluation

--Efficacy analysis

Clinical endpoints

--Safety assessment

Adverse experience Laboratory test results

--Quality of life assessment --Pharmacoeconomics analysis --Outcomes research

(9)

Introduction – Statistical Designs

Parallel Group Designs

The patients are randomized to one of two or more groups, each group being allocated to a different treatment.

Advantages

Simple and easy to implement.

Less complicated analysis and interpretation.

Drawbacks

Relative large variability

(10)

(11)

Effectiveness of Drug Products

Example: Farlow et al (JAMA 1992; 268: 2523-2529) Randomized, double-blind, parallel groups Objective

To compare the tacrine (20, 40, 80 mg per day) versus placebo for probable Alzheimer’s disease

Null hypothesis

No difference in ADAS-cog scale between 80 mg of tacrine and placebo.

Alternative hypothesis

There exists a true difference in ADAS-cog scale between 80 mg of tacrine and placebo.

(12)

Effectiveness of Drug Products

Example: The NINDS rt-PA Stroke Study Group (NEJM 1996; 335: 841-7)

Objective for partⅠ

A greater proportion of patients with acute ischemic stroke treated with t-PA, as compared with those given placebo, have early improvement (>= 4 from baseline on NIHSS).

Primary efficacy endpoint

Proportion of patients with improvement

Null hypothesis

No difference in the proportions of patients with improvement between t-PA and placebo.

Alternative hypothesis

The minimal difference in the proportions of patients with improvement between t-PA and placebo is at least 24%.

(13)

Effectiveness of Drug Products

Statistical Hypothesis

H

_o

:P

_T

– P

_R

= 0 vs. H

_a

: P

_T

– P

_R

≥ 24%

A statistically significant difference indicates that

the new drug is better than the control.

(14)

Effectiveness of Drug Products

Decision Based on Results

True State No difference Minimal difference of 24%

No difference Correct TypeⅠError (false

positive) Minimal difference of 24% Type Ⅱ Error (false negative) Correct

(15)

Effectiveness of Drug Products

Decision Based on Results

Significance level: The consumer’s risk

The chance that the decision based on the results there is a minimal difference of 24% improvement between t-PA and placebo when in fact there is no difference.

Power = 1 – producer’s risk

The chance that decision based on the results concludes a minimal difference of 24% improvement between t-PA and placebo in fact there is.

(16)

Effectiveness of Drug Products

Statistical Testing Procedures

Step1

State the null and alternative hypotheses

Null hypothesis: the one to be questioned

No difference in the proportions of patients with improvement between t-PA and placebo.

Alternative hypothesis: the one of particular interest to investigators

The minimal difference in the proportions of patients with improvement between t-PA and placebo is at least 24%.

(17)

Effectiveness of Drug Products

Statistical Testing Procedures

Step 2

Choose an appropriate test statistics such as two-sample Z-statistic or t-statistic.

Step 3

Select the nominal significance level

the risk of typeⅠerror you are willing to commit

(18)

Effectiveness of Drug Products

Statistical Testing Procedures

Step 4

Determine the critical value, rejection region and decision

rule

For large samples, two-sided alternative and α= 0.05, the critical value is z(0.025) = 1.96 and rejection region will be the one such that the absolute value of the test statistic is greater than 1.96.

Decision rule

reject the null hypothesis if the resulting test statistic is in the rejection region.

(19)

Effectiveness of Drug Products

Statistical Testing Procedures

Step 1 to step 4 should be determined and

pre-specified in the Statistical Method

section of the protocol before initiation of

the study.

(20)

Effectiveness of Drug Products

Statistical Testing Procedures

Step 5

When the study is completed, complete the value of the test statistic specific in Step 2 (protocol).

Step 6

Make decision based on the resulting value of the test statistic and decision rule specified in Step 4 (protocol).

(21)

Effectiveness of Drug Products

Statistical Testing Procedures

Conclusion

Reject the null hypothesis

The sampling error is an unlikely explanation of

discrepancy between the null hypothesis and observed

values and the alternative hypothesis is proved at a risk of 5%.

Fail to reject null hypothesis

The sampling error is a likely explanation and the data fail to provide sufficient evidence to doubt the validity of the null hypothesis.

(22)

Effectiveness of Drug Products

P - value

If there is no difference in in the proportions of

patients with improvement between the two

groups (i.e., the null hypothesis is true), the

chance of obtaining a mean difference at least

as large as the observed mean difference.

If p-value is small, it implies that the observed

difference is unlikely to occur if there is no

difference in the proportions of patients with

improvement between t-PA and placebo.

(23)

Effectiveness of Drug Products

P - value

How small the p-value is sufficient enough to

conclude that there exists a true difference in the proportions of patients with improvement between t-PA and placebo?

It depends upon the risk that the investigator is

willing to take for committing typeⅠerror.

Nominal significance level = risk of typeⅠerror

(The chance of concluding existence of a true difference in the proportions of patients with

improvement between t-PA and placebo when in fact there is no difference)

(24)

Effectiveness of Drug Products

P - value

If the observed p-value < the nominal significance level (i.e., the

observed p-value < risk of type Ⅰerror), then conclude there exists a true difference in the proportions of patients with improvement

between t-PA and placebo

The nominal significance level = 5% or 1%

The p-value for the observed difference in the proportions of patients

with improvement between t-PA and placebo is 0.015.

If the nominal significance level is 5%, then it is concluded that there

is a difference in the proportions of patients with improvement

between t-PA and placebo in target population of patients with acute ischemic stroke .

(25)

Effectiveness of Drug Products

P - value

We can not make the same decision if the

nominal significance level is chosen to be 1%.

Should always reported the observed p-value

and let readers and reviewers judge the

strength of evidence by themselves and do not

use p-value < 0.05.

(26)

Equivalence of Generic Drug Products

New Drug Development (Innovative Drugs)

Length: an average of 12 years

Cost: an average of 800 million US dollars Success rate:

1 out of 10000 molecules screened 60% failure rate during clinical

development

(27)

Equivalence of Generic Drug Products

Abbreviated New Drug Application (Generic

Drugs)

After the patent of the innovative drug is

expired, all other manufacturers can produce the same drug product

Patents of most innovative drugs expires by

2005: big market

Requires evidence of bioequivalence between

(28)

Equivalence of Generic Drug Products

Pharmacokinetic Measures

Absorption Distribution Metabolism Elimination

Based on the plasma concentrations of active ingredients C₀, C₁,…,C_K measures at 0,

(29)

Equivalence of Generic Drug Products

Total Exposures

AUC (0-t_K), AUC (0-∞)

Peak Exposure

C_max – peak drug concentration

Partial Exposure

Partial AUC: AUC(0-t_i)

Other Measures

(30)

Equivalence of Generic Drug Products

Equivalence hypothesis

θ = μ

_T

-

μ

_R

H

_o

: μ

_T

-

μ

_R

≤ θ

_L

or μ

_T

-

μ

_R

≥ θ

_L

vs. H

_a

: θ

_L

< μ

_T

-

μ

_R

< θ

_U

(31)

Equivalence of Generic Drug Products

-

Average Bioequivalence

Two one-sided hypotheses:

H

_oL

:

μ_T - μ_R ≤ θ_Lvs.

H

_aL

:

μ_T - μ_R > θ_L

and

H

_oU

:

μ_T - μ_R ≥ θ_U

vs. H

_aU

:

μ_T - μ_R< θ_U

The parameter space of H_o is the union of the parameter spaces of H_oLand H_oU.

The parameter space of H_a is the intersection of the parameter spaces of H_aLand H_aU.

(32)

Equivalence of Generic Drug Products

- Average Bioequivalence

Schuirmann’s Two One-sided Tests

Procedure (TOST, 1987)

Conclude ABE if

T

_L

=

(f -

θ

_L

)/v(f) >

t(α, n

₁

+n

₂

–2)

and

T

_U

=

(f -

θ

_U

)/v(f) < -

t(α, n

₁

+n

₂

–2),

where f is the LSE for θ

(33)

Equivalence of Generic Drug Products

- Average Bioequivalence

Confidence Interval Approach

If a (1-2α)100% confidence interval for the difference μ_T - μ_R or the ratio μ’_T/μ’_Ris within the acceptance limits as recommended by the regulatory agency, then accept the test formulation; otherwise reject it. Westlake (1981)

α = 5% ⇒ 90% C.I.

log-scale: μ_T - μ_R: ±0.2231

Original Scale: μ’_T/μ’_R: (80%, 125%)

TOST is operationally equivalent to CI approach

This is the requirement by most of health regulatory agencies in the word

(34)

Estimation of Shelf-life

Shelf-life (expiration dating period)

Time interval during which a drug product is expected to remain within the specifications, provided that it is

stored under the conditions defined on the container label

Expiration date

The date placed on the container label of a drug product designating the time prior to which a batch of the

product is expected to remain within the approved shelf life, if stored under defined conditions, and after which it must not be used.

(35)

Estimation of Shelf-life

ICH Q1A(R2) guidance (2003) P.16

“An approach for analyzing data of quantitative attribute that is expected to change with time is to determine the time at which the 95% one-sided confidence limit for the mean curve

intersects the acceptance criterion”

ICH Q1E guidance (2004) p.11

A two-sided 95% confidence interval or 95% one-sided upper or lower confidence interval can be also used.

One-sided lower limit: known degradation One-sided upper limit: known impurities

Two-sided interval: unknown situation about increase or decrease of the assay with the time

(36)

Estimation of Shelf-life

0 3 6 9 12 storage time (month)

degradation curve

% of label claim

lower specification limit 95% lower

(37)

2006/8/24 Copyight by Jen-pei Liu, PhD 37

Estimation of Shelf-life

Only consider the case where the drug product

characteristic decreases linearly with time.

Model:

: jth response of assay at time X_j, α : Intercept(batch effect),

β : Slope(degradation rate),

X_j: time at which Y_j is observed, ε_j : random error ~ N(0,σ2 ). n j X Y_j = α + β _j +ε _j, =1,2,..., j Y

(38)

Estimation of Shelf-life

Construct (l-2α)100% C.I. for X for which the pth upper quantile of the distribution of Y given X is equal to some specified valueη.

The pth upper quantile of the distribution of Y given X is α+βX+σz_p, where z is the pth upper quantile of a standard normal distribution.

The value of X for which the hypothesis H₀: [(η - α - z_pσ)/ β] ≤ X

is not rejected at the 2α significance level will constitute an(l-2α )100% C.I. for X.

(39)

Estimation of Shelf-life

Stability study: mean degradation => p=0.5 => z_p=0.

H₀: [(η - α)/ β ] ≤ X => H₀: η - α – βX ≤ 0 H_a: η - α – βX > 0 => H₀: α + βX ≥ η H_a: α + βX < η => H₀: (η – α)/β ≤ X H_a: (η – α)/β > X

(40)

Estimation of Shelf-life

Stability study: mean degradation =>

p

=0.5

=> z_p=0.

H₀: [(η - α)/ β ] ≤ X => H₀: η - α – βX ≤ 0

The set of values of X for which H₀ is not rejected at the 2α significance level is

A = {X: [η - (a + bX)]2 _{≤ t}2

(41)

Estimation of Shelf-life

Common intercept Common slope Different intercepts Common Slope Common intercepts Different slopes Different intercepts Different slopes

(42)

Quality Control of Drug Products

Sampling Plan and Acceptance Criteria

Content uniformity of dosage units USP/NF general chapter[905]

Dissolution Testing

USP/NF general chapter[711] Disintegration Testing

(43)

2006/8/24 Copyight by Jen-pei Liu, PhD 43

Disintegration Testing

USP/NF general chapter [711]

p9

(44)

(45)

Disintegration Testing

Let Y be the disintegration time. Again we assume that Y follows a normal distribution with mean μ and variance σ2 _.

Also, let p = P{0 < Y < UL},

where UL denotes the specified limit. Since the disintegration test involves only one acceptance criterion at both stages of the sampling plan, the exact probability can be computed. Let

C₁₁ = {all six units disintegrate completely}, C₁₂= {one unit fails to disintegrate completely}, C₁₃= {two units fail to disintegrate completely},

C₂₁ = {11 of 12 additional units disintegrate completely}, C₂₂ = {all 12 additional units disintegrate completely}.

(46)

Then the exact probability of passing the

disintegration test is given as follows:

(

)

(

)

(

)

. ) 1 ( 87 ) 1 ( 6 1 2 6 p 1 1 6 1 11 12 } {C } C | {C } {C } C | C {C } {C pass} { 2 16 17 6 2 4 12 5 12 11 6 13 13 22 12 12 22 21 11 p p p p p p p p p p p p p P P P P P P − + − + = ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ − ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ + ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ − ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ + − ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ + = + + + =

(47)

It can easily be verified that if the desired probability of passing

the disintegration test is 0.5, p is approximately about 0.831. If, in addition, the specified time limit, UL, is 30 min, it follows that

where Z is a standard normal variable and Z(0.169) is the 16.9% upper quantile of a standard normal distribution. Therefore

Hence the contour for μ and σ2 _{is a linear decreasing function}

of given by where 0.957=Z(0.169) 831 . 0 )} 169 . 0 ( { } 30 { } 30 { } { = < = − < − = < = < = Z Z P Y P Y P UL Y P p σ μ σ μ (0.169) 0.957 30− ₌ ₌ Z σ μ μ σ = 30 − 957 . 0

(48)

(49)

(50)

Simplest Situation: Binary Outcomes from marker test (+, -) Binary Classification of Disease (Yes, No)

Design Matrix for Diagnostic Marker Tests

Correct (“Gold Standard”) True State of Disease Diagnosis Made from

Marker Test

Present (D) Absent (D) Total

Positive (T) Negative (T) a (1-β) c (β) b (α) d (1-α) m₁ m₂ Total n₁ n₂ N

(51)

Evaluation of Diagnostic Devices

Retrospective Sampling Plan (case-control)

Sensitivity (True Positive rate): Capacity for making

a correct diagnosis in subjects with the disease

Estimated Sensitivity:

100% x a/(a+c)

Specificity (True Negative rate): Capacity for

making a correct diagnosis in subjects without disease

Estimated Specificity:

(52)

Evaluation of Diagnostic Devices

Positive Predictive Value (Positive Predictive Accuracy): the

proportion of subjects with the disease given the positive results. = 100% x a/(a+b)

Negative Predictive Value (Negative Predictive Accuracy):

the proportion of subjects without the disease given the negative results.

= 100% x d/(c+d)

False positive rate: given the positive results ,the proportion

of subjects without the disease

=1 – positive predictive value = 100% x b/(a+b)

False negative rate: given the negative results, the proportion

of subjects with the disease

(53)

Other Definitions of False Positive Rate and False Negative Rate

False positive rate : given the subjects without thedisease, the

proportion of subjects with positive results = b/(b+d) = b/n₂

False negative rate : given the subjects with the disease, the

proportion of subjects with negative results = c/(a +c) = c/n₁

False positive rate = 1 - specificity False negative rate = 1 - sensitivity

(54)

Evaluation of Diagnostic Devices

Example (Feinstein, 2002) New Maker Test Result Diseased Cases Nondiseased Control Total Positive Negative 46 4 2 48 48 52 Total 50 50 100

(55)

Evaluation of Diagnostic Devices

Data from Example 2 (Feinstein, 2002)

Sensitivity = 100% x 46/50 = 92.0% Specificity = 100% x 48/50 = 96.0% Prevalence = 100% x 50/100 = 50.0% Positive Predictive Value

= 100% x 46/48 = 95.8% Negative Predictive Value

= 100% x 48/52 = 92.3%

False Positive Rate = 100% x 2/48 = 4.2% False Negative Rate = 100% x 4/52 = 7.7%

(56)

Evaluation of Diagnostic Devices

Type of Diagnostic Markers

Binary Test Results (+,-) Multiple Categorical Results

Abnormality Rating Severity Rating

Urine test: None, trace, 1+, 2+ HER2 test: 0, 1+, 2+, 3+

Continuous Test Results

PSA

Intraocular Pressure Glucose tolerance test Gene expression level

(57)

Evaluation of Diagnostic Devices

To convert a ranking scale or a continuous measurement into a binary outcomes (+,–), we need a cutoff point or threshold.

Example:

FBG > 126mg/dL DM (+)

≤ 126mg/dL DM (–)

S-T Depression in Exercise Stress Test Class D < 1.5 min CAD (+)

(58)

Evaluation of Diagnostic Devices

At a specific threshold, relationship of

sensitivity, specificity, false positive and false negative rates can be interpreted through

hypothesis testing:

H0:Absence of the disease H1:Presence of the disease

α =Pr[Type I Error]

=Pr[test positive | no disease] β=Pr[Type II Error]

(59)

Evaluation of Diagnostic Devices

Variable, X μ_Ν μ_D Threshold β α Specificity=1-α Sensitivity=1-β Normal Diseased

(60)

Evaluation of Diagnostic Devices

Sensitivity = Pr[test positive | disease] = 1 – β

= power of the statistical procedure Specificity = Pr[test negative | no disease]

= 1 – α

α↑ ⇒ β↓ ⇒ (1-β)↑

A test with a high sensitivity also has a high incorrect

positive rate but a low incorrect negative rate. A test with a high specificity also has a high incorrect

(61)

Evaluation of Diagnostic Devices

At each individual threshold (cut-off), sensitivity and

specificity can be computed.

A Receiver Operating Characteristic (ROC) curve is a

graphic presentation of sensitivity against 1-specificity.

It is a path in the unit square, from the lower left

corner to the upper right corner. In fact, it can be viewed as a cumulative distribution function.

(62)

Evaluation of Diagnostic Devices

In a useless marker test, the ROC curve will be a straight

line at a 45o _angle.

The area under the ROC curve provides a summary index

for diagnostic accuracy across over all possible values of thresholds.

The range of the area under the ROC curve is from 0.5

(50%) to 1.0(100%)

In a useless marker test, the area under the ROC curve is

50% which is the same as flopping a fair coin.

For non-inferiority or equivalence test based on the paired

ROC curve area, see Liu, et al. (2005, Statistics in

(63)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1-SPECIFICITY S E N S IT IV IT Y Perfect test Ordinary test Useless test Source: Feinstein (2002)

(64)

(65)

Summary

Descriptive Statistics

Description of characteristics and

estimation of special attributes of drug and device products

Inferential Statistics

Decision-making tool for approval of drug and device products for marketing

(66)

References

Chow, SC and Liu, JP (2004) Design and

Analysis of Clinical Trials, 2nd Ed. Wiley

Chow, SC and Liu, JP (2000) Design and

Analysis of Bioavailability and Bioequivalence Studies, Marcel Dekker, Inc.

Chow, SC and Liu, JP (1995) Statistical

Design and Analysis in Pharmaceutical Sciences, Marcel Dekker, Inc.