• 沒有找到結果。

An Introduction to Statistical Evaluation of Drug Products

N/A
N/A
Protected

Academic year: 2021

Share "An Introduction to Statistical Evaluation of Drug Products"

Copied!
66
0
0

加載中.... (立即查看全文)

全文

(1)

by

Jen-pei Liu, Ph.D

.

National Taiwan University and National Health Research Institutes

[email protected] at Feng-Chia University November 17, 2005 Taichung, Taiwan

An Introduction to Statistical

Evaluation of Drug Products

(2)
(3)

Introduction

Effectiveness of Drug Products

Equivalence of Generic Drug Products

Estimation of Shelf-life of Drug Products

Quality Control of Drug Products

Evaluation of Diagnostic Devices

Summary

(4)

Introduction

„ Evidence from clinical trials must prove that the drug

is efficacious – drug is better than no drug

„ Inference from the sample (patients in trials) to the

targeted population (patients in clinical practice)

„ A decision process for clinical hypotheses based on

the trial objectives through statistical testing procedures

(5)

Introduction - Clinical Trials?

„

FDA (21 CFR 312.3, April 1994)

A clinical trial is the clinical investigation of a drug which is administered or dispensed to, or used involving one or more human subjects.

„

Chow and Liu (2004)

A clinical trial is the clinical investigation in which treatments are administered, dispensed or used

involving one or more human subjects for evaluation of the treatments.

(6)

Introduction –

Three Key Components

„

Experimental unit

A subject from a targeted population under study. For example

„ Healthy human subjects

(7)

Introduction –

Three Key Components

„

Treatment

It could be a placebo or any combinations of „ A new pharmaceutical entity

„ A new diet

„ A surgical procedure „ A diagnostic test

(8)

Introduction –

Three Key Components

„ Evaluation

--Efficacy analysis

„ Clinical endpoints

--Safety assessment

„ Adverse experience „ Laboratory test results

--Quality of life assessment --Pharmacoeconomics analysis --Outcomes research

(9)

Introduction – Statistical Designs

„ Parallel Group Designs

The patients are randomized to one of two or more groups, each group being allocated to a different treatment.

„ Advantages

„ Simple and easy to implement.

„ Less complicated analysis and interpretation.

„ Drawbacks

„ Relative large variability

(10)
(11)

Effectiveness of Drug Products

„ Example: Farlow et al (JAMA 1992; 268: 2523-2529) „ Randomized, double-blind, parallel groups „ Objective

To compare the tacrine (20, 40, 80 mg per day) versus placebo for probable Alzheimer’s disease

„ Null hypothesis

No difference in ADAS-cog scale between 80 mg of tacrine and placebo.

„ Alternative hypothesis

There exists a true difference in ADAS-cog scale between 80 mg of tacrine and placebo.

(12)

Effectiveness of Drug Products

„ Example: The NINDS rt-PA Stroke Study Group (NEJM 1996; 335: 841-7)

„ Objective for partⅠ

A greater proportion of patients with acute ischemic stroke treated with t-PA, as compared with those given placebo, have early improvement (>= 4 from baseline on NIHSS).

„ Primary efficacy endpoint

Proportion of patients with improvement

„ Null hypothesis

No difference in the proportions of patients with improvement between t-PA and placebo.

„ Alternative hypothesis

The minimal difference in the proportions of patients with improvement between t-PA and placebo is at least 24%.

(13)

Effectiveness of Drug Products

Statistical Hypothesis

H

o

:P

T

– P

R

= 0 vs. H

a

: P

T

– P

R

≥ 24%

A statistically significant difference indicates that

the new drug is better than the control.

(14)

Effectiveness of Drug Products

Decision Based on Results

True State No difference Minimal difference of 24%

No difference Correct TypeⅠError (false

positive) Minimal difference of 24% Type Ⅱ Error (false negative) Correct

(15)

Effectiveness of Drug Products

Decision Based on Results

„ Significance level: The consumer’s risk

The chance that the decision based on the results there is a minimal difference of 24% improvement between t-PA and placebo when in fact there is no difference.

„ Power = 1 – producer’s risk

The chance that decision based on the results concludes a minimal difference of 24% improvement between t-PA and placebo in fact there is.

(16)

Effectiveness of Drug Products

Statistical Testing Procedures

„

Step1

State the null and alternative hypotheses

„ Null hypothesis: the one to be questioned

No difference in the proportions of patients with improvement between t-PA and placebo.

„ Alternative hypothesis: the one of particular interest to investigators

The minimal difference in the proportions of patients with improvement between t-PA and placebo is at least 24%.

(17)

Effectiveness of Drug Products

Statistical Testing Procedures

„

Step 2

Choose an appropriate test statistics such as two-sample Z-statistic or t-statistic.

„

Step 3

„ Select the nominal significance level

the risk of typeⅠerror you are willing to commit

(18)

Effectiveness of Drug Products

Statistical Testing Procedures

„ Step 4

„ Determine the critical value, rejection region and decision

rule

For large samples, two-sided alternative and α= 0.05, the critical value is z(0.025) = 1.96 and rejection region will be the one such that the absolute value of the test statistic is greater than 1.96.

„ Decision rule

reject the null hypothesis if the resulting test statistic is in the rejection region.

(19)

Effectiveness of Drug Products

Statistical Testing Procedures

Step 1 to step 4 should be determined and

pre-specified in the Statistical Method

section of the protocol before initiation of

the study.

(20)

Effectiveness of Drug Products

Statistical Testing Procedures

„

Step 5

When the study is completed, complete the value of the test statistic specific in Step 2 (protocol).

„

Step 6

Make decision based on the resulting value of the test statistic and decision rule specified in Step 4 (protocol).

(21)

Effectiveness of Drug Products

Statistical Testing Procedures

„ Conclusion

„ Reject the null hypothesis

The sampling error is an unlikely explanation of

discrepancy between the null hypothesis and observed

values and the alternative hypothesis is proved at a risk of 5%.

„ Fail to reject null hypothesis

The sampling error is a likely explanation and the data fail to provide sufficient evidence to doubt the validity of the null hypothesis.

(22)

Effectiveness of Drug Products

P - value

„

If there is no difference in in the proportions of

patients with improvement between the two

groups (i.e., the null hypothesis is true), the

chance of obtaining a mean difference at least

as large as the observed mean difference.

„

If p-value is small, it implies that the observed

difference is unlikely to occur if there is no

difference in the proportions of patients with

improvement between t-PA and placebo.

(23)

Effectiveness of Drug Products

P - value

„ How small the p-value is sufficient enough to

conclude that there exists a true difference in the proportions of patients with improvement between t-PA and placebo?

„ It depends upon the risk that the investigator is

willing to take for committing typeⅠerror.

„ Nominal significance level = risk of typeⅠerror

(The chance of concluding existence of a true difference in the proportions of patients with

improvement between t-PA and placebo when in fact there is no difference)

(24)

Effectiveness of Drug Products

P - value

„ If the observed p-value < the nominal significance level (i.e., the

observed p-value < risk of type Ⅰerror), then conclude there exists a true difference in the proportions of patients with improvement

between t-PA and placebo

„ The nominal significance level = 5% or 1%

„ The p-value for the observed difference in the proportions of patients

with improvement between t-PA and placebo is 0.015.

„ If the nominal significance level is 5%, then it is concluded that there

is a difference in the proportions of patients with improvement

between t-PA and placebo in target population of patients with acute ischemic stroke .

(25)

Effectiveness of Drug Products

P - value

„

We can not make the same decision if the

nominal significance level is chosen to be 1%.

„

Should always reported the observed p-value

and let readers and reviewers judge the

strength of evidence by themselves and do not

use p-value < 0.05.

(26)

Equivalence of Generic Drug Products

„ New Drug Development (Innovative Drugs)

„ Length: an average of 12 years

„ Cost: an average of 800 million US dollars „ Success rate:

„ 1 out of 10000 molecules screened „ 60% failure rate during clinical

development

(27)

Equivalence of Generic Drug Products

„ Abbreviated New Drug Application (Generic

Drugs)

„ After the patent of the innovative drug is

expired, all other manufacturers can produce the same drug product

„ Patents of most innovative drugs expires by

2005: big market

„ Requires evidence of bioequivalence between

(28)

Equivalence of Generic Drug Products

„

Pharmacokinetic Measures

„ Absorption „ Distribution „ Metabolism „ Elimination

Based on the plasma concentrations of active ingredients C0, C1,…,CK measures at 0,

(29)

Equivalence of Generic Drug Products

„

Total Exposures

„ AUC (0-tK), AUC (0-∞)

„

Peak Exposure

„ Cmax – peak drug concentration

„

Partial Exposure

„ Partial AUC: AUC(0-ti)

„

Other Measures

(30)

Equivalence of Generic Drug Products

Equivalence hypothesis

θ = μ

T

-

μ

R

H

o

: μ

T

-

μ

R

≤ θ

L

or μ

T

-

μ

R

≥ θ

L

vs. H

a

: θ

L

< μ

T

-

μ

R

< θ

U

(31)

Equivalence of Generic Drug Products

-

Average Bioequivalence

Two one-sided hypotheses:

H

oL

:

μT - μR ≤ θL vs.

H

aL

:

μT - μR > θL

and

H

oU

:

μT - μR ≥ θU

vs. H

aU

:

μT - μR < θU

The parameter space of Ho is the union of the parameter spaces of HoLand HoU.

The parameter space of Ha is the intersection of the parameter spaces of HaLand HaU.

(32)

Equivalence of Generic Drug Products

- Average Bioequivalence

Schuirmann’s Two One-sided Tests

Procedure (TOST, 1987)

Conclude ABE if

T

L

=

(f -

θ

L

)/v(f) >

t(α, n

1

+n

2

–2)

and

T

U

=

(f -

θ

U

)/v(f) < -

t(α, n

1

+n

2

–2),

where f is the LSE for θ

(33)

Equivalence of Generic Drug Products

- Average Bioequivalence

Confidence Interval Approach

If a (1-2α)100% confidence interval for the difference μT - μR or the ratio μ’T/μ’R is within the acceptance limits as recommended by the regulatory agency, then accept the test formulation; otherwise reject it. Westlake (1981)

α = 5% ⇒ 90% C.I.

log-scale: μT - μR: ±0.2231

Original Scale: μ’T/μ’R: (80%, 125%)

TOST is operationally equivalent to CI approach

This is the requirement by most of health regulatory agencies in the word

(34)

Estimation of Shelf-life

„ Shelf-life (expiration dating period)

Time interval during which a drug product is expected to remain within the specifications, provided that it is

stored under the conditions defined on the container label

„ Expiration date

The date placed on the container label of a drug product designating the time prior to which a batch of the

product is expected to remain within the approved shelf life, if stored under defined conditions, and after which it must not be used.

(35)

Estimation of Shelf-life

ICH Q1A(R2) guidance (2003) P.16

“An approach for analyzing data of quantitative attribute that is expected to change with time is to determine the time at which the 95% one-sided confidence limit for the mean curve

intersects the acceptance criterion”

ICH Q1E guidance (2004) p.11

A two-sided 95% confidence interval or 95% one-sided upper or lower confidence interval can be also used.

One-sided lower limit: known degradation One-sided upper limit: known impurities

Two-sided interval: unknown situation about increase or decrease of the assay with the time

(36)

Estimation of Shelf-life

0 3 6 9 12 storage time (month)

degradation curve

% of label claim

lower specification limit 95% lower

(37)

2006/8/24 Copyight by Jen-pei Liu, PhD 37

Estimation of Shelf-life

„ Only consider the case where the drug product

characteristic decreases linearly with time.

„ Model:

: jth response of assay at time Xj, α : Intercept(batch effect),

β : Slope(degradation rate),

Xj: time at which Yj is observed, εj : random error ~ N(0,σ2 ). n j X Yj = α + β jj, =1,2,..., j Y

(38)

Estimation of Shelf-life

Construct (l-2α)100% C.I. for X for which the pth upper quantile of the distribution of Y given X is equal to some specified valueη.

The pth upper quantile of the distribution of Y given X is α+βX+σzp, where z is the pth upper quantile of a standard normal distribution.

The value of X for which the hypothesis H0: [(η - α - zpσ)/ β] ≤ X

is not rejected at the 2α significance level will constitute an(l-2α )100% C.I. for X.

(39)

Estimation of Shelf-life

„ Stability study: mean degradation => p=0.5 => zp=0.

H0: [(η - α)/ β ] ≤ X => H0: η - α – βX ≤ 0 Ha: η - α – βX > 0 => H0: α + βX ≥ η Ha: α + βX < η => H0: (η – α)/β ≤ X Ha: (η – α)/β > X

(40)

Estimation of Shelf-life

„ Stability study: mean degradation =>

p

=0.5

=> zp=0.

H0: [(η - α)/ β ] ≤ X => H0: η - α – βX ≤ 0

The set of values of X for which H0 is not rejected at the 2α significance level is

A = {X: [η - (a + bX)]2 ≤ t2

(41)

Estimation of Shelf-life

Common intercept Common slope Different intercepts Common Slope Common intercepts Different slopes Different intercepts Different slopes

(42)

Quality Control of Drug Products

„

Sampling Plan and Acceptance Criteria

Content uniformity of dosage units USP/NF general chapter[905]

Dissolution Testing

USP/NF general chapter[711] Disintegration Testing

(43)

2006/8/24 Copyight by Jen-pei Liu, PhD 43

Disintegration Testing

USP/NF general chapter [711]

p9

(44)
(45)

„ Disintegration Testing

Let Y be the disintegration time. Again we assume that Y follows a normal distribution with mean μ and variance σ2 .

„ Also, let p = P{0 < Y < UL},

where UL denotes the specified limit. Since the disintegration test involves only one acceptance criterion at both stages of the sampling plan, the exact probability can be computed. Let

C11 = {all six units disintegrate completely}, C12= {one unit fails to disintegrate completely}, C13= {two units fail to disintegrate completely},

C21 = {11 of 12 additional units disintegrate completely}, C22 = {all 12 additional units disintegrate completely}.

(46)

„ Then the exact probability of passing the

disintegration test is given as follows:

(

)

(

)

(

)

. ) 1 ( 87 ) 1 ( 6 1 2 6 p 1 1 6 1 11 12 } {C } C | {C } {C } C | C {C } {C pass} { 2 16 17 6 2 4 12 5 12 11 6 13 13 22 12 12 22 21 11 p p p p p p p p p p p p p P P P P P P − + − + = ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ − ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ + ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ − ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ + − ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ + = + + + =

(47)

„ It can easily be verified that if the desired probability of passing

the disintegration test is 0.5, p is approximately about 0.831. If, in addition, the specified time limit, UL, is 30 min, it follows that

where Z is a standard normal variable and Z(0.169) is the 16.9% upper quantile of a standard normal distribution. Therefore

Hence the contour for μ and σ2 is a linear decreasing function

of given by where 0.957=Z(0.169) 831 . 0 )} 169 . 0 ( { } 30 { } 30 { } { = < = − < − = < = < = Z Z P Y P Y P UL Y P p σ μ σ μ (0.169) 0.957 30− = = Z σ μ μ σ = 30 − 957 . 0

(48)
(49)
(50)

Simplest Situation: Binary Outcomes from marker test (+, -) Binary Classification of Disease (Yes, No)

Design Matrix for Diagnostic Marker Tests

Correct (“Gold Standard”) True State of Disease Diagnosis Made from

Marker Test

Present (D) Absent (D) Total

Positive (T) Negative (T) a (1-β) c (β) b (α) d (1-α) m1 m2 Total n1 n2 N

(51)

Evaluation of Diagnostic Devices

Retrospective Sampling Plan (case-control)

„ Sensitivity (True Positive rate): Capacity for making

a correct diagnosis in subjects with the disease

„ Estimated Sensitivity:

100% x a/(a+c)

„ Specificity (True Negative rate): Capacity for

making a correct diagnosis in subjects without disease

„ Estimated Specificity:

(52)

Evaluation of Diagnostic Devices

„ Positive Predictive Value (Positive Predictive Accuracy): the

proportion of subjects with the disease given the positive results. = 100% x a/(a+b)

„ Negative Predictive Value (Negative Predictive Accuracy):

the proportion of subjects without the disease given the negative results.

= 100% x d/(c+d)

„ False positive rate: given the positive results ,the proportion

of subjects without the disease

=1 – positive predictive value = 100% x b/(a+b)

„ False negative rate: given the negative results, the proportion

of subjects with the disease

(53)

Other Definitions of False Positive Rate and False Negative Rate

False positive rate : given the subjects without thedisease, the

proportion of subjects with positive results = b/(b+d) = b/n2

False negative rate : given the subjects with the disease, the

proportion of subjects with negative results = c/(a +c) = c/n1

False positive rate = 1 - specificity False negative rate = 1 - sensitivity

(54)

Evaluation of Diagnostic Devices

Example (Feinstein, 2002) New Maker Test Result Diseased Cases Nondiseased Control Total Positive Negative 46 4 2 48 48 52 Total 50 50 100

(55)

Evaluation of Diagnostic Devices

Data from Example 2 (Feinstein, 2002)

„ Sensitivity = 100% x 46/50 = 92.0% „ Specificity = 100% x 48/50 = 96.0% „ Prevalence = 100% x 50/100 = 50.0% „ Positive Predictive Value

= 100% x 46/48 = 95.8% „ Negative Predictive Value

= 100% x 48/52 = 92.3%

„ False Positive Rate = 100% x 2/48 = 4.2% „ False Negative Rate = 100% x 4/52 = 7.7%

(56)

Evaluation of Diagnostic Devices

Type of Diagnostic Markers

„ Binary Test Results (+,-) „ Multiple Categorical Results

Abnormality Rating Severity Rating

Urine test: None, trace, 1+, 2+ HER2 test: 0, 1+, 2+, 3+

„ Continuous Test Results

PSA

Intraocular Pressure Glucose tolerance test Gene expression level

(57)

Evaluation of Diagnostic Devices

„ To convert a ranking scale or a continuous measurement into a binary outcomes (+,–), we need a cutoff point or threshold.

„

Example:

„ FBG > 126mg/dL DM (+)

„ ≤ 126mg/dL DM (–)

„ S-T Depression in Exercise Stress Test „ Class D < 1.5 min CAD (+)

(58)

Evaluation of Diagnostic Devices

At a specific threshold, relationship of

sensitivity, specificity, false positive and false negative rates can be interpreted through

hypothesis testing:

H0:Absence of the disease H1:Presence of the disease

α =Pr[Type I Error]

=Pr[test positive | no disease] β=Pr[Type II Error]

(59)

Evaluation of Diagnostic Devices

Variable, X μΝ μD Threshold β α Specificity=1-α Sensitivity=1-β Normal Diseased

(60)

Evaluation of Diagnostic Devices

Sensitivity = Pr[test positive | disease] = 1 – β

= power of the statistical procedure Specificity = Pr[test negative | no disease]

= 1 – α

„ α↑ ⇒ β↓ ⇒ (1-β)↑

„ A test with a high sensitivity also has a high incorrect

positive rate but a low incorrect negative rate. A test with a high specificity also has a high incorrect

(61)

Evaluation of Diagnostic Devices

„ At each individual threshold (cut-off), sensitivity and

specificity can be computed.

„ A Receiver Operating Characteristic (ROC) curve is a

graphic presentation of sensitivity against 1-specificity.

„ It is a path in the unit square, from the lower left

corner to the upper right corner. In fact, it can be viewed as a cumulative distribution function.

(62)

Evaluation of Diagnostic Devices

„ In a useless marker test, the ROC curve will be a straight

line at a 45o angle.

„ The area under the ROC curve provides a summary index

for diagnostic accuracy across over all possible values of thresholds.

„ The range of the area under the ROC curve is from 0.5

(50%) to 1.0(100%)

„ In a useless marker test, the area under the ROC curve is

50% which is the same as flopping a fair coin.

„ For non-inferiority or equivalence test based on the paired

ROC curve area, see Liu, et al. (2005, Statistics in

(63)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1-SPECIFICITY S E N S IT IV IT Y Perfect test Ordinary test Useless test Source: Feinstein (2002)

(64)
(65)

Summary

„

Descriptive Statistics

„ Description of characteristics and

estimation of special attributes of drug and device products

„

Inferential Statistics

„ Decision-making tool for approval of drug and device products for marketing

(66)

References

„ Chow, SC and Liu, JP (2004) Design and

Analysis of Clinical Trials, 2nd Ed. Wiley

„ Chow, SC and Liu, JP (2000) Design and

Analysis of Bioavailability and Bioequivalence Studies, Marcel Dekker, Inc.

„ Chow, SC and Liu, JP (1995) Statistical

Design and Analysis in Pharmaceutical Sciences, Marcel Dekker, Inc.

參考文獻

相關文件

Hope theory: A member of the positive psychology family. Lopez (Eds.), Handbook of positive

Sunya, the Nothingness in Buddhism, is a being absolutely non-linguistic; so the difference between the two &#34;satyas&#34; is in fact the dif- ference between the linguistic and

Microphone and 600 ohm line conduits shall be mechanically and electrically connected to receptacle boxes and electrically grounded to the audio system ground point.. Lines in

The remaining positions contain //the rest of the original array elements //the rest of the original array elements.

The min-max and the max-min k-split problem are defined similarly except that the objectives are to minimize the maximum subgraph, and to maximize the minimum subgraph respectively..

Experiment a little with the Hello program. It will say that it has no clue what you mean by ouch. The exact wording of the error message is dependent on the compiler, but it might

A decision scheme based on OWA operator for an evaluation programme: an approximate reasoning approach. A decision scheme based on OWA operator for an evaluation programme:

The methodology involved in the study is based on the theory of innovation adoption, including the fact proposed by Holak (1988) that product attributes, consumer characteris- tics