Summary - Literature Review - 定量體外檢驗試劑線性確校統計評估方法之研究

CHAPTER 2 Literature Review

2.4 Summary

As we introduced as above, both the current estimation method of CLSI EP6-A guideline and uncorrected Kroll’s method for linearity assessment in assay validation will inflate the type I error. In particular, the uncorrected Kroll method will also conclude the linearity incorrectly because of the formulation of the incorrect hypothesis and corresponding rejection rule. In Chapter 3, we will introduce various measures for assessing linearity based on the aggregate criterion and disaggregate criterion which will be discussed and compared in our research. In addition to the statistical testing

improving the shortcoming of the current methods in this chapter will also be proposed in Chapter 3. The comparison of their performances in empirical sizes and powers are made by the simulation study.

Chapter 3 Criterion for Assessing Linearity

In this chapter, we summarize the measures for assessing linearity based on the disaggregate criterion and aggregate criterion which are reviewed and proposed in our research. Their corresponding statistical hypotheses are also introduced. In addition, the discussion for difference of the disaggregate criterion and aggregate criterion on the impact of their performance of assessment linearity are also addressed .

3.1 Disaggregate Criterion

As we introduced in Section 2.2 of Chapter 2, following the experiment recommended by EP6-A (Tholen et al., 2003), the guideline proposes that even though the best-fitted model is not linear, the linearity of the analytical procedure can be claimed if the magnitude of deviations from the linearity at each concentration is within some pre-specified allowable limit of δ . The hypothesis corresponded to the proposed ₀ evaluation rule can be formulated as

0i Pi Li 0 ai Pi Li 0

H : μ - μ δ vs. H : μ - μ < δ , for all i=1 ,..., L≥ . (3.1.1)

where the difference in predicted means between the best-fit nonlinear and linear model

Pi Li

μ −μ represents a measure for the degree of the deviation from linearity at each concentration level. Since hypothesis (3.1.1) requires all differences in the predicted means between the best-fitting and linear models be within the pre-specified allowable limit, it is a disaggregate criterion.

3.2 Aggregate Criterion

3.2.1 Average Deviation from Linearity (ADL)

Recall the definition of ADL proposed by Kroll et al. (Kroll, 2000) defined as the following:

where μ is the population mean concentration for all solutions of the assay.

ADL is a scaled deviation defined as the square root of sum of squares of the difference in predicted means between the best-fitted and linear models. The correct hypothesis for evaluation of linearity based on ADL proposed by Kroll, et al. (Kroll, 2000) is given as H0: θ ≥ θ0 vs. Ha: θ < θ0, (3.2.1.2) where θ⁰is the maximum allowable average deviation from linearity.

Unlike the evaluation rule of EP6-A which requires μ_Pi−μ_Li be within some pre-specified allowable limit of δ at all concentration levels, Hypothesis (3.2.1.2) ₀ only requires an summarized measure ADL be less than δ . Since ADL is a function of ₀ standardized sum of squares of the differences in the predicted means between the

best-fitted and linear models, it is an aggregate criterion.

3.2.2 Sum of Squares of Deviations from Linearity (SSDL)

According to the approved CLSI guideline EP6-A (Tholen et al., 2003), the linearity of the proposed analytical method can be concluded if the deviation from linearity is smaller than some pre-specified limit δ0 at all concentrations:

Pi Li 0

μ μ < δ − , for i = 1,…, L.

As a result, a natural aggregate metric for assessment of assay linearity is the sum of squares of deviations from linearity (SSDL) denoted by τ defined as

L 2

Pi Li

i=1

τ =

∑

(μ - μ ) . (3.2.2.1) It follows that the hypotheses for proving the assay linearity can be formulated based on SSDL as follows:

Similar to ADL, SSDL is an aggregate criterion but formulated directly by the nature of disaggregate criterion proposed by CLSI guideline as the form of model-by-dilution interaction. However, the corresponding statistical hypothesis is not to detect existence of the model-by-dilution interaction but rather to verify whether the model-by-dilution interaction is within some pre-specified allowable upper limit.

3.2.3 Coefficient of Variation of the Deviations from Linearity (CVDL)

The CVDL is the scaled deviations scaled by σ, the variability or repeatability of the best-fitted model for assessment of linearity defined as the square root of the average sum of squares of the scaled deviations by σ:

The hypotheses for evaluation of linearity is given for CVDL as:

H0: η ≥ η0 vs. Ha: η < η0. (3.2.3.2)

∑

is also the component of CVDL for assessment of linearity, CVDL

is an aggregate criterion. Moreover, CVDL contains not only the information of the deviation from the linearity but also the repeatability expressed by the residual mean square obtained from the best-fitted model.

3.3 Summary

As we introduced, the disaggregate criterion proposed by CLSI EP6-A guideline (Tholen et al., 2003) requires μ_Pi−μ_Li to be within some pre-specified allowable limit at all concentration levels, while the aggregate criteria of ADL, SSDL and CVDL only require a summary measure of

their corresponding allowable limit. As a result, the evaluation based a disaggregate criterion is more conservative than an aggregate criterion since it requires an intersection-union test. In addition, ^L _Pi _Li ²

i=1

(μ - μ )

∑

is actually the model-by-dilution

interaction. However, unlike the traditional hypothesis to test the existence of the interaction, our goal is to test if the magnitude of the interaction is within the allowable bound. In the next few chapters, the statistical testing procedures will be proposed for assessing linearity based on the disaggregate criterion and aggregate criterion introduced in this chapter. The comparison of the proposed methods and current methods will also be performed via the simulation studies and numerical examples.

Chapter 4 TOST Procedure and Corrected Kroll’s Method

In this chapter, we propose the an one-sided tests procedure (TOST) and the corrected Kroll’s method which are more suitable methods for assessment of linearity by improving the shortcomings of the current methods. The proposed TOST procedure is the method correspondeding to the estimation method of EP6-A (Tholen et al., 2003) which ignore the variability of the estimators, while the corrected Kroll’s method is used to correct the inappropriate statistical hypothesis of the uncorrected Kroll method proposed by Kroll et al. (Kroll, 2000).

4.1 Two One-sided Test Procedure

With respect to the interval hypothesis in (2.2.1), it can also be decomposed into two sets of one-sided hypotheses as,

0 0

0iL Pi Li aiL Pi Li

H : μ - μ -δ vs. H : μ - μ > -δ , for all i=1 ,..., L≤ ^,

and (4.1.1)

0 0

0iU Pi Li aiU Pi Li

H : μ - μ > δ vs. H : μ - μ < δ , for all i=1 ,..., L.

An unbiased estimator of μPi - μLi is the LS estimator lY - lPi Y , i=1,..,L. Define Li

L square obtained from the best-fitted model with degrees of freedom LJ-d-1,

P - L

W W W , W_P and W_L are the projection matrices corresponding to the column spaces spanned by the design matrices of the best–fitted and linear models, respectively, i.e., W_P =X X X_P( ^'_P _P)⁻¹X^'_P and W_L =X X X_L( ^'_L _L)⁻¹X . ^'_L

It follows that the 100(1 - 2α)% confidence interval for μ^Pi - μ^Li is given as l_Pi l_Li _di

α, LJ-d-1

(Y −Y ) ± t σ , i=1,…,L, (4.1.2) where t_α,LJ-d-1 is the upper α percentile of a central distribution with degree of freedom of LJ-d-1.

The linearity of an analytical method can be concluded at the α significance level if the 100(1-2α)% confidence interval for μPi - μLi is completely contained within the pre-specified allowable limit of δ⁰ at each concentration level, I=1,...,L. This method is referred to as the two one-sided tests (TOST) procedure which the statistical testing procedure is proposed instead f the estimation method of EP6-A.

4.2 Corrected Kroll’s Method

The main drawback of the method for evaluation of linearity proposed by Kroll, et al.

(Kroll, 2000) is the incorrect formulation of the hypotheses. We suggest the hypothesis for assessment of linearity based on ADL should be formulated as follows:

H0: θ ≥ θ⁰ vs. Ha: θ < θ⁰. (4.2.1) where θ0 is the allowable margin of ADL for linearity.

Consequently, the linearity of an analytical procedure is concluded at the 5%

significance level if

q0.05

θ < σ

μ LJ

, (4.2.2)

where q0.05 is the 5^th percentile of a non-central chi-square distribution with degrees of freedom d-1 and non-centrality parameter LJθ /(σ/X)²₀ ². This method is referred to as the corrected Kroll method.

4.3 Simulation Study

We conduct a simulation study to compare the empirical sizes and powers of the uncorrected Kroll method, the corrected Kroll method, the estimation method of EP6-A and two one-sided tests procedures. Following the specification of the experiment designs for evaluation of linearity, the number of solutions (or dilutions) of different concentrations is set to be 5 or 7 and the number of replications at each concentration is 2, 3, or 4.

Throughout the simulation, the allowable margin of linearity based on ADL, θ⁰, is specified as 0.05 while the margin for the estimation and TOST procedures, δ0, is

specified as 0.2. There are two types of comparison of size. The first type is to compare the size between the uncorrected Kroll with corrected Kroll methods for which the data were generated at the value of 0.05 for ADL as recommended by Kroll, et al. (Kroll, 2000). The second type is to compare the size between the estimation method suggested in the approved CLSI guideline and the TOST procedure for which the data were generated with the true difference, μPi - μCi at some solutions being either 0.2 or -0.2. In addition, standard deviation of normal random error was specified as 0.1 and 0.2. Table 4.3.1 provides the specifications of the values of parameters in the simulation for evaluation of size. For each of 12 combinations, five thousand (5,000) random samples are generated. For the 5% nominal significance level, a simulation study with 5,000 random samples implies that 95 percent of the empirical sizes evaluated at the equivalence limits will be within 0.04396 and 0.05604 if the proposed methods can adequately control the size at the nominal level of 0.05. In addition, the specifications of parameters for investigation of power are given Table 4.3.2.

Table 4.3.3 presents the results of the empirical sizes. For the comparison between the uncorrected Kroll and the correct Kroll methods, all empirical sizes of the uncorrected Kroll method are above 0.92. On the other hand, the empirical size of the corrected Kroll method ranges from 0.0516 to 0.0780. Only 8.33% (1/12) of the empirical sizes of the corrected Kroll method are within 0.04395 and 0.05604. The reason for the extremely high empirical size of the uncorrected Kroll method is from incorrect formulation of hypothesis for proving the linearity of the analytical methods. The type I error with respect to proving the linearity is the error that the analytical method is claimed to be linear but in fact it is not. Therefore, the empirical size of the uncorrected Kroll method at the 5% nominal level should be close to 95%. On the contrary, the

empirical size of the corrected Kroll method should be close to 5% at the 5% nominal level. However, one needs to estimate the non-central parameters for non-central χ² distribution of the observed ADL. In addition, the critical value in Eq. (4.2.2) also contains an estimatorX. Therefore, both the uncorrected and corrected Kroll methods ignore the variability of the estimators in the non-central parameters and critical value.

As a result, although the empirical size of the corrected Kroll method is close to 0.05, it is still inflated. The empirical sizes of the estimation method and TOST procedure for the same specifications are also provided in Table 4.3.3. From Table 4.3.1, when the true ADL is 0.05, μPi - μCi at some solutions is either greater than 0.2 or smaller -0.2. It follows that all empirical sizes of the TOST procedure are less than 0.02. However, on the contrary, the empirical size of estimation method suggested in the approved CLSI guideline EP6-A can reach as high as 0.30 even when the differences in means between the best fitted curve and the linear regression equation are outside the margin of (-0.2, 0.2) at three of the five solutions.

For the comparison between the estimation method in the approved CLSI guideline EP6-A and TOST procedure, the empirical sizes of the TOST procedure ranges from 0.0440 to 0.0564. Only 8.33% of the empirical sizes (1/12) are not included in (0.04395, 0.05604). The one outside (0.04396, 0.05604) has the empirical size of 0.0564, which is just 0.0036 above 0.05604. However, the range of the empirical sizes of the estimation method is from 0.4930 to 0.5066. Recall that the estimation method suggested in the approved CLSI guideline EP6-A (Tholen et al., 2003) ignores the variation of the estimates of μPi - μCi.When μPi - μCi is equal to either 0.2 or -0.2 at some solutions, and then under the normal assumption, the size should be equal to 0.5 as confirmed by the empirical sizes of the simulation.

Table 4.3.1 Specifications of parameters for size (Uncorrected Kroll vs. Corrected

Table 4.3.2 Specifications of parameters for power (Uncorrected Kroll vs. Corrected Kroll and Estimation Method vs. TOST)

No. of Solution

Levels True

ADL Solution Level

True

Pi Li

μ - μ 5 0.00151 1 -0.02

2 0.01

3 0.02

4 0.01

5 -0.02

7 0.00494 1 -0.10

2 0.00

3 0.06

4 0.08

5 0.06

6 0.00

7 -0.10

Table 4.3.4 presents the results of the empirical powers. For the simulation, the true ADL is specified as 0.00151 or 0.00494 when the number of solutions is 5 or 7, respectively. Therefore, with an allowable margin of 5%, the 91.67% of the empirical powers of the uncorrected and corrected Kroll methods reach 1. On the other hand, the empirical powers of the estimation method and TOST procedures are smaller than those of the uncorrected and corrected Kroll methods. In addition, the empirical powers of the estimation method suggested in the approved CLSI guideline EP6-A (Tholen et al., 2003) and TOST procedures increase as the number of replicates increases or the standard deviation decreases. The results in Table 4.3.4 show that the empirical power of the estimation method is greater than that of TOST procedure. However, from Table 4.3.3, the uncorrected and corrected Kroll methods, and the estimation procedure fail to control the size at the nominal level. Therefore, the advantage of power by these methods comes at the expense of inflation of type I error rate. From the results of the simulation in Table 4.3.4, the power of the TOST procedure is greater than 0.9 when the standard deviation is 0.1 and number of replicates is at least 3.

4.4 Numerical Example

We consider a hypothetical experiment for evaluation of the linearity of a new analytical procedure for determination of β-HCG (β-Human Chorionic Gonadotropic, mIU/mL). The design consists of 5 dilutions with two replicates at each dilution of concentrations. Table 4.4.1 presents a set of hypothetic measurements under the design described above. For the purpose of the illustration, the allowable margin of percent bound for ADL is set as 0.05 for uncorrected and corrected Kroll’s methods.

On the other hand, the allowable limit of is set as 0.4 for the estimation method

Pi Li

μ - μ

Table 4.3.3 Results of empirical sizes (Uncorrected Kroll vs. Corrected Kroll and Sol.: Solution; Rep.: Replications; Uncorr.: Uncorrected; Corr.: Correction.

Table 4.3.4 Results of empirical powers (Uncorrected Kroll vs. Corrected Kroll and Estimation Method vs. TOST)

Kroll’s Method No. of

Sol.

No. of

Rep. SD Uncorr. Corr. EP6-A TOST 5 2 0.1 1.0000 1.0000 0.9954 0.7616

0.2 1.0000 1.0000 0.9052 0.2976

3 0.1 1.0000 1.0000 0.9998 0.9232

0.2 1.0000 1.0000 0.9470 0.4452

4 0.1 1.0000 1.0000 0.9998 0.9754

0.2 1.0000 1.0000 0.9664 0.5518

7 2 0.1 1.0000 1.0000 0.9954 0.7754

0.2 1.0000 1.0000 0.9014 0.3168

3 0.1 0.9998 0.9998 0.9994 0.9164

0.2 1.0000 1.0000 0.9450 0.4468

4 0.1 1.0000 1.0000 1.0000 0.9704

0.2 1.0000 1.0000 0.9660 0.5570

Sol.: Solution; Rep.: Replications; Uncorr. Uncorrected: Corr.:

Correction.

suggested in the approved CLSI guideline EP6-A, and for the TOST procedure.

Table4.4.2 provides the results of regression analyses for the linear, quadratic and cubic linear regression models. The results of the regression analyses presented in Table 4.4.2 demonstrates that all estimates of the regression coefficients of the cubic model are significantly different from 0 at the 5% level. (t0.025, 6 = 2.4469) In addition, the standard error of the residuals from the estimated cubic regression equation is 0.1799 that is at least 40% smaller than those from the linear or the quadratic models.

Furthermore, the coefficient of determination, R², is also above 0.99. As a result, the cubic model is the best-fitted model among the three models recommended by the approved CLSI guideline EP6-A. Figure 4.4.1 presents the fitted the cubic, linear regression equations and the means at each of the five dilution. It clearly shows that the relationship between the dilutions of concentrations and the analytical results is nonlinear and the cubic model is a better fit than the simple linear regression model.

Table 4.4.3 gives the predicted means from the cubic and linear regression models at each of the five dilutions as well as their corresponding differences, while Table 4.4.4 present summarized results of linearity by the four methods. From these differences and observed mean concentration, the observed ADL yields a value of 0.0842. With respect to the hypothesis in Eq. (2.3.2) and a margin of percent bound of 5%, the critical value in Eq. (2.3.3) is 0.0851 which is greater than the observed ADL of 0.0842, According to the decision rule of the uncorrected Kroll method, the analytical method can be concluded linear at the 5% significance level. However, it should be noted that for this example, even though the observed ADL of 0.0842 is already greater than the allowable percent bound of 0.05, the linearity of the analytical method still can be claimed by the uncorrected Kroll method. On the other hand, with respect to hypothesis in Eq. (4.2.1)

Table 4.4.1 Measurement of β-HCG (mIU/mL) Dilution Replicate1 Replicate 2

1 1.00 0.99 2 1.60 1.59 3 2.50 2.60 4 4.36 4.39 5 5.10 5.00

Table 4.4.2 Summary of results of regression analyses of β-HCG

Order Coefficient Value SE t-test

Std err Sy.x

Degrees freedom

Linear α ^' -0.354 0.234 -1.51

β 1 1.089 0.071 15.44 0.3154 8

Quadratic α ^'' 0.156 0.461 0.34

β 1 0.652 0.351 1.85

β 2 0.073 0.058 1.27 0.3041 7

Cubic α ^''' 2.263 0.626 3.62

'''

β 1 -2.308 0.818 -2.82

'''

β 2 1.202 0.304 3.96

'''

β 3 -0.125 0.034 -3.74 0.1799 6

Figure 4.4.1 Regression curves for cubicversus linear models of β-HCG

(4.2.2) is 0.0237. Since the observed ADL of 0.0842 is greater than 0.0237, we cannot reject the null hypothesis and cannot concluded the linearity of the analytical method at the 5% significance level. Unlike the uncorrected Kroll method, the conclusion of the corrected Kroll method is consistent with the evidence for which the observed ADL is 0.0842, which is greater than the allowable percent bound of 0.05.

With respect to the estimation method suggested in the approved CLSI guideline EP6-A, the observed differences in the predicted means between the cubic and linear regression models at all dilutions are within the allowable margin of ±0.4. As a result, the linearity is claimed by the estimation method. On the other hand, the results of the TOST procedure show that the 95% confidence intervals for μ^Pi - μ^Ci at the first two dilutions are not contained within (-0.4, 0.4). With respect to hypotheses in Eq. (4.1.1), the analytical method cannot be concluded linear at the 5% significance level. Because the estimation method completely ignores the variability in the observed differences in the predicted means, its conclusion is made without any statement of the probability of type I error. However, in fact, as demonstrated by the simulation, the probability of type I error of the estimation method far exceeds its nominal significance level.

4.5 Summary

With respect to the disaggregate criterion, the estimation method suggested by the approved CLSI guideline ignores the variation of the estimates of the differences in the predicted means and is not a formal statistical inference procedure. On the other hand, the procedure based on the aggregate criterion of ADL proposed by Kroll et al. (Kroll, 2000) incorrectly formulated the hypothesis for proving linearity as the null hypothesis.

As a result, the uncorrected Kroll method cannot control the type I error in

Table 4.4.3 Mean differences between the best-fitted curve and simple linear regression equation of β-HCG

Result Mean

Predicted (Linear)

Predicted

(Cubic) Difference % Difference 0.995 0.735 1.031 0.296 28.7 1.595 1.824 1.450 -0.374 25.8 2.550 2.913 2.767 -0.146 5.3 4.375 4.002 4.230 0.228 5.4 5.050 5.091 5.086 -0.005 0.1

Table 4.4.4 Results of the linearity by four different methods of β-HCG

NL: Conclusion of nonlinearity at the 5% nominal level L: Conclusion of linearity at the 5% nominal level

decision-making of conclusion for linearity. Therefore, we proposed the TOST procedure for the disaggregate criterion and the corrected Kroll method for the aggregate criterion based on ADL by formulating the hypothesis for proving linearity as the alternative hypothesis. Simulation results and the numerical example described above demonstrate that the proposed TOST and the correct Kroll method not only can adequately control the type I error rate but also reach the conclusion consistent with the

在文檔中定量體外檢驗試劑線性確校統計評估方法之研究 (頁 24-0)