CHAPTER 4 TOST Procedure and Corrected Kroll’s Method
4.4 Numerical Example
We consider a hypothetical experiment for evaluation of the linearity of a new analytical procedure for determination of β-HCG (β-Human Chorionic Gonadotropic, mIU/mL). The design consists of 5 dilutions with two replicates at each dilution of concentrations. Table 4.4.1 presents a set of hypothetic measurements under the design described above. For the purpose of the illustration, the allowable margin of percent bound for ADL is set as 0.05 for uncorrected and corrected Kroll’s methods.
On the other hand, the allowable limit of is set as 0.4 for the estimation method
Pi Li
μ - μ
Table 4.3.3 Results of empirical sizes (Uncorrected Kroll vs. Corrected Kroll and Sol.: Solution; Rep.: Replications; Uncorr.: Uncorrected; Corr.: Correction.
Table 4.3.4 Results of empirical powers (Uncorrected Kroll vs. Corrected Kroll and Estimation Method vs. TOST)
Kroll’s Method No. of
Sol.
No. of
Rep. SD Uncorr. Corr. EP6-A TOST 5 2 0.1 1.0000 1.0000 0.9954 0.7616
0.2 1.0000 1.0000 0.9052 0.2976
3 0.1 1.0000 1.0000 0.9998 0.9232
0.2 1.0000 1.0000 0.9470 0.4452
4 0.1 1.0000 1.0000 0.9998 0.9754
0.2 1.0000 1.0000 0.9664 0.5518
7 2 0.1 1.0000 1.0000 0.9954 0.7754
0.2 1.0000 1.0000 0.9014 0.3168
3 0.1 0.9998 0.9998 0.9994 0.9164
0.2 1.0000 1.0000 0.9450 0.4468
4 0.1 1.0000 1.0000 1.0000 0.9704
0.2 1.0000 1.0000 0.9660 0.5570
Sol.: Solution; Rep.: Replications; Uncorr. Uncorrected: Corr.:
Correction.
suggested in the approved CLSI guideline EP6-A, and for the TOST procedure.
Table4.4.2 provides the results of regression analyses for the linear, quadratic and cubic linear regression models. The results of the regression analyses presented in Table 4.4.2 demonstrates that all estimates of the regression coefficients of the cubic model are significantly different from 0 at the 5% level. (t0.025, 6 = 2.4469) In addition, the standard error of the residuals from the estimated cubic regression equation is 0.1799 that is at least 40% smaller than those from the linear or the quadratic models.
Furthermore, the coefficient of determination, R2, is also above 0.99. As a result, the cubic model is the best-fitted model among the three models recommended by the approved CLSI guideline EP6-A. Figure 4.4.1 presents the fitted the cubic, linear regression equations and the means at each of the five dilution. It clearly shows that the relationship between the dilutions of concentrations and the analytical results is nonlinear and the cubic model is a better fit than the simple linear regression model.
Table 4.4.3 gives the predicted means from the cubic and linear regression models at each of the five dilutions as well as their corresponding differences, while Table 4.4.4 present summarized results of linearity by the four methods. From these differences and observed mean concentration, the observed ADL yields a value of 0.0842. With respect to the hypothesis in Eq. (2.3.2) and a margin of percent bound of 5%, the critical value in Eq. (2.3.3) is 0.0851 which is greater than the observed ADL of 0.0842, According to the decision rule of the uncorrected Kroll method, the analytical method can be concluded linear at the 5% significance level. However, it should be noted that for this example, even though the observed ADL of 0.0842 is already greater than the allowable percent bound of 0.05, the linearity of the analytical method still can be claimed by the uncorrected Kroll method. On the other hand, with respect to hypothesis in Eq. (4.2.1)
Table 4.4.1 Measurement of β-HCG (mIU/mL) Dilution Replicate1 Replicate 2
1 1.00 0.99 2 1.60 1.59 3 2.50 2.60 4 4.36 4.39 5 5.10 5.00
Table 4.4.2 Summary of results of regression analyses of β-HCG
Order Coefficient Value SE t-test
Std err Sy.x
Degrees freedom
Linear α ' -0.354 0.234 -1.51
'
β 1 1.089 0.071 15.44 0.3154 8
Quadratic α '' 0.156 0.461 0.34
''
β 1 0.652 0.351 1.85
''
β 2 0.073 0.058 1.27 0.3041 7
Cubic α ''' 2.263 0.626 3.62
'''
β 1 -2.308 0.818 -2.82
'''
β 2 1.202 0.304 3.96
'''
β 3 -0.125 0.034 -3.74 0.1799 6
Figure 4.4.1 Regression curves for cubicversus linear models of β-HCG
(4.2.2) is 0.0237. Since the observed ADL of 0.0842 is greater than 0.0237, we cannot reject the null hypothesis and cannot concluded the linearity of the analytical method at the 5% significance level. Unlike the uncorrected Kroll method, the conclusion of the corrected Kroll method is consistent with the evidence for which the observed ADL is 0.0842, which is greater than the allowable percent bound of 0.05.
With respect to the estimation method suggested in the approved CLSI guideline EP6-A, the observed differences in the predicted means between the cubic and linear regression models at all dilutions are within the allowable margin of ±0.4. As a result, the linearity is claimed by the estimation method. On the other hand, the results of the TOST procedure show that the 95% confidence intervals for μPi - μCi at the first two dilutions are not contained within (-0.4, 0.4). With respect to hypotheses in Eq. (4.1.1), the analytical method cannot be concluded linear at the 5% significance level. Because the estimation method completely ignores the variability in the observed differences in the predicted means, its conclusion is made without any statement of the probability of type I error. However, in fact, as demonstrated by the simulation, the probability of type I error of the estimation method far exceeds its nominal significance level.
4.5 Summary
With respect to the disaggregate criterion, the estimation method suggested by the approved CLSI guideline ignores the variation of the estimates of the differences in the predicted means and is not a formal statistical inference procedure. On the other hand, the procedure based on the aggregate criterion of ADL proposed by Kroll et al. (Kroll, 2000) incorrectly formulated the hypothesis for proving linearity as the null hypothesis.
As a result, the uncorrected Kroll method cannot control the type I error in
Table 4.4.3 Mean differences between the best-fitted curve and simple linear regression equation of β-HCG
Result Mean
Predicted (Linear)
Predicted
(Cubic) Difference % Difference 0.995 0.735 1.031 0.296 28.7 1.595 1.824 1.450 -0.374 25.8 2.550 2.913 2.767 -0.146 5.3 4.375 4.002 4.230 0.228 5.4 5.050 5.091 5.086 -0.005 0.1
Table 4.4.4 Results of the linearity by four different methods of β-HCG
NL: Conclusion of nonlinearity at the 5% nominal level L: Conclusion of linearity at the 5% nominal level
decision-making of conclusion for linearity. Therefore, we proposed the TOST procedure for the disaggregate criterion and the corrected Kroll method for the aggregate criterion based on ADL by formulating the hypothesis for proving linearity as the alternative hypothesis. Simulation results and the numerical example described above demonstrate that the proposed TOST and the correct Kroll method not only can adequately control the type I error rate but also reach the conclusion consistent with the data.
Since TOST procedure is constructed based on a disaggregate criterion which requires all differences in the predicted means between the best-fitting and linear models be within the pre-specified allowable limit, the method is more conservative than the corrected Kroll’s method which is based on an aggregate criterion and only requires ADL, a function of standardized sum of squares of the differences in the predicted means between the best-fitted and linear models to be controlled within the pre-specified allowable percent bound, However, as mentioned before, the inference based on ADL involves the estimation of the unknown non-centrality parameter and the average population mean concentration. When these estimates are assumed fixed constants for the inference based on ADL, the simulation study shows that the empirical size can be inflated up to 0.078 at the 0.05 significance level. In the next chapter, we will propose GPQ-based ADL statistical testing procedure to overcome the issue of the unknown parameter of the distribution of ADL.
Chapter 5
General Pivotal Quantity Approach of ADL
In Chapter 4, we introduced the corrected Kroll’s method which reformulates the inappropriate statistical hypothesis of the uncorrected Kroll’s method. However, as we observed in the simulation results of the proposed corrected Kroll’s method, the type I error still inflates up due to variability in estimation of unknown non-centrality parameter of the chi-square distribution. To solve this issue, in this chapter we propose an alternative statistical testing procedure based on ADL by applying the generalized pivotal quantity approach introduced by Tsui and Weerahandi (Weerahandi, 1993).
5.1 General Pivotal Quantity (GPQ)
Weerahandi (Tsui and Weerahandi, 1989) used a generalized p-value for comparing parameters of two regressions with unequal variances. Motivated by that application, Tsui and Weerahandi (Tsui and Weerahandi, 1989) gave the explicit definition of generalized p-values, and showed that it is an exact probability of a extreme region.
Their proposed method has been successfully used to provide small sample solution for many hypothesis testing problems when nuisance parameters are present and frequentist testing procedures are difficult to obtain, even nonexistent. Furthermore, Weerahandi
(Weerahandi, 1993) extended the concept of generalized p-values, and presented the generalized confidence interval (GCI) to construct an exact interval estimation.
Suppose that V is a random variable whose distribution depends on a vector of unknown parameters ζ=(θ, η), where θ is a parameter of interest and η is a vector of nuisance parameter. Let V be a random sample from V and v be the observed value of V.
Also let R=R(V; v, ζ) be a function of V, v and ζ. The random quantity R is said to be a GPQ if satisfies the following two conditions:
(a) The distribution of R does not depend on any unknown parameters.
(b) The observed value of R, say r= R(v; v, ζ), is free of the vector of nuisance parameters η. In other words, the value of R at V = v should be a function only of (v, θ).
Specifically, if the observed quantity r = θ, then the GPQ is called the fiduical generalized pivotal quantity (FGPQ) and generalized confidence interval (GCI) based on FGPQ are proven to have asymptotically correct frequent coverage probability in Hanning et al (Hanning et al., 2006). In consequence, an upper 100(1-α)th percentile GCI for θ is given by
R1
−α , where R1
−α are the 100(1-α)th percentile of the distribution of R. The percentile of R can be estimated using Monte-Carlo algorithms.
5.2 Generalized Pivotal Quantity of ADL
Following the regression models in Eq. (2.1.1), we adopt the same expression of its matrix form in Chapter 4 as follows:
the LJx1 vector of observations,
= the LJx1 predicted mean vector of best - fitte =
= the LJx1 predicted mean vector of linear model, μL
where 1 is LJx1 vector of 1s, X =(Xi), X2=(X ), and X2i 3=(X ). L and J are the number 3i of concentrations and number of replicates, respectively.
We have YˆP =WPY and YˆL =WLY as the LS estimators of the predicted mean
where S is the residual mean square obtained from the best-fitted polynomial model 2 with degree of freedom LJ-d-1. Under the assumption that random errors in the above regression model are identically and independently distributed as normal distribution with mean of zero and variance of σ , 2 Yˆ −P YˆL is distributed as a multinormal distribution with mean μP−μL and variance Σ which is equal toσ2WW' . In addition, Y can be expressed as 1 Y′ /LJ which is distributed as an univariate normal distribution with mean μ and variance σ /LJ . 2
It is easy to verify that the estimators WY, S and Y are associated with pivotal 2 quantities Z , and Z which are independent with the following distributions:
( )
where matrix Λ denotes the positive definite square root of a positive definite 1/ 2 matrix Λ and Λ−1/ 2 =(Λ1/ 2)−1 . N 0 I , LJ( , ) χ2LJ-d-1 and N
( )
0,1 denote the multivariate standard normal distribution with LJ×1 random vector, the chi-square random variable with LJ-d-1 degrees and univariate standard normal distribution, respectively.Recall that the definition of ADL denoted by θ as the following:
L 2
(
LJ-d-1 s)
2 1/ 2R has distribution that is free of parameters and thus does not depend on any unknown parameters. When Y and S are substituted by their 2 observed values y and s in (5.2.5), the observed value of 2 −
( )
2which is equal to μ and free of the nuisance parameters. Hence, it fulfills the requirements of (a) and (b) for being a GPQ for μ .
5.3 Generalized Confidence Interval of ADL
An upper 100(1-α)th percentile GCI for ADL can be obtained from the following
Monte-Carlo algorithm:
Step 1: Choose a large simulation sample size, say K=10,000. For k equal to 1 through K, carry out the following two steps.
Step 2: Generate LRx1 standard normal random vector Z , univariate standard normal variable Z , and central chi-square random variables μ U with degree of freedom LJ-d-1.
Step 3: For the realized values of Y and S , compute 2 Rθ,k defined in (5.2.9).
The required upper 100(1-α)th percentiles of the distribution of GPQ for ADL, which is also the upper 100(1-α)th generalized confidence limit for ADL, is then estimated by the 100(1-α)th sample percentiles of the collection of K=10,000 realizations Rθ,1 ,
Rθ,2…….., Rθ,10000.
5.4 Statistical Testing Procedure
With respect to the hypothesis of H0: θ ≥ θ0 vs. Ha: θ < θ0 based on the ADL, the upper 100(1-α)% generalized confidence limit for ADL based on GPQ can be used to test the statistical hypothesis for assessment of linearity. The null hypothesis is rejected and the linearity of a analytical method is concluded at the α significance level if the upper 100(1-α)% generalized confidence limit for ADL is less than θ0.
5.5 Simulation Study
A simulation study is performed to compare the empirical sizes and powers of the corrected Kroll’s and GPQ-based ADL methods. The specifications of the simulation
concentrations is set to be 5 or 7 and the number of replications at each concentration is 2, 3, or 4. Throughout the simulation, mean concentration μ is assumed to be 4. If follows that the allowable margin of linearity based on ADL, θ0, is specified as 0.05 as recommended by Kroll et al. (Kroll, 2000). For each of 12 combinations, ten thousand (10,000) random samples are generated. For the 5% nominal significance level, a simulation study with 10,000 random samples implies that 95 percent of the empirical sizes evaluated at the allowable margins will be within 0.0457 and 0.0543 if the proposed methods can adequately control the size at the nominal level of 0.05.
The results of the empirical sizes are provided in Table 4.5.1. All empirical sizes of the corrected Kroll’s method are larger than 0.0543. This indicates that the corrected Kroll’s method inflates the size and is quite liberal in concluding the linearity of an analytical procedure. On the other hand, all of empirical sizes of the GPQ methods based on ADL are within the range between 0.0457 and 0.0543. The simulation results reveal that the GPQ-based methods for ADL can adequately control the size at the nominal level. The reason for a better performance of the GPQ-based methods for ADL may be that the distributions of GPQs are free of their respective nuisance parameters.
On the other hand, the corrected Kroll’s method fails to take into account the variability in estimator of the non-centrality parameter of the non-central chi-square distribution.
The results of the empirical powers are presented in Table 5.5.2. In Table 5.5.2, the true value of ADL is assumed to be 0.005 for both number of solutions of 5 and 7. The results in Table 5.5.2 also show that the empirical power is an increasing function of the number of replicates and number of solutions. Although the empirical power of the corrected Kroll’s method is larger than the GPQ-based ADL methods, its better performance on the empirical power results from inflation of the size above the nominal
Table 5.5.1 Empirical sizes (Corrected Kroll’s method vs. GPQ-based ADL method) No. of
Solutions
No. of Replicates
Standard Deviation
Corrected
Kroll GPQ-based ADL
5 2 0.1 0.0702 0.0467
0.2 0.0763 0.0517
3 0.1 0.0623 0.0502
0.2 0.0655 0.0517
4 0.1 0.0594 0.0505
0.2 0.0595 0.0508
7 2 0.1 0.0655 0.0501
0.2 0.0635 0.0494
3 0.1 0.0592 0.0509
0.2 0.0583 0.0498
4 0.1 0.0562 0.0498
0.2 0.0571 0.0510
Table 5.5.2 Empirical powers with the true ADL=0.005 (Corrected Kroll’s method vs.
GPQ-based ADL method) No. of
Solutions
No. of Replicates
Standard Deviation
Corrected
Kroll GPQ-based ADL
5 2 0.1 1.0000 1.0000
0.2 0.9670 0.9331
3 0.1 1.0000 1.0000
0.2 0.9965 0.9942
4 0.1 1.0000 1.0000
0.2 0.9996 0.9995
7 2 0.1 1.0000 1.0000
0.2 0.9923 0.9888
3 0.1 1.0000 1.0000
0.2 0.9996 0.9994
4 0.1 1.0000 1.0000
0.2 1.0000 1.0000
level. Figure 5.5.1 and 5.5.2 present the empirical powers when σ are 0.1 and 0.2, respectively with number of solutions is 5, number of replicates is 3. The true values of ADL are ranged from 0 to 0.08. A comparison of Figure 5.5.1 and Figure 5.5.2 reveals that the power of both methods is a deceasing function of σ. In Figure 5.5.1, when the ADL = 0.05, the empirical size for the corrected Kroll’s and the GPQ-based methods are 0.0623 and 0.0502 for ADL respectively. Similar findings are observed in Figure 5.5.2. Again these results show that the corrected Kroll’s method inflate the size above the 0.05 level while the GPQ-based procedure can adequately control the size at the nominal level of 5%.
5.6 Numerical Example
Table 5.6.1 presents the duplicate determinations at the first five concentrations given in Example 2 of CLSI guideline EP6-A (Tholen et al., 2003) to illustrate the proposed testing procedures in evaluation of linearity of an analytical procedure. Following EP6-A (Tholen et al., 2003), the criterion of μ -μPi Li for linearity is set as 0.2 mg/dL for all 5 concentrations. In this example, the allowable margin of percent bound for ADL is set as 0.05 for all methods based on ADL as suggested by Kroll, et al. (Kroll, 2000). The results of regression analyses for the linear, quadratic and cubic linear regression models are given in Table 5.6.2. From Table 5.6.2, the estimates of the regression coefficient β of the quadratic model are statistically significantly different ''2 from 0 at the 5% level (t0.025, 7 = 2.4469) while none of them is significantly different from 0 for the cubic model. In addition, the standard error of the residuals from the estimated quadratic regression equation is 0.124 which is smaller than the 0.2 set by the
0.00 0.02 0.04 0.06 0.08
0.00.20.40.60.81.0
ADL
power
Corrected Kroll GPQ-based ADL
Figure 5.5.1 Empirical powers when standard deviation of normal random error is 0.1, number of solutions is 5, and number of replicates is 3 (Corrected Kroll’s method vs. GPQ-based ADL method)
0.00 0.02 0.04 0.06 0.08
0.00.20.40.60.81.0
ADL
power
Corrected Kroll GPQ-based ADL
Figure 5.5.2 Empirical powers when standard deviation of normal random error is 0.2, number of solutions is 5, and number of replicates is 3 (Corrected Kroll’s method vs. GPQ-based ADL method)
manufacturer. Furthermore, R2 is also above 0.99. As a result, the quadratic model is the best-fitted model among the three models recommended by the approved CLSI guideline EP6-A (Tholen et al., 2003).
The observed predicted means from the quadratic and linear regression models at each of the five dilutions as well as their corresponding differences are given in Table 5.6.3. The results of the corrected Kroll’s and the GPQ-based ADL methods are provided in Table 5.6.4. From the differences in the observed predicted means between the quadratic and linear regression models and the observed mean concentrations, the observed ADL yields a value of 0.0146. With respect to a margin of percent bound of 5%, the critical value is 0.0437 which is greater than the observed ADL of 0.0146, According to the decision rule of the corrected Kroll method, the analytical method can be concluded linear at the 5% significance level. The 95% upper confidence limit for the ADL computed by the GPQ-based ADL method is 0.0218 which is smaller than the allowable upper limit of 0.05. Hence, the linearity of the analytical procedure can be concluded at the 5% significance level by the GPQ-based ADL procedure.
5.7 Summary
The ADL proposed by Kroll et al. (Kroll, 2000) is an aggregate criterion constructed from the deviations from linearity scaled by the mean concentrations. However, the sampling distribution of the observed ADL involves unknown nuisance parameters μ and σ. On the other hand, the observed values of GPQs are free of the nuisance parameters. As a result, we apply the GPQ method to the inference of evaluation of linearity based ADL. The simulation results presented above show that the corrected Kroll’s method inflates the type I error rate and the GPQ-based ADL method can control
Table 5.6.1 Measurement of calcium (mg/dL) Dilution Replicate 1 Replicate 2
1 4.7 4.6 2 7.8 7.6 3 10.4 10.2 4 13.0 13.1 5 15.5 15.3 Source : The approved CLSI guideline EP6-A (2003)
Table 5.6.2 Summary of results of regression analyses for the example of calcium
Order Coefficient LS
Estimates SE t-test
SE Sy.x
Degrees freedom
Linear α ' 2.16 0.15 14.3
'
β 1 2.68 0.05 59.0 0.204 8
Quadratic α '' 1.54 0.19 8.2
''
β 1 3.22 0.14 22.4
''
β 2 -0.09 0.02 -3.8 0.124 7
Cubic α ''' 1.47 0.47 3.15
'''
β 1 3.32 0.61 5.45
'''
β 2 -0.13 0.23 -0.56
'''
β 3 0.004 0.02 0.17 0.134 6
Source : The approved CLSI guideline EP6-A (2003)
the size at the nominal level. On the other hand, the GPQ-based ADL procedure not only adequately control the type I error rate but also has the similar performance of the power as the corrected Kroll’s method. Therefore, we conclude the GPQ-based ADL procedure is better than the correct Kroll’s method for evaluating the linearity in assay validation.
Table 5.6.3 Mean differences between the best-fitted curve and simple linear regression equation for the example of calcium
Result Mean
Predicted (Linear)
Predicted
(Quadratic) Difference % Difference 4.65 4.85 4.67 -0.18 -3.9 7.70 7.54 7.62 0.08 1.0 10.30 10.22 10.40 0.18 1.8 13.05 12.90 12.99 0.09 0.7 15.40 15.59 15.41 -0.18 -1.2 Source : The approved CLSI guideline EP6-A (2003)
Table 5.6.4 Results of the linearity evaluation for the example of calcium by corrected Kroll’s and GPQ-based ADL methods
Method
Sample Statistic /
Critical Value or Allowable Bound Conclusion
Sample ADL 0.0146
Corrected Kroll
Critical Value 0.0437 Linear Upper 95% C.L. 0.0218
GPQ-based ADL
Allowable Upper Bound 0.05 Linear 95% C.L. : Upper 95% Confidence limit
Chapter 6
Alternative Aggregate Criterion - Sum of Square of the Deviation from Linearity (SSDL)
In this chapter, we propose a new measure of the assessment of linearity named Sum of Square of the Deviation from Linearity (SSDL). As mentioned in Section 3.2.1 of Chapter 3, SSDL is formulated directly by the nature of disaggregate criterion proposed by CLSI guideline as the form of model-by-dilution interaction. However, its corresponding statistical hypothesis and testing procedure is not to detect existence of the model-by-dilution interaction but rather to verify whether the model-by-dilution
In this chapter, we propose a new measure of the assessment of linearity named Sum of Square of the Deviation from Linearity (SSDL). As mentioned in Section 3.2.1 of Chapter 3, SSDL is formulated directly by the nature of disaggregate criterion proposed by CLSI guideline as the form of model-by-dilution interaction. However, its corresponding statistical hypothesis and testing procedure is not to detect existence of the model-by-dilution interaction but rather to verify whether the model-by-dilution