CHAPTER 6 Alternative Aggregate Criterion – Sum of Square of the Deviation from
7.3 Generalized Confidence Interval of CVDL
An upper 100(1-α)th percentile GCI for CVDL can be obtained from the following Monte-Carlo algorithm:
Step 1: Choose a large simulation sample size, say K=10,000. For k equal to 1 through K, carry out the following two steps.
Step 2: Independently generate LJx1 standard normal random vectorZ and U is the same chi-square random variable with degrees of LJ-d-1.
Step 3: For the realized values of Y and S2, compute R as defined in (7.2.3). η
The required upper 100(1-α)th percentiles of the distribution of GPQ for CVDL is then
estimated by the 100(1-α)th sample percentiles of the collection of K=10,000 realizations R , η,1 R ,……..,η,2 Rη,10000.
7.4 Statistical Testing Procedure
The upper 100(1-α)% generalized confidence limit for CVDL based on GPQ can be used to test their respective statistical hypotheses in (7.1.2) for linearity. The null hypothesis in (7.1.2) is rejected and the linearity of a analytical method is concluded at the α significance level if the upper 100(1-α)% generalized confidence limit for CVDL is less than η0.
7.5 Simulation Study
A simulation study is performed to compare the empirical sizes and powers of the corrected Kroll’s and GPQ methods based on CVDL. The specifications of the simulation study are given as follows: The number of solutions (or dilutions) of different concentrations is set to be 5 or 7 and the number of replications at each concentration is 2, 3, or 4. Throughout the simulation, mean concentration μ is assumed to be 4. If following that the allowable margin of linearity based on ADL, θ0, is specified at 0.05 as recommended by Kroll et al. (Kroll, 2000). Using the relationship that CVDL = θ L× ×μ/ σ, where μ is the population mean concentrations for all solutions of the assay and θ is ADL. The allowable limit η0 is 2 and 1 for σ being 0.1 and 0.2, respectively. For each of 12 combinations, ten thousand (10,000) random samples are generated. For the 5% nominal significance level, a simulation study with 10,000 random samples implies that 95 percent of the empirical sizes evaluated at the
allowable margins will be within 0.0457 and 0.0543 if the proposed methods can adequately control the size at the nominal level of 0.05.
The results of the empirical sizes are provided in Table 7.5.1. All the empirical sizes of the GPQ method based on CVDL are within the range between 0.0457 and 0.0543, while all empirical sizes of the corrected Kroll’s method are larger than 0.0543. The simulation results reveal that the GPQ-based CVDL method can adequately control the size at the nominal level. The reason for a better performance of the GPQ-based CVDL method may be that the distribution of GPQ is free of their respective nuisance parameters. On the other hand, the corrected Kroll’s method fails to take into account the variability in estimator of the non-centrality parameter of the non-central chi-square distribution.
The results of the empirical powers are presented in Table 7.5.2. In Table 7.5.2, the true value of ADL is assumed to be 0.005 for both number of solutions of 5 and 7. The results in Table 7.5.2 also show that the empirical power of both methods is an increasing function of the number of replicates and number of solutions. In addition, the empirical power of the GPQ-based CVDL method is competitive to the corrected Kroll’s method. Although the empirical power of the corrected Kroll’s method is larger than that of the GPQ-based CVDL method, its better performance on the empirical power results from inflation of the size above the nominal level.
Figure 7.5.1 and 7.5.2 present the empirical powers of the four methods when σ are 0.1 and 0.2, respectively with number of solutions is 5, number of replicates is 3. The true values of ADL are ranged from 0 to 0.08. A comparison of Figure 7.5.1 and Figure 7.5.2 reveals that the power of both methods is a deceasing function of σ. The power curve of the GPQ-based CVDL method is uniformly lower than that of the corrected
Kroll’s method. However, the empirical power of GPQ-based CVDL method at ADL=0.05 is 0.0511 while which for corrected Kroll’s method is 0.0623. Therefore, it show that show that the GPQ-based CVDL method can control the size at the nominal level while corrected Kroll’s method cannot.
7.6 Numerical Example
The same numerical data of calcium in the previous chapters is used to illustrate the proposed testing procedures in evaluation of linearity of an analytical procedure.
Following EP6-A (Tholen et al., 2003), the criterion of μ -μPi Li for linearity is set as 0.2 mg/dL for all 5 concentrations. In this example, the allowable margin of percent bound for ADL is set as 0.05. On the other hand, the allowable limit of the GPQ-based SSDL is set as 0.2 which is calculated by square of 0.2 mg/dL multiplying 5 concentrations. We also assume that the allowable repeatability set by the manufacturer is 0.2. Therefore, the allowable margin of the GPQ-based CVDL is 1 which is equal to the allowable margin of 0.2 for SSDL divided by the product of 5 (concentrations) and square of the repeatability of 0.2, i.e., η= τ/(Lσ ) . The results of the corrected Kroll’s 2 and the GPQ-based CVDL methods are provided in Table 7.6.1. The linearity is concluded by corrected Kroll’s method since the observed ADL yields a value of 0.0146 is less than the critical value of 0.0437 with respect to a margin of percent bound of 5%.
On the other hand, the 95% upper confidence limits for CVDL methods is 1.9125. Its 95% upper confidence limits is larger than their respective allowable upper limits of 1.
As a result, the GPQ-based CVDL method can not conclude the linearity of the analytical procedure at the 5% significance level. As shown in simulation results and
Table 7.5.1 Empirical sizes (corrected Kroll’s method vs. GPQ-based CVDL method)
No. of Solutions
No. of Replicates
Standard Deviation
Corrected
Kroll GPQ-based CVDL
5 2 0.1 0.0702 0.0540
0.2 0.0763 0.0467
3 0.1 0.0623 0.0513
0.2 0.0655 0.0511
4 0.1 0.0594 0.0489
0.2 0.0595 0.0509
7 2 0.1 0.0655 0.0490
0.2 0.0635 0.0504
3 0.1 0.0592 0.0473
0.2 0.0583 0.0504
4 0.1 0.0562 0.0529
0.2 0.0571 0.0452
Table 7.5.2 Empirical powers with the true ADL=0.005 (corrected Kroll’s method vs.
GPQ-based CVDL method) No. of
Solutions
No. of Replicates
Standard Deviation
Corrected Kroll
GPQ-based CVDL
5 2 0.1 1.0000 0.9876
0.2 0.9670 0.7754
3 0.1 1.0000 0.9989
0.2 0.9965 0.9212
4 0.1 1.0000 0.9998
0.2 0.9996 0.9678
7 2 0.1 1.0000 0.9979
0.2 0.9923 0.8994
3 0.1 1.0000 0.9999
0.2 0.9996 0.9742
4 0.1 1.0000 1.0000
0.2 1.0000 0.9932
0.00 0.02 0.04 0.06 0.08
0.00.20.40.60.81.0
ADL
power
Corrected Kroll GPQ-based CVDL
Figure 7.5.1 The empirical powers when standard deviation of normal random error is 0.1, number of solutions is 5, and number of replicates is 3 (corrected Kroll’s method vs. GPQ-based CVDL method)
0.00 0.02 0.04 0.06 0.08
0.00.20.40.60.81.0
ADL
power
Corrected Kroll GPQ-based CVDL
Figure 7.5.2 The empirical powers when standard deviation of normal random error is 0.2, number of solutions is 5, and number of replicates is 3 (corrected Kroll’s method vs. GPQ-based CVDL method)
Table 7.6.1 Results of the linearity evaluation for the example of calcium by corrected Kroll’s and GPQ-based CVDL methods
Method
Sample Statistic /
Critical Value or Allowable Bound Conclusion
Sample ADL 0.0146
Corrected Kroll
Critical Value 0.0437 Linear Upper 95% C.L. 1.9125
GPQ-based CVDL
Allowable Upper Bound 1 Nonlinear 95% C.L. : Upper 95% Confidence limit
7.7 Summary
Both ADL and CVDL are the aggregate criterion for assessment of linearity in assay validation. The main difference between these two criteria is the proposed CVDL is an criterion not only contain the information of the deviations from linearity but also the repeatability of the analytical procedure. The simulation results presented above show that the corrected Kroll’s method inflates the type I error rate and the GPQ-based CVDL methods can control the size at the nominal level. In addition, the GPQ-based CVDL method also keep the good power performance. Therefore, we conclude the GPQ-based CVDL with respect to the statistical hypothesis in (7.1.2) for evaluating the linearity in assay validation is better than the corrected Kroll’s method.
Chapter 8
Discussion and Summary
Various aggregate criteria including ADL, SSDL and CVDL for evaluating the linearity in assay validation were introduced in Chapter 2 to 7. Although these criteria are formulated by different components which provide the different characteristics, however, the common feature of these criteria is that all of them contain the sum of square for the deviations from linearity as the major component. In this chapter, we discuss the relationship among these criteria. In addition, the results of the simulation study and numerical example are used to compare the performances and characteristics of each aggregate criterion for the assessment of linearity in assay validation.
8.1 Relationship of the Aggregate Criteria
Recall the definition for ADL, SSDL and CVDL are defined in the previous chapters as follows:
where μPi and μLi are the predicted mean of the best-fit polynomial model and linear regression model at ith concentration with i = 1,.., L, respectively; μ is the population mean concentration for all solutions of the assay and σ is the variance of residual 2 under the best-fitted model. It can be found that the SSDL is the common component for each criterion. Their relationship can easily be constructed as the following:
τ = L(μθ)2 = L(ησ)2 ( 8 . 1 . 1 ) Unlike that SSDL is the unscaled deviation defined as the pure sum of square of the deviations from the linearity, both CVDL and ADL are the scaled deviations. ADL is the square root of the average sum of squares of the scaled deviations by μ, while CVDL is scaled by the variability or repeatability of the best-fitted model.
8.2 Comparison by Simulation Study
A simulation studies was employed to compare the empirical sizes and powers among three GPQ-based ADL, SSDL and CVDL methods. Parts of the results from the same simulation study were presented in Chapter 5, 7 for comparing the performance of the corrected Kroll’s method with GPQ-based ADL and GPQ-based CVDL methods, respectively. As described in Chapter 5 and 7, the specifications of the simulation study are given as follows: The number of solutions (or dilutions) of different concentrations is set to be 5 or 7 and the number of replications at each concentration is 2, 3, or 4.
Throughout the simulation, mean concentration μ is assumed to be 4. The allowable margin of linearity based on ADL, θ0, is specified at 0.05. From the relationship of τ = L(μθ)2 = L(ησ)2 in Eq. (8.1.1), it follows that the margin for SSDL are 0.2 and 0.28 for 5 and 7 concentrations, respectively. In addition, under the specification of standard
deviation of normal random error is specified as 0.1 and 0.2, the allowable limit η0 is 2 and 1 for σ being 0.1 and 0.2, respectively. For each of 12 combinations, ten thousand (10,000) random samples are generated. For the 5% nominal significance level, a simulation study with 10,000 random samples implies that 95 percent of the empirical sizes evaluated at the allowable margins will be within 0.0457 and 0.0543 if the proposed methods can adequately control the size at the nominal level of 0.05.
The results of the empirical sizes are provided in Table 8.2.1. All of empirical sizes of the GPQ methods based on ADL, SSDL and CVDL are within the range between 0.0457 and 0.0543. The simulation results reveal that the GPQ-based methods for SSDL, ADL, and CVDL can adequately control the size at the nominal level. On the other hand, according to the empirical powers of three GPQ-based methods presented in Table 8.2.2, the empirical power of the GPQ-based ADL method is larger than that of the GPQ-based SSDL which is in turn larger than that of the GPQ-based CVDL method.
Figure 8.2.1 and 8.2.2 present the empirical powers of the three GPQ-based methods when σare 0.1 and 0.2, respectively with number of solutions is 5, number of replicates is 3. For Figure 8.2.1, when the ADL = 0.05, the empirical size for the three GPQ-based methods are 0.0499, 0.0513, and 0.0502 for SSDL, CVDL, and ADL respectively.
Similar findings are observed in Figure 8.2.2. Again these results show that the three GPQ-based procedures can adequately control the size at the nominal level of 5%.
Moreover, Both figures demonstrate that the GPQ-based ADL procedure is uniformly more powerful than the GPQ-based SSDL method which is in turn uniformly more powerful than the GPQ-based CVDL method. For example, in Figure 8.2.1 when ADL is 0.03, the empirical powers are 0.9426, 0.7803, and 0.5337, respectively for the GPQ-based ADL, SSDL, and CVDL methods. In other words, the GPQ-based ADL
Table 8.2.1 Empirical sizes (GPQ-based SSDL vs. GPQ-based CVDL vs. GPQ-based ADL methods)
No. of Solutions
No. of Replicates
Standard
Deviation GPQ-basedSSDL
GPQ-based CVDL
GPQ-based ADL
5 2 0.1 0.0462 0.0540 0.0467
0.2 0.0523 0.0467 0.0517
3 0.1 0.0499 0.0513 0.0502
0.2 0.0522 0.0511 0.0517
4 0.1 0.0498 0.0489 0.0505
0.2 0.0504 0.0509 0.0508
7 2 0.1 0.0504 0.0490 0.0501
0.2 0.0495 0.0504 0.0494
3 0.1 0.0505 0.0473 0.0509
0.2 0.0495 0.0504 0.0498
4 0.1 0.0498 0.0529 0.0498
0.2 0.0498 0.0452 0.0510
Table 8.2.2 Empirical powers with the true ADL=0.005 (GPQ-based SSDL vs.
GPQ-based CVDL vs. GPQ-based ADL methods) No. of
Solutions
No. of Replicates
Standard Deviation
GPQ-based SSDL
GPQ-based
CVDL GPQ-based ADL
5 2 0.1 0.9995 0.9876 1.0000
0.2 0.6976 0.7754 0.9331
3 0.1 1.0000 0.9989 1.0000
0.2 0.9326 0.9212 0.9942
4 0.1 1.0000 0.9998 1.0000
0.2 0.9814 0.9678 0.9995
7 2 0.1 1.0000 0.9979 1.0000
0.2 0.9123 0.8994 0.9888
3 0.1 1.0000 0.9999 1.0000
0.2 0.9850 0.9742 0.9994
4 0.1 1.0000 1.0000 1.0000
0.2 0.9981 0.9932 1.0000
0.00 0.02 0.04 0.06 0.08
0.00.20.40.60.81.0
ADL
power
GPQ-based SSDL GPQ-based CVDL GPQ-based ADL
Figure 8.2.1 The empirical powers when standard deviation of normal random error is 0.1, number of solutions is 5, and number of replicates is 3 (GPQ-based SSDL vs. GPQ-based CVDL vs. GPQ-based ADL methods)
0.00 0.02 0.04 0.06 0.08
0.00.20.40.60.81.0
ADL
power
GPQ-based SSDL GPQ-based CVDL GPQ-based ADL
Figure 8.2.2 The empirical powers when standard deviation of normal random error is 0.2, number of solutions is 5, and number of replicates is 3 (GPQ-based SSDL vs. GPQ-based CVDL vs. GPQ-based ADL methods)
procedure is 40% more powerful than the GPQ-based CVDL method and is 16% more powerful than the GPQ-based SSDL method at the ADL of 0.03. Therefore, the improvement of the power provided by the GPQ-based ADL method is impressively substantial.
8.3 Numerical Example
The previous example of calcium is used to illustrate the proposed testing procedures in evaluation of linearity of an analytical procedure. Under the criteria of μ -μPi Li for linearity and repeatability are 0.2mg/dL and 0.2mg/dL, respectively, and the allowable margin of percent bound for ADL is set as 0.05, the corresponding criteria and results of three GPQ-based ADL, SSDL and CVDL methods are presented in Table 8.3.1. The results show that the 95% upper confidence limit for the ADL computed by the GPQ method is 0.0218 which is smaller than the allowable upper limit of 0.05. Hence, the linearity of the analytical procedure can be concluded at the 5% significance level by the GPQ-based ADL procedure. On the other hand, the 95% upper confidence limits for SSDL of the GPQ-based SSDL and CVDL methods are 0.2471 and 1.9125, respectively.
Both 95% upper confidence limits are larger than their respective allowable upper limits of 0.2 and 1. As a result, both methods can not conclude the linearity of the analytical procedure at the 5% significance level. The results presented above show the different conclusions between the GPQ-based methods. As shown in simulation results, all three GPQ-based methods can control the size at the nominal size of 0.05, the GPQ-based ADL method is uniformly more powerful than the other two GPQ-based methods.
This might be one of the reasons why the linearity can be claimed by the GPQ-based
Table 8.3.1 Results of the linearity evaluation by three different methods
Method
Sample Statistic /
Critical Value or Allowable Bound Conclusion
Upper 95% C.L. 0.2471 GPQ-based SSDL
Allowable Upper Bound 0.2 Nonlinear Upper 95% C.L. 1.9125
GPQ-based CVDL
Allowable Upper Bound 1 Nolinear Upper 95% C.L. 0.0218
GPQ-based ADL
Allowable Upper Bound 0.05 Linear 95% C.L. : Upper 95% Confidence limit
ADL method.
8.4 Summary
In this chapter, we discuss the relationship among three different aggregate criteria of ADL, SSDL and CVDL. As we mentioned in Section 8.1, the SSDL, i.e., the sum of square of the deviation from the linearity is the basis of three criteria. On the other hand, ADL and CVDL are the scale measures scaled by the average concentration and repeatability, respectively. As the demonstrated by the simulation results, all three GPQ-based ADL, SSDL, CVDL methods can control the size at the nominal level.
Moreover, simulation results reveal that the GPQ-based ADL procedure is uniformly more powerful than the GPQ-based SSDL and CVDL methods. In addition, CVDL method is the most conservative procedure among all three GPQ-methods. This may be due to the reason that it is scaled by the repeatability and it requires both the predicted means and repeatability of the best-fitted model to meet the allowable limits. On the other hand, the GPQ-based ADL procedure not only adequately control the type I error rate but also is uniformly more powerful than the other GPQ-based method. Therefore, the GPQ-based ADL procedure will be recommended to be the better procedure for evaluating the linearity in assay validation among the three GPQ-based methods.
However, as the GPQ-based CVDL procedure considers linearity and repeatability simultaneously in one measure, one may consider using CVDL as the criterion for assay validation if he/she would like to evaluate accuracy and reliability simultaneously.
Chapter 9
Concluding Remarks
9.1 Conclusion
One of the most important characteristics for evaluation of accuracy and precision in assay validation is linearity. Even though the best-fitted model is not linear, linearity of the analytical procedure can still be claimed if the difference in the predicted means between the best-fitted and linear models is smaller than some pre-specified allowable limit at all concentrations employed in the validation experiment. As a result, the deviation from linearity is the fundamental unit for assessment of bias for evaluation of linearity.
With respect to the disaggregate criterion, the approved CLSI EP6-A guideline proposes the estimation method by comparing the estimates of the differences in the predicted means with the pre-specified allowable limit directly without the formal statistical inference procedure. The method completely ignores the variation of the estimate and inflates the type I error of the results of the evaluation. On the hand, the ADL proposed by Kroll et al. (Kroll, 2000) is an aggregate criterion constructed by the sum of square of from the deviations from linearity scaled by the mean concentrations.
However, the statistical testing procedure proposed by Kroll et al. (Kroll, 2000) not only incorrectly formulates the hypothesis for proving linearity but also contained the unknown nuisance parameters in the distribution of ADL which causes the problem for
controlling the size at the nominal level. Therefore, we propose the TOST procedure and corrected Kroll’s method to improve the shortcomings of the above two methods by providing the formal statistical testing procedure instead of the estimation method and reformulating the correct hypothesis for the uncorrected Korll’s method, respectively.
The simulation results show the proposed methods can control the size better than the two current methods. On the other hand, to overcome the issue raised by the unknown nuisance parameters of the distribution of ADL, we propose the GPQ-based ADL method for eliminating the unknown parameter in the distribution by applying the concept of generalized confidence interval proposed by Weerahadi (Weerahandi, 1993).
The proposed GPQ method not only can control the size at the nominal level better than the corrected Kroll’s method but also keep the good performance of the power for assessment of linearity in assay validation.
In addition to ADL proposed by Kroll et al. (Kroll, 2000), we also introduce two new alternative criteria SSDL and CVDL. SSSL is an un-scaled measure which is formulate by the sums of the square of the deviation from linearity, while CVDL is a scaled measure which is scaled by the variability of the best-fitted model for assessment of linearity. The major difference of CVDL with other two aggregate criteria is that CVDL considers both accuracy and reliability with respect to an analytical method into one measure simultaneously. With respect to SSDL, one may consider the following test statistic for evaluating linearity using F-test:
Under the null hypothesis of hypothesis (3.2.2.2), i.e., L Pi Li 2 02
i=1
(μ - μ ) =Lδ
∑
, ψ isdistributed as an non-central Fd-1,LJ-d-1 distribution with non-centrality parameter of
2 0 2
LJδ
σ . However, there is still unknown parameter σ in the non-centrality parameter of 2 the distribution. If the statistical testing is performed based on ψ with non-central
d-1,LJ-d-1
F distribution by substituting σ using its estimates, the type-I error may still 2
F distribution by substituting σ using its estimates, the type-I error may still 2