CHAPTER 9 Concluding Remarks
9.1 Conclusion
One of the most important characteristics for evaluation of accuracy and precision in assay validation is linearity. Even though the best-fitted model is not linear, linearity of the analytical procedure can still be claimed if the difference in the predicted means between the best-fitted and linear models is smaller than some pre-specified allowable limit at all concentrations employed in the validation experiment. As a result, the deviation from linearity is the fundamental unit for assessment of bias for evaluation of linearity.
With respect to the disaggregate criterion, the approved CLSI EP6-A guideline proposes the estimation method by comparing the estimates of the differences in the predicted means with the pre-specified allowable limit directly without the formal statistical inference procedure. The method completely ignores the variation of the estimate and inflates the type I error of the results of the evaluation. On the hand, the ADL proposed by Kroll et al. (Kroll, 2000) is an aggregate criterion constructed by the sum of square of from the deviations from linearity scaled by the mean concentrations.
However, the statistical testing procedure proposed by Kroll et al. (Kroll, 2000) not only incorrectly formulates the hypothesis for proving linearity but also contained the unknown nuisance parameters in the distribution of ADL which causes the problem for
controlling the size at the nominal level. Therefore, we propose the TOST procedure and corrected Kroll’s method to improve the shortcomings of the above two methods by providing the formal statistical testing procedure instead of the estimation method and reformulating the correct hypothesis for the uncorrected Korll’s method, respectively.
The simulation results show the proposed methods can control the size better than the two current methods. On the other hand, to overcome the issue raised by the unknown nuisance parameters of the distribution of ADL, we propose the GPQ-based ADL method for eliminating the unknown parameter in the distribution by applying the concept of generalized confidence interval proposed by Weerahadi (Weerahandi, 1993).
The proposed GPQ method not only can control the size at the nominal level better than the corrected Kroll’s method but also keep the good performance of the power for assessment of linearity in assay validation.
In addition to ADL proposed by Kroll et al. (Kroll, 2000), we also introduce two new alternative criteria SSDL and CVDL. SSSL is an un-scaled measure which is formulate by the sums of the square of the deviation from linearity, while CVDL is a scaled measure which is scaled by the variability of the best-fitted model for assessment of linearity. The major difference of CVDL with other two aggregate criteria is that CVDL considers both accuracy and reliability with respect to an analytical method into one measure simultaneously. With respect to SSDL, one may consider the following test statistic for evaluating linearity using F-test:
Under the null hypothesis of hypothesis (3.2.2.2), i.e., L Pi Li 2 02
i=1
(μ - μ ) =Lδ
∑
, ψ isdistributed as an non-central Fd-1,LJ-d-1 distribution with non-centrality parameter of
2 0 2
LJδ
σ . However, there is still unknown parameter σ in the non-centrality parameter of 2 the distribution. If the statistical testing is performed based on ψ with non-central
d-1,LJ-d-1
F distribution by substituting σ using its estimates, the type-I error may still 2 be inflated due to the variability of estimates of σ . Therefore, the GPQ approach is 2 proposed to solve the issue of the unknown parameters in the distribution of the estimators of each aggregate criterion. Our simulation results show all three GPQ-based ADL, SSDL and CVDL method can not only control the size better than corrected Kroll’s method but also maintain the good performance of the power. On the other hand, it also show the GPQ-based ADL procedure is uniformly more powerful than the GPQ-based SSDL and CVDL methods.
In addition to the proposed GPQ approach, a bootstrap procedure may be a reasonable approach to evaluation of linearity for the proposed aggregate criteria.
However, bootstrap procedures may suffer a disadvantage that the sampling distributions of the observed ADL, SSDL and CVDL involve unknown nuisance parameters which need to be substituted by their estimates when generating the bootstrap samples. Bootstrap procedures will still inflate type I error rate due to variability of estimates of unknown parameters. On the other hand, derivation of generalized pivotal quantities is based on the sampling distribution of the sample mean, the mean square of the best-fitted model. As result, our proposed GPQ procedures do incorporate the sampling variability of the estimated parameters. In addition, the observed GPQ is free of the nuisance parameters. This is another novelty of our
proposed procedure which applies the technique of GPQ to resolve the issue of nuisance parameters for the inference of the proposed aggregate criteria on evaluation of linearity.
The other issue needs to be noted is about the design of experiment for evaluation of linearity. As it has already known that the variability of the predicted values of the fitted regression models will become larger at the concentration levels which are close to the start and end points of the range of selected concentration levels. Therefore, the optimal design with the selection of appropriate concentration levels including the number of concentration levels, the value of concentration levels and the number of samples at each concentration levels by considering the change of the variability for the predicted values at different concentration levels needs to be considered. As one of the purposes for the evaluation of linearity is to decide the range of concentration levels with linearity, after selecting out the concentration levels without nonlinearity according to the criteria of EP6-A guideline (Tholen et al., 2003), an equal space design, i.e., equal difference between each two neighbor concentration levels, which is the design with most efficiency is recommend.
In our research, we introduce the TOST procedure for the disaggregate criterion as well as the GPQ-based procedure for the different aggregate criteria. All of the proposed procedures show the good performance in controlling the size and power for assessment of linearity in assay validation. In addition, the evaluation procedure based on the disaggregate criterion is more conservative than which based on the aggregate criterion because it requires that the differences in predicted means between the best-fitted model and linear models for all solutions be within the pre-specified limit, while the aggregate criterion only requires the magnitude of sum of deviations from linearity be controlled
within a aggregate limit. The choice of the disaggregate-based procedure and aggregate based procedure may depend on how accuracy the assay method is required. In addition, although the GPQ-based ADL procedure is recommended to be used for assessment of linearity in assay validation since it is the uniformly more powerful than the other two GPQ-based methods. However, one may consider using CVDL as the criterion for assay validation if he/she would like to evaluate accuracy and reliability simultaneously.