CVDL and Statistical Hypothesis - Alternative Aggregate Criterion

CHAPTER 6 Alternative Aggregate Criterion – Sum of Square of the Deviation from

7.1 CVDL and Statistical Hypothesis

As introduced in Section 3.2.3 of Chapter 3, CVDL is defined as:

CVDL contains both information of the deviation from the linearity and the repeatability in term of ^L _Pi _Li ²

i=1

the sum of squares for the difference in predicted values of the best-fitted model andσ is the residual mean square obtained from the best-fitted model as defined in the previous chapters.

The corresponding hypothesis for assessing linearity is then given as:

H0: η ≥ η0 vs. Ha: η < η0. ( 7 . 1 . 2 ) where η0 is the allowable limit of CVDL.

An estimator of CVDL can be also expressed in terms of SSDL as

_i=1^L

(

^{Y -Y}^ˆ^Pi ^ˆ^Li

)

²^/L

η = s

∑

, ( 7 . 1 . 3 )

where s is the square root of the residual mean square obtained from the best-fitted model with degrees of freedom of LJ-d-1, and d is the degrees of freedom for regression of the best-fitted model.

7.2 Generalized Pivotal Quantity of CVDL

As it can be found in Eq. (7.1.1) for the definition of CVDL, the term of numerator is exactly the SSDL denoted by R we introduced in Chapter 5. In addition, a GPQ of _τ

σ can be obtained as 2

2 where U is the same chi-square random variable with degrees of LJ-d-1 we defined in Eq. (5.2.2).

From (7.2.1), R has distribution that is free of parameters. In addition, when σ2 S is ² substituted by its observed value s in (7.2.2), then the observed value of ² R σ2

denoted by r is equal to σ2 σ and free of any nuisance parameter. Hence, it fulfills ² the two requirements of (a) and (b) as described in Section 5.1 for being a GPQ for σ . ²

Therefore, a GPQ of CVDL can be obtained by:

7.3 Generalized Confidence Interval of CVDL

An upper 100(1-α)th percentile GCI for CVDL can be obtained from the following Monte-Carlo algorithm:

Step 1: Choose a large simulation sample size, say K=10,000. For k equal to 1 through K, carry out the following two steps.

Step 2: Independently generate LJx1 standard normal random vectorZ and U is the same chi-square random variable with degrees of LJ-d-1.

Step 3: For the realized values of Y and S², compute R as defined in (7.2.3). _η

The required upper 100(1-α)th percentiles of the distribution of GPQ for CVDL is then

estimated by the 100(1-α)th sample percentiles of the collection of K=10,000 realizations R , _η,1 R ,……..,_η,2 R_η,10000.

7.4 Statistical Testing Procedure

The upper 100(1-α)% generalized confidence limit for CVDL based on GPQ can be used to test their respective statistical hypotheses in (7.1.2) for linearity. The null hypothesis in (7.1.2) is rejected and the linearity of a analytical method is concluded at the α significance level if the upper 100(1-α)% generalized confidence limit for CVDL is less than η0.

7.5 Simulation Study

A simulation study is performed to compare the empirical sizes and powers of the corrected Kroll’s and GPQ methods based on CVDL. The specifications of the simulation study are given as follows: The number of solutions (or dilutions) of different concentrations is set to be 5 or 7 and the number of replications at each concentration is 2, 3, or 4. Throughout the simulation, mean concentration μ is assumed to be 4. If following that the allowable margin of linearity based on ADL, θ⁰, is specified at 0.05 as recommended by Kroll et al. (Kroll, 2000). Using the relationship that CVDL = θ L× ×μ/ σ, where μ is the population mean concentrations for all solutions of the assay and θ is ADL. The allowable limit η0 is 2 and 1 for σ being 0.1 and 0.2, respectively. For each of 12 combinations, ten thousand (10,000) random samples are generated. For the 5% nominal significance level, a simulation study with 10,000 random samples implies that 95 percent of the empirical sizes evaluated at the

allowable margins will be within 0.0457 and 0.0543 if the proposed methods can adequately control the size at the nominal level of 0.05.

The results of the empirical sizes are provided in Table 7.5.1. All the empirical sizes of the GPQ method based on CVDL are within the range between 0.0457 and 0.0543, while all empirical sizes of the corrected Kroll’s method are larger than 0.0543. The simulation results reveal that the GPQ-based CVDL method can adequately control the size at the nominal level. The reason for a better performance of the GPQ-based CVDL method may be that the distribution of GPQ is free of their respective nuisance parameters. On the other hand, the corrected Kroll’s method fails to take into account the variability in estimator of the non-centrality parameter of the non-central chi-square distribution.

The results of the empirical powers are presented in Table 7.5.2. In Table 7.5.2, the true value of ADL is assumed to be 0.005 for both number of solutions of 5 and 7. The results in Table 7.5.2 also show that the empirical power of both methods is an increasing function of the number of replicates and number of solutions. In addition, the empirical power of the GPQ-based CVDL method is competitive to the corrected Kroll’s method. Although the empirical power of the corrected Kroll’s method is larger than that of the GPQ-based CVDL method, its better performance on the empirical power results from inflation of the size above the nominal level.

Figure 7.5.1 and 7.5.2 present the empirical powers of the four methods when σ are 0.1 and 0.2, respectively with number of solutions is 5, number of replicates is 3. The true values of ADL are ranged from 0 to 0.08. A comparison of Figure 7.5.1 and Figure 7.5.2 reveals that the power of both methods is a deceasing function of σ. The power curve of the GPQ-based CVDL method is uniformly lower than that of the corrected

Kroll’s method. However, the empirical power of GPQ-based CVDL method at ADL=0.05 is 0.0511 while which for corrected Kroll’s method is 0.0623. Therefore, it show that show that the GPQ-based CVDL method can control the size at the nominal level while corrected Kroll’s method cannot.

7.6 Numerical Example

The same numerical data of calcium in the previous chapters is used to illustrate the proposed testing procedures in evaluation of linearity of an analytical procedure.

Following EP6-A (Tholen et al., 2003), the criterion of μ -μ_Pi _Li for linearity is set as 0.2 mg/dL for all 5 concentrations. In this example, the allowable margin of percent bound for ADL is set as 0.05. On the other hand, the allowable limit of the GPQ-based SSDL is set as 0.2 which is calculated by square of 0.2 mg/dL multiplying 5 concentrations. We also assume that the allowable repeatability set by the manufacturer is 0.2. Therefore, the allowable margin of the GPQ-based CVDL is 1 which is equal to the allowable margin of 0.2 for SSDL divided by the product of 5 (concentrations) and square of the repeatability of 0.2, i.e., η= τ/(Lσ ) . The results of the corrected Kroll’s ² and the GPQ-based CVDL methods are provided in Table 7.6.1. The linearity is concluded by corrected Kroll’s method since the observed ADL yields a value of 0.0146 is less than the critical value of 0.0437 with respect to a margin of percent bound of 5%.

On the other hand, the 95% upper confidence limits for CVDL methods is 1.9125. Its 95% upper confidence limits is larger than their respective allowable upper limits of 1.

As a result, the GPQ-based CVDL method can not conclude the linearity of the analytical procedure at the 5% significance level. As shown in simulation results and

Table 7.5.1 Empirical sizes (corrected Kroll’s method vs. GPQ-based CVDL method)

No. of Solutions

No. of Replicates

Standard Deviation

Corrected

Kroll ^GPQ-basedCVDL

5 2 0.1 0.0702 0.0540

0.2 0.0763 0.0467

3 0.1 0.0623 0.0513

0.2 0.0655 0.0511

4 0.1 0.0594 0.0489

0.2 0.0595 0.0509

7 2 0.1 0.0655 0.0490

0.2 0.0635 0.0504

3 0.1 0.0592 0.0473

0.2 0.0583 0.0504

4 0.1 0.0562 0.0529

0.2 0.0571 0.0452

Table 7.5.2 Empirical powers with the true ADL=0.005 (corrected Kroll’s method vs.

GPQ-based CVDL method) No. of

Solutions

No. of Replicates

Standard Deviation

Corrected Kroll

GPQ-based CVDL

5 2 0.1 1.0000 0.9876

0.2 0.9670 0.7754

3 0.1 1.0000 0.9989

0.2 0.9965 0.9212

4 0.1 1.0000 0.9998

0.2 0.9996 0.9678

7 2 0.1 1.0000 0.9979

0.2 0.9923 0.8994

3 0.1 1.0000 0.9999

0.2 0.9996 0.9742

4 0.1 1.0000 1.0000

0.2 1.0000 0.9932

0.00 0.02 0.04 0.06 0.08

0.00.20.40.60.81.0

ADL

power

Corrected Kroll GPQ-based CVDL

Figure 7.5.1 The empirical powers when standard deviation of normal random error is 0.1, number of solutions is 5, and number of replicates is 3 (corrected Kroll’s method vs. GPQ-based CVDL method)

0.00 0.02 0.04 0.06 0.08

0.00.20.40.60.81.0

ADL

power

Corrected Kroll GPQ-based CVDL

Figure 7.5.2 The empirical powers when standard deviation of normal random error is 0.2, number of solutions is 5, and number of replicates is 3 (corrected Kroll’s method vs. GPQ-based CVDL method)

Table 7.6.1 Results of the linearity evaluation for the example of calcium by corrected Kroll’s and GPQ-based CVDL methods

Method

Sample Statistic /

Critical Value or Allowable Bound Conclusion

Sample ADL 0.0146

Corrected Kroll

Critical Value 0.0437 Linear Upper 95% C.L. 1.9125

GPQ-based CVDL

Allowable Upper Bound 1 Nonlinear 95% C.L. : Upper 95% Confidence limit

7.7 Summary

Both ADL and CVDL are the aggregate criterion for assessment of linearity in assay validation. The main difference between these two criteria is the proposed CVDL is an criterion not only contain the information of the deviations from linearity but also the repeatability of the analytical procedure. The simulation results presented above show that the corrected Kroll’s method inflates the type I error rate and the GPQ-based CVDL methods can control the size at the nominal level. In addition, the GPQ-based CVDL method also keep the good power performance. Therefore, we conclude the GPQ-based CVDL with respect to the statistical hypothesis in (7.1.2) for evaluating the linearity in assay validation is better than the corrected Kroll’s method.

Chapter 8 Discussion and Summary

Various aggregate criteria including ADL, SSDL and CVDL for evaluating the linearity in assay validation were introduced in Chapter 2 to 7. Although these criteria are formulated by different components which provide the different characteristics, however, the common feature of these criteria is that all of them contain the sum of square for the deviations from linearity as the major component. In this chapter, we discuss the relationship among these criteria. In addition, the results of the simulation study and numerical example are used to compare the performances and characteristics of each aggregate criterion for the assessment of linearity in assay validation.

8.1 Relationship of the Aggregate Criteria

Recall the definition for ADL, SSDL and CVDL are defined in the previous chapters as follows:

where μ_Pi and μ_Li are the predicted mean of the best-fit polynomial model and linear regression model at ith concentration with i = 1,.., L, respectively; μ is the population mean concentration for all solutions of the assay and σ is the variance of residual ² under the best-fitted model. It can be found that the SSDL is the common component for each criterion. Their relationship can easily be constructed as the following:

τ = L(μθ)² = L(ησ)² ( 8 . 1 . 1 ) Unlike that SSDL is the unscaled deviation defined as the pure sum of square of the deviations from the linearity, both CVDL and ADL are the scaled deviations. ADL is the square root of the average sum of squares of the scaled deviations by μ, while CVDL is scaled by the variability or repeatability of the best-fitted model.

8.2 Comparison by Simulation Study

A simulation studies was employed to compare the empirical sizes and powers among three GPQ-based ADL, SSDL and CVDL methods. Parts of the results from the same simulation study were presented in Chapter 5, 7 for comparing the performance of the corrected Kroll’s method with GPQ-based ADL and GPQ-based CVDL methods, respectively. As described in Chapter 5 and 7, the specifications of the simulation study are given as follows: The number of solutions (or dilutions) of different concentrations is set to be 5 or 7 and the number of replications at each concentration is 2, 3, or 4.

Throughout the simulation, mean concentration μ is assumed to be 4. The allowable margin of linearity based on ADL, θ0, is specified at 0.05. From the relationship of τ = L(μθ)² = L(ησ)² in Eq. (8.1.1), it follows that the margin for SSDL are 0.2 and 0.28 for 5 and 7 concentrations, respectively. In addition, under the specification of standard

deviation of normal random error is specified as 0.1 and 0.2, the allowable limit η0 is 2 and 1 for σ being 0.1 and 0.2, respectively. For each of 12 combinations, ten thousand (10,000) random samples are generated. For the 5% nominal significance level, a simulation study with 10,000 random samples implies that 95 percent of the empirical sizes evaluated at the allowable margins will be within 0.0457 and 0.0543 if the proposed methods can adequately control the size at the nominal level of 0.05.

The results of the empirical sizes are provided in Table 8.2.1. All of empirical sizes of the GPQ methods based on ADL, SSDL and CVDL are within the range between 0.0457 and 0.0543. The simulation results reveal that the GPQ-based methods for SSDL, ADL, and CVDL can adequately control the size at the nominal level. On the other hand, according to the empirical powers of three GPQ-based methods presented in Table 8.2.2, the empirical power of the GPQ-based ADL method is larger than that of the GPQ-based SSDL which is in turn larger than that of the GPQ-based CVDL method.

Figure 8.2.1 and 8.2.2 present the empirical powers of the three GPQ-based methods when σare 0.1 and 0.2, respectively with number of solutions is 5, number of replicates is 3. For Figure 8.2.1, when the ADL = 0.05, the empirical size for the three GPQ-based methods are 0.0499, 0.0513, and 0.0502 for SSDL, CVDL, and ADL respectively.

Similar findings are observed in Figure 8.2.2. Again these results show that the three GPQ-based procedures can adequately control the size at the nominal level of 5%.

Moreover, Both figures demonstrate that the GPQ-based ADL procedure is uniformly more powerful than the GPQ-based SSDL method which is in turn uniformly more powerful than the GPQ-based CVDL method. For example, in Figure 8.2.1 when ADL is 0.03, the empirical powers are 0.9426, 0.7803, and 0.5337, respectively for the GPQ-based ADL, SSDL, and CVDL methods. In other words, the GPQ-based ADL

Table 8.2.1 Empirical sizes (GPQ-based SSDL vs. GPQ-based CVDL vs. GPQ-based ADL methods)

No. of Solutions

No. of Replicates

Standard

Deviation ^GPQ-basedSSDL

GPQ-based CVDL

GPQ-based ADL

5 2 0.1 0.0462 0.0540 0.0467

0.2 0.0523 0.0467 0.0517

3 0.1 0.0499 0.0513 0.0502

0.2 0.0522 0.0511 0.0517

4 0.1 0.0498 0.0489 0.0505

0.2 0.0504 0.0509 0.0508

7 2 0.1 0.0504 0.0490 0.0501

0.2 0.0495 0.0504 0.0494

3 0.1 0.0505 0.0473 0.0509

0.2 0.0495 0.0504 0.0498

4 0.1 0.0498 0.0529 0.0498

0.2 0.0498 0.0452 0.0510

Table 8.2.2 Empirical powers with the true ADL=0.005 (GPQ-based SSDL vs.

GPQ-based CVDL vs. GPQ-based ADL methods) No. of

Solutions

No. of Replicates

Standard Deviation

GPQ-based SSDL

GPQ-based

CVDL ^GPQ-basedADL

5 2 0.1 0.9995 0.9876 1.0000

0.2 0.6976 0.7754 0.9331

3 0.1 1.0000 0.9989 1.0000

0.2 0.9326 0.9212 0.9942

4 0.1 1.0000 0.9998 1.0000

0.2 0.9814 0.9678 0.9995

7 2 0.1 1.0000 0.9979 1.0000

0.2 0.9123 0.8994 0.9888

3 0.1 1.0000 0.9999 1.0000

0.2 0.9850 0.9742 0.9994

4 0.1 1.0000 1.0000 1.0000

0.2 0.9981 0.9932 1.0000

0.00 0.02 0.04 0.06 0.08

0.00.20.40.60.81.0

ADL

power

GPQ-based SSDL GPQ-based CVDL GPQ-based ADL

Figure 8.2.1 The empirical powers when standard deviation of normal random error is 0.1, number of solutions is 5, and number of replicates is 3 (GPQ-based SSDL vs. GPQ-based CVDL vs. GPQ-based ADL methods)

0.00 0.02 0.04 0.06 0.08

0.00.20.40.60.81.0

ADL

power

GPQ-based SSDL GPQ-based CVDL GPQ-based ADL

Figure 8.2.2 The empirical powers when standard deviation of normal random error is 0.2, number of solutions is 5, and number of replicates is 3 (GPQ-based SSDL vs. GPQ-based CVDL vs. GPQ-based ADL methods)

procedure is 40% more powerful than the GPQ-based CVDL method and is 16% more powerful than the GPQ-based SSDL method at the ADL of 0.03. Therefore, the improvement of the power provided by the GPQ-based ADL method is impressively substantial.

8.3 Numerical Example

The previous example of calcium is used to illustrate the proposed testing procedures in evaluation of linearity of an analytical procedure. Under the criteria of μ -μ_Pi _Li for linearity and repeatability are 0.2mg/dL and 0.2mg/dL, respectively, and the allowable margin of percent bound for ADL is set as 0.05, the corresponding criteria and results of three GPQ-based ADL, SSDL and CVDL methods are presented in Table 8.3.1. The results show that the 95% upper confidence limit for the ADL computed by the GPQ method is 0.0218 which is smaller than the allowable upper limit of 0.05. Hence, the linearity of the analytical procedure can be concluded at the 5% significance level by the GPQ-based ADL procedure. On the other hand, the 95% upper confidence limits for SSDL of the GPQ-based SSDL and CVDL methods are 0.2471 and 1.9125, respectively.

Both 95% upper confidence limits are larger than their respective allowable upper limits of 0.2 and 1. As a result, both methods can not conclude the linearity of the analytical procedure at the 5% significance level. The results presented above show the different conclusions between the GPQ-based methods. As shown in simulation results, all three GPQ-based methods can control the size at the nominal size of 0.05, the GPQ-based ADL method is uniformly more powerful than the other two GPQ-based methods.

This might be one of the reasons why the linearity can be claimed by the GPQ-based

Table 8.3.1 Results of the linearity evaluation by three different methods

Method

Sample Statistic /

Critical Value or Allowable Bound Conclusion

Upper 95% C.L. 0.2471 GPQ-based SSDL

Allowable Upper Bound 0.2 Nonlinear Upper 95% C.L. 1.9125

GPQ-based CVDL

Allowable Upper Bound 1 Nolinear Upper 95% C.L. 0.0218

GPQ-based ADL

Allowable Upper Bound 0.05 Linear 95% C.L. : Upper 95% Confidence limit

ADL method.

8.4 Summary

In this chapter, we discuss the relationship among three different aggregate criteria of ADL, SSDL and CVDL. As we mentioned in Section 8.1, the SSDL, i.e., the sum of square of the deviation from the linearity is the basis of three criteria. On the other hand, ADL and CVDL are the scale measures scaled by the average concentration and repeatability, respectively. As the demonstrated by the simulation results, all three GPQ-based ADL, SSDL, CVDL methods can control the size at the nominal level.

Moreover, simulation results reveal that the GPQ-based ADL procedure is uniformly more powerful than the GPQ-based SSDL and CVDL methods. In addition, CVDL method is the most conservative procedure among all three GPQ-methods. This may be due to the reason that it is scaled by the repeatability and it requires both the predicted means and repeatability of the best-fitted model to meet the allowable limits. On the other hand, the GPQ-based ADL procedure not only adequately control the type I error rate but also is uniformly more powerful than the other GPQ-based method. Therefore, the GPQ-based ADL procedure will be recommended to be the better procedure for evaluating the linearity in assay validation among the three GPQ-based methods.

However, as the GPQ-based CVDL procedure considers linearity and repeatability simultaneously in one measure, one may consider using CVDL as the criterion for assay validation if he/she would like to evaluate accuracy and reliability simultaneously.

Chapter 9 Concluding Remarks

9.1 Conclusion

One of the most important characteristics for evaluation of accuracy and precision in assay validation is linearity. Even though the best-fitted model is not linear, linearity of the analytical procedure can still be claimed if the difference in the predicted means between the best-fitted and linear models is smaller than some pre-specified allowable limit at all concentrations employed in the validation experiment. As a result, the deviation from linearity is the fundamental unit for assessment of bias for evaluation of linearity.

With respect to the disaggregate criterion, the approved CLSI EP6-A guideline proposes the estimation method by comparing the estimates of the differences in the predicted means with the pre-specified allowable limit directly without the formal statistical inference procedure. The method completely ignores the variation of the estimate and inflates the type I error of the results of the evaluation. On the hand, the ADL proposed by Kroll et al. (Kroll, 2000) is an aggregate criterion constructed by the sum of square of from the deviations from linearity scaled by the mean concentrations.

However, the statistical testing procedure proposed by Kroll et al. (Kroll, 2000) not only incorrectly formulates the hypothesis for proving linearity but also contained the unknown nuisance parameters in the distribution of ADL which causes the problem for

controlling the size at the nominal level. Therefore, we propose the TOST procedure and corrected Kroll’s method to improve the shortcomings of the above two methods by providing the formal statistical testing procedure instead of the estimation method and reformulating the correct hypothesis for the uncorrected Korll’s method, respectively.

The simulation results show the proposed methods can control the size better than the two current methods. On the other hand, to overcome the issue raised by the unknown

在文檔中定量體外檢驗試劑線性確校統計評估方法之研究 (頁 80-0)