Recovery of Higher-Level Parameters - 在認知診斷測量架構中的試題差異功能偵測效果探討

CHAPTER 4 RESULTS

4.1.1 Recovery of Higher-Level Parameters

The higher level estimated parameters of the RHO-RDINA model and RHO-RDINO

model includes attribute discrimination, γ and attribute difficulty, β. The recovery of

three parameters was assessed with RMSE and Bias. The recovery results for these

parameters under all testing conditions are presented in Tables 4.1 to 4.2. Accuracy of

the examinee attributes mastery parameter, α is computed by correct classification rate.

The correct classification rate for α is presented in Tables 4.3 and 4.4.

In Table 4.1, it can be seen that recovery was generally good for the

discrimination parameter with the RHO-RDINA model, when the ability distribution

of the focal and reference groups were equal, and almost as good in the unequal ability condition, under various DIF patterns. The RMSEs for attribute difficulty and

attribute discrimination range from .04 to .16. This indicates generally good recovery.

Recovery of β and γ did not appear to be affected much by any of the different

scenarios (i.e., combinations of DIF patterns; ability distribution difference).

Recovery was less accurate, however, for short test lengths. This may have occurred

because the structure of Q-matrices in 40-items or 60-items conditions is the same as

the short test length, and attribute difficulty parameters are set the same across

different test lengths. Since the numbers of items that tested by each attribute are

increased with test length the estimation of attribute difficulty will be more accurate.

A similar pattern was found in the recovery of each of the parameters with the RHO-RDINO model, which can be seen in Table 4.2. The RMSEs for attribute

difficulty and attribute discrimination range from .04 to .20. This indicates generally

good recovery. Recovery of β and γ did not appear to be affected much by any of the

different scenarios (i.e., combinations of DIF patterns and ability distribution

difference).

Table 4.1 Bias and RMSEs of Attribute Difficulty, A, and Discrimination, γ over 25 Replications with RHO-RDINA Model

Equal Unequal

Balanced One-sided No-DIF Balanced One-sided No-DIF Test Length

par gen Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE 20 items A[1] -1.5 -0.08 0.14 -0.06 0.16 -0.04 0.11 -0.06 0.15 -0.08 0.14 -0.06 0.12 A[2] -1.0 -0.06 0.11 -0.07 0.13 -0.04 0.09 -0.04 0.11 -0.04 0.11 -0.05 0.11 A[3] -1.0 -0.04 0.10 -0.04 0.11 -0.03 0.09 -0.03 0.08 -0.04 0.12 -0.04 0.10 A[4] -0.5 -0.01 0.08 -0.03 0.08 0.00 0.07 -0.02 0.08 -0.01 0.08 0.00 0.07 A[5] -0.5 -0.01 0.07 -0.03 0.08 -0.01 0.07 -0.02 0.06 -0.01 0.08 0.00 0.08 γ 1.5 -0.05 0.12 -0.04 0.11 -0.04 0.11 -0.02 0.13 -0.05 0.11 -0.05 0.10 40 items A[1] -1.5 -0.02 0.09 -0.08 0.14 -0.05 0.11 -0.05 0.09 -0.05 0.12 -0.04 0.11 A[2] -1.0 0.00 0.07 -0.04 0.09 -0.03 0.10 -0.03 0.07 -0.02 0.08 -0.02 0.09 A[3] -1.0 -0.02 0.08 -0.03 0.08 -0.03 0.10 -0.04 0.07 -0.03 0.07 -0.03 0.09 A[4] -0.5 0.00 0.06 -0.02 0.06 0.00 0.05 -0.02 0.05 -0.01 0.06 0.00 0.06 A[5] -0.5 0.00 0.04 -0.01 0.07 -0.01 0.05 -0.02 0.05 -0.01 0.05 0.00 0.06 γ 1.5 -0.01 0.09 -0.06 0.09 -0.04 0.11 -0.04 0.10 -0.02 0.10 -0.03 0.10 60items A[1] -1.5 -0.05 0.09 -0.02 0.09 -0.08 0.11 -0.05 0.10 -0.03 0.08 -0.03 0.06 A[2] -1.0 -0.02 0.07 0.00 0.08 -0.03 0.07 0.00 0.07 -0.02 0.07 -0.01 0.07 A[3] -1.0 -0.02 0.07 -0.01 0.07 -0.03 0.08 -0.01 0.07 -0.02 0.07 -0.01 0.05 A[4] -0.5 -0.01 0.06 0.00 0.05 -0.01 0.06 0.01 0.05 0.00 0.07 0.01 0.06 A[5] -0.5 -0.02 0.06 0.00 0.06 -0.03 0.06 0.00 0.06 -0.02 0.06 -0.01 0.04 γ 1.5 -0.02 0.08 -0.01 0.06 -0.05 0.09 -0.04 0.07 -0.01 0.08 -0.01 0.06

Table 4.2 Bias and RMSEs of Attribute Difficulty, A, and Discrimination, γ over 25 Replications with RHO-RDINO Model

Equal Unequal

Balanced One-sided No-DIF Balanced One-sided No-DIF Test Length

par gen Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE 20 items A[1] -1.5 -0.09 0.14 -0.12 0.19 -0.12 0.19 -0.12 0.17 -0.07 0.12 -0.11 0.16 A[2] -1.0 -0.04 0.13 -0.07 0.14 -0.06 0.14 -0.06 0.10 -0.04 0.09 -0.09 0.13 A[3] -1.0 -0.08 0.12 -0.06 0.14 -0.06 0.11 -0.06 0.12 -0.05 0.12 -0.05 0.11 A[4] -0.5 -0.02 0.15 -0.02 0.14 -0.01 0.14 0.03 0.08 -0.01 0.11 0.02 0.10 A[5] -0.5 0.01 0.17 0.03 0.13 0.08 0.17 0.03 0.13 -0.01 0.12 0.01 0.12 γ 1.5 -0.12 0.17 -0.12 0.16 -0.10 0.18 -0.12 0.19 -0.07 0.18 -0.11 0.20 40 items A[1] -1.5 -0.05 0.11 -0.02 0.11 -0.04 0.08 -0.05 0.09 -0.04 0.10 -0.07 0.11 A[2] -1.0 -0.01 0.08 0.00 0.07 -0.03 0.08 -0.02 0.07 -0.01 0.08 -0.04 0.07 A[3] -1.0 -0.03 0.07 -0.03 0.09 -0.05 0.09 -0.01 0.05 -0.02 0.08 -0.04 0.08 A[4] -0.5 0.00 0.08 -0.01 0.08 0.00 0.07 0.00 0.07 -0.01 0.06 -0.02 0.07 A[5] -0.5 0.00 0.08 0.00 0.07 -0.01 0.07 -0.01 0.06 -0.01 0.08 -0.01 0.07 γ 1.5 -0.05 0.10 -0.04 0.10 -0.06 0.11 -0.03 0.10 -0.02 0.08 -0.07 0.11 60items A[1] -1.5 -0.03 0.09 -0.06 0.12 -0.05 0.12 -0.07 0.12 -0.04 0.10 -0.04 0.11 A[2] -1.0 -0.02 0.07 -0.02 0.08 -0.03 0.09 -0.04 0.09 -0.02 0.07 -0.04 0.08 A[3] -1.0 -0.02 0.07 -0.03 0.10 -0.02 0.07 -0.04 0.07 -0.01 0.07 -0.01 0.07 A[4] -0.5 0.01 0.07 -0.02 0.05 0.01 0.06 -0.02 0.06 0.02 0.06 -0.01 0.05 A[5] -0.5 -0.02 0.04 -0.02 0.06 -0.01 0.06 -0.04 0.06 0.00 0.05 -0.02 0.05 γ 1.5 -0.02 0.07 -0.03 0.11 -0.03 0.12 -0.04 0.10 -0.04 0.11 0.00 0.09

Table 4.3 shows the percent of correct classification rates of attribute mastery

based on RHO-RDINA model. The correct classification rate was computed by comparing the estimated classification against the deterministic classification obtained

using the true abilities. The table shows that the correct classification rates of attribute

mastery were relatively high, ranging from .92 to .99. The last row of Table 4.3 shows

the percent of examinees whose attribute vector was correctly estimated. The correct

classification rates of attribute mastery and overall consistency decreased for the unequal ability distribution.

Table 4.4 shows the percent of correct classification rates of attribute mastery

based on RHO-RDINO model. The ability distribution difference had an effect on the

accuracy of attribute and overall consistency with the HO-RDINO model. The table shows that the correct classification rates of attribute mastery and overall consistency

increased in the unequal ability distribution compared to the equal ability distribution.

This may have occurred because in the unequal ability distribution, the ability

distribution for the focal group was generated from N (-1, 1) and N (0, 1) for the reference group. This may have lead to more non-masters in this condition compared

to those in the equal ability distribution. There was a large discrepancy between the

sample size of masters and non-masters. Thus, the accuracy of attribute classification

and the overall consistency decreased in the unequal ability distribution with the

RHO-RDINA model. However, the accuracy of attribute and overall consistency

increased in the unequal ability distribution because of the nature of the RHO-RDINO model.

In sum, the recovery of higher-order parameters yielded in a reasonable range.

The RMSEs were below .20 and the Bias below ±.12 for the two proposed models.

The easier attributes the larger bias and RMSE were. For the both proposed models

the recovery of attribute discrimination and attribute difficulty parameters were independent of DIF patterns and ability distribution difference, but the test length had

slightly impact on the recovery of discrimination parameter, attribute difficulty and

correct classification rate of attribute mastery. As test length increased the correct

classification rates increased. The results show that as test length increased, the overall consistency of examinees whose attribute vector was increased especially

from the short test length to median test length condition. This may have occurred

because of the increased test length offered sufficient information and thus improve

the estimation of attribute mastery. Moreover, the percent of correct classification rates of attribute mastery for RHO-RDINA model were higher than .92 across all

conditions and higher than .80 for RHO-RDINO model which indicated that correct

estimates on the examinee attribute profile score estimates using the two proposed

models under these conditions.

Table 4.3 Percent of RHO-RDINA Correct Classification by Attribute and Vector

Test length 20 40 60

Equal Unequal Equal Unequal Equal Unequal

Classification BA ON NO BA ON NO BA ON NO BA ON NO BA ON NO BA ON NO Attribute 1 0.93 0.93 0.93 0.89 0.89 0.89 0.97 0.97 0.97 0.95 0.95 0.95 0.98 0.98 0.96 0.97 0.97 0.97 Attribute 2 0.94 0.95 0.95 0.92 0.93 0.92 0.98 0.98 0.98 0.97 0.97 0.97 0.99 0.99 0.99 0.97 0.98 0.98 Attribute 3 0.93 0.93 0.93 0.91 0.91 0.90 0.97 0.98 0.98 0.96 0.96 0.96 0.99 0.99 0.99 0.98 0.98 0.98 Attribute 4 0.92 0.92 0.92 0.90 0.90 0.90 0.97 0.97 0.97 0.96 0.96 0.96 0.99 0.99 0.99 0.98 0.98 0.98 Attribute 5 0.95 0.95 0.95 0.94 0.94 0.94 0.98 0.98 0.98 0.98 0.97 0.98 0.99 0.99 0.99 0.99 0.99 0.99 Overall consistency 0.76 0.77 0.76 0.68 0.68 0.68 0.90 0.90 0.90 0.85 0.85 0.85 0.95 0.95 0.95 0.91 0.91 0.91 Note: BA denotes balanced DIF pattern; ON denotes one sided DIF pattern; NO denotes no DIF pattern

Table 4.4 Percent of RHO-RDINO Correct Classification by Attribute and Vector

Note: BA denotes balanced DIF pattern; ON denotes one sided DIF pattern; NO denotes no DIF pattern

Test length 20 40 60

Equal Unequal Equal Unequal Equal Unequal

Classification BA ON NO BA ON NO BA ON NO BA ON NO BA ON NO BA ON NO Attribute 1 0.93 0.93 0.93 0.93 0.93 0.93 0.95 0.93 0.96 0.97 0.97 0.97 0.98 0.98 0.98 0.98 0.98 0.98 Attribute 2 0.90 0.90 0.90 0.91 0.91 0.91 0.93 0.92 0.95 0.96 0.96 0.96 0.96 0.96 0.96 0.97 0.97 0.97 Attribute 3 0.88 0.88 0.88 0.89 0.89 0.89 0.92 0.91 0.94 0.95 0.95 0.95 0.97 0.97 0.97 0.97 0.97 0.97 Attribute 4 0.80 0.80 0.80 0.82 0.82 0.82 0.85 0.86 0.88 0.90 0.90 0.90 0.93 0.92 0.92 0.94 0.94 0.94 Attribute 5 0.89 0.89 0.89 0.90 0.90 0.90 0.90 0.91 0.94 0.95 0.95 0.94 0.96 0.96 0.96 0.97 0.97 0.96 Overall consistency 0.55 0.56 0.55 0.59 0.59 0.59 0.68 0.70 0.73 0.77 0.77 0.77 0.82 0.82 0.82 0.85 0.85 0.85

在文檔中在認知診斷測量架構中的試題差異功能偵測效果探討 (頁 77-84)