Derivation of the Hierarchical Structure CAT Algorism

3.2 Study 2: The CAT Procedure based on the HSIRM

3.2.1 Derivation of the Hierarchical Structure CAT Algorism

ς was the mean estimate across replications. If the computed Z statistic was ^

beyond the critical range of ±2.576, which corresponds to the .01 nominal level, the corresponding estimator was judged as biased. The absolute value of relative bias (ARB) was computed as:

When ARB was less than .05, the estimator for the parameter was deemed acceptable (Hoogland & Boomsma, 1998).

3.2 Study 2: The CAT Procedure based on the HSIRM

3.2.1 Derivation of the Hierarchical Structure CAT Algorism

Following the approach to development of the MCAT algorithm (Segall, 1996), the latent traits with a hierarchical structure was implemented onto the CAT procedure.

Five methods (three direct estimations or one-stage methods, and two indirect estimations or two-stage methods), were proposed to evaluate the accuracy of ability estimates and are discussed below. Similar to study 1 described above, only an overall ability and three domain abilities are of interest.

3.2.1.1 Multidimensional CAT Approach

The hierarchical structure item response model can be considered as a special case of traditional MIRT models if the overall ability is not isolated from the domain abilities. Consequently, the original MCAT can be put into practice to obtain the estimates of the three domain abilities and then to assign a score for the hierarchical underlying factor through linear transformation. The details are as follows:

Step 1. Set the factor loadings contributing to overall ability at β1, β2, and β3 for each domain ability, respectively. The correlation between domain abilities can be obtained as follows:

where R is a symmetric matrix which can be used as prior information when administering MCAT to adaptively select the item for an examinee and estimate the three domain abilities through Bayesian estimation given a fixed-length stopping rule.

Step 2. Thompson’s method (also called the regression method) in the literature of factor analysis is usually implemented to obtain a factor score (Johnson, 1998). Under the condition where only an overall ability of interest, the factor score or the general ability score can be computed as follows:

s g =Λ^'R⁻¹θ

θ , (3.16)

where θ_g and θ are scalar of the overall ability and the vector of domain ability _s estimates with a pre-specified number of items, respectively; Λ is the vector of ^' factor loadings as defined above; and R is the correlation matrix presented in

Equation (3.16). As a result, the estimates for overall ability and domain abilities can be obtained; the overall ability estimate is not used in the procedure of item selection and updating ability estimates but is only computed at the final phase when the domain abilities are adaptively estimated.

3.2.1.2 Unidimensional CAT Approach

Because the domain ability is a linear combination of overall ability, the responses of examinees are also determined by the overall ability, but in an indirect way. It may be assumed that a single latent trait (overall ability) with unity

discrimination power in a PL IRT model and varying discrimination powers in a 2PL IRT model has an effect on item responses of examinees and that overall ability can be estimated through UCAT procedure for examinees and then the domain abilities also can be computed by the information prior from the overall ability estimates. The algorithm is listed as follows:

Step 1. Administer UCAT to an examinee to obtain the overall ability estimate with a unity discrimination parameter in a 1PL IRT model and with varying discrimination parameters in a 2PL IRT model.

Step 2. After completing an item adaptively selected for an examinee, the overall ability can be estimated by MAP estimation with normal prior N(0, 1).

Step3. Given the estimate of overall ability for an examinee and the three factor loadings (β1, β2, and β3), the MCAT algorism, which is congruent with the generated model, is implemented to estimate the three domain abilities for the same item selected with the UCAT algorithm. The significance of the novel computation is that when implementing MCAT to estimate domain abilities the prior for the subtest abilities of an examinee are set as:

⎟⎠

where the mean vector µ and diagonal covariance matrix Σ correspond to the prior distribution of the domain abilities. It can be viewed as the adaptive Bayesian estimation which is similar to the adaptive EAP estimation in the context of the UCAT environment (Raîche, Blais, & Magis, 2007), because every examinee has an

individual prior distribution to estimate his or her abilities.

Step 4. Given the updated overall ability estimate, go back to Step 2 to administer the next adaptive item to the examinee and then update the overall and domain ability estimates again.

Step5. Repeat the process until a pre-specified number of items is met.

However, this method based on UCAT algorithm may be confronted with problems. Because the random effects or residual variances from subtests are

neglected, it will be expected that the UCAT approach provides worse estimation for overall and domain abilities especial in a diverse factor loading setting. Even though the effect of the proposed method may not be significant, it will be helpful to

understand the impact of ignoring random effects of subtests through a series of simulations.

3.2.1.3 HSIRM-CAT

Because the HSIRM can be seen as a special case of testlet IRT model in Equation (2.8), the TCAT algorism would be implemented to estimate ability parameters. Similar to the above design, only one common ability and three domain abilities are illustrated as follows:

Step 1. Administer CAT to an examinee based on the HSIRM to obtain the overall ability estimate which is analogous to the intended ability estimate in the TCAT environment and residual estimates for the three predictions of domain abilities which is analogous to variables from testlet random effects. The overall ability and the domain residuals share the same discrimination parameter with the same item in a subtest but the discrimination parameters of the overall ability have to multiply their corresponding factor loading contributing to the domain abilities.

Step 2. Given the starting or updated values of the overall ability and the domain residuals, the adaptive item can be selected for an examinee to administer according to the maximum information criterion, and the difference between TCAT and HSIRM-CAT is that overall and domain ability estimates are considered when selecting single item across subtests to maximize the information function in a HSIRM-CAT environment. If the Bayesian selection is considered, the prior distribution for the overall ability and the domain residuals can be expressed as follows:

(

⁰^,⁰^,⁰^,⁰

)

where Σ is a symmetric matrix and the elements of the diagonal represent the variances of the overall ability and the three domain residuals, respectively.

Step3. Given the estimates of the overall ability and the domain residuals, go back to Step 2 to administer the next adaptive item to the examinee and then update the overall and domain residual estimates.

Step4. Repeat the process until a pre-specified number of items is completed.

Step 5. The overall ability estimate is obtained directly and the domain ability estimates should be computed by the following formula:

θs represents the three domain ability estimates;

(

₁ ₂ ₃

)

' = β ,β ,β

Λ represents the factor loadings for subtests; g

θ^ is the estimate for

the overall ability from the above steps; and _⎟

⎠ residual estimates from the above steps.

3.2.1.4 Unidimensional CAT Approach with Confirmatory Factor Analysis

To compare traditional and conventional estimations with the three novel estimations listed above for overall ability estimates, the researcher simultaneously implemented the UCAT procedure to estimate domain ability for each subtest and then computed the overall ability through confirmatory factor analysis. This approach

is considered as the two-stage method compared to the one-stage methods discussed above. The steps are listed as follows:

Step 1. Administer UCAT to examinees to obtain domain ability estimates under a pre-specified number of items for each subtest.

Step 2. Given all specific abilities estimated by UCAT procedure, the confirmatory factor analysis with a common overall factor is computed to obtain the factor scores as the overall ability estimates.

3.2.1.5 Unidimensional CAT Approach with Average Values

As another estimation of the two-stage method to obtain the overall ability, the average value can be computed as the overall ability estimate after conducting separate UCATs for each subtest. The steps are listed as follows:

Step 1. Administer a UCAT to examinees to obtain domain ability estimates under a pre-specified number of items for each subtest.

Step 2. Given all specific abilities estimated by UCAT procedure, we can take the average of the specific ability estimates to obtain the overall ability estimates.

3.2.2 Simulation Design

Overall ability and three domain abilities were of interest in this study in the context of the hierarchical structure item response model. Five major independent variables were manipulated: (a) test length—30 and 60 items; (b) item pool size—600 (each subtest had 200 items) and 1200 (each subtest had 400 items) items; (c) the number of parameter in the HSIRM—1PL- and 2PL-HSIRMs; (d) the magnitude of factor loadings—high ( .9, .8, and .7) and diverse ( .9, .6, and .3) factor loadings; and (e) the five estimations of trait level—multidimensional CAT approach,

unidimensional CAT approach, HSIRM-CAT approach, unidimensional CAT

approach with confirmatory factor analysis, and unidimensional CAT approach with average values. In addition, the item difficulty parameters were drawn from U(-3, 3) and the item discrimination parameters were drawn from N(1, 0.25).

The trait level of the simulees was randomly generated based on the hierarchical structure item response model as described in study 1. For each manipulation, 1000 simulees were sampled. The starting ability estimates were set at 0. Consequently,

three dependent variables were used for the comparison between methods over

varying conditions: (a) the difference between the ability estimates and the generating values, that is, bias shown in Equation (3.22); (b) RMSE for the accuracy, as shown in Equation (3.23); (c) the test reliability represented as one minus the overall mean square error (MSE) for each estimator as shown in Equation (3.24), where the overall MSE is computed from the overall and domain abilities according to Equation (3.25);

and (d) the relative efficiency of the one-stage methods over the two-stage methods were assessed using the ratio of the overall MSEs for the two methods as shown in Equation (3.26). domain abilities respectively; N was equal to 1000 representing the sample size in the

在文檔中階層結構試題反應模式及其在電腦適性測驗之應用 (頁 62-68)