In contrast to traditional CBT, computerized adaptive testing (CAT) is a more precise and efficient way to measure examinees’ ability. When an examinee correctly answers the working item, the more difficult item will be selected as the next one.
Otherwise, the easier one will be taken. The steps of examinees’ ability estimation and item selection will be repeated until examinees’ ability estimation is precise enough (Ho, 1989; Wainer, et al., 1990, p103). In this way, more precise ability estimation can be expected and fewer items (thus less time) are required to the procedure of CAT.
Based on IRT, the procedure CAT comprises of Starting, Continuing and Stopping steps (Wainer, et al., 1990, p.108). Following the three steps, a CAT system can be implemented in practice. Figure 2.5 shows the flowchart of CAT procedure.
Start
First item selection
Item presentation
Collection of the examinee's responses starting
continuing
stopping
Item is responded by the examinee
Item selection
Stop
N Are stopping criteria satisfied?
Y
Examinees' latent trait or ability estimation
Figure 2. 5 The procedure of CAT
2.2.1 Starting
In Starting step, the CAT system predetermines the level of examinees’ initial ability and selects first test item for them. Methods for the first item selection are shown as follows:
1. Medium-difficulty item: Selecting a medium-difficulty item as the first one is a commonly applied method because the majority of examinees’ ability is in medium level. It is convenient, but it often ends in overusing the middle-difficulty-level items.
Furthermore, if examinees are at the two ends of θ, this selection method might waste examinees' extra time doing items that are not in their ability range (Wainer, et al., 1990, p.109).
2. Random entry: The CAT system randomly selected an item from the item bank as the first item. This method is good for improving test security because each item has equal chance to be selected. However, in general, the CAT system would like to randomly administrate the candidate items whose difficulty range is from -.5 to .5, that is, middle difficulty.
3. Self-adaptive (SA): Self-adaptive method is a method in which examinees can decide the difficulty level of the first test item depending on their self-expected abilities. Examinees could still choose the difficulty level for the remaining items to be administered during the CAT procedure (Wise, Plake, Johnson, & Roos, 1992).
4. Referring to related data: Another method to the selection of first test item involves reference to related data. Examinees’ achievement, intelligence quotient, or age is possible data to be considered (Vispoel, 1998). This method is precise for examinees’ ability estimation because the difficulty level of the first item is close to examinees’ performance in similar domain or subject. However, collecting the examinees’ related data is difficult and it also increases the cost of testing.
Lord (1977) claimed that if examinees take enough test items, there is no significant difference on examinees’ ability estimation among using different methods to set up examinees’ initial point. In other words, examinees’ ability estimation is stable if enough test items are taken no matter what methods are used.
2.2.2 Continuing
The Continuing step involves the iteration for the examinees’ ability estimation and the following item selection. After examinees answer one item, the CAT system immediately estimates their ability according to the responses. Then, the CAT system administrates the next item according to the present ability estimation. The procedure is repeated until the examinees’ ability estimation is precise enough, that is, the standard error of estimators is small enough.
1. Examinees’ ability estimation: The most often used methods for examinees’
ability estimation are maximum likelihood estimation (MLE) and Bayesian estimation.
The former procedure is easy, but the examinees’ ability estimation can not converge if examinees either answer correctly or fail all items (Hambleton & Swaminathan, 1985, p35). The latter procedure is more complex and less efficient. However, it can solve the problem of divergent estimation because it supposed a prior-information on the distribution of the examinees’ abilities (Ho & Hsu, 1989).
2. Item selection strategies: The common methods for the item selection step include maximum information selection and Bayesian strategies (Baker, 1992, p.209).
The key concept of maximum information selection strategy is that each item revealed different information for examinees’ ability estimation. The relation between the information of an item and the examinees’ abilities could be plotted as an IIC (See Fig.
2.4). The more information the test item reveals, it is to be selected as the next question. This method is to select an item revealing maximum information and which has not yet been taken as the next item (Birnbaum, 1968).
In contrast to maximum information strategy, Bayesian strategy is more
complicated. This method supposes a prior-information on a normal distribution of examinees’ abilities. After examinees answer an item, the posterior distribution of examinees’ abilities and variance are estimated, and then the results become the indices for the selection of the next item. This method is to select an item revealing minimum posterior variance and which has not yet been used as the next item (Owen, 1975).
2.2.3 Stopping
When a CAT is finished, examinees can have completed different number of test items. The constraint conditions for the Stopping step, however, are based on what research purposes are (洪碧霞、吳鐵雄, 1989). Four constraint conditions in which the testing stops are shown as follows:
1. Setting the maximum standard error (SE): Examinees finish a CAT when the SE of the examinees’ ability estimation is less than the set maximum SE. The constraint is often incorporated into Bayesian selection strategy (Wainer, et al., 1990, p.114).
2. All suitable items were used: Examinees finish a CAT when there is no any item that can increase item information. The constraint is often incorporated into maximum information selection strategy.
3. Setting the testing length: Examinees finish a CAT when the testing length is reached. The constraint is easy to implement, and therefore is often incorporated into simulation research. The item exposure rate can be controlled, but the preciseness for examinees’ ability estimation may be unstable (Wainer et al., 1990, p.114).
4 Setting the test time: The CAT is terminated when the test time is finished.
This rule is often used for the purpose of convenience, especially for test administration(洪碧霞、吳鐵雄, 1989). However, it is not fair for the examinees who answer questions slowly.
The advantages of IRT-based CAT (Ho, 1989; Baker, 1992, p.2) are demonstrated in the following three aspects. First, it is an effective and efficient measurement tool for examinees. Examinees can take fewer items and gain a more accurate estimator of ability because of information technology. Second, in terms of group invariance of item parameters, the abilities of examinees in different groups are comparable based on their ability is on the same scale. In other words, it is easier to explain the test results. Third, it is an economical tool for administration because it provides examinees with a standard testing procedure, promotes the efficiency of mass data process and reduces the cost of testing. Owing to the efficiency and effectiveness of the IRT-based CAT, it is unsurprising that a growing number of language testing systems adapting IRT-based CAT procedure are becoming available (Brown, 1997;
Dunkel, 1997; 1999; Madsen & Larson, 1985).