• 沒有找到結果。

Chapter 2. Literature Review

2.5. Virtual Item Bank Theory

Next, this study would discuss literature related to item security. In addition to investigating researches related to parameter control of item exposure by using algorithm, this study would also describe VIB theories that this study used in detail.

When VIB theories were used, item exposure rate and item overlap rate would maintain within a secure range so that item security could be well controlled.

CBT or CAT is administrated by selecting items from item bank. However, test security problems concerning item overexposure will arise when a great number of examinees have participated in the test over time. The test security could be assessed by two key indicators: the item exposure rate and the item overlap rate. Chang (2003) tried to protect test security by randomly selecting items for a more even item distribution, however, no desirable results were seen in this method. Some researchers solely focused on the control of the item exposure rate in the hopes that this problem could be solved. One of the most discussed control method in their study is SH Procedure (Sympson & Hetter procedure) proposed by Sympson and Hetter (1985).

This method was done by using the ability distribution of a group of simulated examinees to control the item exposure rate prior to the test. To achieve better control, the ability distribution of this group should be similar to the real world. To make this happen, different exposure control parameters were used in examinees with different levels of ability.

22

In 2003, Chang proposed SHC (Sympson & Hetter conditional procedure). SHC is a kind of control mode which divides examinees with different levels of ability into different groups, obtains the exposure control parameters of each item in different levels of ability, and combines the parameters into an exposure control matrix as the basis of exposure control in a real test. For fewer examinees with higher and lower ability, the maximum expected exposure parameter should be adjusted higher. On the contrary, for more examinees with medium ability, the maximum expected exposure parameter should be adjusted lower to increase the usage rate of the item (Chen, 2007).

Other methods which can control item exposure rate includes unconditional multinomial (SL) procedure (Stocking & Lewis, 1995), conditional multinomial (SLC) procedure (Stocking and Lewis, 1998), Davey & Parshall procedure (DP, 1995), and SH online procedure with freeze control (SHOF) (Chen, 2005). However, these methods do not take item overlap rate into consideration so that item overlap problems remain.

Based on the argument that item exposure rate and item overlap rate are not independent but interdependent (Chen, 2004), Chen and Lei (2005) developed SHT that controlled both the item exposure rate and the item overlap rate to complement SH.

Like SH, SHT requires pre-simulated exposure parameters as they both have time-consuming and test scenario problems. To solve this problem, Chen, Lei and Liao (2008) extended SHT into SHTO so that the efficiency of controlling item exposure problems can be dramatically enhanced by controlling item exposure rate and item overlap instantly online without having to pre-simulate exposure parameters.

Nevertheless, both SHT and SHTO can only control the item overlap rate between two examinees. In fact, an examinee can obtain test information from more than one person.

Therefore, it is necessary to control the item overlap rate between one prospect examinee and a group of examinees who have already taken the test. To broadly control item overlap rate, Chen (2008) proposed SHGT control method. Similar to SHTO, SHGT can instantly control item exposure rate and overlap rate on line. They differ from each other in that SHTO can only control item overlap rate between two examinees, while SHGT can do so for one prospect examinee and á past examinees (á

1).

23

Although researchers have come up with different ways to control both item exposure rate and item overlap rate, test disclosure remains a problem when there are too many users over time (Chang, 2003). Thus, some researchers use Automatic Item Generation (AIG) technique to generate items. Although it has been proposed for 30 years, AIG has not been used until recently (Irvine & Kyllonen, 2002). There are numerous approaches for generating items using a computer (Millman & Westman, 1989), but they generally require the existence of an item model. An item model (Bejar, 2002; Drasgow et al., 2006) is a general prototypical representation of the items to be generated. Furthermore, each component of an item model can contain both fixed and variable elements (Lai, Alves & Gierl, 2009). Using item model, AIG can generate countless items to solve item exposure rate and overlap rate problems. However, this method cannot be applied in CBT or CAT as it cannot accurately calculate examinee’s ability.

Designing CAT and CBT is challenging as it takes a lot time and resources to create the item bank. According to a study conducted by Chen (2007), only 78 research papers done by PhDs and graduate school students in Taiwan are on tests (10 on traditional CBT, 35 on CAT, 33 on Online Testing). It is even rare to see papers on figural testing. Therefore, it is an important job for researchers to help test editors to design the item bank for figural tests using fewer manpower and resources in a shorter time.

The term Virtual Item Bank was first seen in the research, “Design a Virtual Item Bank Based on Image Processing Technique“ by Liao(2002). Different from traditional item banks, VIBs are not physical lists of items but innumerable items generated via these two tools are described as follow.

24

2.5.1. Virtual Item Bank System

In VIBS, the item database no longer stores large amounts of items; instead, it saves two elements to replace the traditional items:

1. Basic figure elements: This system no longer requires saving a large amount of figural items. Instead, items were built upon three basic figure types: line, circle and multilateral. Not only does this lower the memory space requirements, but it also reduces the probability of item exposure.

2. Solution processes: In VIBS, examinees’ solving processes and abilities were defined by specialists and converted to mathematical formulas which could be manipulated by computers and stored in the hypothetical item database.

The VIB which replaces the traditional item bank is illustrated by the flow chart below.

Figure 2-6. Flow chart of VIB

The VIBS contains three subsystems: item rule definition subsystem, item generation subsystem, and answer retrieval subsystem. Each subsystem has different tasks and functions and is described below.

1. Item rule definition subsystem: This subsystem provides test editors with a

25

position on the system interface and choose the method to process images.

The subsystem then estimates the item difficulty and asks the test editors to adjust the difficulty level. Finally, the item initiation subsystem would save this information into the data. The item generation rules and the items generated by these rules are shown as Figure 2-7:

Addition Rule

Diagonal Rule Oblique Rule

Allocation Change Rule Quantitative Pair Wise Progression Rule

Size Change Rule

Move Rule Straight and Oblique Rule Overlap Rule

Bias Rule Angle Change rule Image operation with no changes Figure 2-7. Item generation rules without binary operations

26

These 12 item generation rules can be combined with 4 binary operations yielding 48 different rules, and the difficulty parameters of these 48 item types were then calibrated.

2. Item generation subsystem: The main idea of this system was to generate all kinds of data in the item generation subsystems in the hope of producing an infinite number of items. It contains three subsystems:

a. Image processing subsystem: This subsystem performed image processing based on the variation demanded for items. In addition to the 2D process described above, there were object size variations, position movements, and other special processing functions.

b. Data retrieval subsystem: This subsystem can retrieve items and make sure there is only one reasonable solution for each produced item.

Therefore, it will delete unreasonable items from appearing in the test to ensure item accuracy.

c. Item shape control subsystem: This function can control the item shape, and generate the answer zone as well as make sure that the answer is the only correct one. It also restrains the item’s body logic from the top to the bottom, from left to right, from right to left, and from the bottom to the top in order to provide a variety of items to examinees.

The features of the item generation subsystem are:

a. Defining the needed abilities and strategies in order to solve the item.

b. Determining the object shape of each item c. Identifying the difficulty parameter

d. Parameter conversion: The system converts the data mentioned above into mathematical formulas, and saves them in the VIB.

e. Automatic generation system: The item generation subsystem can automatically generate items according to the defined strategy, difficulty level, and selection.

27

3. Answer retrieval subsystem: Alternative options of each item were generated by image comparison. First, the RGB value of the figures’ pixel was computed as the characteristic value. Then, the figure characteristic was saved into a 2-dimension matrix, and compared it with figures in the database. The similarities of the two figures were used to calculate the Euclidean distance (as shown in Equation 2-8) of the characteristic value, and the lowest three were selected as the alternative options.

d(Q ,I)= Equation 2-8

The VIBS is composed of those subsystems which control the item shape, item difficulty, answer, and all parameters.

2.5.2. CAT system

The feature of CAT system is to select appropriate items for examinees and evaluate examinees’ ability based on IRT model. In producing items, CAT system is only an application interface, and does not perform image process, item design or retrieval. These tasks are done by VIBS, and the results are sent back to CAT system to administer tests. In terms of ability evaluation, this system uses IRT to process.

Since the system simplifies factors that affect the items, the Rasch model is used in their study. The functions of the editing tool are described as follows.

1. Editing tool: The major functions of this system are to obtain all initial parameters of figural items, and store the item parameters into the VIB. When an examinee takes a test, the VIBS will generate suitable items for examinees.

The following are the system instructions of how to input the elements and functions into VIBs.

(

f

Q

f

I)2

28 Step 1: Deciding the location of figural objects

Figure 2-8. Deciding the location of normal figural objects Step 2: Deciding the image processing operations rules

Figure 2-9. Deciding on the image processing rules

Select the locations of figural objects

Decide the image processing rules

29

Step 3: Choosing the next figural objects and save them into the VIB.

Figure 2-10. Choosing the next figural objects and store them into the VIB.

2. CAT system: The major functions of the CAT system are collecting and evaluating the ability of examinees. The figure above represents the issues of the problem and demands of item bank generation, in addition to the development of research tools. These tools helped test editors to solve the problem of the item exposure rate. A simulation of the item overlap rate be discussed and proved in this research.

Finally in their research, they proposed a new technique called, “VIB”, to address test security problems. This technique integrates AIG, Content Based Image Data Retrieval, item exposure rate control, and item overlap rate control to do so. Using VIB to administrate a test can rule out item exposure and overlap problems. Using VIB can also precisely calculate examinee’s real ability without an error.

Save the rule into VIBs and

generate items

30 Figure 2-11. CAT system

To validate the study, they conducted a test using APM as the material to build a VIB for figural testing and using CAT system to link the VIB. Their research found out the combination rule of APM tests some research on APM and uses image processing operations, such as And, Or, Xor, and Sub to establish these rules. In their research, using image processing techniques helped us to easily and quickly generate items.

To address the technical problems on distracters, the purpose of their research aims to prevent similar distracters that may confuse examinees. They used the content-based image retrieval technique to analyze the similarity of two options. Options with

31

higher level of similarity will be removed by VIB. Likely, items with similar stems will also be taken out by VIB so that the items will make more sense to examinees.

Working with all the above techniques, their research developed research tools that included item rule definition subsystem, item generation subsystem, answer retrieval subsystem and CAT system. Using these tools, test editors can easily build a VIB. This study refers to APM to build the basic element of figural testing and transform the item combination rule of APM into image processing actions to be into the VIB for final test and validation.

The result of their research shows a positive correlation with that of using APM and demonstrates a desirable correlation coefficient(r = 0.683, n = 301, p = .000). The item exposure rate was extremely low with the rate ranging from 0 to 1.0128E-4, while the item overlap rate was 2.43488E-10 which could be excluded from calculation.

Conclusively, when VIB is used, test security is the highest and an examinee’s ability can be correctly calculated.

Above all, the VIB building process proposed in their research are well-acclaimed by both test editors and experts. The researchers can use this technique to build a VIB on all tests. With regards to research tool manipulation, both test experts involved in this study think the tools are easy to use. Using graphic design technique to build objects and rules make it easy to build a VIB. On the test interface, CAT can quickly generate an item. Besides, both test operators and experts have not seen any duplicate items during their study, which means test security was ensured along the way.

32