• 沒有找到結果。

Chapter 2. Literature Review

2.2. Item Response Theory

On the other hand, IRT is a model-based paradigm: it starts with modeling the relationship between latent variable being measured and the item responses. IRT is generally regarded as an improvement over CTT. The name item response theory is due to the focus of the theory on the item, as opposed to the test-level focus of CTT.

Thus IRT models the response of each examinee of a given ability to each item in the test.

IRT model rests on two basic postulates (Hambleton, Rogers, & Swaminathan, 1995): (a) the performance of an examinee on a test item can be predicted or explained by a set of factors called abilities or latent traits, expressed mathematically by θ; and (b) the relationship between examinees’ item performance and the set of trait underlying item performance can be described by a monotonically increasing function called an item characteristic function (ICF). Though some research indicated that CTT methods and the IRT Rasch model are functioning almost identically (Lawson, 1991), most research has shown that IRT provides a more accurate and precise estimate when compared to CTT (Schulz, Kolen, & Nicewander, 1999; Pommerich, 2006; Schulz &

Lee, 2002). The primary advantages of IRT over CTT are: (a) Examinees can be placed on the same scale, and (b) item parameter estimates obtained from different samples are equivalent within sampling fluctuation up to a linear transformation (Lord, 1980).

2.2.1. IRT Assumptions

To achieve useful characteristics described above, IRT models currently in use require stringent assumptions, including unidimensionality, local independence, and nonspeededness, which needed to be sustained before the model could be used to analyze the data (Ackeman, 1989; Hambleton & Swaminathan, 1985).

9

1. Unidimensionality: For IRT model, a unidimensional trait denoted by θ was measured. The trait is further assumed to be measurable on a scale, typically set to a standard scale with a mean of .00 and a standard deviation of 1.00. This assumption was very strict and hard to meet in reality since several cognitive, personalities, and test-taking factors could affect the test performance. Fortunately, research designed to assess the impact of violations of the unidimensionality assumption had suggested that the unidimensional IRT models were relatively robust with respect to moderate violations of strict unidimensionality, and that the most important issue concerned the relative degree to which the item pool was dominated by a single latent trait (Harvey &

Hammer, 1999).

2. Local independence: This assumption means that items are not related except for the fact that they measure the same trait. It meant that abilities specified in the model were the only factors influencing examinees’ response to test items.

3. Nonspeededness: A special case of the preceding assumptions is nonspeededness. That is, examinees who fail to answer test items correctly do so because of limited ability and not because they failed to reach test items (Hambleton &

Swaminathan, 1985). However, in most educational assessment settings, tests were designed to be power test, which meant that even given unlimited time, not every student would get a near perfect score. Since it was unrealistic to provide examinees with unlimited time in educational tests, the time limit still had an effect on examinees performance (Yen, 2010).

2.2.2. One, two, and three parameter logistical IRT model

According to the number of parameter describing the item, IRT model can be classified into three categories: one-parameter logistic (1PL), two-parameter logistic (2PL), and three-parameter logistic (3PL) IRT model.

10

If one wishes to construct a measurement scale with a limited sample size, the 1PL model may be the most appropriate model to use (Lord, 1983). The 1PL, or Rasch (Lord & Novick, 1968) model was one of the simplest IRT models; as its name implies, it assumed that only a single item parameter was required to represent the item response process. In 1PL model, the probability that an examinee with ability θ could answer an item with difficulty b correctly could be mathematically expressed as

, Equation 2-4

, Equation 2-5

where the difficulty parameter, b, indicated how difficult the item is, and D was a scaling factor (usually equal to 1.702) to make the logistic function as close as possible to the normal ogive function (Baker, 1992).Figure 2-1 illustrated an examples of ICC for 1PL IRT model. The x-axis represented the examinee’s latent ability scale and the y-axis represented the probability of answering correctly on an item by the examinee.

Figure 2-1. Example of 1PL ICC with b = 0 P1PL) = 1

1 + exp[−D(θ− b)]

P1PL) = 1

1+ exp[−D(θ− b)]

11

One main potential drawback to the 1PL IRT model was its assumption that all items in the test share identically shaped ICCs, and this would be quite unusual in most applied assessment situations (Yen, 2010). A slightly more complex IRT model is called the 2PL model. According to the 2PL model, an examinee’s response to an item is determined by his/her ability (θ), the item difficulty (b), and the item discrimination (a). The mathematical form of the 2PL model could be written as (Lord, 1980)

. Equation 2-6

The new parameter a allowed an item to discriminate differently among the examinees. Figure 2-2 showed two examples of ICC for 2PL IRT model with b = 0 while a = 0.5 and 1.5 respectively. P2PL(θ) was the probability of an examinee with ability θ could answer an item with difficulty b correctly. In both 1PL and 2PL models, the probability of passing ranged from 0 to 1 as θ went from -∞ to ∞.

Figure 2-2. Two examples of 2PL ICC with b = 0 while a=0.5 and 1.5 respectively P2PL) = 1

1 + exp[−Da(θ− b)]

12

For the 1PL and 2PL models it is a tacit assumption that as examinee ability levels become very low (approaching negative infinity), the probability of a correct response approaches zero. For many assessments, however, this may not be appropriate. For example, on multiple-choice assessments a low-ability examinee may get an item correct simply by guessing. The 3PL model allows for this possibility through the inclusion of a guessing parameter. The resulting 3PL model was

, Equation 2-7

where the c represented the probability that an extremely low ability examinee would get the item correct.

If P3PL(θ) is plotted as a function of ability , the result would be an ICC as shown in Figure 2-3. The greater the value of ability was, the greater possibility of getting item right, and, of course, the more accurate the item parameters was, the more precise the estimation of examinee’s ability.

Figure 2-3. A typical ICC for the 3PL model with b = 0, a = 1, and c = 0.1 P3PL(θ) = c + (1−c)P2PL(θ)

θ

13