Latent Trait Theory, or Item Response Theory, is another widely used approach
in language testing. LTT assumes that two variables decide an individual’s
performance on a test: the ability of a test-taker and the characteristic of items.
Underlying LTT is the belief that the performance of subjects on a test can be
explained or predicted by estimating their abilities, or traits as embodied in the test
score. Traits are often referred to as latent traits as they are abstract, unobservable
traits which are not directly measurable (Hambleton & Swaminathan, 1984). As
opposed to CTT, LTT is capable of producing indices that are comparable across
different tests and groups of subjects. Hambleton and Swaminathan (1984) pinpointed
the three key advantages of LTT:
(1) Item parameter estimates are independent of the group of examinees used.
(2) Test taker ability estimates are independent of the particular set of test items used.
(3) Precision of ability estimates are known.
The first two advantages are known as invariance, the major distinctive feature
distinguishing LTT from CTT. Invariance indicates that the ability distribution of
test-takers does not affect item characteristics and that item characteristics do not
affect the ability distribution of test-takers. Even if a test is administrated to two
groups of test-takers with different ability, the resulting item characteristics will still
be identical.
Compared to CTT, LTT is established on stronger assumptions. One key
assumption, unidimensionality, indicates that a single trait suffices to explain the test
performance of a subject. That is, there would be only one single dominant trait that
can account for the score pattern and ranking of the test takers. If a test requires more
than two dominant traits to explain the performance, it is called a multidimensional
model, which lacks sufficient empirical support (Davies et al., 1999; Hambleton &
Swaminathan, 1984).
Another central assumption is local independence, which assumes that “when
abilities influencing test performance are held constant, examinees’ responses to any
pair of items are statistically independent (Hambleton et al., 1991).” If local
independence is achieved, a test taker’s responses to different items will exhibit no
relationship. This condition can be achieved when traits other than the ability to be
measured are ruled out.
There is an item characteristic curve (ICC) for each test item which showcases
the relationship between the latent trait and the probability of answering correctly.
There can be an infinite number of LTT models with different combination of
parameters. Among the most commonly used model are one-parameter logistic model,
two-parameter logistic model, and three-parameter logistic model.
One parameter logistic model, also known as the Rasch model, takes into
account one sole parameter: item difficulty, which denotes bi. The bi parameter
resides on the ability scale where the chances of answering correctly stand at 0.5. As
Figure 1 indicates, the nearer the curve is to the right end of the ability scale, the more
difficult the item is. To have a 50% chance of answering item 2 correctly, a test-takers
needs to have a bi higher than 2, while test-takers only need a bi higher than -1 to
answer item 3 correctly. The only difference between curves in the one parameter
logistic model is their location on the ability scale. This model requires at least 100
participants and about 25 items to produce reliable results.
Figure 3.
An Example of OneAs for the two-parameter logistic model, it takes item discrimination into account
along with item difficulty. The item discrimination parameter, denoting
shown by how steep the slope is. An item
curve than items with lower discrimination. The acceptable range of
(0, 2). The higher the value is, the more
Figure 4, ICCs presented in the two
location as well as slopes. While item 1 and item 2 are the most difficult ones, item 4
possesses better discriminating power since it has the steepest slope among the four
curves. Alderson et al (1995) proposed that it takes a minimum of 200 subjects to get answer item 3 correctly. The only difference between curves in the one parameter
logistic model is their location on the ability scale. This model requires at least 100
participants and about 25 items to produce reliable results.
An Example of One-Parameter Logistic Model. Adopted from Hambleton, 1991
parameter logistic model, it takes item discrimination into account
along with item difficulty. The item discrimination parameter, denoting
shown by how steep the slope is. An item possessing high discrimination has a steeper
than items with lower discrimination. The acceptable range of ai
(0, 2). The higher the value is, the more discriminative the item is. As indicated by
Figure 4, ICCs presented in the two-parameter logistic model differ in terms of their
as slopes. While item 1 and item 2 are the most difficult ones, item 4
possesses better discriminating power since it has the steepest slope among the four
curves. Alderson et al (1995) proposed that it takes a minimum of 200 subjects to get answer item 3 correctly. The only difference between curves in the one parameter
logistic model is their location on the ability scale. This model requires at least 100
Parameter Logistic Model. Adopted from Hambleton, 1991
parameter logistic model, it takes item discrimination into account
along with item difficulty. The item discrimination parameter, denoting ai, can be
high discrimination has a steeper
remains within
ve the item is. As indicated by
parameter logistic model differ in terms of their
as slopes. While item 1 and item 2 are the most difficult ones, item 4
possesses better discriminating power since it has the steepest slope among the four
curves. Alderson et al (1995) proposed that it takes a minimum of 200 subjects to get
Figure 4.
An Example of TwoFigure 5. An Example of Three
the two parameter model running while McNamara (1996) suggested that 500
subjects and 20 items are required.
The three-parameter logistic model includes an additional parameter
pseudo-chance-level parameter. It allows the observation on the lower end of the
ability scale, providing valuable information on tests using
An Example of Two-Parameter Logistic Model. Adopted from Hambleton, 1991
An Example of Three-Parameter Logistic Model. Adopted from Hambleton, 1991
running while McNamara (1996) suggested that 500
20 items are required.
parameter logistic model includes an additional parameter
level parameter. It allows the observation on the lower end of the
ability scale, providing valuable information on tests using selective response, Parameter Logistic Model. Adopted from Hambleton, 1991
Parameter Logistic Model. Adopted from Hambleton, 1991
running while McNamara (1996) suggested that 500
parameter logistic model includes an additional parameter ---
level parameter. It allows the observation on the lower end of the
response, like
multiple choice questions. Figure 5 is an example of the three parameter model. As
shown in Figure 3, the lower asymptote of each curve is no longer zero. Item 3, with
its guessing value at 0.25, is more susceptible to guessing in comparison to the other
items with lower guessing values. Since the 3PL model involves three parameters, it
requires a minimum of 1000 participants and 60 items.
Aside from providing information on test items and latent traits, Rasch analysis
can single out problematic test items and subjects, which is done through estimation
of goodness of fit, namely, fit analysis. When fit analysis is conducted, the latent trait
of examinees is calculated, and then predictions of an examinee’s performance are
made based on the calculation. If the predictions match the observed behavior, then
goodness of fit is attained. If there are discrepancies between the two, misfit items or
person will be located.
Misfit items suggest two possible situations. On one hand, they might be poorly
written items with low discriminating power, which need further revision. On the
other hand, those items might not be assessing the one latent trait the whole test
claims to measure. Such information enables test developers to identify, revise, or
exclude poor items and therefore refine a test. Likewise, poor person fit statistics
imply that the test does not accurately estimated the ability of the misfit person. It
might happen when a person randomly guesses through the whole test or when a test
proves a poor measurement for a large population of examinees (Davies et al., 1999;
McNamara & Candlin, 1996).