In this chapter, the proposed ability estimation is evaluated by a simulation study.
To investigate the characteristics of the proposed method, we first analyze the
conver-gence speed and the error distance between the ground truth and the estimated ability.
Next, an example, which presents the benefits of taking historical data into
considera-tion, is shown. Finally, the proposed ability estimation was compared with other
relat-ed work. The details of the experimental designs are describrelat-ed in the following
subsec-tions.
6.1 Setting
To understand the performance of the proposed method, we conducted a
simula-tion. According to a one-parameter logistic model in Item Response Theory
(Embert-son & Resise, 2000), the probability of correct response is 0.5 when an item difficulty
is equal to an examinee’s ability. In the simulation, we referred to this probability for
the estimation of the variable s. We used a one-parameter logistic model to predict the
probability of a correct response when given the ability and an item, and then
condi-tionally randomly sampled the variable s ~ N(given ability, 0.2).
In each simulation, ten items were generated according to an examinee’s ability at
the time. The distribution of difficulty of these items acts as a normal distribution. For
example, given an examinee’s ability =3, the difficulties of a test are {2, 2, 3, 3, 3, 3,
3, 3, 4, 4}. Ability and difficulty in this study range from one to six, corresponding to
the school grades. In practice, an examinee’s school grade is considered as their initial
ability, and the ability is updated by responses in each test. Thus, the simulation starts
with any grade ranging from one to six in order to simulate different grade students
with various abilities, updates the estimated ability and then terminates 100 iterations
after the convergence point. We found the convergence point and then counted the
Root Mean Square Error (RMSE) during the 100 iterations. The definition of the
con-vergence point is determined by computing the difference between the estimated
abil-ity and the ground truth, and the difference value is continuously four times smaller
than a threshold (thd = 0.25 in the simulation). Each simulation was processed 1000
times. RMSE is used here, which represents the average distance between the ground
truth and the generated results. The smaller RMSE value indicates that the estimated
ability is close to the ground truth. In addition, we also discuss the parameter in
equation (16). The parameter is presented in terms of n time periods and represents the
weight of the observation at the present time. The variable n was set from one to
twelve.
6.2 The characteristics of the proposed ability estimation
Table 7 shows the average convergence points in the number of variable n of
pa-rameter in equation (16) over the degree of difference between the estimated ability
and ground truth, and the results of RMSE during the 100 iterations after the
conver-gence points. It is clear that the proposed method can successfully estimate abilities in
the finite iterations. Specifically, an examinee’s ability can be estimated more precisely
when he or she continues to have more tests. Furthermore, the error distances
be-tween the estimated abilities and the ground truths are low enough to be acceptable
af-ter convergence. That is, an examinee’s ability can be steadily measured during a
long-term observation.
The parameter =2/(n+1) in the equation (16) is an exponential weight of the
current ability, and n represents the number of time periods, such as times or days,
taken into consideration. When n=1, it represents that an examinee’s ability only
con-siders the current estimated ability without the history record. In Table 7, the values in
screentone present that the average convergence points are fewer than the points
gen-erated from n=1. This result shows that the estimated abilities are quickly found and
the error distances decrease when considering the history record. In particular, it is
ap-parent when the initial grade is equal to the ground truth. When n is small (e.g. n=2,
=2/3; n=3, =1/2), the estimated ability is mainly decided by the current ability. The
convergence points are smallest and the RMSE is slightly smaller than one generated
from n=1. In contrast, when n increases, the estimated ability is principally composed
of abilities from the past to now. If an examinee’s initial ability is not close to his or
her actual ability, it takes more information to accurately estimate. Although it takes
time, the RMSE is clearly shrinking.
Table 7 The results of convergence point and RMSE (each row represents the degree of
difference between the initial ability and the actual ability, and each column represents
the number of time periods considered by the exponential weight of the current ability)
d
n
1 2 3 4 5 6 7 8 9 10 11 12
0 20.61 13.88 11.72 11.53 10.98 10.90 10.26 10.52 10.16 10.35 10.18 10.04
1 21.96 16.17 15.74 16.31 17.40 19.07 20.43 22.29 23.98 25.45 26.92 28.42
2 22.91 18.08 18.54 19.91 21.90 24.18 26.64 29.06 31.50 33.53 35.62 38.58
3 23.86 19.67 19.91 21.91 24.59 27.62 30.33 32.90 35.74 38.43 41.52 44.13
4 24.30 20.73 21.52 23.51 26.71 29.68 32.96 36.00 40.19 42.83 45.45 48.65
5 24.50 21.41 22.66 25.22 29.10 31.92 35.97 38.22 42.62 46.40 49.18 53.12
RMSE 0.39 0.32 0.28 0.26 0.24 0.23 0.22 0.21 0.20 0.19 0.19 0.18
Consider a dramatic example to explain the properties of the proposed method.
As-sume that a first grade student, whose real ability is the sixth grade, learns and has a
test in a web-based learning system once a day.
Figure 6 illustrates the changes in the estimated ability computed from the
pro-posed method in different weights. The black horizontal line at the sixth grade
repre-sents the student’s actual ability as the ground truth. The other curves depict the
esti-mated abilities under the different weights: a red dotted line, n=1; a green solid line,
n=3; a purple solid line, n=6; and a blue solid line, n=12. The mark labels on each line
are the convergence points (the value is continuously four times smaller than thd =
0.25). It is clear that the estimated abilities are converging as n decreases in size.
Alt-hough these estimated abilities are estimated using few iterations when n=1, the
red-dotted line drastically fluctuates after the convergence point. In other words, if the
ability estimation only takes the current responses into consideration, instead of past
performance, the variance of every estimated ability may be large. In this situation,
question selection in a test using inaccurate ability estimation could result in confusion
by the examinee. In contrast, the estimated error gradually decreases when n>1, even
though the estimated abilities when n=1 take more time to estimate. In this situation,
the students’ abilities were gradually updated and the difficulties of items
incremental-ly increased. This is thus a trade-off problem between speed and precision.
Figure 6 The changes in the estimated ability computed from the proposed method for
the different weights (n=1, n=3, n=6, n=12)
6.3 The comparison with other ability estimations
To understand the performance of the proposed ability estimation, we compare
our results (n=1 used in this section) to those of MLE (Embertson & Resise, 2000) and
Lee (2012). One of the typical ability estimations in Item Response Theory is MLE in
which the estimated ability is obtained by multiplying the item response function of
each item and finding the highest possibility of which is the maximum likelihood
esti-mate of a student ability using the Newton-Raphson method. Lee (2012) extended
BME in Item Response Theory and proposed a conventional approach to approximate
the posterior distribution of the student’s ability obtained from the subsequent
re-sponses.
Table 8 shows the results of RMSE between the proposed estimation and other
es-timations. Each row represents the degree of simulated student ability, and each
col-umn represents the given difficulty of a test. When the difficulty levels of items were
equal to the abilities of simulated students (shown as in the diagonals of the matrixes),
the estimated results between MLE and Lee (2012) were similar, but these estimated
by the proposed method was more close to the ground truth. With the increase in
dif-ference between the student abilities and item difficulties, it was obvious that the
pro-posed estimation produced more accurate estimated abilities than other estimations.
When questions were more difficult (the upper-right of the matrixes) or easier (the
bottom-left of the matrixes) than the abilities of students, all of these methods failed to
estimate the correct student abilities because the uncertainty among responses was
un-predictable. But the error ranges of the proposed method were mostly within two grade;
by comparison, the error ranges of MLE and Lee’s method were from four to five
grades. This demonstrates that the proposed method is robust especially when a
stu-dent’s ability is unknown. Moreover, note that the proposed method used in the section
did not incorporate historical data during the estimation. It means that the estimated
abilities will be obtained more accurately if both of the current responses and the past
performance are used in the ability estimation, as the previous section shown.
Table 8 The results of RMSE between MLE, Lee (2012) and the proposed ability
esti-mation
s
t 1 2 3 4 5 6 s
t 1 2 3 4 5 6 s
t 1 2 3 4 5 6
1 0.22 1.00 2.01 2.99 4.04 5.13 1 0.21 1.01 2.04 3.05 4.13 5.17 1 0.13 0.52 1.04 1.51 1.95 2.18 2 1.00 0.23 1.00 2.02 3.03 4.04 2 1.01 0.22 1.01 2.05 3.11 4.15 2 0.51 0.13 0.52 1.04 1.54 1.95 3 2.00 0.99 0.22 1.01 2.01 3.03 3 2.03 1.00 0.21 1.02 2.04 3.11 3 1.03 0.52 0.13 0.53 1.03 1.53 4 2.96 1.99 1.00 0.23 1.03 2.01 4 3.05 2.02 1.01 0.22 1.04 2.05 4 1.50 1.02 0.51 0.13 0.53 1.03 5 3.98 3.01 1.98 1.00 0.24 1.01 5 4.09 3.07 2.01 1.01 0.23 1.02 5 1.93 1.53 1.01 0.52 0.13 0.52 6 4.91 3.93 2.98 2.00 1.00 0.23 6 4.74 3.78 2.84 1.87 0.89 0.11 6 2.16 1.92 1.51 1.03 0.52 0.13
MLE Lee (2012) The proposed method