• 沒有找到結果。

Simulation on ability estimation

In this chapter, the proposed ability estimation is evaluated by a simulation study.

To investigate the characteristics of the proposed method, we first analyze the

conver-gence speed and the error distance between the ground truth and the estimated ability.

Next, an example, which presents the benefits of taking historical data into

considera-tion, is shown. Finally, the proposed ability estimation was compared with other

relat-ed work. The details of the experimental designs are describrelat-ed in the following

subsec-tions.

6.1 Setting

To understand the performance of the proposed method, we conducted a

simula-tion. According to a one-parameter logistic model in Item Response Theory

(Embert-son & Resise, 2000), the probability of correct response is 0.5 when an item difficulty

is equal to an examinee’s ability. In the simulation, we referred to this probability for

the estimation of the variable s. We used a one-parameter logistic model to predict the

probability of a correct response when given the ability and an item, and then

condi-tionally randomly sampled the variable s ~ N(given ability, 0.2).

In each simulation, ten items were generated according to an examinee’s ability at

the time. The distribution of difficulty of these items acts as a normal distribution. For

example, given an examinee’s ability =3, the difficulties of a test are {2, 2, 3, 3, 3, 3,

3, 3, 4, 4}. Ability and difficulty in this study range from one to six, corresponding to

the school grades. In practice, an examinee’s school grade is considered as their initial

ability, and the ability is updated by responses in each test. Thus, the simulation starts

with any grade ranging from one to six in order to simulate different grade students

with various abilities, updates the estimated ability and then terminates 100 iterations

after the convergence point. We found the convergence point and then counted the

Root Mean Square Error (RMSE) during the 100 iterations. The definition of the

con-vergence point is determined by computing the difference between the estimated

abil-ity and the ground truth, and the difference value is continuously four times smaller

than a threshold (thd = 0.25 in the simulation). Each simulation was processed 1000

times. RMSE is used here, which represents the average distance between the ground

truth and the generated results. The smaller RMSE value indicates that the estimated

ability is close to the ground truth. In addition, we also discuss the parameter  in

equation (16). The parameter is presented in terms of n time periods and represents the

weight of the observation at the present time. The variable n was set from one to

twelve.

6.2 The characteristics of the proposed ability estimation

Table 7 shows the average convergence points in the number of variable n of

pa-rameter  in equation (16) over the degree of difference between the estimated ability

and ground truth, and the results of RMSE during the 100 iterations after the

conver-gence points. It is clear that the proposed method can successfully estimate abilities in

the finite iterations. Specifically, an examinee’s ability can be estimated more precisely

when he or she continues to have more tests. Furthermore, the error distances

be-tween the estimated abilities and the ground truths are low enough to be acceptable

af-ter convergence. That is, an examinee’s ability can be steadily measured during a

long-term observation.

The parameter  =2/(n+1) in the equation (16) is an exponential weight of the

current ability, and n represents the number of time periods, such as times or days,

taken into consideration. When n=1, it represents that an examinee’s ability only

con-siders the current estimated ability without the history record. In Table 7, the values in

screentone present that the average convergence points are fewer than the points

gen-erated from n=1. This result shows that the estimated abilities are quickly found and

the error distances decrease when considering the history record. In particular, it is

ap-parent when the initial grade is equal to the ground truth. When n is small (e.g. n=2, 

=2/3; n=3,  =1/2), the estimated ability is mainly decided by the current ability. The

convergence points are smallest and the RMSE is slightly smaller than one generated

from n=1. In contrast, when n increases, the estimated ability is principally composed

of abilities from the past to now. If an examinee’s initial ability is not close to his or

her actual ability, it takes more information to accurately estimate. Although it takes

time, the RMSE is clearly shrinking.

Table 7 The results of convergence point and RMSE (each row represents the degree of

difference between the initial ability and the actual ability, and each column represents

the number of time periods considered by the exponential weight of the current ability)

d

n

1 2 3 4 5 6 7 8 9 10 11 12

0 20.61 13.88 11.72 11.53 10.98 10.90 10.26 10.52 10.16 10.35 10.18 10.04

1 21.96 16.17 15.74 16.31 17.40 19.07 20.43 22.29 23.98 25.45 26.92 28.42

2 22.91 18.08 18.54 19.91 21.90 24.18 26.64 29.06 31.50 33.53 35.62 38.58

3 23.86 19.67 19.91 21.91 24.59 27.62 30.33 32.90 35.74 38.43 41.52 44.13

4 24.30 20.73 21.52 23.51 26.71 29.68 32.96 36.00 40.19 42.83 45.45 48.65

5 24.50 21.41 22.66 25.22 29.10 31.92 35.97 38.22 42.62 46.40 49.18 53.12

RMSE 0.39 0.32 0.28 0.26 0.24 0.23 0.22 0.21 0.20 0.19 0.19 0.18

Consider a dramatic example to explain the properties of the proposed method.

As-sume that a first grade student, whose real ability is the sixth grade, learns and has a

test in a web-based learning system once a day.

Figure 6 illustrates the changes in the estimated ability computed from the

pro-posed method in different weights. The black horizontal line at the sixth grade

repre-sents the student’s actual ability as the ground truth. The other curves depict the

esti-mated abilities under the different weights: a red dotted line, n=1; a green solid line,

n=3; a purple solid line, n=6; and a blue solid line, n=12. The mark labels on each line

are the convergence points (the value is continuously four times smaller than thd =

0.25). It is clear that the estimated abilities are converging as n decreases in size.

Alt-hough these estimated abilities are estimated using few iterations when n=1, the

red-dotted line drastically fluctuates after the convergence point. In other words, if the

ability estimation only takes the current responses into consideration, instead of past

performance, the variance of every estimated ability may be large. In this situation,

question selection in a test using inaccurate ability estimation could result in confusion

by the examinee. In contrast, the estimated error gradually decreases when n>1, even

though the estimated abilities when n=1 take more time to estimate. In this situation,

the students’ abilities were gradually updated and the difficulties of items

incremental-ly increased. This is thus a trade-off problem between speed and precision.

Figure 6 The changes in the estimated ability computed from the proposed method for

the different weights (n=1, n=3, n=6, n=12)

6.3 The comparison with other ability estimations

To understand the performance of the proposed ability estimation, we compare

our results (n=1 used in this section) to those of MLE (Embertson & Resise, 2000) and

Lee (2012). One of the typical ability estimations in Item Response Theory is MLE in

which the estimated ability is obtained by multiplying the item response function of

each item and finding the highest possibility of which is the maximum likelihood

esti-mate of a student ability using the Newton-Raphson method. Lee (2012) extended

BME in Item Response Theory and proposed a conventional approach to approximate

the posterior distribution of the student’s ability obtained from the subsequent

re-sponses.

Table 8 shows the results of RMSE between the proposed estimation and other

es-timations. Each row represents the degree of simulated student ability, and each

col-umn represents the given difficulty of a test. When the difficulty levels of items were

equal to the abilities of simulated students (shown as in the diagonals of the matrixes),

the estimated results between MLE and Lee (2012) were similar, but these estimated

by the proposed method was more close to the ground truth. With the increase in

dif-ference between the student abilities and item difficulties, it was obvious that the

pro-posed estimation produced more accurate estimated abilities than other estimations.

When questions were more difficult (the upper-right of the matrixes) or easier (the

bottom-left of the matrixes) than the abilities of students, all of these methods failed to

estimate the correct student abilities because the uncertainty among responses was

un-predictable. But the error ranges of the proposed method were mostly within two grade;

by comparison, the error ranges of MLE and Lee’s method were from four to five

grades. This demonstrates that the proposed method is robust especially when a

stu-dent’s ability is unknown. Moreover, note that the proposed method used in the section

did not incorporate historical data during the estimation. It means that the estimated

abilities will be obtained more accurately if both of the current responses and the past

performance are used in the ability estimation, as the previous section shown.

Table 8 The results of RMSE between MLE, Lee (2012) and the proposed ability

esti-mation

s

t 1 2 3 4 5 6 s

t 1 2 3 4 5 6 s

t 1 2 3 4 5 6

1 0.22 1.00 2.01 2.99 4.04 5.13 1 0.21 1.01 2.04 3.05 4.13 5.17 1 0.13 0.52 1.04 1.51 1.95 2.18 2 1.00 0.23 1.00 2.02 3.03 4.04 2 1.01 0.22 1.01 2.05 3.11 4.15 2 0.51 0.13 0.52 1.04 1.54 1.95 3 2.00 0.99 0.22 1.01 2.01 3.03 3 2.03 1.00 0.21 1.02 2.04 3.11 3 1.03 0.52 0.13 0.53 1.03 1.53 4 2.96 1.99 1.00 0.23 1.03 2.01 4 3.05 2.02 1.01 0.22 1.04 2.05 4 1.50 1.02 0.51 0.13 0.53 1.03 5 3.98 3.01 1.98 1.00 0.24 1.01 5 4.09 3.07 2.01 1.01 0.23 1.02 5 1.93 1.53 1.01 0.52 0.13 0.52 6 4.91 3.93 2.98 2.00 1.00 0.23 6 4.74 3.78 2.84 1.87 0.89 0.11 6 2.16 1.92 1.51 1.03 0.52 0.13

MLE Lee (2012) The proposed method

相關文件