• 沒有找到結果。

On the development of a computer-assisted testing system with genetic test sheet-generating approach

N/A
N/A
Protected

Academic year: 2021

Share "On the development of a computer-assisted testing system with genetic test sheet-generating approach"

Copied!
5
0
0

加載中.... (立即查看全文)

全文

(1)

[11] J. Hu and M. P. Wellman, “Multiagent reinforcement learning: Theoret-ical framework and an algorithm,” in Proc. Int. Conf. Machine Learning, 1998, pp. 242–250.

[12] M. Kaya and R. Alhajj, “Reinforcement learning in multiagent systems: A modular fuzzy approach with internal model capabilities,” in Proc. IEEE Int. Conf. Tools Artificial Intelligence, Nov. 2002, pp. 469–474. [13] M. L. Littman, “Markov games as a framework for multi agent

rein-forcement learning,” in Proc. Int. Conf. Machine Learning, 1994, pp. 157–163.

[14] Y. Nagayuki, S. Ishii, and K. Doya, “Multi-agent reinforcement learning: An approach based on the other agent’s internal model,” in Proc. IEEE Int. Conf. Multiagent Systems, Jul. 2000, pp. 215–221.

[15] P. Stone and M. Veloso, “Multiagent systems: A survey from a machine learning perspective,” Auton. Robotics, vol. 8, no. 3, 2000.

[16] T. W. Sandholm and R. H. Crites, “Multi agent reinforcement learning in the iterated prisoner’s dilemma,” Biosystems, vol. 37, pp. 147–166, 1995.

[17] A. Savasere, E. Omiecinski, and S. Navathe, “An efficient algorithm for mining association rules in large databases,” in Proc. Int. Conf. Very Large Databases, 1995, pp. 432–443.

[18] R. Srikant and R. Agrawal, “Mining quantitative association rules in large relational tables,” in Proc. ACM SIGMODInt. Conf. Management Data, 1996, pp. 1–12.

[19] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduc-tion. Cambridge, MA: MIT Press, 1998.

[20] M. Tan, “Multi-agent reinforcement learning: Independent vs. coopera-tive agents,” in Proc. Int. Conf. Machine Learning, 1993, pp. 330–337. [21] C. J. C. H. Watkins and P. Dayan, “Technical note: Q-learning,” Mach.

Learn., vol. 8, pp. 279–292, 1992.

[22] S. Zhang, C. Zhang, and X. Yan, “Post-mining: Maintenance of asso-ciation rules by weighting,” Inform. Syst., vol. 28, no. 7, pp. 691–707, 2003.

On the Development of a Computer-Assisted Testing System With Genetic Test Sheet-Generating Approach

Gwo-Jen Hwang, Bertrand M. T. Lin, Hsien-Hao Tseng, and Tsung-Liang Lin

Abstract—Since the last decade, computer-assisted testing has proven to

be an efficient and effective way to evaluating students’ learning status such that proper tutoring strategies can be adopted to improve their learning performance. A good test will not only help the instructor evaluate the learning status of the students, but also facilitate the diagnosis of the prob-lems embedded in the students’ learning process. One of the most impor-tant and challenging issues in conducting a good test is the construction of test sheets that can meet various assessment requirements. A previous study has indicated that selecting test items to best fit multiple assessment requirements can be formulated as a mixed integer programming model. The problem is known to be NP-hard in the literature and, hence, compu-tational challenges hinder the development of efficient solution methods. As a sequel, we instead seek quality approximate solutions in a reasonable time. Two approximation methods based upon a genetic approach are de-veloped. Statistics from a series of computational experiments indicate that our approach is able to efficiently generate near-optimal combinations of test items that satisfy the specified requirements or constraints.

Index Terms—Computer-assisted testing, genetic algorithm (GA), mixed

integer programming, test sheet generating.

I. BACKGROUND ANDMOTIVATIONS

In recent years, educators have reported the importance of con-ducting an interactive and personalized tutoring process, which is helpful toward the training of creativity and the improvement of learning performance in children. The need for interactive and per-sonalized tutoring environments has encouraged the development of computer-assisted-instruction (CAI) systems which are able to record the learning status of each student and provide adaptive subject ma-terials and practice drills. Therefore, it is very important to precisely determine the learning status of each student so that proper tutoring strategies can be applied accordingly [10], [17]. A high-quality test is the major criterion for determining the learning status of students.

Computer-based tests have been proven to be more effective and ef-ficient than traditional paper-and-pencil tests due to several reasons: First, the test sheets can be composed dynamically based on the prac-tical requirements; second, more plentiful test items can be presented in multimedia styles; third, the student testing portfolio can be recorded and analyzed to improve their learning performance [5], [15], [19].

The key to a high-quality test not only depends on the quality of test items, but also the way the test sheet is constructed [10], [14]. As the number of test items in an item bank is usually large and the number of feasible combinations to form test sheets thus grows exponentially, it is very difficult to find an optimal test sheet in a timely manner [3], [6], [11], [12]. Such an issue is likely to grow in importance owning to the rapid advent of Internet technologies and the fast growth of

network-Manuscript received June 15, 2003; revised October 28, 2004. This work was supported by the National Science Council of the Republic of China under Con-tracts NSC-91-2520-S-260-003 and NSC92-2524-S-260-002. This paper was recommended by Guest Editor S. H. Rubin.

G.-J. Hwang is with the Department of Information and Learning Tech-nology, National University of Tainan, Tainan, Taiwan 700, R.O.C. (e-mail: gjhwang@mail.nutn.edu.tw).

B. M. T. Lin is with the Department of Information and Finance Manage-ment, Institute of Information ManageManage-ment, National Chiao Tung University, Hsinchu, Taiwan 300, R.O.C.

H.-H. Tseng and T.-L. Lin are with the Department of Information Manage-ment, National Chi Nan University, Pu-Li, Taiwan 545, R.O.C.

Digital Object Identifier 10.1109/TSMCC.2004.843184 1094-6977/$20.00 © 2005 IEEE

(2)

based educational systems and online learning population. Along with the growth of distance learning through the Internet, computer-based assessment systems are also becoming demanding.

Although many computer-assisted testing systems have been pro-posed, few of them have addressed the problem of systematically com-posing test sheets for multiple assessment requirements [2], [17]. Most of the existing systems construct a test sheet by manually or randomly selecting test items from their item banks. Such manual or random test item selecting strategies are inefficient and usually are not able to si-multaneously meet multiple assessment requirements. Some previous investigations attempted to employ a dynamic programming algorithm to find an optimal composition of the test items [11]. As the time com-plexity of the dynamic programming algorithm is exponential in terms of the size of input data, the required execution time will become un-acceptably long if the number of candidate test items is large.

To cope with the increasingly hard situations encountered in devel-oping optimal test sheets, we shall present two mixed integer program-ming models to formulate the problems of finding a set of test items that fit multiple assessment requirements. As the problems are strongly NP-hard, we propose two genetic algorithms [4], [7], [13], [16]–[18] to find quality approximate solutions in acceptable time. Computational experiments will be also presented to study the performances of the proposed algorithms.

II. MIXEDINTEGERPROGRAMMINGMODELS

In an item bank, a subset ofn candidate test items Q1; Q2; . . . ; Qn will be selected for composing a test sheet. In the following subsec-tions, we shall present two models that formulate the test sheet-gen-erating problem under different assessment considerations. The first model was proposed by [12] that is aimed at optimizing the discrimi-nation degree of the generated test sheets with a specified range of as-sessment time and some other multiple constraints. The second model proposed in this paper formulates the optimization of discrimination degree of the generated test sheets with a fixed number of test items as the major constraint.

A. Specified Length of Assessment Time (SLAT) Problem

In the SLAT problem, the major consideration is to confine the length required by the students to answer the selected items. Assume there are n items in the item bank and m concepts are involved in the test. The variables used in the formulated models are defined as follows:

• Decision variablesxi; 1  i  n : xi is 1 if test itemi is selected; 0, otherwise.

• Coefficientti; 1  i  n: Expected time needed for answering itemQi.

• Coefficientdi; 1  i  n: Degree of discrimination of Qi. • Coefficientrij; 1  i  n; 1  j  m: Degree of association

betweenQiand conceptCj.

• Right-hand sidehj; 1  j  m: Lower bound on the expected relevance ofCj.

• Right-hand sidel: Lower bound on the expected time needed for answering the selected items.

• Right-hand sideu: Upper bound on the expected time needed for answering the selected items.

Objective function MaximizeZ = n i=1 dixi n i=1 xi Subject to n i=1 rijxi hj; j = 1; 2; . . . ; m (1) n i=1 tixi l (2) n i=1 tixi u; xi= 0 or 1; i = 1; 2; . . . ; n: (3) In the above formula, binary variablexireflects the decision about whether test itemi is included or not. Constraint set (1) indicates that the selected items must have a total relevance no less than the expected relevance to each concept to be addressed. Constraint sets (2) and (3), respectively, specify the lower and upper limits on the time needed to answer the selected items. In the objective function, ni=1dixiis the total discrimination summing over the selected test items and ni=1xi is the number of test items selected. Therefore, the objective of this model aims to select a subset of test items such that average discrimi-nation is maximized.

B. Fixed Number of Test Items (FNTI) Problem

In the FNTI problem, the number of test items is specified and fixed asq num  n. The variables used in this model are given as follows. • Decision variables:xiis an integer variable that reflects the de-cision about which test item would be selected and designated as questioni; 1  xi n; i = 1; 2; . . . ; q num.

• Right-hand sidehj,1  j  m: lower bound on the expected relevance of conceptCj. Objective function MaximizeZ = q num i=1 dx Subject to q num i=1 rij  h; j = 1; 2; . . . ; m (4) x1 1 (5) xi+1> xi; 1  i  q num 0 1: (6) In the above formula, constraint set (4) indicates the selected test items must have a total relevance that is no less than the expected rele-vance to each concept to be covered. Constraint sets (5) and (6) indicate that no test item can be selected twice or more. In the objective func-tion, q numi=1 dx is the total discrimination summing over the selected test items. Therefore, the objective of this model seeks to select a fixed number of test items such that the total discrimination is maximized.

III. GENETICALGORITHMS FORTESTSHEETGENERATION In this section, we shall propose two genetic algorithms (GAs), con-cept lower-bound first genetic approach (CLFG) and feasible item first genetic approach (FIFG) to find quality approximate solutions for the SLAT and FNTI problems. In CLFG, we shall select a set of test items to meet the lower bound on the expected relevance of each concept first, and then substitute some of the selected test items with the candidate test items to meet the upper bound and lower bound of the expected answering time. In FIFG, we confine the number of test items of the test sheet first, and then substitute some of the selected items with the candidate items to meet the lower bound on the expected relevance of each concept.

A. Concept Lower-Bound First Genetic (CLFG) Approach

To cope with the SLAT problem, we propose the CLFG approach as follows:

(3)

1) Step 1. Create Population (Encode): Let variableS denote the

set of initially generated chromosomes and variableK be the size of the population inS. Chromosome Skis represented as ann-bit binary string[xk1; xk2; . . . ; xkn] consisting of n genes, where xkiis either 1 or 0 indicating that the test item is currently selected or not. An initial set of binary strings, such asX = [0; 0; 1; . . . ; 0], is randomly gener-ated to represent the status of each test item.

2) Step 2. Fitness Ranking: To satisfy the constraints for the lower

bound on the expected relevance of each concept, we define a penalty function to approximate the constraints asR = dc 2 ipt, where R is a penalty score,dc = mj=1maxfhj0 ni=1rijx; 0g is the sum of deviations between the relevance of each concept in the currently selected test items and the corresponding lower bound, and ipt is the penalty weight defined by the instructor.

Moreover, two constraints are needed to specify the penalty values when the total testing time of the selected test items is less than the lower bound or greater than the upper bound. For the selected test items that have a total testing time of less than the lower bound, the penalty function is = w2dtl2ipt l, where w = ni=1Xidi=average(u; l) represents the average discrimination weight of a chromosome,dtl = maxfl 0 n

i=1tixi; 0g and ipt l are the user-defined penalty weight penalizing the violation of lower bound constraint.

For the selected test items that have a total testing time greater than the upper bound, the penalty function is = w 2 dtu 2 ipt u, where w = ni=1Xidi=average(u; l) represents the average dis-crimination weight of chromosomedtu = maxf ni=1tixi0 u; 0g, and ipt u is a user-defined penalty weight for the case where the upper bound constraint is violated. The evaluation function is aggregated from the aforementioned weights and defined as v(Sk) = ( ni=1dixi0 0 0 R)= ni=1xi.

3) Step 3. Selection: The roulette wheel approach is adopted in the

fitness-proportional selection procedure, which selects a new popula-tion with respect to the probability distribupopula-tion based on fitness values. The probability that chromosomeSkis selected and defined aspk = v(Sk)=V , where

V =

pop size+o spring size k=1

v(sk):

4) Step 4. Crossover: The one-cut-point method is used to perform

the “crossover” operation by randomly selecting a cut point and ex-changing the right parts of two parents to generate offsprings. In this application, the value of the crossover rate is 0.2, which was derived from the results of a series of preliminary experiments.

5) Step 5. Mutation: Mutation alters one or more genes with

the mutation rate P = n01. A sequence of real random num-bers y1; y2; . . . ; ynk is then generated with each yi to be a real number in [0, 1]. Ifyi; 1  i  nk is greater than P , then the rth, r = i 0 (di=ne 0 1)n, bit of the di=ne chromosome will be complemented.

Steps 2 to 5 constitute a generation. In our procedure, the whole process iterates generation by generation until either no better solution was attained within the most recent ten generations or 1500 genera-tions have been examined. When the procedure stops, the best solution encountered is reported.

B. Feasible Item First Genetic (FIFG) Approach

To cope with the FNTI Problem, we propose a CLFG approach. The GA differs from the previous one in representation, fitness function, and mutation scheme. Therefore, we introduce these parts only.

1) Step 1. Create Population (Encode): Let variable K be the

number of the chromosomes inS, the initial population, and variable

TABLE I

BRIEFDESCRIPTION OFEACHITEMBANK

Skbe thekth chromosome of S. Chromosome Sk is represented as [xk1; xk2; . . . ; xk;q num], consisting of q num genes, each of which denotes a selected item. A set of integers is randomly generated to represent the test item numbers, for example,X = [25; 908; . . . ; 113]. Note thatxi6= xj for1  i 6= j  q num.

2) Step 2. Fitness Ranking: To satisfy the constraints of the

lower bound on the expected relevance of concept, we define a penalty function R = dc 2 ipt, where R is a penalty score,

dc = m

j=1maxfh 0 ni=1rx j; 0g, which is the sum of distances between the relevance of each concept for the currently selected test items and the corresponding lower bound, and ipt is the user-defined penalty weight. The evaluation function is defined as

v(Sk) = n i=1

dx 0 R:

3) Step 5. Mutation: “Mutation” operation alters one or more genes

with the mutation rateP = n01. A sequence of real random numbers y1; y2;. . . ; yq num2kis then generated withyibeing a real number in [0, 1], fori = 1 to q num 2 k. A random number selected from 1 to n is used to replace the value of ith gene if yi< P .

IV. EXPERIMENTS ANDEVALUATION

To evaluate the performance of the proposed algorithms, two exper-iments have been conducted to compare the execution time and the so-lution quality of four soso-lution-seeking strategies: CLFG, FIFG, random selection, and exhaustive search. The random selection program gen-erates the test sheet by selecting test items randomly to meet the con-straints of time interval or number of test items, while the exhaustive search program examines every feasible combination of the test items to find the optimal solution. Eight item banks of K7 to K9 mathematics courses have been employed in the experiments. Table I shows a brief description of each item bank, where N indicates the total number of test items. The platform of the experiments is a personal computer with a Pentium III 1.0-GHz CPU and 256-MB random-access memory (RAM). The programs are coded in Java Language.

The experiment is conducted by applying CLFG and FIFG twenty times on each item bank with the average execution time and discrimi-nation degree recorded. Tables II–IV show the experimental results for the lower bounds of testing time being 30, 60, and 120 min, respec-tively. It can be seen that for most cases, it is time-consuming to derive optimal solutions. ForN = 30 and l = 60, it takes nearly 3 h (i.e., 187 min) to find an optimal solution. Such a lengthy process is obviously unacceptable. When the values ofN and l increase, it becomes very unlikely to find optimal solutions in reasonable time. This indicates the need for heuristic algorithms to derive approximate solutions at a cer-tain quality level.

It can be seen that test sheets with near-optimal discrimination de-grees can be obtained in a much shorter time by employing CLFG than by the random selection approach. The line charts also show that the

(4)

TABLE II

EXPERIMENTALRESULTS FORl = 30

TABLE III

EXPERIMENTALRESULTS FORl = 60

TABLE IV

EXPERIMENTALRESULTS FORl = 120

results of CLFG are very close to the known optimal solutions. For each case with more than 250 candidate test items, the execution time for finding optimal solutions is more than 1 000 000 min, which is not acceptable, while CLFG can still generate test sheets with a degree of discrimination of greater than 0.9 min.

Figs. 1 and 2 depict the chart concerning the execution time of CLFG and that of finding optimum solutions. When the number of candidate test items exceeds 40, it is almost impossible to find an optimal solution, while CLFG can find near-optimal solutions in a very short time (less than 1 min).

Moreover, the statistics show that CLFG can efficiently select proper test items from an item bank containing two 500-candidate test items. Even when the number of test items in the item bank increases to 4000, the execution time of CLFG is still acceptable (about 25 to 35 s).

It is also interesting to compare the performances of CLFG and FIFG although they are used to solve different problems with dif-ferent GA representations. In Tables V–VII, the experiment results of FIFG and CLFG are given to compare the execution time and discrimination degree of each generated test sheet. In each table, dc = m

j=1maxfhj0 ni=1rijx; 0g is the sum of distance between the relevance of each concept for the currently selected test items and the corresponding lower bound, andq num is the number of test items selected in the generated test sheet.

Fig. 1. Runtimes of CLFG and Optimum forl = 30.

Fig. 2. Runtimes of CLFG and Optimum forl = 60.

TABLE V

EXPERIMENTALRESULTS FORq num = 18

From Tables V–VII, it can be seen that the discrimination degrees reported by FIFG and CLFG are pretty close to each other. Sometimes the discrimination degree of FIFG even transcends CLFG with less time elapsed. The impacts that the number of candidate test items can im-pose on the runtime of FIFG are not significant. That is, as the number of candidate test items increases, FIFG still can demonstrate an impres-sive performance.

V. CONCLUSION ANDFUTUREWORK

In this paper, we have proposed two genetic algorithms: CLFG and FIFG to cope with the test sheet-generating problems. Experimental results show that test sheets with near-optimal discrimination degrees can be obtained in a much shorter time by employing our approaches. The two algorithms have been embedded in a CAI system, Intelli-gent Tutoring, Evaluation, and Diagnosis (ITED-II), to provide a more informative, flexible, and capable tool for the instructors and learners [9]. The testing subsystem of ITED-II accepts assessment requirements and reads the test items from the item bank to generate test sheets. After

(5)

TABLE VI

EXPERIMENTALRESULTS FORq num = 36

TABLE VII

EXPERIMENTALRESULTS FORq num = 72

conducting a test, the test results are transmitted to the tutoring sub-system for arranging adaptive subject materials. The commercial ver-sion of ITED II is funded by an e-learning company and is scheduled for release in November 2005. This version will incorporate the fol-lowing development strategies.

1) The subject materials and item banks are designed to completely match the contents of textbooks for primary schools and junior high schools.

2) Several functions suggested by the primary school and junior high school teachers, including adaptive learning, adaptive testing, personalized learning diagnosis, and guiding, are provided.

3) A client program is delivered to the teachers and students as a low-price compact-disc read-only memory (CD-ROM) bun-dled to the textbooks. The trial CD-ROM contains limited func-tions to demonstrate part of the subject materials, test items, and learning diagnosis functions.

4) The users need to register to the server for accessing advanced functions and complete subject materials, which are charged by month, semester, or year.

Several other AI- or optimization-based technologies, such as Tabu search, Ant systems, and heuristic algorithms, could be maneuvered to develop more efficient test sheet generating approaches for very large item banks. To facilitate possible comparisons between different problem-solving approaches, the test sheet-generating programs and the database schema of the item bank are available from the corre-sponding author upon request.

REFERENCES

[1] C. Chou, “Constructing a computer-assisted testing and evaluation system on the world wide web-the CATES experience,” IEEE Trans. Educ., vol. 43, no. 3, pp. 266–272, Aug. 2000.

[2] J. M. Feldman and J. Jones Jr., “Semiautomatic testing of student soft-ware under Unix(R),” IEEE Trans. Educ., vol. 40, no. 2, pp. 158–161, May 1997.

[3] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. San Francisco, CA: Freedman, 1979.

[4] D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Ma-chine Learning. Reading, MA: Addison-Wesley, 1989.

[5] A. V. Gonzalez and L. R. Ingraham, “Automated exercise progression in simulation-based training,” IEEE Trans. Syst., Man, Cybern., vol. 24, no. 6, pp. 863–874, Jun. 1994.

[6] F. S. Hillier and G. J. Lieberman, Introduction to Operations Research, 7th ed. New York: McGraw-Hill, 2001.

[7] K. Hitomi, Manufacturing Systems Engineering, 2nd ed. London, U.K.: Taylor & Francis.

[8] S. Hopper, “Cooperative learning and computer-based instruction,” Educ. Technol. Res. Develop., vol. 40, no. 3, pp. 21–38, 1996. 1992. [9] G.-J. Hwang, “On the development of a cooperative tutoring

environ-ment on computer networks,” IEEE Trans. Syst., Man, Cybern. Part C, vol. 32, no. 3, pp. 272–278, Aug. 2002.

[10] , “A concept map model for developing intelligent tutoring sys-tems,” Comput. Educ., vol. 40, no. 3, pp. 217–235, 2003.

[11] , “A test sheet generating algorithm for multiple assessment re-quirements,” IEEE Trans. Educ., vol. 46, no. 3, pp. 329–337, Aug. 2003. [12] G. J. Hwang, T. L. Lin, and B. M. T. Lin, “An effective approach to the composition of test sheets from large item banks,” in Proc. 5th Int. Congr. Industrial Applied Mathematics, Sydney, Australia, July 7–11, 2003.

[13] J. T. Linderoth and M. W. P. Savelsbergh, “A computational study of search strategies for mixed integer programming,” INFORMS J. Comput., vol. 11, no. 2, pp. 173–187, 1999.

[14] P. Lira, M. Bronfman, and J. Eyzaguirre, “MULTITEST II: A program for the generation, correction, and analysis of multiple choice tests,” IEEE Trans. Educ., vol. 33, no. 4, pp. 320–325, Nov. 1990.

[15] J. B. Olsen, D. D. Maynes, D. Slawson, and K. Ho, “Comparison and equating of paper-administered, administered, and computer-ized adaptive tests of achievement,” in Proc. Annu. Meet. American Ed-ucational Research Association, California, Apr. 16–20, 1986. [16] H. R. Parsaei, Genetic Algorithms and Engineering Design. New

York: Wiley, 1997.

[17] K. Rasmussen, P. Northrup, and R. Lee, “Implementing web-based instruction,” in Web-Based Instruction, B. H. Khan, Ed. Englewood Cliffs, NJ: Educational Technology, 1997, pp. 341–346.

[18] N. Singh, Systems Approach to Computer-Integrated Design and Man-ufacturing. New York: Wiley, 1996.

[19] H. Wainer, Computerized Adaptive Testing: A Primer. Hillsdale, NJ: Lawrence Erlbaum Associates, 1990.

數據

TABLE II
TABLE VII

參考文獻

相關文件

You are given the wavelength and total energy of a light pulse and asked to find the number of photons it

Reading Task 6: Genre Structure and Language Features. • Now let’s look at how language features (e.g. sentence patterns) are connected to the structure

By correcting for the speed of individual test takers, it is possible to reveal systematic differences between the items in a test, which were modeled by item discrimination and

volume suppressed mass: (TeV) 2 /M P ∼ 10 −4 eV → mm range can be experimentally tested for any number of extra dimensions - Light U(1) gauge bosons: no derivative couplings. =&gt;

incapable to extract any quantities from QCD, nor to tackle the most interesting physics, namely, the spontaneously chiral symmetry breaking and the color confinement.. 

• Formation of massive primordial stars as origin of objects in the early universe. • Supernova explosions might be visible to the most

The difference resulted from the co- existence of two kinds of words in Buddhist scriptures a foreign words in which di- syllabic words are dominant, and most of them are the

• elearning pilot scheme (Four True Light Schools): WIFI construction, iPad procurement, elearning school visit and teacher training, English starts the elearning lesson.. 2012 •