子計劃二：網路化創造性學習環境之可行性研究 (III)

(1)

行政院國家科學委員會補助專題研究計畫成果報告

※※※※※※※※※※※※※※※※※※※※※※※※※

※ ※

※ 中學生資訊科技之網路學習與評量系統之研究 ※

※子計畫二:網路化創造性學習環境之可行性研究(3/3)※

※

※※※※※※※※※※※※※※※※※※※※※※※※※

計畫類別：□個別型計畫整合型計畫

計畫編號：NSC － 89 － 2520 － S － 009 － 011

執行期間： 89 年 8 月 1 日至 90 年 7 月 31 日

計畫主持人：林盈達教授

共同主持人：

本成果報告包括以下應繳交之附件：

□赴國外出差或研習心得報告一份

□赴大陸地區出差或研習心得報告一份

□出席國際學術會議心得報告及發表之論文各一份

□國際合作研究計畫國外研究報告書一份

執行單位：交通大學資訊科學系

中

華

民國

90 年

9 月

15 日

(2)

行政院國家科學委員會專題研究計畫成果報告

中學生資訊科技之網路學習與評量系統之研究子計劃二：

網路化創造性學習環境之可行性研究(3/3)

計劃編號：NSC 89-2520-S-009-011

執行期限：89 年 8 月 1 日至 90 年 7 年 31 日

主持人：林盈達教授

交通大學資訊科學系

一、中文摘要 本論文提出一個新穎的網路電腦輔助測驗系統 , DIYexamer (Do-It-Yourself Examer)。其三項特色使得它和現有的系統有著明顯的差別，分別為學生自創試題、題庫共享以及自動評鑑機制。 DIYexamer 接受由老師和學生所提供的試題，同時也和其他機構所使用的 DIYexamer 共享題庫。我們採用一個可以動態評估題庫內試題鑑別度的演算法，藉此將不合格的試題由題庫中刪除。如此不但保證題庫中試題的品質，也可以有效的增加題庫內的試題數量。關鍵詞：電腦輔助測驗系統，試題評量，試題取得、鑑別度、遠距學習。 Abstr act

This paper presents a novel network CAT system, DIYexamer (Do-It-Yourself

Examer). It has three features that

differentiate it from existing CAT systems: student DIY items, item-bank sharing, and

automatic assessment of item

discriminability. DIYexamer accepts test items contributed form teachers as well as students, and allows limited item sharing between item-banks possibly maintained by different organizations. An algorithm is

applied dynamically to assess the

discriminability of items in item-banks in

order to filter out less qualified

contributions, hereby assuring the quality of stored items while scaling up the size of

item-banks.

Keywor ds: Computer Assisted Testing, Test

Evaluation, Test Item Acquisition,

Discriminability, Distant

Learning

二、緣由與目的

Computer-assisted Testing (CAT) or Computer-based Testing (CBT), the use of computers for testing purposes, has a history spanning more than twenty years. The

documented advantages of computer

administered testing include reductions of testing time, an increase in test security, provision of instant scoring, and an individualized adaptive testing environment [1][2][3][4]. Three categories of CAT are currently employed: standalone packages,

test centers and networked systems.

Regardless of which CAT system is employed, a critical issue in developing CAT is the construction of a test item-bank. Traditionally, asking teachers and content experts to submit items generates the item-bank. Three major drawbacks of the traditional method can be observed:

1) Limitation of item amount: Teachers and content experts tend to have similar views on the test subject. That is, in a given field vital subject matter might be confined. Therefore, although more teachers and content experts are invited to contribute test items, the total number of distinct items remains low.

2) Passive learning attitude: Students are

conventionally excluded from the

(3)

computer-assisted testing system, teachers generate tests, the system presents test sheets and students then complete the tests. That is, they play a passive role within the testing system, and are not afforded the opportunity to

conduct “meta-learning” or

“meta-analysis.”

3) No guarantee on item quality: Permitting students to generate tests may be a possible solution to the aforementioned problems. However, this raises a new problem: quality assurance and ensuring that the tests are worth storing and used for further tests. Even when the whole item-bank is contributed by teachers and content experts, ways to dynamically assess and filter test items are needed.

1 The Diyexamer Solution

The DIYexamer[5] provides a web interface for users to remotely control and operate the system. Three kinds of users are supported: administrators, teachers, and students. It allows students to contribute test items, and provides an effective means of verifying the discriminability of these items. Three main ideas are introduced below: 1) Item DIY by students: DIYexamer allows

students to generate test items into the item-banks online as Fig 1, while teachers can query these items generated by students. In addition to rapidly increasing the total number of items in an item-bank, this feature also encourages

students to develop meta-learning, i.e.

creative learning. In order to submit tests,

students must thoroughly study the learning materials, develop higher-level overviews of the materials, and practice cognitive and creative thinking.

2) Assessment of item discriminability:

DIYexamer provides an

item-discriminability assessment method to ensure the quality of the stored items. In addition to ensuring the internal consistency of existing test items, this

method also continuously and

dynamically screens additional new items

in the item-bank.

Fig 1: Students generate items into the item-bank

3) Item-bank sharing: DIYexamer, a scalable multi-server system, connects many item-banks stored in different servers. Therefore, via the Internet, more items can be accessed and shared. The sharing is limited and controlled in a sense that a server issues a request, describing the criteria of a test item it requests, to another server.

Additional advantages have been identified and include the facts that since DIYexamer provides a real-time on-demand generation of test-sheet function, cheating is avoided. Also, DIYexamer provides an item cross-analysis function to which the degree of difficulty for each test as well as the entire test base can be accurately measured.

2 Discr iminability Assessment Of Diyexamer

When selecting sample students, only those whose scores have large gap with the average score should be considered. Accordingly, those with the top 30%, in terms of range, scores are defined as “high-score group (H’)”, while those with the bottom 30% scores are defined as “low-score group (L’)”.

To show the different criteria and effects of choosing samples in the traditional method and DIYexamer method, Fig.2 depicts the score distribution in a test. In this example, the highest score is 92, the lowest

(4)

score is 34, and the average score is 69. The “high rank score group” and the “low rank score group” are chosen according to these two methods. Take student X as an example, the score of X is 66, which differs only 3 points from the average score. The associated information of X should have little, if not none, referential value in computing item discriminability. However, X is chosen as a sample in the high rank group in the traditional method. This fallacy results from using rank group, in terms of count, as the criterion of choosing samples. In DIYexamer, X is not chosen since score group, in terms of range, rather than rank group is used. Only those with large gap with the average score are chosen as samples. 20 30 40 50 60 70 80 90 100 3 4 M I N X L AVG6 9 M AX9 2 H L ’ H ’ 6 6

Fig 2: Comparison of samples taken in the traditional method and DIYexamer method For different samples that have different impacts on discriminability, a referential value with respect to an item is generated for each student selected as a

sample. We first define the item

discriminability as the average of all associated referential values,

Discriminability = Sum of the referential values of sampled students

Number of sampled students

Since the referential values depend on students’ scores, the referential values are computed according to the ratio of correct and incorrect answers of the sampled students. The ratios of correct and incorrect answers are defined as,

Ratio of correct answer = Number of items answered correctly

Number of items on the test

Ratio of incorrect answer = Number of items answered incorrectly

Number of items on the test According to Table 1, the referential

value of a student correctly answered an item is the ratio of correct answer of the student. Alternately, the referential value of a student incorrectly answered an item is the ratio of incorrect answer of the student. This policy comes from the fact that an item should have increased discriminability if correctly answered by a competent student, while rendering decreased discriminability if correctly answered by a less competent student. In this way, a competent student contributes large referential value to a

correctly answered item and small

referential value to an incorrectly answered item, and vice versa.

三、結果與討論

1 Evaluation Of The Discr iminability Assessment In Diyexamer

The fairness and performance of DIYexamer was evaluated. We conducted an experiment where 10 students took the test on-line using DIYexamer with 10 items. Discriminability for each item is computed using both the traditional method and the

DIYexamer method. However, the

discriminability originally falls between -1 to 1 using the traditional method, while falling between 0 to 1 using the DIYexamer method. To compare these two methods, both two ranges of discriminability are normalized from 0 to 10, as shown in Fig 3.

0 2 4 6 8 1 0 1 2 I te m Discriminability D I Y E x a m e r 3 .6 6 .4 6 .4 6 8 8 8 7 .2 8 8 T r a d itio n a l 5 5 5 6 .7 8 .3 3 1 0 1 0 6 .7 1 0 1 0 1 2 3 4 5 6 7 8 9 1 0

Fig 3: Comparison of item discriminability

(5)

Student Answer Item discr iminability

Refer ential value to compute discr iminability

Correct High Ratio of correct answer Competent

(With high ratio of correct

answer) Incorrect Low Ratio of incorrect answer Correct Low Ratio of correct answer Less competent

(With low ratio of correct

answer) Incorrect High Ratio of incorrect answer

2 Conclusion

This paper has presented a novel architecture for a networked CAT system, DIYexamer. It supports item DIY by students, item-bank sharing, and item discriminability assessment.

For discriminability assessment, new calculation formulas were proposed. When compared with the traditional assessment scheme, the main difference is that the top

and the bottom 30% of the score group, in

terms of range of scores were selected rather

than the rank group, in terms of count of

students. Thus, item discriminability is more

accurately reflected particularly when the tested students have close scores.

Item-bank sharing and item DIY by

students has increased both the amount and

the variety of questions in item-banks. Item

DIY by students promotes creative learning

within students, while automatic

discriminability assessment assures better quality than traditional CAT systems.

四、計畫成果自評

A questionnaire was used to survey subjective attitudes of students about DIYexamer and the outcome revealed that most students were interested in item DIY. The technique proposed herein is useful in general tuition not only to improve the

quality of test items and fairness; but also to save time from generating questions and computing scores. We recommend that DIYexamer be popularized to schools.

五、參考文獻

[1] C. V. Bunderson, D. K. Inouye, and J. B.

Olsen, “The four generations of

computerized educational

measurement,” in Educational

measurement (3rd ed.), R. L. Linn, Ed.

New York: American Council on Education— Macmillan, pp 367-407, (1989).

[2] S. L.Wise and B. S. Plake, “Research on the effects of administering tests via computers,” Educational Measurement: Issues and Practice, vol. 8, no. 3, pp. 5-10, (1989).

[3] A. C. Bugbee, Examination on Demand: Findings in Ten Years of Testing by Computers 1982-1991. Edina, MN: TRO Learning, (1992).

[4] Load, F. M., Applications of Item Response Theory to Practical Testing

Problems. Erlbaum, Hillsdale,

NJ ,(1980).

[5] “DIYexamer system”,

http://speed.cis.nctu.edu.tw/~diy

子計劃二：網路化創造性學習環境之可行性研究 (III)

行政院國家科學委員會補助專題研究計畫成果報告

※※※※※※※※※※※※※※※※※※※※※※※※※

※ ※

※ 中學生資訊科技之網路學習與評量系統之研究 ※

※子計畫二:網路化創造性學習環境之可行性研究(3/3)※

※

※

※※※※※※※※※※※※※※※※※※※※※※※※※

計畫類別：□個別型計畫 整合型計畫

計畫編號：NSC － 89 － 2520 － S － 009 － 011

執行期間： 89 年 8 月 1 日至 90 年 7 月 31 日

計畫主持人： 林盈達 教授

共同主持人：

本成果報告包括以下應繳交之附件：

□赴國外出差或研習心得報告一份

□赴大陸地區出差或研習心得報告一份

□出席國際學術會議心得報告及發表之論文各一份

□國際合作研究計畫國外研究報告書一份

執行單位： 交通大學資訊科學系

中

華

民 國

90

年

9

月

15

日

行政院國家科學委員會專題研究計畫成果報告

中學生資訊科技之網路學習與評量系統之研究 子計劃二：

網路化創造性學習環境之可行性研究(3/3)

計劃編號：NSC 89-2520-S-009-011

執行期限：89 年 8 月 1 日至 90 年 7 年 31 日

主持人：林盈達 教授

交通大學資訊科學系

計畫類別：□個別型計畫整合型計畫

計畫主持人：林盈達教授

執行單位：交通大學資訊科學系

民國

中學生資訊科技之網路學習與評量系統之研究子計劃二：

主持人：林盈達教授