• 沒有找到結果。

寬頻網際網路服務品質保證---子計畫VI:整合服務與差別服務網路之相容運作技術(II)

N/A
N/A
Protected

Academic year: 2021

Share "寬頻網際網路服務品質保證---子計畫VI:整合服務與差別服務網路之相容運作技術(II)"

Copied!
4
0
0

加載中.... (立即查看全文)

全文

(1)

1

行政院國家科學委員會專題研究計畫成果報告

中學生資訊科技之網路學習與評量系統之研究 子計劃二:

網路化創造性學習環境之可行性研究(2/3)

DIYexamer : A Web-based Multi-Server Testing System with

Dynamic Test Item Acquisition and Discriminability Assessment

計劃編號:NSC 89-2520-S-009-006

執行期限:88 年 8 月 1 日至 89 年 7 年 31 日

主持人:林盈達 教授  交通大學資訊科學系

Abstract

This paper presents a novel network CAT system, DIYexamer (Do-It-Yourself Examer). It has three features that differentiate it from existing CAT systems: student DIY items, item-bank sharing, and automatic assessment of item discriminability. DIYexamer accepts test items contributed form teachers as well as students, and allows limited item sharing between item-banks possibly maintained by different organizations. An algorithm is applied dynamically to assess the discriminability of items in item-banks in order to filter out less qualified contributions, hereby assuring the quality of stored items while scaling up the size of item-banks.

Keywords : Computer Assisted Testing, Test

Evaluation,Test Acquisition, Discriminability, Distant Learning

1 Intorduction

Computer-assisted Testing (CAT) or Computer-based Testing (CBT), the use of computers for testing purposes, has a history spanning more than twenty years. The documented advantages of computer administered testing include reductions of testing time, an increase in test security, provision of instant scoring, and an individualized adaptive testing environment [1][2][3][4]. Three categories of CAT are currently employed: standalone packages, test centers and networked systems. Regardless of which CAT system is employed, a critical issue in developing CAT is the construction of a test item-bank. Traditionally, asking teachers and content experts to submit items generates the item-bank. Three major drawbacks of the traditional method can be observed:

1) Limitation of item amount: Teachers and content experts tend to have similar views on the test subject. That is, in a given field vital subject matter might be confined. Therefore, although

more teachers and content experts are invited to contribute test items, the total number of distinct items remains low.

2) Passive learning attitude: Students are conventionally excluded from the creation of tests. In a typical computer-assisted testing system, teachers generate tests, the system presents test sheets and students then complete the tests. That is, within the system of testing, they play a passive role, and are not afforded the opportunity to conduct “meta-learning” or “meta-analysis.” 3) No guarantee on item quality: Permitting students

to generate tests may be a possible solution to the aforementioned problems. However, this raises a new problem: quality assurance and ensuring that the tests are worth storing and used for further tests. Even when the whole item-bank is contributed by teachers and content experts, ways to dynamically assess and filter test items are needed.

2 The Diyexamer Solution

The DIYexamer[5] provides a web interface for users to remotely control and operate the system. Three types of users are supported: administrators, teachers, and students. It allows students to contribute test items, and provides an effective means of verifying the discriminability of these items. Three main ideas are as below:

1) Item DIY by students: DIYexamer allows students to generate test items into the item-banks online as Fig 1. Teachers can query these items generated by students as Fig 2. In addition to rapidly increasing the total number of items in an item-bank, this feature also encourages students to develop meta-learning, i.e. creative learning. In order to submit tests, students must thoroughly study the learning materials, develop higher-level overviews of the materials, and practice cognitive and creative thinking.

(2)

2 Fig 1: Students generate items into the item-bank

Fig 2: Student DIY items as queried by teachers

2) Assessment of item discriminability: DIYexamer provides an item-discriminability assessment method to ensure the quality of the stored items. In addition to ensuring the internal consistency of existing test items, this method also continuously and dynamically screens additional new items in the item-bank. Fig 3 shows the average item discriminabilities of several item-banks.

Fig 3: Average item discriminabilities of item-banks

3) Item-bank sharing: DIYexamer, a scalable multi-server system, connects many item-banks stored in different servers. Therefore, via the Internet, more items can be accessed and shared. The sharing is limited and controlled in a sense that a server issues a request, describing the criteria of a test item it requests, to another server. A server does not open up its item-bank for unlimited access. Additional advantages have been identified and include the facts that since DIYexamer provides a real-time on-demand generation of test-sheet function, cheating is avoided. Also, DIYexamer provides an item cross-analysis function to which the degree of difficulty for each test as well as the entire test base can be accurately measured.

3 Discriminability Assessment Of Diyexamer

When selecting sample students, only those whose scores have large gap with the average score should be considered. Accordingly, those with the top 30%, in terms of range, scores are defined as “high-score group (H’)”, while those with the bottom 30% scores are defined as “low-score group (L’)”.

To show the different criteria and effects of choosing samples in the traditional method and DIYexamer method, Fig.4 depicts the score distribution in a test. In this example, the highest score is 92, the lowest score is 34, and the average score is 69. The “high rank score group” and the “low rank score group” are chosen according to these two methods. Take student X as an example, the score of X is 66, which differs only 3 points from the average score. The associated information of X should have little, if not none, referential value in computing item discriminability. However, X is chosen as a sample in the high rank group in the traditional method. This fallacy results from using rank group, in terms of count, as the criterion of choosing samples. In DIYexamer, X is not chosen since score group, in terms of range, rather than rank group is used. Only those with large gap with the average score are chosen as samples. 20 30 40 50 60 70 80 90 100 3 4 M I N X L AVG6 9 M AX9 2 H L H 6 6

Fig 4: Comparison of samples taken in the traditional method and DIYexamer method

(3)

3 respect to an item is generated for each student selected as a sample. We first define the item discriminability as the average of all associated referential values, as shown below:

Discriminability = Sum of the referential values of sampled students Number of sampled students

Since the referential values depend on students’ scores, the referential values are computed according to the ratio of correct and incorrect answers of the sampled students. The ratios of correct and incorrect answers are defined as follows:

Ratio of correct answer = Number of items answered correctly Number of items on the test Ratio of incorrect answer = Number of items answered incorrectly

Number of items on the test

According to Table 1, the referential value of a student correctly answered an item is the ratio of correct answer of the student. Alternately, the referential value of a student incorrectly answered an item is the ratio of incorrect answer of the student. This policy comes from the fact that an item should have increased discriminability if correctly answered by a competent student, while rendering decreased discriminability if correctly answered by a less competent student. In this way, a competent student contributes large referential value to a correctly answered item and small referential value to an incorrectly answered item, and vice versa.

4 Evaluation Of The Discriminability Assessment In Diyexamer

The fairness and performance of DIYexamer was evaluated. We conducted an experiment where 10 students took the test on-line using DIYexamer with 10 items. Table 2 summarizes the test results. Fig 5 shows the score distribution of the experiment. Discriminability for each item is computed using both the traditional method and the DIYexamer method. However, the discriminability originally falls between -1 to 1 using the traditional method, while falling between 0 to 1 using the DIYexamer method. To compare these two methods, both two ranges of discriminability are then normalized to 0 to 10, as shown in Fig 6. 0 1 2 3 1 2 3 4 5 6 7 8 9 S c o r e Number of students

Fig 5: Score distribution of the test experiment

0 2 4 6 8 1 0 1 2 I te m Discriminability D I Y E x a m e r 3 .6 6 .4 6 .4 6 8 8 8 7 .2 8 8 T r a d itio n a l 5 5 5 6 .7 8 .3 3 1 0 1 0 6 .7 1 0 1 0 1 2 3 4 5 6 7 8 9 1 0

Fig 6: Comparison of item discriminability

TABLE 1: Principle to compute the referential value of a student with respect to an item

Student Answer Item

discriminability

Referential value to compute discriminability

Correct High Ratio of correct answer

Competent

(With high ratio of correct answer) Incorrect Low Ratio of incorrect answer

Correct Low Ratio of correct answer

Less competent

(With low ratio of correct answer) Incorrect High Ratio of incorrect answer

TABLE 2: Result of the test experiment

Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9 Item 10 Number of correct answers(score)

student1 1(correct) 0(incorrect) 0 0 0 0 0 0 0 0 1

student2 1 0 0 1 0 0 0 0 0 0 2 student3 0 1 1 0 1 0 0 1 0 0 4 student4 1 1 0 1 0 0 0 1 0 1 5 student5 0 1 0 1 0 0 1 1 0 1 5 student6 1 1 1 0 0 1 0 1 1 0 6 student7 0 1 1 1 1 0 0 1 1 0 6 student8 1 0 0 1 1 1 1 0 1 1 7 student9 1 0 0 0 1 1 1 1 1 1 7 student10 0 1 1 1 1 1 1 1 1 1 9

(4)

4

5 Conclustion

This paper has presented a novel architecture for a networked CAT system, DIYexamer. It supports item DIY by students, item-bank sharing, and item discriminability assessment.

For discriminability assessment, new calculation formula were proposed. When compared with the traditional assessment scheme, the main difference is that the top and the bottom 30% of the

score group, in terms of range of scores were

selected rather than the rank group, in terms of count

of students. Thus, item discriminability is more

accurately reflected particularly when the tested students have close scores.

Item-bank sharing and item DIY by students has increased both the amount and the variety of questions in item-banks. Item DIY by students promotes creative learning within students, while automatic discriminability assessment assures better quality than traditional CAT systems.

A questionnaire was used to survey subjective attitudes of students about DIYexamer. As shown in Table 3, the outcome revealed that most students were interested in item DIY.

The technique proposed herein is useful in

general tuition not only to improve the quality of test items and fairness; but also to save time from generating questions and computing scores. We recommend that DIYexamer be popularized to schools.

6 References

[1] C. V. Bunderson, D. K. Inouye, and J. B. Olsen, “The four generations of computerized

educational measurement,” in Educational measurement (3rd ed.), R. L. Linn, Ed. New York: American Council on Education— Macmillan, pp 367-407, (1989).

[2] S. L.Wise and B. S. Plake, “Research on the effects of administering tests via computers,” Educational Measurement: Issues and Practice, vol. 8, no. 3, pp. 5-10, (1989).

[3] A. C. Bugbee, Examination on Demand: Findings in Ten Years of Testing by Computers 1982-1991. Edina, MN: TRO Learning, (1992). [4] Load, F. M., Applications of Item Response

Theory to Practical Testing Problems. Erlbaum, Hillsdale, NJ ,(1980).

[5] “DIYexamer system”,

http://speed.cis.nctu.edu.tw/~diy(accessed on Augest 22, 2000).

TABLE 3: DIYexamer questionnaire results: percentage and the number of students in parentheses of each question

Question Strongly agree Agree No opinion Disagree Strongly disagree

Item DIY is interesting. 12.3 (7) 43.9 (25) 21.1 (12) 15.8 (9) 7.0 (4)

Item DIY is fanciful. 19.5 (10) 49.1 (28) 21.1 (12) 10.5 (6) 1.8 (1)

I am curious about the testing result of my DIY

item. 26.3 (15) 59.6 (34) 10.5 (6) 3.5 (2) 0.0 (0)

I learned a lot when creating items. 12.3 (7) 47.4 (27) 22.8 (13) 17.5 (10) 0.0 (0)

I am curious about the teacher’s opinion about my

DIY item. 22.8 (13) 50.9 (29) 22.8 (13) 1.8 (1) 1.8 (1)

I am curious about other students’ opinions about

my DIY item. 15.8 (9) 56.1 (32) 21.1 (12) 7.0 (4) 0.0 (0)

I studied harder to prepare item DIY. 10.5 (6) 54.4 (31) 21.1 (12) 14.0 (8) 0.0 (0)

Judging the difficulties of my DIY items is easy. 40.4 (23) 38.6 (22) 14.0 (8) 7.0 (4) 0.0 (0)

Judging the fitness of my DIY items is difficult. 36.8 (21) 49.1 (28) 8.8 (5) 5.3 (3) 0.0 (0)

Item DIY by students comes from the laziness of

teachers. 7.0 (4) 12.3 (7) 43.9 (25) 33.3 (19) 3.5 (2)

If possible, I hope such item DIY mode through the

whole course can replace conventional testing. 1.8 (1) 10.5 (6) 35.1 (20) 38.6 (22) 14.0 (8)

Items generated by students are easier than by the

teacher. 7.0 (4) 36.8 (21) 28.1 (16) 24.6 (14) 3.5 (2)

I knew more about the testing material after item

數據

Fig 4: Comparison of samples taken in the traditional method and DIYexamer method
TABLE 2: Result of the test experiment Item 1 Item2 Item3 Item4 Item5 Item6 Item7 Item8 Item9 Item10 Number of correctanswers(score)
TABLE 3: DIYexamer questionnaire results: percentage and the number of students in parentheses of each question

參考文獻

相關文件

1、 網路管理與通信技術整合實務、機電控制、網拍多媒體行銷及物流從業人員

[r]

[r]

機器人、餐飲服務、花藝、雲端運算、網路安全、3D 數位遊戲藝術、旅 館接待、行動應用開發、展示設計、數位建設

By correcting for the speed of individual test takers, it is possible to reveal systematic differences between the items in a test, which were modeled by item discrimination and

問題類型 非結構化問題 結構化問題 結構化問題 結構化問題 學習能力 不具學習能力 不具學習能力 自錯誤中學習 自錯誤中學習 學習能力 不具學習能力 不具學習能力

熟悉 MS-OFFICE

‡ Verio 提供網站代管公司完整的軟體、運算 與網路資源,也提供網路零售業者開發電子 商務及網站代管的服務 V i 也提供小型 商務及網站代管的服務。