Automatic Item-Generation (AIG)

Chapter 2. Literature Review

2.4. Automatic Item-Generation (AIG)

Within recent years, automatic item-generation system has become a hot topic in the testing field. The current automatic item-generation systems have been widely applied in the tests of vocabulary, reading, grammar, cloze and figural tests. This study would investigate literature related to AIG, discuss the production method of distracters, and also understand the fields that AIG could be applied to. In addition, this study also expected to understand whether the application of AIG was suitable for teaching system and whether the system that AIG produced could help the students’

learning or not. Finally, this study also investigated figural tests produced by AIG theory to understand its system architecture, performance, and method to generate figural items by using image processing expecting that those could assist the system construction of this study.

In the following sections, this study would investigate studies related to understand its generation mechanism of distractor hoping that it could be applied in this study. Steven (1991) proposed a method for constructing vocabulary tests; he utilized the technology of concordance of natural language processing to produce

vocabulary item from general corpora. On the other hand, Wilson (1997) suggested automatically generating practice items of CALL (Computer Assisted Language Learning) from electronic dictionaries and parsed corpus.

Coniam (1997) used the information of word frequency from the statistics of corpora to generate multiple response items of cloze tests automatically. The method was to obtain a large amount of corpus and then use Automatic Grammatical Tagging System (AGTS) as the tool to mark the lexical category of the words and make statistics on the part of speech of each word one by one. Utilizing these two information, Coniam’s system could accept three different item-generation conditions:

(I) always remove the nth word in each sentence and use it as the answer (in order to allow students to have a preliminary concept about the whole article, the answer from the second sentence was usually removed), (II) limit the answer to certain ranges of word frequency, and (III) limit the part of speech of the answer. After deciding the answers of the cloze test, the system would choose 3 words of the same part of speech and similar word frequency with the answer to be distractors in each item to construct a cloze test. Coniam provided a basic process generating cloze tests automatically from a corpus. However, the generated items still needed to be reviewed by human mainly due to two major reasons. First, the strategy of deciding answer was not optimized and this may lead to that items were too easy or too hard to answer. Secondly, after distractors were generated, the system didn’t examine whether they could replace the answer to be the best choice and this led to that there were too many correct choices to choose from in a multiple-choice item.

Poel and Weatherly (1997) proposed another method to construct cloze tests.

Their strategy of selecting answers was to remove the answer every 6 or 7 words starting from the 3rd sentence of an article. These items stems with removed answers were then answered by a student group with certain ability in the form of “fill in the blanks“ item, and they generalized 3 wrong words that these students answered most frequently to be distractors. This method had a more serious problem of choosing inappropriate answers because when the answer was a definite article then the items would be too easy and students could answer without understanding the whole article,

and when the answer was a proper noun, students couldn’t answer it. Besides, whether student’s wrong answers could be distractors needed to be proved by experiments.

AWETS (Automatic Web-based English Testing System) developed by Kao (2000) integrated sub-systems including automatic generation of vocabulary item, test delivery, automatic scoring, and grade recording. The process of automatic generation of vocabulary item was to collect the articles of Project Gutenburg (http://promo.net/pg/) and Taiwan Panorama magazine (http://www.sinorama.com.tw/) from the internet and then save the sentences in a corpus after the pretreatment of natural language technology. Test editors could choose answers or difficulty parameter of items and then retrieve sentences complying with the conditions from the corpus, and use information provided by electronic dictionaries and word classification to generate distractors. According to the ranking of word frequency of answers, the difficulty parameter was broadly divided into hard, middle, and easy. Professor Kao’s method still couldn’t make sure whether distractors would confuse the structure of multiple-choice item due to the lack of mechanism of examining distractors. In addition, without monitoring of words’ parts of speech would also lead to the inconsistence between tenses of distractors and answers.

Mitkov (2003) developed a system automatically generating English multiple-choices tests automatically, in which the items were essay questions and the answer was limited to noun/noun phrase because the corpus wasn’t a large one. Mitkov used electronic teaching materials as the source of items, utilized nouns/noun phrases to be answers from the sentences that stated facts, changed the original statements into interrogative sentences to be question stems, and finally selected words or phrases with semantic similarity to the answer to be distractors. For instance, we could obtain

“introductory modifier“ to be the answer from the sentence of “A prepositional phrase at the beginning of a sentence constitutes an introductory modifier,“ and then change it to an interrogative sentence and make it become the item as the following:

What does a prepositional phrase at the beginning of a sentence constitute?

1. a modifier that accompanies a noun 2. an associated modifier

3. an introductory modifier 4. a misplaced modifier

Mitkov’s experimental results indicated that items generated by this algorithm weren’t significantly different from the man-made ones in difficulty and discrimination parameter. Moreover, the quality of distractors was promoted and the efficiency of question generation was even improved several times.

We would also discuss studies related to the application of AIG in teaching in the following sections. Li (2011) developed a Chinese idiom practice system based on ontology. He first analyzed multiple messages from the literature to construct idiom ontology. After generalized common reasons that students used wrong idioms to make sentences, a diagnosis mechanism workable on computers was then designed. Finally, he constructed a Chinese idiom practice system based on ontology by the system development method.

Li’s system could provide on-line idiom teaching materials, generate true/false, multiple choice, and matching questions automatically. Besides, it could use the method of situational sentence making – utilizing idioms and words to form a complete sentence at a certain situation – to let people practice idioms. This method could also judge the answer, diagnose whether the sentence was reasonable or not, and provide immediate feedbacks according to the blind spots in using idioms. Yang’s system broke the limit that the current on-line idiom practice systems could only generate questions from the pre-set item bank and also provided an environment of applying idioms to situational sentence-making.

As for system performance and satisfaction evaluation, the results showed that this system could improve student, especially for those with middle and low. For using idioms to make sentences, students who were assisted by the system of this study had a

significantly higher level of correctness in using syntax and meaning of idioms than those who received traditional teaching. Students also felt that the system was easy to use and useful and felt satisfied about it. Finally, as for system feasibility, teachers who were interviewed all thought that the system of this study could be applied in students’ self-study of idiom.

In 2010, a template-based item generation system was proposed. By specifying multiple item templates which used different words of subject, verb, object, a question sentence, and variable numbers to substitute the elements of a template, Jiang (2010) proposed an automatic item generation system based on templates. In order to keep the generated sentence going smoothly and fluently, the number of artificial intervention was decreased and the rationality among elements was considered. The proposed automatic item generation system can quickly create a series of items, and improve the time consuming and manpower cost.

Next we would discuss the application of AIG in figural tests and hope that it could assist the system development in this study. Liu, Liang and Lin (2001) has researched computer adaptive figural testing since 1998. Their researches are based on the analysis of Raven’s Advanced Progressive Matrices (APM) test structure. Besides, Lin was also responsible for the development of the New Figure Reasoning Test (NFRT).

NFRT contains two main systems: the automatic item-generation system and the online testing system. The online testing system based on IRT theory is just an interface for collecting and evaluating the ability of examinees. The point of this study is an automatic item-generation discussed in the following paragraphs.

The IRT parameters of APM in Lin’s study are discussed as follows:

1. Difficulty: According to Hambleton and Swaminathan (1985), the value of item difficulty parameter was set to between -2.0 and 2.0. Based on this criterion, the difficulty of APM items was between -2.0 to 2.0 with average of -.868. An APM example item is as Figure 2-5.

2. Discrimination: In terms of ability tests, the value of discrimination parameter was more than 0 and relatively low in APM, and the item 8 had the lowest discrimination (.014).

3. Guessing: According to the estimation, the supposed value of guessing parameter in APM items was .219. Since there were 8 choices in APM, the predicated value should be 12.5%. The average guessed value was higher than expected.

Figure 2-5. An example item of APM (Liu et al., 2001)

An automatic item-generation system contains an item generation algorithm and an item-generation engine based on APM. The functions, strengths and restrictions of this system are described as follows.

1. Item generation engine: The engine can automatically generate a specific item with particular content features, and combine different types of geometric figures in a systematic fashion for producing and measuring the item which matches the goal. The purpose of the measurement was to evaluate examinees’

reasoning ability on the conclusion (inference on relations) and deduction (inference of relativity) through the figure partition characteristic of the item

and the manipulation of the relationships between figures in space. An example item of APM is shown in Figure 2-5.

2. Item generation algorithm: The algorithm for item-generation was based on the analysis of features in APM items. The key points were the parameters in IRT theory and the problem solving processes of APM.

在文檔中應用虛擬題庫理論-電腦化方塊計數測驗之實作 (頁 26-32)