1.1 Background
For many years, educational assessment has played an important role in teaching
and learning (Gronlund, 1993). It can evaluate the effectiveness of teaching, diagnose
the state of learning, and help the development of students’ learning (Chen, Lee, &
Chen, 2005; Chen & Chung, 2008; Johns, Hsingchin, & Lixun, 2008; Barla et al.,
2010). With the development of computers and the Internet, Computer Adaptive
Test-ing (CAT) is now a developTest-ing way to administer tests adaptTest-ing to learners’ knowledge
or competence in language learning (Troubley, Heireman, & Walle, 1996). Based on
adaptive tests, examinees’ abilities can be more accurately measured by fewer suitable
questions (Weiss & Kingsbury, 1984; Van der Linden & Glas, 2000); moreover, student
performance has also been demonstrated improved (Barla et al., 2010). CAT can not
only provide questions but also be combined with scaffolding hints and instructional
feedback (Feng, Heffernan & Koedinger, 2010). This facilitates students learning and
helps them acquire knowledge with external help. However, when a great number of
in assessment resources because it is time–consuming and cost–intensive for human
experts to manually produce questions.
In recent years, there has been increasing attention to computer-aided question
generation (also called automatic question generation or automatic quiz generation) in
the field of e-learning and Natural Language Processing (NLP). It is useful in multiple
subareas and has been proposed to use in generating instructions in tutoring system
(Mostow & Chen, 2009), assessing domain knowledge (Mitkov & Ha, 2003),
evaluat-ing language proficiency (Brown, Frishkoff, & Eskenazi, 2005), assistevaluat-ing academic
writing (Liu, Calvo, Aditomo, & Pizzato, 2012) and question answering (Pasca, 2011).
In order to make learning environment more effectively and efficiently, many
re-searchers have been exploring the possibility of an automatic question generation in
various contexts. For example, a wide variety of applications, such as Linguistics
(Mitkov, Ha, & Karamanis, 2006) and Biology (Agarwal & Mannem, 2011), identified
the important concepts in textbooks and generated multiple-choice questions and
gap-fill questions. In the domain of language learning, a growing number of studies
(Turney, 2001; Turney, Littman, Bigham, & Shnayder, 2003; Liu, Wang, Gao, &
Huang, 2005; Sumita et al., 2005; Lee & Seneff, 2007; Lin, Sung, & Chen, 2007; Pino,
Heilman, & Eskenazi, 2008; Smith, Avinesh, & Kilgarriff,, 2010) are now available to
not only drills and exercise, including vocabulary, grammar, reading questions, but also
formal exams, including SAT (Scholastic Aptitude Test) analogy questions and TOEFL
(Test of English as a Foreign Language) synonym task. To support academic writing,
Liu et al. (2012) used Wikipedia and the conceptual graph structures of research papers
and generated specific trigger questions for supporting literature review writing.
Several researches have addressed the benefit of facilitative learning and teaching
with automatic question generation. The use of computer-aided question generation for
educational purpose was motivated as research of reading comprehension consistently
found that assessment is helpful in learning and enhances learners’ retention of
materi-al (Anderson & Biddle, 1975). Mitkov et materi-al. (2006) demonstrated that computer–aided
question generation was more timeȉefficient than manual labor. Turney et al. (2003)
showed that the generated SAT and TOEFL questions are comparable to that generated
by experts. Liu et al. (2012) found that the generated trigger questions were more
use-ful than manual generic questions and that the questions could prompt students to
re-flect on key concepts, because the questions were generated based on what students
read. With the advantage of automatic question generation, students can practice
with-out waiting for a teacher to compose a quiz, and teachers can spend more time on
teaching; moreover, besides evaluating students’ understanding, automatic question
generation can be designed with additional functions.
1.2 Research problem
Recent theories on learning have focused increasing attention on understanding
and measuring student ability. There is now general consensus over Vygotsky’s (1978)
observation that a learner’s ability in the Zone of Proximal Development (ZPD)—the
difference between a learner’s actual ability and his or her potential development—can
progress well with external help. Instructional scaffolding (Wood, Bruner, & Ross,
1976), closely related to the concept of ZPD, suggests that appropriate support during
the learning process helps learners achieve their learning goals. However, effective
in-structional support requires identifying students’ prior knowledge, tailoring assistance
to meet their initial needs, and then removing this aid when they acquire sufficient
knowledge.
Even though previous studies in the field of computer-aided question generation
automatically generate all possible questions based on their proposed approach in an
attempt to reduce the cost of time and money of manual question generation, such
ex-haustive list of questions is inappropriate for language learning, because it can lead to
redundant, over–simplistic test questions that are unsuitable for evaluating student
progress. Moreover, it is hard to achieve meaningful test purpose and maximize
exam-inees’ learning outcomes because the personalized design (Fehr et al., 2012; Hsiao,
Chang, Chen, Wu, & Lin, 2013; Wu, Su, & Liu, 2013) is still critically lacking.
1.3 Research purpose
This work is intended to provide personalized computer-aided question
genera-tion on formative assessment to assess students’ receptive skills in English as a foreign
or second language. It generates three question types, including vocabulary, grammar
and reading comprehension, and differs from previous studies in the way learners’
language proficiency levels are considered in the generating process and questions are
generated with difficulties. The definition of “personalization” refers to the adjustment
to learner needs by matching the difficulty of questions to their knowledge level. In
other words, questions are generated based on an individual’s ability even though
stu-dents read the same learning material.
This work, the personalized computer-aided question generation, is based on the
related concept to the age of acquisition (AOA). The basic idea of age of acquisition is
the age at which a word, a concept, even specific knowledge is acquired. For instance,
people learn some words such as “dog” and “cat” before others such as “calculus” and
“statistics”. Numerous studies in psychology and cognitive science have shown the
positive influence on the process of brain, such as object recognition (Urooj et al.,
2013), object naming (Carrolla & Whitea, 1973; Morrison, Ellis, & Quinlan, 1992;
Alario, Ferrand, Laganaro, New, Frauenfelder, & Segui, 2005; Davies, Barbón, &
Cuetos, 2013), and language learning (Brysbaert, Wijnendaele, & Deyne, 2000;
McDonald, 2000; Izura & Ellis, 2002; Zevin & Seidenberg, 2002). Today, with the
various number of content available from the web and other digital resources, this
concept can be realized with advanced technology, Information Retrieval (Baeza-Yates
& Ribeiro-Neto, 1999; Manning, Raghavan, & Schütze, 2008) and Natural Language
Processing (Manning & Schütze, 1999), which counts word frequency and calculates
the probability of which a word is acquired at a certain school grade when given a
group of documents. With a large enough resource, such as an extensive collection of
all learning materials which people read and learn, the acquisition grade distributions
can be computed and implemented. For example, based on textbooks authored
specifi-cally for students in grade level six, questions can be generated based on concepts in
these textbooks that were correctly answered by a student, and from this, the student
can be said to either have or lack the skills at the grade level six. This implies that
learning materials, such as textbook, are written with intent to represent what learners
at a certain grade level learn and acquire. Two related work to this concept are a
reada-bility prediction (Kidwell, Lebanon & Collins-Thompson, 2011), which mapped a
document to a numerical value corresponding to a grade level based on the distribution
of acquisition age, and a word difficulty estimation (Kireyev & Landauer, 2011), which
modeled language acquisition with Latent Semantic Analysis to compute the degree of
knowledge of words at different learning stages.
In response to the personalized design based on the acquisition grade distributions,
we propose a personalized automatic quiz generation to generate multiple–choice
questions with varying difficulty, a reading difficulty estimation to predict the
difficul-ty level of an article for English as foreign language learners, as well as an
interpreta-ble and statistical ability estimation to estimate a student’s ability with inherent
ran-domness in the acquisition process, specifically in the Web-based learning environment,
as shown in Figure 1.
The purpose of personalized testing is to not only measure the achievement
per-formance of students, but also help them improve their own learning process and
cor-rect their mistakes by understanding what they has learned and has not learned yet.
Through this approach, students can read any materials online and then do more
exer-cises to understand their strengths and to improve their weaknesses, as a strategy to
guide them to language acquisition.
Figure 1 The architecture of the personalized computer-aided question generation.
The main research questions addressed in this study are:
(1) Does the proposed personalized design with the appropriate instructional
scaffolding help students advance their learning progress?
(2) Does the proposed personalized question selection help students correct their
unclear concept?
(3) How are students’ perceptions and experiences in the proposed personalized
computer-aided question generation?
We also conduct simulation and empirical evaluations to investigate the property
(4) What are the representative features of the proposed reading difficulty
esti-mation in English as a foreign or second language?
(5) How is the performance of the proposed reading difficulty estimation
com-pared with the other reading difficulty estimation?
(6) What are the characteristics of the proposed ability estimation based on the
quantiles of acquisition grade distributions and item response theory?
(7) How is the performance of the proposed ability estimation compared with
the other ability estimations?
(8) How is the performance of the proposed ability estimation with the empirical
data in a Web-based learning environment?
The rest of this article is organized as follows. Chapter 2 describes related work.
In Chapter 3, we present the design of automatic quiz generation and the mechanism
for assigning question difficulty. Chapter 4 outlines the personalization framework,
consisting of reading difficulty estimation, ability estimation and quiz selection. In
Chapter 5 and Chapter 6, we present simulation evaluations of reading difficulty
esti-mation and ability estiesti-mation respectively. Chapter 7 evaluates the effectiveness of
personalized computer-aided question generation in the empirical study. Finally,
Sec-tion 8 summarizes with contribuSec-tions, limitaSec-tions, and potential applicaSec-tions.