In this chapter, the personalized quiz strategy based on automatic quiz generation
is presented. This personalized quiz strategy aims to achieve the following three
pur-poses: first, we not only build a model to estimate reading difficulty, but also
investi-gate the optimal combination of features for improving reading difficulty estimation.
Next, an examinee’s grade level is estimated by concerning the test responses and his
or her historical data; in contrast, previous work only considered the current test
re-sponses. Finally, questions are selected with not only corresponding difficulties but
al-so examinees’ unclear concepts behind the previous incorrect responses. A student’s
previous mistakes are recorded and considered in advance in order to confirm whether
he or she has learnt. Through the iterative practice, students’ understanding will be
en-hanced by absorbing lots of different reading materials.
4.1 Reading difficulty estimation
As mentioned above, almost all past literature was designed for native readers,
of articles written for native readers. But for second language learners, the word
diffi-culty depends on the structure of the material they study, not its popularity in the real
world. In this section, we design a reading difficulty estimation for second language
learners. We investigate the effectiveness of several meaningful lexical and
grammati-cal features from early work, and then further consider organized grading indices of
vocabulary from different sources, as well as grammar patterns collected from
text-books which represent words and grammar patterns that language learners have
ac-quired at various grade levels, such as the word and grammar acquisition grade
distri-butions. Furthermore, we also propose features that take into consideration word sense
and coreference resolution.
Let D represents a document, while S represents the sentences in D. Suppose that
D has n sentences, s1, s2, …, sn, so that D = { s1, s2, …, sn}. Let W be the set of words in
D. Suppose D has m distinct words, w1, w2, …, wm, so that a document D = { w1, w2, …,
wm}. We further suppose that the sentence S has k words, w1, w2, …, wk, so that S =
{ w1, w2, …, wk}, m > k. For a given training data set, the features are extracted and
sent to a linear regression process to obtain a linear model that includes the weight of
each feature. The linear model is then applied to a document to estimate the difficulty
level. In the following sections we explain and define the features used in the proposed
estimation.
5.1.1 Baseline features
Word Number: A basic assumption is that a longer document is more difficult
than a shorter one. Almost all prior work assumed that the number of words in a
doc-ument accurately estimates reading difficulty (Flesch, 1948; Dale and Chall, 1948;
Gunning, 1952; McLaughlin, 1969; Coleman and Liau, 1975; Kincaid et al., 1975).
Pitler and Nenkova (2008) pointed out that this feature is significantly correlated with
readability. For second language learners, we assume that a longer document takes
more time to consume. Therefore, the number of words in a document is used in this
study as one of the features to estimate reading difficulty. Word count is defined as
follows:
_ log | |
word number= D (1)
Sentence length: Past studies have also taken sentence length into account,
as-suming that a shorter sentence is easier than a longer one (Flesch, 1948; Dale and Chall,
1948; Gunning, 1952; McLaughlin, 1969; Coleman and Liau, 1975; Kincaid et al.,
1975). Thus for each document, we consider the average number of words per sentence
as sentence length. The difficulty of a sentence is defined as follows:
_ word number_ sentence length
= n (2)
Syllables: A syllable is a unit of organization for a sequence of speech sounds.
For example, the word water is composed of two syllables: wa and ter. Some related
work has also taken syllables into consideration (Flesch, 1948; Gunning, 1952;
McLaughlin, 1969; Kincaid et al., 1975). One notable example is the SMOG formula
(McLaughlin, 1969), which estimates the reading difficulty of a document by only
us-ing the average number of polysyllables per senence.
Even though syllables have proven to be a useful measure of reading difficult for
first-language users, similarities between sounds of a native speaker’s mother tongue
and their adopted second language can impact second-language learning. For instance,
a word in an Asian language usually has one syllable, while a word in western
lan-ond-language learner could use similar-sounding syllables from their first language to
learn vocabulary (called L1ɡphonology effect hypothesis; Yamada, 2004). We assume
the number of syllables in a word may affect the difficulties of documents. Thus, we
find the average number of syllables of every word in a document to measure reading
difficulty. The syllable difficulty of a document is defined as follows:
_
where word_syllablesi is the number of syllables within a word i.
5.1.2 The word acquisition grade distributions fea-tures
It is crucial to understand when a word is acquired by target readers. Kireyev and
Landauer (2011) have tried using latent semantic analysis to capture word difficulty.
Even though there is no existing dictionary presenting the word acquisition distribution
in school grades, second language learners learn vocabulary in a limited range, which
is usually decided by experts or teachers. Similar to Wan et al. (2010), we build two
the word grading. This helps better identify the word acquisition grade distributions
resulting from random draws from the population of second language learners. Similar
to Section 4.1, two resources, the General English Proficiency Test Reference
Vocabu-lary and the VocabuVocabu-lary Quotient, are used to estimate the word acquisition grade
dis-tributions in our study.
GEPT Word Lists: The General English Proficiency Test (GEPT; Wu and Liao,
2010) is designed to evaluate student proficiency in English as a second language. It
provides a reference vocabulary list with about 8,000 words divided into three word
levels: elementary (gept1), intermediate (gept2) and high-intermediate (gept3). Some
words not found in the GEPT word list are attributed to the out of GEPT word list
(gept0). For each word from a document, we identify its vocabulary difficulty by
searching for the word’s level from the GEPT word lists, counting the number of
dis-tinct words in each level, and finally normalizing by the total number of disdis-tinct words
in each level.
Age of Word Acquisition: In addition to the GEPT, we also collected a word list
from an organization, Vocabulary Quotient (VQ; Ho and Huong, 2011). This
organiza-tion collected more than 10,000 words and labeled them in reference to other
educa-tional institutions, such as the Elementary School Reference Vocabulary and the Junior
High School English Reference Vocabulary texts made by the Ministry of Education of
Taiwan, and the High School English Reference vocabulary text made by the College
Entrance Examination Center of Taiwan. The word list is divided into fourteen levels
(vq3ɡvq16), which represent the words learned by second language learners from
elementary school to college. Just as with the GEPT list, some words are still absent
from the Vocabulary Quotient word list; these words are attributed to out of vocabulary
list (vq0). For each word from a document, we identify its difficulty by first
referenc-ing its difficulty level from within those word lists, and after countreferenc-ing the number of
distinct words in each level, normalizing by the total number of distinct words in each
level.
5.1.3 Frequency features
Besides the word acquisition grade distributions features, word frequency is
an-other approach to estimating word difficulty. Word frequency is based on the
assump-word acquisition grade distributions, we find its assump-word frequency from the BNC corpus
and also use a Google search result count as an alternative frequency, for every word in
a document.
Word Frequency in BNC Corpus: The British National Corpus (BNC; Lou and
Guy, 1998) is a 100 million word collection of written and spoken language from a
wide range of sources, designed to represent a wide cross-section of British English
from the later 20th century. For each word in a document, we calculate the distinct
word frequency (wf) that refers to the times it appears in the BNC corpus. Word
fre-quency is defined as follows:
| | ni wfi dj
= (4)
where ni, is the number of occurrences of the considered distinct word wi in
docu-ment dj, and the denominator is the sum of the number of occurrences of all distinct
words in document dj, that is, the size of the document | dj |. For each word in a given
document, we also calculate the average number of log word frequency. The
docu-ment’s difficulty value based on word frequency in the BNC corpus is defined as
fol-_ log 0
Google Search Result Count: For a given query, Google will return a list of
documents containing the queried words and a search result count. We use the search
result count as a measure of word frequency, like the word frequency from a corpus.
For each word in a given document, we also calculate the average number of log word
frequency. The document’s difficulty value based on word frequency from Google is
defined as follows:
where googlei is the search result count of a word i from Google.
5.1.4 Parse features
Syntactic constructions affect the understanding of a sentence. This assumes that
the more complicated a sentence, the greater its difficulty. Schwarm and Ostendorf
(2005) proposed four syntactic features for their measure of reading difficulty: the
av-erage parse tree height, the avav-erage number of noun phrases per sentence, the avav-erage
number of verb phrases per sentence, and the average number of subordinate clauses
per sentence (SBAR). Because sentences with multiple noun phrases require the reader
to remember more entities, Barzilay and Lapata (2008) found that documents written
for adults tended to contain more noun phrases than those written for children. In
addi-tion, while including more verb phrases in each sentence increases sentence complexity,
adults might prefer to have related clauses explicitly grouped together. Pitler and
Nenkova (2008) have also found a strong correlation between readability and the
number of verb phrases. These works show that the more complicated the parse
fea-tures in a document, the more likely it was written for adults. Hence, we also examine
the influence of parse features for second language learners.
Prepositions are a class of words that indicate relationships between nouns,
pro-nouns and other words in a sentence. Prepositions can be divided into two kinds:
sim-ple prepositions and compound prepositions. Simsim-ple prepositions are single word
prepositions, while compound prepositions are more than one word. We assume that
more prepositional phrases in a sentence also increase its complexity, and second
lan-guage learners might be confused by complex prepositional phrases. Thus, in addition
to the parsing features proposed by Schwarm and Ostendorf (2005), we also present
the average number of prepositional phrases as a new feature to capture grammatical
complexity.
Thus, from the outline above, for a document we consider the following syntactic
features from parse results generated by a Stanford parser (Klein and Manning, 2003):
the average parse tree height, the average number of noun phrases, the average number
of verb phrases, the average number of SBAR and the average number of prepositional
phrases.
Average Parse Tree Height: Suppose the height of a parse tree of a sentence is h.
The average parse tree height difficulty of a document is defined as follows:
_ 0
Average Number of Noun Phrases: Suppose a sentence has npi noun phrases.
The average noun phrase difficulty of a document is defined as follows:
0
Average Number of Verb Phrases: Suppose a sentence has vpi verb phrases. The
average verb phrase difficulty of a document is defined as follows:
0
Average Number of SBAR: Subsidiary conjunctions (SBAR), for example,
be-cause, unless, even though, and until, are placed at the beginning of a subordinate
clause that links the subordinate clause and the dominant clause. SBAR is an indicator
to measure sentence complexity. The SBAR difficulty of a document is defined as
fol-lows:
Average Number of Prepositional Phrases: Suppose a sentence has ppi
preposi-tional phrases. The average number of the preposipreposi-tional difficulty of a document is
de-fined as follows:
5.1.5 The grammar acquisition grade distributions features
In Heilman et al. (2007), they found that grammatical features played an
im-portant role in reading difficulty estimation for second-language learners. A model with
complex syntactic grammatical feature sets achieved more accurate results than
sim-pler models. In their work, they examined the ratio of grammatical occurrence per 100
words: both the passive voice and past participle had obvious differences between the
lowest and highest levels in the second-language corpus. Thus, we measure
grammati-cal difficulty as a linguistic processing factor in estimating reading difficulty for
sec-ond language learners.
Grading Index of Grammar (grammar1ɡgrammar6): To decide the
gram-matical difficulty level of a document, the same method described in Section 4.2. We
first collected sentences from the six versions of second-language textbooks and parsed
the sentences to find their grammar patterns, for a total of 44 grammar patterns.
Manu-ally identifying these grammar patterns allows the parse tool to then automaticManu-ally find
these same patterns within a given document. Next, using this parse tree structure
searching tool (Levy and Andrew, 2006), the grammatical structures were assigned to
the textbook grade in which they frequently appear.
5.1.6 Semantic features
For any given word, its meaning may vary broadly depending on the context. For
example, the word “bank” has two distinct meanings (also called two senses),
“finan-cial institution” and “sloping mound”, not to mention its other colloquial uses. For
both the word acquisition grade distributions and frequency features, we assume that a
word only has one sense, because this still results in accurate performance with many
language technologies, such as information retrieval or text classification. However, it
cannot be claimed that a second language learner having learned a word knows every
sense of the word. Therefore, we designed semantic features to identify word senses in
a document.
Average Number of WordNet Synsets: We adopted WordNet (Miller et al., 1990)
as a resource for understanding the senses in a word. WordNet is a large lexical
data-base of English. The datadata-base contains 155,287 words, with each word annotated with
A set of near-synonyms is defined as a synset, which represents a concept of a word.
For each word in a document, we total the number of a word’s synset using
WordNet. To determine the representation of this feature, we develop seven categories
(wordnet1ɡwordnet7) to represent the number of synsets of each word in a
docu-ment. Here, suppose a word has wsi synsets. The number is normalized as two square
roots and then rounded down to an integer as a feature index. For example, if the
number of synsets of a word is 17, it is attributed to wordnet4. If the number of
synsets of a word is greater than 49, it is assigned to wordnet7. Finally, we count the
number of distinct words in each WordNet category and normalize by the total number
of distinct words.
5.1.7 Relation features
Coreference is a grammatical relation that presents two referring expressions that
refer to the same entity. This entity is called an antecedent, and the referring expression
is called an anaphora. We assume that coreference represents the implicit relations
be-tween sentences. When second language learners recognize the coreferent relation well,
count the number of pronouns per document, the number of proper nouns per
docu-ment, the number of antecedents per docudocu-ment, the average number of anaphora per
coreference chain and the average distance between anaphora and antecedents per
chains.
Average Number of Pronouns: We assume that the greater the number of
pro-nouns in a document, the more entities the reader needs to remember, and this
increas-es reading difficulty. Thus, we total the average number of pronoun in a document.
Average Number of Proper Nouns: If a sentence contains more than one proper
noun, a reader must remember more objects in a document. Barzilay and Lapata (2008)
found that documents written for adults tended to contain more entities than those
written for children. Hence, we count the average number of proper nouns in a
docu-ment.
The Number of Antecedents per Document: Antecedents represent real entities
mentioned in the document. Similar to the average number of proper nouns, we assume
that if a document contains less entity, the document is easier to read. We total the
number of antecedents as the number of entities to capture this idea.
The Average Number of Anaphora per Corference Chain (corefer_chain):
We assume that with more anaphora per coreference chain, second language learners
need more knowledge to resolve them; consequently, we count the average number of
anaphora per chain.
The Average Distance between Anaphora and Antecedents per Chain
(co-refer_distance): This captures the distance between antecedents and anaphora. We
assume if an antecedent and anaphora are in the same sentence, the sentence will be
easy to understand. In contrast, if they are several sentences apart, it is probable that
the document is more complex to read.
5.1.8 Regression model
Linear regression is an approach to modeling the relationship between a scalar
variable Y and variables denoted X. A prediction of a given document is the inner
product of a vector of feature values for the document and a vector of regression
coef-ficients estimated from the training data.
, 1,2,...
1
Y = +α i¦n βi iX +ε i= n
= (12)
where Y is the difficulty value of a document, Ƚis the intercept parameter, X = {x1, x2, …, xn} represents the feature values, β={ ,β β1 2,...,βn}refers to the regression
co-efficient for each feature value i, and lastly ߝ is an unobserved random variable that represents noise in the linear relationship between the dependent variable and
regres-sors.
The primary reason for adopting linear regression for a reading difficulty model is
that the output scores are continuous and related with each other, whereas the outputs
of other methods, such as classification, are discrete and unrelated between levels.
Kate et al. (2010) evaluated the performance of reading difficulty among several
ma-chine learning methods and reported that all of the regression family outperformed the
baseline. In our study, the readability of a text is represented by a score or a class,
which is typically indicated in terms of school grades. Overall, the content difficulty of
textbooks increases incrementally. Thus, we opt for linear regression as our model, as
we assert that our estimated results are correlated.
5.2 Ability estimation
In this section, we propose an interpretable and statistical ability estimation with
inherent randomness in the acquisition process, specifically in the Web-based learning
environment. This model draws a connection between students’ abilities and the
acqui-sition grade distributions. For a student who is said to be a grade level six in this work,
our method is able to estimate how much the student has acquired as a certain
per-centage of the knowledge in a population when he correctly answers a certain
percent-age of items on a test.
We propose the following interpretation of the quantitative denition: an
exami-nee is said to have ability if s percent of items in a test T = (t1, . . . ,tm) have been
correctly answered each by r percent of the population. We first consider that each item
ti in a test T has been correctly answered by r percent of the population. In general,
there is a specific knowledge behind each tested item ti. The difficulty level of the
spe-cific knowledge represents the age at which most people have acquired knowledge of ti.
Most people understand some knowledge at an early age, whereas some understand
this knowledge later in life. Here, we precisely denote the level the specific knowledge
by the age at which r percent of the population has acquired knowledge of ti, where age
refer to school grades. When given a knowledge ti and a population, the probability
distribution of grade acquisition pt() can be calculated. Let the quantile function qt of
distribution of grade acquisition pt() can be calculated. Let the quantile function qt of