Personalization - 個人化電腦輔助出題於英文學習之研究

In this chapter, the personalized quiz strategy based on automatic quiz generation

is presented. This personalized quiz strategy aims to achieve the following three

pur-poses: first, we not only build a model to estimate reading difficulty, but also

investi-gate the optimal combination of features for improving reading difficulty estimation.

Next, an examinee’s grade level is estimated by concerning the test responses and his

or her historical data; in contrast, previous work only considered the current test

re-sponses. Finally, questions are selected with not only corresponding difficulties but

al-so examinees’ unclear concepts behind the previous incorrect responses. A student’s

previous mistakes are recorded and considered in advance in order to confirm whether

he or she has learnt. Through the iterative practice, students’ understanding will be

en-hanced by absorbing lots of different reading materials.

4.1 Reading difficulty estimation

As mentioned above, almost all past literature was designed for native readers,

of articles written for native readers. But for second language learners, the word

diffi-culty depends on the structure of the material they study, not its popularity in the real

world. In this section, we design a reading difficulty estimation for second language

learners. We investigate the effectiveness of several meaningful lexical and

grammati-cal features from early work, and then further consider organized grading indices of

vocabulary from different sources, as well as grammar patterns collected from

text-books which represent words and grammar patterns that language learners have

ac-quired at various grade levels, such as the word and grammar acquisition grade

distri-butions. Furthermore, we also propose features that take into consideration word sense

and coreference resolution.

Let D represents a document, while S represents the sentences in D. Suppose that

D has n sentences, s1, s2, …, sn, so that D = { s1, s2, …, sn}. Let W be the set of words in

D. Suppose D has m distinct words, w1, w2, …, wm, so that a document D = { w1, w2, …,

wm}. We further suppose that the sentence S has k words, w1, w2, …, wk, so that S =

{ w1, w2, …, wk}, m > k. For a given training data set, the features are extracted and

sent to a linear regression process to obtain a linear model that includes the weight of

each feature. The linear model is then applied to a document to estimate the difficulty

level. In the following sections we explain and define the features used in the proposed

estimation.

5.1.1 Baseline features

Word Number: A basic assumption is that a longer document is more difficult

than a shorter one. Almost all prior work assumed that the number of words in a

doc-ument accurately estimates reading difficulty (Flesch, 1948; Dale and Chall, 1948;

Gunning, 1952; McLaughlin, 1969; Coleman and Liau, 1975; Kincaid et al., 1975).

Pitler and Nenkova (2008) pointed out that this feature is significantly correlated with

readability. For second language learners, we assume that a longer document takes

more time to consume. Therefore, the number of words in a document is used in this

study as one of the features to estimate reading difficulty. Word count is defined as

follows:

_ log | |

word number= D ⁽¹⁾

Sentence length: Past studies have also taken sentence length into account,

as-suming that a shorter sentence is easier than a longer one (Flesch, 1948; Dale and Chall,

1948; Gunning, 1952; McLaughlin, 1969; Coleman and Liau, 1975; Kincaid et al.,

1975). Thus for each document, we consider the average number of words per sentence

as sentence length. The difficulty of a sentence is defined as follows:

_ word number_ sentence length

= n ⁽²⁾

Syllables: A syllable is a unit of organization for a sequence of speech sounds.

For example, the word water is composed of two syllables: wa and ter. Some related

work has also taken syllables into consideration (Flesch, 1948; Gunning, 1952;

McLaughlin, 1969; Kincaid et al., 1975). One notable example is the SMOG formula

(McLaughlin, 1969), which estimates the reading difficulty of a document by only

us-ing the average number of polysyllables per senence.

Even though syllables have proven to be a useful measure of reading difficult for

first-language users, similarities between sounds of a native speaker’s mother tongue

and their adopted second language can impact second-language learning. For instance,

a word in an Asian language usually has one syllable, while a word in western

lan-ond-language learner could use similar-sounding syllables from their first language to

learn vocabulary (called L1ɡphonology effect hypothesis; Yamada, 2004). We assume

the number of syllables in a word may affect the difficulties of documents. Thus, we

find the average number of syllables of every word in a document to measure reading

difficulty. The syllable difficulty of a document is defined as follows:

where word_syllablesi is the number of syllables within a word i.

5.1.2 The word acquisition grade distributions fea-tures

It is crucial to understand when a word is acquired by target readers. Kireyev and

Landauer (2011) have tried using latent semantic analysis to capture word difficulty.

Even though there is no existing dictionary presenting the word acquisition distribution

in school grades, second language learners learn vocabulary in a limited range, which

is usually decided by experts or teachers. Similar to Wan et al. (2010), we build two

the word grading. This helps better identify the word acquisition grade distributions

resulting from random draws from the population of second language learners. Similar

to Section 4.1, two resources, the General English Proficiency Test Reference

Vocabu-lary and the VocabuVocabu-lary Quotient, are used to estimate the word acquisition grade

dis-tributions in our study.

GEPT Word Lists: The General English Proficiency Test (GEPT; Wu and Liao,

2010) is designed to evaluate student proficiency in English as a second language. It

provides a reference vocabulary list with about 8,000 words divided into three word

levels: elementary (gept1), intermediate (gept2) and high-intermediate (gept3). Some

words not found in the GEPT word list are attributed to the out of GEPT word list

(gept0). For each word from a document, we identify its vocabulary difficulty by

searching for the word’s level from the GEPT word lists, counting the number of

dis-tinct words in each level, and finally normalizing by the total number of disdis-tinct words

in each level.

Age of Word Acquisition: In addition to the GEPT, we also collected a word list

from an organization, Vocabulary Quotient (VQ; Ho and Huong, 2011). This

organiza-tion collected more than 10,000 words and labeled them in reference to other

educa-tional institutions, such as the Elementary School Reference Vocabulary and the Junior

High School English Reference Vocabulary texts made by the Ministry of Education of

Taiwan, and the High School English Reference vocabulary text made by the College

Entrance Examination Center of Taiwan. The word list is divided into fourteen levels

(vq3ɡvq16), which represent the words learned by second language learners from

elementary school to college. Just as with the GEPT list, some words are still absent

from the Vocabulary Quotient word list; these words are attributed to out of vocabulary

list (vq0). For each word from a document, we identify its difficulty by first

referenc-ing its difficulty level from within those word lists, and after countreferenc-ing the number of

distinct words in each level, normalizing by the total number of distinct words in each

level.

5.1.3 Frequency features

Besides the word acquisition grade distributions features, word frequency is

an-other approach to estimating word difficulty. Word frequency is based on the

assump-word acquisition grade distributions, we find its assump-word frequency from the BNC corpus

and also use a Google search result count as an alternative frequency, for every word in

a document.

Word Frequency in BNC Corpus: The British National Corpus (BNC; Lou and

Guy, 1998) is a 100 million word collection of written and spoken language from a

wide range of sources, designed to represent a wide cross-section of British English

from the later 20th century. For each word in a document, we calculate the distinct

word frequency (wf) that refers to the times it appears in the BNC corpus. Word

fre-quency is defined as follows:

| | ni wfi dj

= (4)

where ni, is the number of occurrences of the considered distinct word wi in

docu-ment dj, and the denominator is the sum of the number of occurrences of all distinct

words in document dj, that is, the size of the document | dj |. For each word in a given

document, we also calculate the average number of log word frequency. The

docu-ment’s difficulty value based on word frequency in the BNC corpus is defined as

fol-_ log 0

Google Search Result Count: For a given query, Google will return a list of

documents containing the queried words and a search result count. We use the search

result count as a measure of word frequency, like the word frequency from a corpus.

For each word in a given document, we also calculate the average number of log word

frequency. The document’s difficulty value based on word frequency from Google is

defined as follows:

where googlei is the search result count of a word i from Google.

5.1.4 Parse features

Syntactic constructions affect the understanding of a sentence. This assumes that

the more complicated a sentence, the greater its difficulty. Schwarm and Ostendorf

(2005) proposed four syntactic features for their measure of reading difficulty: the

av-erage parse tree height, the avav-erage number of noun phrases per sentence, the avav-erage

number of verb phrases per sentence, and the average number of subordinate clauses

per sentence (SBAR). Because sentences with multiple noun phrases require the reader

to remember more entities, Barzilay and Lapata (2008) found that documents written

for adults tended to contain more noun phrases than those written for children. In

addi-tion, while including more verb phrases in each sentence increases sentence complexity,

adults might prefer to have related clauses explicitly grouped together. Pitler and

Nenkova (2008) have also found a strong correlation between readability and the

number of verb phrases. These works show that the more complicated the parse

fea-tures in a document, the more likely it was written for adults. Hence, we also examine

the influence of parse features for second language learners.

Prepositions are a class of words that indicate relationships between nouns,

pro-nouns and other words in a sentence. Prepositions can be divided into two kinds:

sim-ple prepositions and compound prepositions. Simsim-ple prepositions are single word

prepositions, while compound prepositions are more than one word. We assume that

more prepositional phrases in a sentence also increase its complexity, and second

lan-guage learners might be confused by complex prepositional phrases. Thus, in addition

to the parsing features proposed by Schwarm and Ostendorf (2005), we also present

the average number of prepositional phrases as a new feature to capture grammatical

complexity.

Thus, from the outline above, for a document we consider the following syntactic

features from parse results generated by a Stanford parser (Klein and Manning, 2003):

the average parse tree height, the average number of noun phrases, the average number

of verb phrases, the average number of SBAR and the average number of prepositional

phrases.

Average Parse Tree Height: Suppose the height of a parse tree of a sentence is h.

The average parse tree height difficulty of a document is defined as follows:

_ 0

Average Number of Noun Phrases: Suppose a sentence has npi noun phrases.

The average noun phrase difficulty of a document is defined as follows:

Average Number of Verb Phrases: Suppose a sentence has vpi verb phrases. The

average verb phrase difficulty of a document is defined as follows:

Average Number of SBAR: Subsidiary conjunctions (SBAR), for example,

be-cause, unless, even though, and until, are placed at the beginning of a subordinate

clause that links the subordinate clause and the dominant clause. SBAR is an indicator

to measure sentence complexity. The SBAR difficulty of a document is defined as

fol-lows:

Average Number of Prepositional Phrases: Suppose a sentence has ppi

preposi-tional phrases. The average number of the preposipreposi-tional difficulty of a document is

de-fined as follows:

5.1.5 The grammar acquisition grade distributions features

In Heilman et al. (2007), they found that grammatical features played an

im-portant role in reading difficulty estimation for second-language learners. A model with

complex syntactic grammatical feature sets achieved more accurate results than

sim-pler models. In their work, they examined the ratio of grammatical occurrence per 100

words: both the passive voice and past participle had obvious differences between the

lowest and highest levels in the second-language corpus. Thus, we measure

grammati-cal difficulty as a linguistic processing factor in estimating reading difficulty for

sec-ond language learners.

Grading Index of Grammar (grammar1ɡgrammar6): To decide the

gram-matical difficulty level of a document, the same method described in Section 4.2. We

first collected sentences from the six versions of second-language textbooks and parsed

the sentences to find their grammar patterns, for a total of 44 grammar patterns.

Manu-ally identifying these grammar patterns allows the parse tool to then automaticManu-ally find

these same patterns within a given document. Next, using this parse tree structure

searching tool (Levy and Andrew, 2006), the grammatical structures were assigned to

the textbook grade in which they frequently appear.

5.1.6 Semantic features

For any given word, its meaning may vary broadly depending on the context. For

example, the word “bank” has two distinct meanings (also called two senses),

“finan-cial institution” and “sloping mound”, not to mention its other colloquial uses. For

both the word acquisition grade distributions and frequency features, we assume that a

word only has one sense, because this still results in accurate performance with many

language technologies, such as information retrieval or text classification. However, it

cannot be claimed that a second language learner having learned a word knows every

sense of the word. Therefore, we designed semantic features to identify word senses in

a document.

Average Number of WordNet Synsets: We adopted WordNet (Miller et al., 1990)

as a resource for understanding the senses in a word. WordNet is a large lexical

data-base of English. The datadata-base contains 155,287 words, with each word annotated with

A set of near-synonyms is defined as a synset, which represents a concept of a word.

For each word in a document, we total the number of a word’s synset using

WordNet. To determine the representation of this feature, we develop seven categories

(wordnet1ɡwordnet7) to represent the number of synsets of each word in a

docu-ment. Here, suppose a word has wsi synsets. The number is normalized as two square

roots and then rounded down to an integer as a feature index. For example, if the

number of synsets of a word is 17, it is attributed to wordnet4. If the number of

synsets of a word is greater than 49, it is assigned to wordnet7. Finally, we count the

number of distinct words in each WordNet category and normalize by the total number

of distinct words.

5.1.7 Relation features

Coreference is a grammatical relation that presents two referring expressions that

refer to the same entity. This entity is called an antecedent, and the referring expression

is called an anaphora. We assume that coreference represents the implicit relations

be-tween sentences. When second language learners recognize the coreferent relation well,

count the number of pronouns per document, the number of proper nouns per

docu-ment, the number of antecedents per docudocu-ment, the average number of anaphora per

coreference chain and the average distance between anaphora and antecedents per

chains.

Average Number of Pronouns: We assume that the greater the number of

pro-nouns in a document, the more entities the reader needs to remember, and this

increas-es reading difficulty. Thus, we total the average number of pronoun in a document.

Average Number of Proper Nouns: If a sentence contains more than one proper

noun, a reader must remember more objects in a document. Barzilay and Lapata (2008)

found that documents written for adults tended to contain more entities than those

written for children. Hence, we count the average number of proper nouns in a

docu-ment.

The Number of Antecedents per Document: Antecedents represent real entities

mentioned in the document. Similar to the average number of proper nouns, we assume

that if a document contains less entity, the document is easier to read. We total the

number of antecedents as the number of entities to capture this idea.

The Average Number of Anaphora per Corference Chain (corefer_chain):

We assume that with more anaphora per coreference chain, second language learners

need more knowledge to resolve them; consequently, we count the average number of

anaphora per chain.

The Average Distance between Anaphora and Antecedents per Chain

(co-refer_distance): This captures the distance between antecedents and anaphora. We

assume if an antecedent and anaphora are in the same sentence, the sentence will be

easy to understand. In contrast, if they are several sentences apart, it is probable that

the document is more complex to read.

5.1.8 Regression model

Linear regression is an approach to modeling the relationship between a scalar

variable Y and variables denoted X. A prediction of a given document is the inner

product of a vector of feature values for the document and a vector of regression

coef-ficients estimated from the training data.

, 1,2,...

Y = +α i¦n βi iX +ε i= n

= ⁽¹²⁾

where Y is the difficulty value of a document, Ƚis the intercept parameter, X = {x^1, x2, …, xn} represents the feature values, β={ ,β β1 2,...,βn}refers to the regression

co-efficient for each feature value i, and lastly ߝ is an unobserved random variable that represents noise in the linear relationship between the dependent variable and

regres-sors.

The primary reason for adopting linear regression for a reading difficulty model is

that the output scores are continuous and related with each other, whereas the outputs

of other methods, such as classification, are discrete and unrelated between levels.

Kate et al. (2010) evaluated the performance of reading difficulty among several

ma-chine learning methods and reported that all of the regression family outperformed the

baseline. In our study, the readability of a text is represented by a score or a class,

which is typically indicated in terms of school grades. Overall, the content difficulty of

textbooks increases incrementally. Thus, we opt for linear regression as our model, as

we assert that our estimated results are correlated.

5.2 Ability estimation

In this section, we propose an interpretable and statistical ability estimation with

inherent randomness in the acquisition process, specifically in the Web-based learning

environment. This model draws a connection between students’ abilities and the

acqui-sition grade distributions. For a student who is said to be a grade level six in this work,

our method is able to estimate how much the student has acquired as a certain

per-centage of the knowledge in a population when he correctly answers a certain

percent-age of items on a test.

We propose the following interpretation of the quantitative denition: an

exami-nee is said to have ability if s percent of items in a test T = (t1, . . . ,tm) have been

correctly answered each by r percent of the population. We first consider that each item

ti in a test T has been correctly answered by r percent of the population. In general,

there is a specific knowledge behind each tested item ti. The difficulty level of the

spe-cific knowledge represents the age at which most people have acquired knowledge of ti.

Most people understand some knowledge at an early age, whereas some understand

this knowledge later in life. Here, we precisely denote the level the specific knowledge

by the age at which r percent of the population has acquired knowledge of ti, where age

refer to school grades. When given a knowledge ti and a population, the probability

distribution of grade acquisition pt() can be calculated. Let the quantile function qt of

在文檔中個人化電腦輔助出題於英文學習之研究 (頁 59-84)