In this section, analyses of data are presented in four parts. First, the
participants’ immediate word gain and delayed word retention are examined through
two-way repeated measures MANOVA, one-way MANOVAs, forgetting rate analyses
and independent-samples t tests between the word gain after correct and incorrect
responses made during the unglossed word encounter. Second, analyses of the
learners’ self-report of the gloss use are presented. Third, effects of the interventions
on reading comprehension are reported quantitatively from the participants’
performance on the reading comprehension test and qualitatively from the think-aloud
data. Finally, the participants’ preference and perceived usefulness of different
intervention patterns are presented.
Vocabulary Learning Performance
Research Question 1: Which lexical intervention yields better word gain? Glossing
preceded by inference, glossing followed by retrieval, or full glossing?
To answer the first research question, all the vocabulary test scores from the
participants who did not think aloud were analyzed quantitatively. Since only the
data of the participants who did not show any knowledge of the target words before
the treatment were under analyses, the participants’ baseline target word knowledge
was assumed to be zero and any knowledge they showed on the vocabulary tests was 44.00% 68.75% 74.38% 44.63% 75.75% 80.50% 33.88% 55.75% 66.00%
Delayed 15.63% 44.88% 57.50% 19.00% 58.13% 66.63% 15.00% 39.13% 53.50%
Note. Maximum word gain score was eight in each cell. Assessment tasks: form recall (FR), meaning recall (MR) and meaning recognition (REC).
Judging from the mean scores, gloss-retrieval-gloss (GRG) yielded the most
word gain and retention on all of the tests, followed by inference-gloss-gloss (IGG),
while full glossing (GGG) the least. In the full glossing condition, at least more than
one third of the word form and more than half of the word meaning were recalled, and
5 According to Knight (1994), the learning percentage is calculated by dividing the difference between the exposure mean score and the no-exposure mean score by the number of target words and then
almost two thirds of the word meaning was recognized on the immediate measures.
Two weeks later, at least 15% of form and almost 40% of meaning were retained and
more than half of the meaning was still recognized. In the second best
inference-gloss-gloss condition, the immediate word gain ranged from 44% in form
recall, more than 68% in meaning recall to almost 75% in meaning recognition and
the gain was retained at more than 15% in form recall, almost 45% in meaning recall
and 57.5% in meaning recognition. In the best gloss-retrieval-gloss condition, more
than 44% of form and 75% of meaning were recalled and as high as 80.5% of
meaning was recognized immediately, and almost one fifth of form was retained and
more than 58% and 66% of meaning was recalled and recognized respectively 2
weeks later. The two-way repeated measures MANOVA showed that there were
significant main effects for treatment conditions, F(6, 226) = 2.94, p < .01, and time,
F(3, 113) = 61.43, p < .01, but no significant interaction between the two factors, F(6,
226) = .90, n.s.
The effect for treatment conditions was further examined from tests of
between-subject effects. They revealed that the significant differences were
observed under the meaning recall test, F(2, 115) = 8.32, p < .01, and meaning
recognition test, F(2, 115) = 4.31, p < .05, with the exception of the form recall test,
F(2, 115) = 2.15, n.s. The differences in the two meaning-related tests were
examined via Scheffé’s multiple comparisons. The Scheffé’s multiple comparisons
revealed that the GRG condition was significantly better than the GGG condition in
meaning recall and recognition as had been expected but that the IGG condition was
not significantly better than the GGG condition as had been hypothesized (see Table
6).
Table 6
Overall multiple comparisons between treatments
Form Recall Meaning Recall Meaning Recognition Treatments Mean
Difference Sig. Mean
Difference Sig. Mean
Difference Sig.
IGG vs. GRG -.16 .12 -.82 .11 -.61 .27
IGG vs. GGG .43 .34 .75 .15 .49 .42
GRG vs. GGG .59 .14 1.56 .00** 1.10 .02*
The finding that the GRG learners outperformed the GGG learners in meaning
learning but not in form learning is consistent with Nation (2001). Nation mentioned
that the most effective kind of form learning was implicit learning involving noticing
such as repeated word encounters while the most effective type of meaning learning
was strong explicit learning that required depth of processing. Since GRG and GGG
may only differ in depth of processing, but not in word repetitions, the treatment
differences can only be observed in meaning recall and recognition.
They showed that the amount of word knowledge varied significantly across time in
all the three types of tests, including the form recall test, F(1, 115) = 165.08, p < .01,
the meaning recall test, F(1, 115) = 66.49, p < .01, and the meaning recognition test,
F(1, 115) = 36.18, p < .01.
To further examine the treatment differences at different test periods, two
one-way MANOVAs were conducted: one with word knowledge measured
immediately and the other with word retention measured 2 weeks later. For the
immediate measures, the significant effects were observed under the form recall test,
F(2, 115) = 3.09, p < .05, the meaning recall test, F(2, 115) = 6.97, p < .01, and the
meaning recognition test, F(2, 115) = 3.81, p < .05. The Scheffé’s multiple
comparisons revealed that the GRG condition was significantly better than the GGG
condition in meaning recall and recognition (see Table 7).
Table 7
Multiple comparisons between treatments on immediate posttests
Form Recall Meaning Recall Meaning Recognition Treatments Mean
Difference Sig. Mean
Difference Sig. Mean
Difference Sig.
IGG vs. GRG -.05 .99 -.56 .43 -.49 .51
IGG vs. GGG .83 .12 1.04 .06 .67 .28
GRG vs. GGG .86 .09 1.60 .00** 1.15 .03*
meaning recall test, F(2, 115) = 5.72, p < .01, but not on the form recall test, F(2, 115)
= .64, n.s. and the meaning recognition test, F(2, 115) = 2.67, n.s. The Scheffé’s
multiple comparisons revealed that the GRG condition was significantly better than
the GGG condition in meaning recall (see Table 8).
Table 8
Multiple comparisons between treatments on delayed posttests
Form Recall Meaning Recall Meaning Recognition Treatments Mean
Difference Sig. Mean
Difference Sig. Mean
Difference Sig.
IGG vs. GRG -.27 .67 -1.07 .46 -.73 .29
IGG vs. GGG .05 .99 .46 .61 .32 .79
GRG vs. GGG .32 .58 1.53 .00** 1.05 .08
Table 9
The forgetting rate (FR) on each test under each treatment
Form Recall Meaning Recall Meaning Recognition Treatments Mean
Difference FR Mean
Difference FR Mean
Difference FR
IGG 2.27 64.49% 1.91 34.73% 1.35 22.69%
GRG 2.05 57.42% 1.41 23.27% 1.11 17.24%
GGG 1.51 55.72% 1.33 29.82% 1.00 18.94%
The participants’ immediate and delayed vocabulary performances further
underwent the forgetting rate analyses. Based on Groot (2000), the forgetting rate
the delayed posttest by the mean score on the immediate posttest and then converting
the quotient into percentage. Table 9 showed that the greatest rates of attrition
occurred on the form recall test (55.72%~64.49%) followed by the meaning recall test
(22.27%~34.73%) and the meaning recognition test (17.24%~22.69%) in all of the
treatments. In the comparison among the treatments, all of the highest forgetting
rates were observed in the IGG condition and almost all of the smallest forgetting
rates were found in the GRG condition except for the form recall test, the rate in
which was a little higher than its counterpart in the GGG condition by less than 2%.
The forgetting rate difference between the GRG condition and the GGG condition was
not large, only ranging from 1.70% on the form recall and meaning recognition tests
to 6.55% on the meaning recall test. In contrast, the forgetting rate difference
between the IGG condition and the other two conditions was relatively larger. The
differences between the IGG condition and the GRG condition ranged from 5.45% on
the meaning recognition test, 7.07% on the form recall test to 11.46% on the meaning
recall test. The differences between the IGG condition and the GGG condition
ranged from 3.75% on the meaning recognition test, 4.91% on the meaning recall test
to 8.77% on the form recall test. The smaller rate differences in the GRG and GGG
conditions showed that both conditions, though the former generated the highest word
gain whereas the latter the least, were less susceptible to attrition than the IGG
condition. The GRG condition may have some superiority over the other two
conditions, so its word gain and retention were the highest and most of its forgetting
rates were the lowest. As for the GGG condition, its low forgetting rates could be
attributed to a floor effect since it generated the least word gain.
Table 10
Means and standard deviations of word gain after correct and incorrect responses
Form recall Meaning recall Meaning
vocabulary gain and retention, the learners’ responses during the inferring stage in the
IGG condition or the retrieval stage in the GRG condition were scored by two
independent raters. Interrater reliability analyses showed that the agreement
reached .80 (p < .01) in the IGG condition, .95 (p < .01) in the GRG condition,
and .88 (p < .01) in the two conditions. Analyses of the participants’ responses
during inference or retrieval revealed that the rate of correct responses was higher
during retrieval than inferencing, with 53.85% in retrieval and 34.38% in inferencing.
Overall in Table 10, the mean scores of correct responses were higher than those
of incorrect responses, with the exception on the immediate meaning recognition test.
In the IGG condition, the higher mean scores were observed in correct responses as
well except on the immediate meaning recognition test and on the delayed meaning
recall test. In the GRG condition, the better mean scores were also noticed in correct
responses except on the immediate meaning recognition test and the delayed form and
meaning recall tests.
Table 11
Independent-samples t test results between correct and incorrect responses
Form recall Meaning recall Meaning
recognition
Treatments IM DE IM DE IM DE
MD Sig. MD Sig. MD Sig. MD Sig. MD Sig. MD Sig.
IGG .04 .57 .10 .02* .19 .00** -.02 .73 -.09 .11 .09 .11 GRG .08 .05* -.00 .86 .01 .76 -.02 .68 -.00 .97 .05 .35 IGG+GRG .06 .12 .05 .05* .11 .00** .01 .83 -.02 .57 .09 .02*
Independent-samples t tests between the word gain after correct and incorrect
responses showed that the significant differences were only observed when the mean
score of correct responses was higher than that of incorrect responses (see Table 11).
In other words, correct responses were more likely to contribute to word learning than
incorrect responses. In the IGG and GRG conditions, correctness of the responses
was a significant factor in the delayed form recall performance, p < .05, the immediate
meaning recall performance, p < .01 and the delayed meaning recognition
performance, p < .05. In the IGG condition, the correctness of the inferences played
a significant role in the delayed form recall performance, p < .05 and the immediate
meaning recall performance, p < .01. In the GRG condition, the significant
difference was only observed on the immediate form recall test, p < .05. Therefore,
whether the response during the inferencing or the retrieval stage was correct or not
might be a factor in determining the word gain and retention, especially in the IGG
condition.
To sum up, among the three treatment conditions, the GRG condition yielded the
best word gain as well as retention and it significantly outperformed the GGG
condition on the immediate meaning-related tests and on the delayed meaning recall
test. Aside from generating the greatest word gain and retention, the forgetting rate
analyses also revealed that the GRG condition was generally the least subject to
forgetting.
Examination of response correctness during inferencing or retrieval and
independent-samples t tests showed that the superiority of the GRG condition over the
IGG condition might lie in the higher rate of correct responses during retrieval in the
GRG condition.
Gloss Use
Research question 2: How do learners make use of glosses in each gloss encounter
under different lexical interventions?
The post-treatment survey showed that the non-think-aloud-participants made
use of glosses differently in different gloss encounters. The Chi-square test showed
that the participants’ gloss use differed by the times of the gloss encounter, with
χ2(117) = 33.70, p < .01 in the GGG condition, χ2(78) = 14.54, p < .01 in the GRG
condition, and χ2(80) = 10.95, p < .05 in the IGG condition.
Generally speaking, as the frequency of encountering glosses increased, the
participants’ reliance on glosses decreased. In the first gloss encounter, the learners
in the three treatment conditions tended to refer to glosses directly; in the second gloss
encounter, they tended to think about the meaning first before referring to glosses.
The Chi-square test revealed that how glosses were used in the first two gloss
encounters did not differ by treatment conditions, with χ2(118) = 6.31, n.s. in the first
gloss encounter, and χ2(118) = 9.80, n.s. in the second gloss encounter. In the third
gloss encounter, which only occurred in the GGG condition, there were much fewer
learners checking the glosses directly and more learners ignoring them altogether.
Detailed gloss use distribution in each treatment condition is reported as follows.
Table 12 Note. Gloss use strategies: usually referred to the gloss directly (G), usually thought about the meaning first and then referred to the gloss (TG), sometimes referred to the gloss directly but sometimes thought about the meaning first before referring to the gloss (G/TG), usually thought about the meaning by myself without referring to the gloss (T), and usually kept reading without thinking about the meaning or referring to the gloss (X).
Table 12 shows that most of the GGG learners checked the gloss directly at first
sight of the target words or the glosses. Some thought about the meaning first with
the meaning unchecked or checked against the gloss, but few ignored the gloss or the
word. In the second encounter of the target words and their glosses, the majority of
the learners still checked the gloss, but more of them did not do so until they thought
about the meaning themselves first. In the third encounter of the target words and
their glosses, there were fewer learners checking the gloss directly and more learners
ignoring the words together with their glosses or thinking about the meaning with or
without the meaning checked against the gloss afterward.
Table 13
Gloss use in the GRG condition
Gloss Strategies
Encounters G TG G/TG T X
First (N = 39) Count 16 3 15 1 4
% within encounter 41.03 7.69 38.46 2.56 10.26 % within strategy 69.57 21.43 53.57 12.50 80.00
% of total 20.51 3.85 19.23 1.28 5.13
Second (N = 39) Count 7 11 13 7 1
% within encounter 17.95 28.21 33.33 17.95 2.56 % within strategy 30.43 78.57 46.43 87.50 20.00
% of total 8.97 14.10 16.67 8.97 1.28
Total (N = 78) Count 23 14 28 8 5
% of total 29.49 17.95 35.90 10.26 6.41
Table 13 shows that similar to the GGG condition, most of the GRG learners
checked the glosses directly in their first word and gloss encounter. In addition,
some of them also thought about the meaning before checking the gloss, but few
learners ignored the gloss or the word. The similarity in the gloss use between the
two conditions in this word and gloss encounter was supported by the Chi-square test,
with χ2(78) = 4.27, n.s. In the second gloss encounter of the GRG condition (i.e. the
third word encounter) after the retrieval task, the gloss use distribution was similar to
that in the second gloss and word encounter in the GGG condition, and the similarity
of the two conditions was supported in the chi-square test, χ2(78) = 6.86, n.s.
However, the distribution of the gloss use was less similar to that in the third word
< .05. There were much fewer GRG learners ignoring the word together with the
Chi-square value in the gloss use comparisons between treatments
Treatments
Table 14 shows that the distribution of the gloss use in the first gloss encounter
(i.e. the second word encounter) among the IGG learners after the inferencing task
was similar to that in the first gloss encounter in the other two treatment conditions,
between the IGG condition and the other two conditions in this gloss encounter was
supported in the chi-square tests, with χ2(87) = .89, n.s. between IGG and GRG and
χ2(79) = 4.72, n.s. between IGG and GGG. Compared with the second word and
gloss encounter in the GGG condition, there were far more IGG learners referring to
the gloss directly in the second word encounter. However, the Chi-square test
revealed that in the second word encounter, the gloss use between the IGG and GGG
learners did not differ significantly, with χ2(79) = 7.36, n.s.6 In the second gloss
encounter of the IGG condition (i.e. the third word encounter), there were a growing
number of learners thinking about the meaning themselves first as in the GRG
condition, but unlike the GRG condition, more IGG learners ignored the words
together with their glosses. The chi-square test showed that there was a significant
difference between the two conditions in the third word or the second gloss encounter,
with χ2(79) = 11.28, p < .05. In contrast, the chi-square tests between the IGG
condition and the GGG condition either in the second gloss encounter or in the third
word encounter did not reach a significant difference, with χ2(79) = 6.67, n.s. and
χ2(79) = 5.99, n.s., respectively. What the IGG second gloss encounter and the GGG
second and third word encounters had in common was that they followed the previous
6 When the data collected from the think-aloud participants were included in the analysis, the
gloss encounter continuously. The tendency seemed to suggest that the learners’ use
of glosses was influenced by whether a gloss occurred continuously or not. Table 15
displays the chi-square test results between treatments in different word or gloss
encounters.
The results from the gloss use survey seemed to support the second hypothesis.
The learners in the inference-gloss-gloss condition tended to check the gloss directly
in the second target word encounter probably to verify their inference and might have
referred to the gloss only when they failed to retrieve the word meaning in the third
target word encounter. In the gloss-retrieval-gloss condition, learners referred to the
gloss in the first target word encounter and compared with the other conditions, more
GRG learners still did so in the third target word encounter probably to verify their
retrieval. In the full glossing condition, learners referred to the gloss in the first
target word encounter, and their attention to glosses decreased in the following two
target word encounters.
Reading Comprehension
Research question 3: What is the effect of the lexical interventions on text
comprehension?
To answer this question, all the data from the participants who did not think
aloud were analyzed quantitatively and all the data from the think-aloud participants
were analyzed qualitatively. The quantitative analyses showed that the treatment
conditions did not make too much difference in the participants’ reading
comprehension (see Table 16). Although the average score in the full glossing
condition was a little higher than that in the other two treatments, especially the IGG
condition, the ANOVA showed that there was no significant differences between
treatment conditions, F(6, 226) = 1.99, n.s. As had been expected, full glossing,
though yielding the least word learning, contributed to reading comprehension quite
well. However, the tendency went against the hypothesis that inference-gloss-gloss
might result in the best reading comprehension and that the gloss-retrieval-gloss
condition might hinder comprehension the most.
Table 16
Means and standard deviations of reading comprehension
IGG (N = 40) GRG (N = 39) GGG (N = 39) Total (N = 118)
Mean 2.35 2.54 2.82 2.57
SD .74 .91 1.00 .90
Note. Maximum word gain score was four.
Although the treatment conditions did not affect the reading comprehension
score significantly, the think-aloud protocol showed that learners might have
interpreted a proposition incorrectly when the text was not assisted with glosses.
Take the following excerpt for example.
Excerpt 1 (wrong interpretation)
Input passage: However, if the way the craquelures go through the signature of a painting is like how they go through the rest of the paint, the signature is more likely to be real.
Learner’s interpretation: rán ér rú guǒ [pause] zhè gè de fāng shì, go through [pause] chuàng zuò, qiān míng de fāng shì, yě xiàng tā men chuàng, go through the rest of the [pause] yòng qiān míng de fāng shì hé tā men, zài huà zuò pǐn de fāng shì shì yī yàng de, nà me zhè gè qiān míng, nà me zhè gè, zhè gè shǔ míng, kě néng jiù shì, zhēn de7. (GRG2)
Learner’s interpretation translated in English: However if [pause] this way,
Learner’s interpretation translated in English: However if [pause] this way,