CHAPTER FOUR RESULTS - 註解前後字彙推論或檢索對單字學習之影響

In this section, analyses of data are presented in four parts. First, the

participants’ immediate word gain and delayed word retention are examined through

two-way repeated measures MANOVA, one-way MANOVAs, forgetting rate analyses

and independent-samples t tests between the word gain after correct and incorrect

responses made during the unglossed word encounter. Second, analyses of the

learners’ self-report of the gloss use are presented. Third, effects of the interventions

on reading comprehension are reported quantitatively from the participants’

performance on the reading comprehension test and qualitatively from the think-aloud

data. Finally, the participants’ preference and perceived usefulness of different

intervention patterns are presented.

Vocabulary Learning Performance

Research Question 1: Which lexical intervention yields better word gain? Glossing

preceded by inference, glossing followed by retrieval, or full glossing?

To answer the first research question, all the vocabulary test scores from the

participants who did not think aloud were analyzed quantitatively. Since only the

data of the participants who did not show any knowledge of the target words before

the treatment were under analyses, the participants’ baseline target word knowledge

was assumed to be zero and any knowledge they showed on the vocabulary tests was 44.00% 68.75% 74.38% 44.63% 75.75% 80.50% 33.88% 55.75% 66.00%

Delayed 15.63% 44.88% 57.50% 19.00% 58.13% 66.63% 15.00% 39.13% 53.50%

Note. Maximum word gain score was eight in each cell. Assessment tasks: form recall (FR), meaning recall (MR) and meaning recognition (REC).

Judging from the mean scores, gloss-retrieval-gloss (GRG) yielded the most

word gain and retention on all of the tests, followed by inference-gloss-gloss (IGG),

while full glossing (GGG) the least. In the full glossing condition, at least more than

one third of the word form and more than half of the word meaning were recalled, and

5 According to Knight (1994), the learning percentage is calculated by dividing the difference between the exposure mean score and the no-exposure mean score by the number of target words and then

almost two thirds of the word meaning was recognized on the immediate measures.

Two weeks later, at least 15% of form and almost 40% of meaning were retained and

more than half of the meaning was still recognized. In the second best

inference-gloss-gloss condition, the immediate word gain ranged from 44% in form

recall, more than 68% in meaning recall to almost 75% in meaning recognition and

the gain was retained at more than 15% in form recall, almost 45% in meaning recall

and 57.5% in meaning recognition. In the best gloss-retrieval-gloss condition, more

than 44% of form and 75% of meaning were recalled and as high as 80.5% of

meaning was recognized immediately, and almost one fifth of form was retained and

more than 58% and 66% of meaning was recalled and recognized respectively 2

weeks later. The two-way repeated measures MANOVA showed that there were

significant main effects for treatment conditions, F(6, 226) = 2.94, p < .01, and time,

F(3, 113) = 61.43, p < .01, but no significant interaction between the two factors, F(6,

226) = .90, n.s.

The effect for treatment conditions was further examined from tests of

between-subject effects. They revealed that the significant differences were

observed under the meaning recall test, F(2, 115) = 8.32, p < .01, and meaning

recognition test, F(2, 115) = 4.31, p < .05, with the exception of the form recall test,

F(2, 115) = 2.15, n.s. The differences in the two meaning-related tests were

examined via Scheffé’s multiple comparisons. The Scheffé’s multiple comparisons

revealed that the GRG condition was significantly better than the GGG condition in

meaning recall and recognition as had been expected but that the IGG condition was

not significantly better than the GGG condition as had been hypothesized (see Table

6).

Table 6

Overall multiple comparisons between treatments

Form Recall Meaning Recall Meaning Recognition Treatments Mean

Difference Sig. Mean

Difference Sig.

IGG vs. GRG -.16 .12 -.82 .11 -.61 .27

IGG vs. GGG .43 .34 .75 .15 .49 .42

GRG vs. GGG .59 .14 1.56 .00** 1.10 .02*

The finding that the GRG learners outperformed the GGG learners in meaning

learning but not in form learning is consistent with Nation (2001). Nation mentioned

that the most effective kind of form learning was implicit learning involving noticing

such as repeated word encounters while the most effective type of meaning learning

was strong explicit learning that required depth of processing. Since GRG and GGG

may only differ in depth of processing, but not in word repetitions, the treatment

differences can only be observed in meaning recall and recognition.

They showed that the amount of word knowledge varied significantly across time in

all the three types of tests, including the form recall test, F(1, 115) = 165.08, p < .01,

the meaning recall test, F(1, 115) = 66.49, p < .01, and the meaning recognition test,

F(1, 115) = 36.18, p < .01.

To further examine the treatment differences at different test periods, two

one-way MANOVAs were conducted: one with word knowledge measured

immediately and the other with word retention measured 2 weeks later. For the

immediate measures, the significant effects were observed under the form recall test,

F(2, 115) = 3.09, p < .05, the meaning recall test, F(2, 115) = 6.97, p < .01, and the

meaning recognition test, F(2, 115) = 3.81, p < .05. The Scheffé’s multiple

comparisons revealed that the GRG condition was significantly better than the GGG

condition in meaning recall and recognition (see Table 7).

Table 7

Multiple comparisons between treatments on immediate posttests

Form Recall Meaning Recall Meaning Recognition Treatments Mean

Difference Sig. Mean

Difference Sig.

IGG vs. GRG -.05 .99 -.56 .43 -.49 .51

IGG vs. GGG .83 .12 1.04 .06 .67 .28

GRG vs. GGG .86 .09 1.60 .00** 1.15 .03*

meaning recall test, F(2, 115) = 5.72, p < .01, but not on the form recall test, F(2, 115)

= .64, n.s. and the meaning recognition test, F(2, 115) = 2.67, n.s. The Scheffé’s

multiple comparisons revealed that the GRG condition was significantly better than

the GGG condition in meaning recall (see Table 8).

Table 8

Multiple comparisons between treatments on delayed posttests

Form Recall Meaning Recall Meaning Recognition Treatments Mean

Difference Sig. Mean

Difference Sig.

IGG vs. GRG -.27 .67 -1.07 .46 -.73 .29

IGG vs. GGG .05 .99 .46 .61 .32 .79

GRG vs. GGG .32 .58 1.53 .00** 1.05 .08

Table 9

The forgetting rate (FR) on each test under each treatment

Form Recall Meaning Recall Meaning Recognition Treatments Mean

Difference FR Mean

Difference FR

IGG 2.27 64.49% 1.91 34.73% 1.35 22.69%

GRG 2.05 57.42% 1.41 23.27% 1.11 17.24%

GGG 1.51 55.72% 1.33 29.82% 1.00 18.94%

The participants’ immediate and delayed vocabulary performances further

underwent the forgetting rate analyses. Based on Groot (2000), the forgetting rate

the delayed posttest by the mean score on the immediate posttest and then converting

the quotient into percentage. Table 9 showed that the greatest rates of attrition

occurred on the form recall test (55.72%~64.49%) followed by the meaning recall test

(22.27%~34.73%) and the meaning recognition test (17.24%~22.69%) in all of the

treatments. In the comparison among the treatments, all of the highest forgetting

rates were observed in the IGG condition and almost all of the smallest forgetting

rates were found in the GRG condition except for the form recall test, the rate in

which was a little higher than its counterpart in the GGG condition by less than 2%.

The forgetting rate difference between the GRG condition and the GGG condition was

not large, only ranging from 1.70% on the form recall and meaning recognition tests

to 6.55% on the meaning recall test. In contrast, the forgetting rate difference

between the IGG condition and the other two conditions was relatively larger. The

differences between the IGG condition and the GRG condition ranged from 5.45% on

the meaning recognition test, 7.07% on the form recall test to 11.46% on the meaning

recall test. The differences between the IGG condition and the GGG condition

ranged from 3.75% on the meaning recognition test, 4.91% on the meaning recall test

to 8.77% on the form recall test. The smaller rate differences in the GRG and GGG

conditions showed that both conditions, though the former generated the highest word

gain whereas the latter the least, were less susceptible to attrition than the IGG

condition. The GRG condition may have some superiority over the other two

conditions, so its word gain and retention were the highest and most of its forgetting

rates were the lowest. As for the GGG condition, its low forgetting rates could be

attributed to a floor effect since it generated the least word gain.

Table 10

Means and standard deviations of word gain after correct and incorrect responses

Form recall Meaning recall Meaning

vocabulary gain and retention, the learners’ responses during the inferring stage in the

IGG condition or the retrieval stage in the GRG condition were scored by two

independent raters. Interrater reliability analyses showed that the agreement

reached .80 (p < .01) in the IGG condition, .95 (p < .01) in the GRG condition,

and .88 (p < .01) in the two conditions. Analyses of the participants’ responses

during inference or retrieval revealed that the rate of correct responses was higher

during retrieval than inferencing, with 53.85% in retrieval and 34.38% in inferencing.

Overall in Table 10, the mean scores of correct responses were higher than those

of incorrect responses, with the exception on the immediate meaning recognition test.

In the IGG condition, the higher mean scores were observed in correct responses as

well except on the immediate meaning recognition test and on the delayed meaning

recall test. In the GRG condition, the better mean scores were also noticed in correct

responses except on the immediate meaning recognition test and the delayed form and

meaning recall tests.

Table 11

Independent-samples t test results between correct and incorrect responses

Form recall Meaning recall Meaning

recognition

Treatments IM DE IM DE IM DE

MD Sig. MD Sig. MD Sig. MD Sig. MD Sig. MD Sig.

IGG .04 .57 .10 .02* .19 .00** -.02 .73 -.09 .11 .09 .11 GRG .08 .05* -.00 .86 .01 .76 -.02 .68 -.00 .97 .05 .35 IGG+GRG .06 .12 .05 .05* .11 .00** .01 .83 -.02 .57 .09 .02*

Independent-samples t tests between the word gain after correct and incorrect

responses showed that the significant differences were only observed when the mean

score of correct responses was higher than that of incorrect responses (see Table 11).

In other words, correct responses were more likely to contribute to word learning than

incorrect responses. In the IGG and GRG conditions, correctness of the responses

was a significant factor in the delayed form recall performance, p < .05, the immediate

meaning recall performance, p < .01 and the delayed meaning recognition

performance, p < .05. In the IGG condition, the correctness of the inferences played

a significant role in the delayed form recall performance, p < .05 and the immediate

meaning recall performance, p < .01. In the GRG condition, the significant

difference was only observed on the immediate form recall test, p < .05. Therefore,

whether the response during the inferencing or the retrieval stage was correct or not

might be a factor in determining the word gain and retention, especially in the IGG

condition.

To sum up, among the three treatment conditions, the GRG condition yielded the

best word gain as well as retention and it significantly outperformed the GGG

condition on the immediate meaning-related tests and on the delayed meaning recall

test. Aside from generating the greatest word gain and retention, the forgetting rate

analyses also revealed that the GRG condition was generally the least subject to

forgetting.

Examination of response correctness during inferencing or retrieval and

independent-samples t tests showed that the superiority of the GRG condition over the

IGG condition might lie in the higher rate of correct responses during retrieval in the

GRG condition.

Gloss Use

Research question 2: How do learners make use of glosses in each gloss encounter

under different lexical interventions?

The post-treatment survey showed that the non-think-aloud-participants made

use of glosses differently in different gloss encounters. The Chi-square test showed

that the participants’ gloss use differed by the times of the gloss encounter, with

χ²(117) = 33.70, p < .01 in the GGG condition, χ²(78) = 14.54, p < .01 in the GRG

condition, and χ²(80) = 10.95, p < .05 in the IGG condition.

Generally speaking, as the frequency of encountering glosses increased, the

participants’ reliance on glosses decreased. In the first gloss encounter, the learners

in the three treatment conditions tended to refer to glosses directly; in the second gloss

encounter, they tended to think about the meaning first before referring to glosses.

The Chi-square test revealed that how glosses were used in the first two gloss

encounters did not differ by treatment conditions, with χ²(118) = 6.31, n.s. in the first

gloss encounter, and χ²(118) = 9.80, n.s. in the second gloss encounter. In the third

gloss encounter, which only occurred in the GGG condition, there were much fewer

learners checking the glosses directly and more learners ignoring them altogether.

Detailed gloss use distribution in each treatment condition is reported as follows.

Table 12 Note. Gloss use strategies: usually referred to the gloss directly (G), usually thought about the meaning first and then referred to the gloss (TG), sometimes referred to the gloss directly but sometimes thought about the meaning first before referring to the gloss (G/TG), usually thought about the meaning by myself without referring to the gloss (T), and usually kept reading without thinking about the meaning or referring to the gloss (X).

Table 12 shows that most of the GGG learners checked the gloss directly at first

sight of the target words or the glosses. Some thought about the meaning first with

the meaning unchecked or checked against the gloss, but few ignored the gloss or the

word. In the second encounter of the target words and their glosses, the majority of

the learners still checked the gloss, but more of them did not do so until they thought

about the meaning themselves first. In the third encounter of the target words and

their glosses, there were fewer learners checking the gloss directly and more learners

ignoring the words together with their glosses or thinking about the meaning with or

without the meaning checked against the gloss afterward.

Table 13

Gloss use in the GRG condition

Gloss Strategies

Encounters G TG G/TG T X

First (N = 39) Count 16 3 15 1 4

% within encounter 41.03 7.69 38.46 2.56 10.26 % within strategy 69.57 21.43 53.57 12.50 80.00

% of total 20.51 3.85 19.23 1.28 5.13

Second (N = 39) Count 7 11 13 7 1

% within encounter 17.95 28.21 33.33 17.95 2.56 % within strategy 30.43 78.57 46.43 87.50 20.00

% of total 8.97 14.10 16.67 8.97 1.28

Total (N = 78) Count 23 14 28 8 5

% of total 29.49 17.95 35.90 10.26 6.41

Table 13 shows that similar to the GGG condition, most of the GRG learners

checked the glosses directly in their first word and gloss encounter. In addition,

some of them also thought about the meaning before checking the gloss, but few

learners ignored the gloss or the word. The similarity in the gloss use between the

two conditions in this word and gloss encounter was supported by the Chi-square test,

with χ²(78) = 4.27, n.s. In the second gloss encounter of the GRG condition (i.e. the

third word encounter) after the retrieval task, the gloss use distribution was similar to

that in the second gloss and word encounter in the GGG condition, and the similarity

of the two conditions was supported in the chi-square test, χ²(78) = 6.86, n.s.

However, the distribution of the gloss use was less similar to that in the third word

< .05. There were much fewer GRG learners ignoring the word together with the

Chi-square value in the gloss use comparisons between treatments

Treatments

Table 14 shows that the distribution of the gloss use in the first gloss encounter

(i.e. the second word encounter) among the IGG learners after the inferencing task

was similar to that in the first gloss encounter in the other two treatment conditions,

between the IGG condition and the other two conditions in this gloss encounter was

supported in the chi-square tests, with χ²(87) = .89, n.s. between IGG and GRG and

χ²(79) = 4.72, n.s. between IGG and GGG. Compared with the second word and

gloss encounter in the GGG condition, there were far more IGG learners referring to

the gloss directly in the second word encounter. However, the Chi-square test

revealed that in the second word encounter, the gloss use between the IGG and GGG

learners did not differ significantly, with χ²(79) = 7.36, n.s.⁶ In the second gloss

encounter of the IGG condition (i.e. the third word encounter), there were a growing

number of learners thinking about the meaning themselves first as in the GRG

condition, but unlike the GRG condition, more IGG learners ignored the words

together with their glosses. The chi-square test showed that there was a significant

difference between the two conditions in the third word or the second gloss encounter,

with χ²(79) = 11.28, p < .05. In contrast, the chi-square tests between the IGG

condition and the GGG condition either in the second gloss encounter or in the third

word encounter did not reach a significant difference, with χ²(79) = 6.67, n.s. and

χ²(79) = 5.99, n.s., respectively. What the IGG second gloss encounter and the GGG

second and third word encounters had in common was that they followed the previous

6 When the data collected from the think-aloud participants were included in the analysis, the

gloss encounter continuously. The tendency seemed to suggest that the learners’ use

of glosses was influenced by whether a gloss occurred continuously or not. Table 15

displays the chi-square test results between treatments in different word or gloss

encounters.

The results from the gloss use survey seemed to support the second hypothesis.

The learners in the inference-gloss-gloss condition tended to check the gloss directly

in the second target word encounter probably to verify their inference and might have

referred to the gloss only when they failed to retrieve the word meaning in the third

target word encounter. In the gloss-retrieval-gloss condition, learners referred to the

gloss in the first target word encounter and compared with the other conditions, more

GRG learners still did so in the third target word encounter probably to verify their

retrieval. In the full glossing condition, learners referred to the gloss in the first

target word encounter, and their attention to glosses decreased in the following two

target word encounters.

Reading Comprehension

Research question 3: What is the effect of the lexical interventions on text

comprehension?

To answer this question, all the data from the participants who did not think

aloud were analyzed quantitatively and all the data from the think-aloud participants

were analyzed qualitatively. The quantitative analyses showed that the treatment

conditions did not make too much difference in the participants’ reading

comprehension (see Table 16). Although the average score in the full glossing

condition was a little higher than that in the other two treatments, especially the IGG

condition, the ANOVA showed that there was no significant differences between

treatment conditions, F(6, 226) = 1.99, n.s. As had been expected, full glossing,

though yielding the least word learning, contributed to reading comprehension quite

well. However, the tendency went against the hypothesis that inference-gloss-gloss

might result in the best reading comprehension and that the gloss-retrieval-gloss

condition might hinder comprehension the most.

Table 16

Means and standard deviations of reading comprehension

IGG (N = 40) GRG (N = 39) GGG (N = 39) Total (N = 118)

Mean 2.35 2.54 2.82 2.57

SD .74 .91 1.00 .90

Note. Maximum word gain score was four.

Although the treatment conditions did not affect the reading comprehension

score significantly, the think-aloud protocol showed that learners might have

interpreted a proposition incorrectly when the text was not assisted with glosses.

Take the following excerpt for example.

Excerpt 1 (wrong interpretation)

Input passage: However, if the way the craquelures go through the signature of a painting is like how they go through the rest of the paint, the signature is more likely to be real.

Learner’s interpretation: rán ér rú guǒ [pause] zhè gè de fāng shì, go through [pause] chuàng zuò, qiān míng de fāng shì, yě xiàng tā men chuàng, go through the rest of the [pause] yòng qiān míng de fāng shì hé tā men, zài huà zuò pǐn de fāng shì shì yī yàng de, nà me zhè gè qiān míng, nà me zhè gè, zhè gè shǔ míng, kě néng jiù shì, zhēn de⁷. (GRG2)

Learner’s interpretation translated in English: However if [pause] this way,

在文檔中註解前後字彙推論或檢索對單字學習之影響 (頁 113-146)