This chapter presents the results of this research in order to answer research questions 2 to 4. First, statistic results of both groups’ proficiency tests are reported and compared. Second, qualitative data of the EG participants’ strategy use in their English speaking are examined by analyzing the transcripts of nine EG participants’ posttest, of their self-interpreting assignments, and of in-class discussions. Transcripts, however, can only show the surface use of strategies. To probe further into the inner mechanisms and metacognitive aspects of strategy use, insights and comments from the EG participants’ written reflections, retrospective interviews, individual interviews, and group interviews are also analyzed. Finally, to understand the EG participants’
perceptions of the interpreting training, quantitative data from End-of-term Questionnaire are cross-referenced with comments from their group interviews.
The Participants’ Proficiency Gains
To answer research question 2. “How does this Chinese-to-English (C-to-E) interpreting strategy training affect the L2 learners’ English oral proficiency,” interrater reliability and equivalent forms reliability were first computed, followed by comparisons of the two groups’ pretest and posttest mean total scores, their mean scores in detailed aspects, and subgroups’ mean total scores.
Interrater reliability. There are two types of interrater reliability—consistency estimates and consensus estimates (Salkind, 2007). The former focuses on whether raters’ differences in their application of a scoring rubric are systematic, while the latter emphasizes the exact agreement in raters’ interpretations of a scoring rubric (Salkind, 2007). The consistency estimate between the two raters in this study was of higher concern here, since it was the average scores given by the two raters, rather than the scores of one rater, that were put to subsequent analysis. However, both kinds of
120
estimates were computed to understand the consistency and agreement between the two raters.
Intraclass correlation coefficients (ICC) were computed in this study because it is the most conservative and best measure of interrater reliability for interval data (Salkind, 2010, p. 627). For ICC interrater agreement measures, the guidelines given by Cicchetti (1994) state that when the value is “between .60 and .74, the level of clinical significance is good” (p. 286). Intraclass correlation (ICC) in this study showed an excellent consistency in the two raters’ differences (ICC = .92), and a good agreement between the two raters (ICC = .66). This means that the two raters in this study were highly consistent and systematic in their differences in the application of the scoring rubric, and their interpretations of the descriptors of the four judging criteria were similar. In other words, even in cases of score discrepancy between the two raters, the discrepancy was consistent as well. This may justify my decision of putting the average scores given by the two raters for subsequent analysis, rather than re-rating or having a third rater to rate the speech samples with score discrepancy.
Equivalent forms reliability. To ensure equivalent forms reliability, independent t-tests showed that there was no significant difference between Test A and Test B in either the pretest (M = 70.99, SD = 9.32, n = 32 for Test A; M = 72.69 , SD = 9.87, n = 35 for Test B; t(65) = .72, p = .474) or the posttest (M = 74.94, SD = 9.27, n = 35 for Test A; M = 72.79, SD = 9.17, n = 32 for Test B; t(65) = .95 , p = .345).
Comparison of the two groups’ total scores. In terms of between-group differences in the total scores, independent t-tests were conducted to compare the mean scores of the pretest and posttest of both the experimental and control groups. There was no significant difference in the pretest mean scores between the EG (M = 73.22, SD = 9.51) and the CG (M = 69.47, SD = 9.42); t(65) = 1.55, p = .125. This result suggests that both groups’ participants started off at the same level of speaking
121
proficiency. However, no significant difference was found in the posttest mean scores between the EG (M = 75.23, SD = 9.02) and the CG (M = 71.54, SD = 9.27); t(65) = 1.59, p = .117. This result implies that the C-to-E interpreting training did not cause significant improvement in the EG’s overall oral proficiency.
In terms of within-group differences, paired t-tests were conducted to compare the pretest and posttest mean scores of the EG and those of the CG. For the EG, the average score of the posttest was significantly higher than that of the pretest; t(42) = 2.38, p
= .022. For the CG, the average score of the posttest was also higher than that of the pretest, but the difference did not achieve a significant level set at .05; t(23) = 1.94, p
= .065. The between-group and within-group comparisons in the pretest and posttest mean scores, as shown in Table 5.1, may suggest that the EG’s significant improvement in the overall oral proficiency, compared with its own pretest mean score, could not be directly attributed to, but may bear relation to, the interpreting training.
Table 5.1
Means (M) and Standard Deviations (SD) of the Pretest and Posttest Scores for Both the Experimental (EG) and the Control Groups (CG), as well as Between-group and Within-group t-values interpreting training exerted influence on detailed aspects of oral proficiency, the mean scores of each of the four judging criteria in the pretest and the posttest were compared
122
between the two groups and within each group, as can be seen from Table 5.2.
Independent t-tests showed that no significant between-group differences were found in the pretest on each criterion (t(65) = .98, p = .333 for Fluency, t(65) = 1.63, p = .109 for Coherence, t(65) = 1.82, p = .073 for Lexical resource, and t(65) = 1.69, p = .095 for Grammatical range and accuracy). Neither did the posttest show any significant between-group differences on these four criteria (t(65) = 1.49, p = .142 for Fluency, t(65) = 1.78, p = .080 for Coherence, t(65) = 1.39, p = .171 for Lexical resource, and t(65) = 1.54, p = .130 for Grammatical range and accuracy). These results once again indicated that the two groups were equivalent in oral proficiency at the starting point, and any improvement in the posttest could not be directly attributed to the interpreting training.
Table 5.2
Means (M) and Standard Deviations (SD) of the Four Judging Criteria in the Pretest and Posttest for Both the Experimental (EG) and the Control Groups (CG), as well as Between-group and Within-group t-values
EG (n = 43) CG (n = 24)
Notes. *= p < .05. Maximum total score of each criterion: 27 (The maximum score of each criterion was 9, and there were three parts in a speaking test, thereby 9×3=27).
F=Fluency; C=Coherence; L= Lexical resource; G= Grammatical range and accuracy.
123
On the other hand, paired t-tests showed promising within-group differences in both groups. For the CG, two criteria saw significant improvement. The mean score of Lexical resource in the posttest (M =17.98, SD = 2.45) was significantly higher than that in the pretest (M = 17.42, SD = 2.33); t(23) = 2.21, p = .037. Similarly, in terms of Grammatical range and accuracy, its mean score in the posttest (M = 18.44, SD = 2.03) was also higher than that in the pretest (M = 17.71, SD = 2.25); t(23) = 2.57, p = .017.
In contrast, the EG saw significant improvement in three criteria. Its mean score of Fluency increased significantly from M = 17.80 (SD = 2.59) in the pretest to M = 18.44 (SD = 2.44) in the posttest; t(42) = 2.58, p = .013. In addition, its mean score of Coherence rose significantly from M = 18.23 (SD = 2.54) in the pretest to M = 18.72 (SD = 2.36) in the posttest; t(42) = 2.22, p = .032. And similar to the CG, its mean score of Grammatical range and accuracy also improved significantly from M = 18.68 (SD = 2.25) in the pretest to M = 19.24 (SD = 2.06) in the posttest; t(42) = 2.48, p = .017.
These results may imply that while communicative language classes with or without the interpreting treatment could enhance grammar and accuracy, and the class without the interpreting training could even enhance lexical resource, the class with an interpreting twist might be particularly effective in enhancing the learners’ fluency and coherence.
In addition to the mean scores of the four judging criteria, the three parts of the speaking tests and the same four criteria under each part were also examined to see if the interpreting training exerted influence on these even more refined items of the speaking tests. The main purpose of this examination was to see if the interpreting treatment made an impact on different task types.
As shown in Table 5.3, in terms of the between-group differences, independent t-tests showed that the EG performed significantly better than the CG in the pretest on
124
three items: Part 2 as a whole (t(65) = 2.02 , p =.048), Lexical resource in Part 2 (t(65)
= 2.42, p = .018), as well as Grammatical range and accuracy in also Part 2 (t(65) = 2.54, p = .013). Although these results indicated that the two groups did not exactly start off from the same level, the items that the EG outperformed significantly in the pretest were all limited to Part 2. This showed that the one-minute planning time gave the EG an edge in narrative and descriptive discourse in the pretest, but in the more responsive and spontaneous Q&A format like Part 1 and Part 3, the two groups showed no significant difference at the starting point.
In the posttest, on the other hand, only on the criterion of Fluency in Part 2 did the EG (M = 6.33, SD = .81) significantly outperform the CG (M = 5.88, SD = .78); t(65)
= 2.24, p = .029. This result suggests that the C-to-E interpreting treatment was the cause of improvement in fluency, but only in a less spontaneous condition with narrative/descriptive task type.
In terms of the within-group differences, paired t-tests showed that the EG’s improvement outshined the CG’s. For the CG, the only item that saw significant improvement in the posttest was Grammatical range and accuracy in Part 2 (t(23) = 3.76, p = .001), indicating that the CG’s improvement was limited to the grammar and accuracy aspect of narrative and descriptive discourse when they had time to plan their responses, but the improvement did not extend to more spontaneous Q&A format in Part 1 or Part 3.
On the other hand, the EG saw significant improvement in the posttest on four items. Fluency in Part 1 improved significantly from M = 5.88 (SD = 1.11) in the pretest to M = 6.14 (SD = .97) in the posttest; t(42) = 2.19, p = .034. The other three items were all related to Part 3. First, the mean score of Part 3 as a whole rose significantly from M = 23.56 (SD = 4.45) in the pretest to M = 24.59 (SD = 3.52) in the posttest; t(42) = 2.20, p = .034. Similarly, the mean score of Coherence in Part 3 increased significantly
125
from M = 5.87 (SD = 1.21) in the pretest to M = 6.21 (SD = 1.03) in the posttest, t(42)
= 2.68, p = .010. Finally, the mean score of Grammatical range and accuracy in Part 3 went up significantly from M = 6.00 (SD = 1.09) in the pretest to M = 6.24 (SD = .80) in the posttest, t(42) = 2.02, p = .050.
It was hypothesized that the interpreting training would be more beneficial to language learners when they have to deal with questions requiring deeper reflections and reasoning, such as those in Part 3. According to the data from After-test Self-evaluation Questionnaire after both the pre- and posttests, Part 3 was indeed perceived to be the hardest. Part 3 (M = 4.61, SD = 1.31, N = 135) was significantly more challenging than Part 2 (M = 4.19, SD = 1.29, N = 135), t(134) = 3.82, p =.000; which in turn was also significantly harder than Part 1 (M = 3.92, SD = 1.26, N = 135), t(134)
= 2.85, p =.005. The fact that the EG’s within-group improvements were mainly in Part 3 (Part 3 as a whole, Part 3’s Coherence, and Part 3’s Grammatical range and accuracy) while none of the CG’s within-group improvement was in Part 3 may suggest that the interpreting treatment might facilitate the learners’ oral output when it came to more complicated topics requiring one to compare and contrast or justify opinions. However, it is important to bear in mind that the interpreting training was not the direct cause of these results.
126
Table 5.3
Means (M) and Standard Deviations (SD) of Each Part (P) and the Four Judging Criteria Under Each Part in the Pretest and Posttest for Both the Experimental (EG) and the Control Groups (CG), as well as Between-group and Within-group t-values
EG (n=43) CG (n=24)
127
Notes. *= p ≤ .05, **= p ≤ .01, ***= p ≤ .001. Maximum total score of each part: 36 (The maximum score of each criterion was 9, thereby 9×4=36). P1=Part 1; P2=Part 2; P3=Part 3. F=Fluency; C=Coherence; L= Lexical resource; G= Grammatical range and accuracy.
Comparison of the higher and lower subgroups’ total scores. It was also hypothesized that the interpreting training might be more beneficial to learners with lower oral proficiency than to those with higher speaking proficiency. This is because for unbalanced bilinguals, like L2 learners with lower proficiency, “L1 items and rules are more frequently used; therefore, they have a higher resting level of activation than L2 items and procedures” (Kormos, 2006, p. 174). The strategies under PRINCIPLE 1.
BE FLEXIBLE were designed to deal with Chinese words, phrases, and expressions that may appear in one’s mind. Also, strategies under PRINCIPLE 2. ONE CHUNK AT A TIME were designed to deal with time pressure in speaking, and may also be effective in dealing with Chinese sentences that pop up in one’s mind by breaking them down into manageable simple English sentences.
To test this hypothesis, both groups’ participants with higher or lower oral proficiency were first identified based on their pretest scores, with the cutoff point set at 35%. The participants’ self-reported frequencies of Chinese appearing in mind were then calculated. Their self-reported frequencies were the added points of seven items (Questions 8 to 14) on five-point Likert scales in Start-of-term Questionnaire. The maximum total points were 35, and both groups’ participants as a whole had a mean value of 16.96 (SD = 4.68, N = 67), suggesting that Chinese appeared in the learners’
minds sometimes when they spoke English. An independent t-test showed that the EG’s lower 35% indeed had Chinese appear more frequently in their minds (M = 19.47, SD
= 4.14, n=15) than the EG’s higher 35% (M = 13.60, SD = 3.72, n=15), and the difference reached a significant level, t(28) = 4.08, p = .000.
128
In terms of the proficiency scores, as illustrated in Table 5.4, independent t-tests showed that there was no significant difference in the mean scores between the higher 35% subgroups of the two groups in either the pretest (t(21) = 1.80, p =.087) or the posttest (t(21) = 1.61, p = .123), neither did the lower 35% subgroups of the two groups show significant differences in the pretest (t(21) = 1.83, p = .081) or the posttest (t(21)
= 1.18, p = .252). These results once again indicate that the two groups, specifically their respective two subgroups, were equated at the starting point, and the interpreting training did not make significant impact on the English oral proficiency of the EG’s high or low subgroups, either.
Table 5.4
Means (M) and Standard Deviations (SD) of the Pretest and Posttest Scores for the Higher 35% (High) and the Lower 35% (Low) Subgroups in Both the Experimental (EG) and the Control Groups (CG), as well as Between-group and Within-group t-values
In terms of within-group comparisons, paired t-tests showed that the lower 35%
of the EG was the only subgroup whose posttest mean score (M = 68.25, SD = 4.23) was significantly higher than its pretest mean score (M = 63.57, SD = 4.27); t(14) = 3.65, p = .003. There was no significant difference between the pretest and posttest mean scores for the higher 35% of the EG (t(14) = .72, p = .481), or for both the higher
129
35% (t(7) = .35, p = .735) and the lower 35% (t(7) = 2.06, p = .078) of the CG.
The fact that the lower 35% of the EG improved significantly in the posttest while its CG’s counterpart did not seems to indicate a trend consistent with the hypothesis:
the interpreting training was more beneficial to the learners with lower oral proficiency, who reported to have Chinese appearing in mind significantly more frequently than those with higher oral proficiency. However, it is important to bear in mind that these results do not establish causal relationships among a learner’s oral proficiency level, frequency of Chinese appearing in mind, and the effectiveness of interpreting training.
This is because the numbers of participants in these four subgroups were not enough to be statistically meaningful. Furthermore, it was the within-group comparison that achieved statistical significance, so any improvement on the part of the EG’s lower 35%
could not be directly attributed to the interpreting treatment. Therefore, it is only safe to say that the EG’s lower 35%’s significant within-group improvement in the posttest mean score might have something to do with the interpreting training, whose effectiveness might be partially related to the fact that Chinese appeared significantly more frequently in their minds when they spoke English.
In summary, from between-group comparisons, it is clear that the two groups were mostly equivalent at the outset of the experiment, suggesting that although this was a quasi-experiment with participants from intact classes, the condition was close to a real experiment with randomly-assigned participants. After the C-to-E interpreting training, the EG significantly outperformed the CG on Fluency in Part 2, suggesting that the interpreting treatment was the direct cause of improvement in fluency, but only in the condition where pre-planning time was allowed and where task type was mainly narrative/descriptive.
Within-group comparisons yielded more significant differences. While the CG saw significant improvement in the posttest on two judging criteria (Lexical resource
130
and Grammatical range and accuracy) and on one item (Grammatical range and accuracy in Part 2), the EG saw significant improvement in the posttest mean total score, on three judging criteria (Fluency, Coherence, and Grammatical range and accuracy), on four items (Fluency in Part 1; Part 3 as a whole; Coherence in Part 3; and Grammatical range and accuracy in Part 3), and in its lower 35% subgroup. In other words, the EG surpassed the CG by having significant improvement on far more aspects in the posttest, suggesting that the interpreting training might be related to, although not the direct cause of, these improved aspects. How the different aspects of proficiency gains might be related to specific interpreting principles and strategies will be explored in chapter 6.
The Learners’ Use and Perceptions of Strategies
To answer research question 3. “How do the L2 learners apply interpreting principles and strategies to their English speaking,” actual strategy use extracted from the transcripts of nine EG participants’ posttest performances and self-interpreting assignments, as well as segments from in-class debates, will shed light on the learners’ application of these principles and strategies. Furthermore, the EG participants’ written reflections on Post-task Self-evaluation Worksheets (referred to as Worksheet hereafter), the same nine participants’ verbal comments during retrospective interviews and individual interviews, as well as focus group interviews from the high and low 35%, will give us insights into their perceived difficulties and usefulness of strategy use, as well as mental processes during English speaking. Reports on the learners’ strategy application and perceptions will follow the list of interpreting principles and strategies presented in Table 3.1.
For convenience sake, Table 3.1 is copied and pasted here.
131
Table 3.1
List of Interpreting Strategies for the Experimental Group PRINCIPLES Strategies
1.
BE FLEXIBLE (靈活變通)
1-1.Use a more general term (往上搜詞): Use a term of higher rank or broader category to replace a word or a list of items/concepts.
1-2. Use a similar term (橫向搜詞): Use an approximation, a synonym, or a near equivalent term, which may be followed by synonymic phrases, examples, or explanatory remarks to enhance accuracy.
1-3. Explain ( 解 釋 ): Describe one or more traits of a concrete concept/item.
1-4. Paraphrase (換句話說): Put ideas in other words 1-4-1. Paraphrase from the opposite angle (反向操作):
A term, phrase, or clause opposite from the intended message is used after “not” or “no.”
1-4-2. Use plain but clear English to disambiguate the meaning of metaphors, idioms, slangs, four-character idioms (成語), euphemisms, quips, figures of speech, etc.
(淺白至上)
2-3. Produce short, simple, direct, and self-contained sentences in the target language. (簡單句)
3.
BE CLEAR (條理分明)
3-1. (Re)structure messages from main idea to supporting details or from general to specific. (重整思路)
3-2. Add cohesive words to explicate the logical relationships between ideas. (加銜接詞)
4.
BE CONCISE (簡潔扼要)
4-1. Omit redundant, secondary, superfluous, or repetitive parts of speech. (去蕪)
4-2. Select important messages. (存菁)
The use of strategies under PRINCIPLE 1. BE FLEXIBLE. The most obvious benefit of BE FLEXIBLE as a group of strategies, according to the learners’ comments in group interviews, was that they learned how to use alternatives to get their meaning across. In the past, they often got stuck on words, or even avoid expressing certain ideas,
132
but after the strategy training, they found it easier to express themselves, and would try to exhaust their resources to express their intentions. As S40-L34 said:
Before I learned [these strategies], because I was thinking in Chinese, and if I didn’t know how to say something in English, I might just stop there without trying to explain, etc. I would keep deliberating over words I didn’t know [in English], but I still couldn’t produce them. Now, after this semester, I have changed.
I can somewhat use another way to express what I want to say. Even if the words I produce are not very precise, I can still make myself understood. (Group interview)
Strategies 1-1 & 1-2. Use a more general term or a similar term. These two