In section 1, the background profiles of the participants are stated

(1)

CHAPTER THREE METHODOLOGY

This chapter delineates the rationales for research design and procedures for data collection and subsequent statistical analyses. In section 1, the background profiles of the participants are stated. In section 2, rationales for material selection and test design are expounded in full detail. In section 3, the procedure for test administration is specified. In section 4, approaches to data collection and statistical analyses are presented.

Participants

Initially, a total of 384 students at a senior high school in central Taiwan participated in the study. The participants were comprised of 10th, 11th and 12th graders. They were all male, aged between 16 and 18. The whole subject pool

included nine complete classes, three for each grade. At this school, specific academic groupings were implemented, based on future fields of academic interests. In Grade 10, there was no specific grouping for the three classes. In Grade 11, of the three classes, one belonged to sciences class and two belonged to social sciences class. In Grade 12, of the three classes, two were sciences class and one was social sciences class. Those who failed to attend each test session were not included as legitimate participants and results of their performance were thus excluded from data analyses.

In addition, the DST was administered to the participants in the second-half semester of the 2007 school year, during which time the 12th-graders were preparing for the DRET scheduled in July, 2007. It was possible that some of these graders had taken some of the past exams of the DST. Due to concern about the practice effect, test performance from participants who claimed having taken the past exams was also excluded. Finally, a total of 354 students constituted the whole subject pool for the present study, with 124, 115 and 115 respectively for Grades 10, 11 and 12.

(2)

Instruments

This section presents the rationales for the selection of materials for the DST and the RCT. Because the development of the self-made RCT is the main concern of the present study, more emphasis is placed upon the development of the RCT.

Materials for the Discourse Structure Test

To investigate the significance of cohesion and its hypothetically correlated dimensions shared in the performance on DST, materials for the DST were directly adopted from past exams in the DRET from 2002 to 2005. The obtained text materials consisted of six individual short passages featuring a variety of topics. Each passage served as a question set. A total of 30 DST items were offered, with five items for each of the question sets (see Appendix A).

Materials for the Rational Cloze Test

Using textbook materials for the construction of the RCT was intended to make pedagogical implications by showing whether English teachers could also apply textbook materials and design a cohesion-based RCT. Due to concerns about the practice effect, the textbook adopted at the school had been excluded. The text materials subject to the rational cloze procedure were derived from three complete passages from different versions of senior high school English textbooks¹. To equate the RCT with the DST, analyses of the DST had been undergone, and the results were applied to the selection of materials for the RCT. The appropriateness of materials for the RCT was examined based on textual analyses, including text type², readability³,

1 For studies that utilized textbook materials for test design, see Abraham and Chapelle (1992);

Farhady and Keramati (1996); and Yamashita (2003).

2 A considerable body of research has indicated that the selection of appropriate texts for test design can be a significant factor determining test difficulty. See Alderson and Urquhart (1985); Jonz (1991);

Klein-Braley (1997); Sasaki (2000); Shohamy (1988); and Swain (1993).

3 For past research that performed an analysis of readability on texts chosen for test design, see Baker (1989); Brown (1993); Dastjerdi and Talebinezhad (2006); Farhady and Keramati (1996); and

(3)

and vocabulary frequency⁴.

Initially, about ten complete passages from different readers were eligible and selected for the subsequent analysis of text type. For readability, these passages were subject to the Flesch–Kincaid readability tests (Flesch, 1948), which could be

performed via Microsoft^® Word. The Flesch-Kincaid tests would yield two types of readability indexes: the Flesch-Kincaid Grade Level and the Flesch Reading Ease, which could be cross-referential. The numerical values for the Flesch-Kincaid Grade Level correspond to specific grade levels suitable for the readers to comprehend the texts. The higher the value, the higher the grade level would be for proper

comprehension of a chosen material. The numerical values for the Flesch Reading Ease range between 0 and 100. The higher the number, the easier a given text could be.

For ease of interpretation, the Flesch-Kincaid Grade Level index was primarily utilized in the study.

The results of analyses can be seen in Table 1. In this table, exposition accounts for the majority of the text type for the DST, with a ratio of 4:6 (simplified as 2:3). By the sample token, expository materials would also account for the majority of the RCT texts, with the same ratio of 2:3. For readability, Table 1 shows that the DST texts as a whole exhibit a wider range of readability levels compared with the RCT texts. However, the mean of The Flesch–Kincaid Grade Level for the RCT is higher than that of the DST (9.3 and 8.65, respectively).

Kobayashi (2002).

4 Research on the design of the multiple-choice format cloze has been scant in analyzing frequency levels of the choices. The present study aimed to explore this issue.

(4)

Table 1. Text Analyses of the DST and RCT Subtests

Subtest Text Type Word F-KGL FRE

DST A

DST B DST C DST D DST E DST F Cloze A Cloze B Cloze C

exposition biography description exposition exposition exposition biography exposition exposition

208 219 307 255 287 265 680 614 714

7.0 9.0 6.1 12.0 10.5 7.3 10.8 9.0 8.1

65.1 56.4 71.6 34.3 47.6 65.8 50.6 54.6 60.2

Note.

1. The Flesch–Kincaid Grade Level is abbreviated as F-KGL.

2. The Flesch Reading Ease is abbreviated as FRE.

To further examine the use of vocabulary, the Lexical Frequency Profile measure (Laufer & Nation, 1995) was performed on all the subtexts of the DST and the RCT, respectively. Because both tests differed in the total number of words, proportions of use of vocabulary bands were worth consideration. As shown in Table 2 to 3, the proportions of words covered respectively within the first two (i.e., K1 and K2) or three word lists (i.e., K1, K2 and AWL) are fairly approximate for the texts of both tests. Analyses of lexical frequency for individual subtexts are presented in Appendix B.

(5)

Table 2. An Analysis of the Lexical Frequency Profile on the DST Text

Families Types Tokens Percentage (%) Cumulative (%)

K1

K2 AWL Off-List

315 64 49

418 73 50 80

1233 113 66 129

80.01 7.33 4.28 8.37

80.01 87.34 91.62 100

Note. The sum of tokens would be equal to the total number of words in the text (n = 1541).

Table 3. An Analysis of the Lexical Frequency Profile on the RCT Text

Families Types Tokens Percentage (%) Cumulative (%)

K1

K2 AWL Off-List

388 79 47

514 88 50 131

1644 103 69 192

81.87 5.13 3.44 9.56

81.87 87.00 90.44 100

Note.

1. The K1 list covers the first most frequent 1000 words, and the K2 involves the second 1000 words.

2. K1 and K2 typically include words in the General Service List (West, 1953).

3. The Academic Word List (AWL) consists of important words for reading at university level (Coxhead, 2000), a replacement of the original University Word List (Xue & Nation, 1984).

4. The Off-List represents vocabulary excluded from the first three lists.

5. The sum of tokens would be equal to the total number of words in the text (n = 2008).

Development of the Rational Cloze Test

This subsection provides rationales for specific item types adopted as the RCT items. The execution of the rational deletion procedure is presented. In the last subsection, the construction of distractros is specified.

Item Type

Five major cohesion types typically comprise reference, substitution, ellipsis,

(6)

conjunction, and lexical cohesion (Halliday & Hasan, 1976). Of the five, ellipsis and substitution tend to be more characteristic of spoken discourse and associated with context of informality (Thompson, 1996, 1999), which may account for their relatively fewer occurrences in written discourse. The researcher found that a significant scarcity of ellipsis and substitution in the chosen texts would render the RCT difficult and impractical to design. Had such tests been constructed, items would have been insufficient and underrepresentative for testing purposes. Therefore, for theoretical and practical concerns, reference, lexical cohesion and conjunction were adopted in the present study as major item types for the RCT subtests. Reference included three subcategories: personal reference (PR), demonstrative reference (DR) and comparative reference (CR). Lexical cohesion involved two subcategories:

reiteration (REI) and collocation (COL). Conjunction featured four subcategories:

adversative (ADV), temporal (TEM), additive (ADD) and causative (CAU).

Different genres, text types and topics may exhibit disparate amounts and proportions of cohesive ties (Halliday, 1994; Tierney & Mosenthal, 1981).

Concerning item design, appropriateness and utility of a given text to assess a specific cohesion type were examined in terms of the total number of cohesive ties identified in a given passage. Cloze A, with the topic “England’s Greatest Playwright,” was determined to measure knowledge of reference; Cloze B, with the topic “A

Prescription for Mozart,” was aimed to assess knowledge of lexical cohesion; Cloze C, with the topic “Writing Is a Process,” was intended to test knowledge of conjunction.

Rational Deletion Procedure

Within each cloze subtest, a presupposed item (i.e., typically an antecedent) and its presupposing counterpart which altogether constituted a specific type of cohesive chain were first identified. For example, the following is a stretch of adjacent

sentences from the text of Cloze A:

(7)

William Shakespeare was born at Stratford-upon-Avon, a fairly important English market town about 80 miles northwest of London, in 1564. This town, which lies in the beautiful green valley of the River Avon, is one of the oldest towns in England.

Within the two conjoined sentences, “a fairly important market town” serves as a presupposed item which its presupposing item “this town” refers to. Such a cohesive chain would span across sentence boundaries (i.e., intersententially), intended to match the context of the DST, where successful task performance was assumed to involve intersentential reading. Therefore, a presupposing item served as a target item and was accompanied by a set of multiple-choice alternatives.

The researcher cooperated with another two graduate students in linguistics and TESOL, respectively, to examine the execution of the rational deletion procedure and to ensure the legitimacy and validity of specific cohesive ties. Items involving

problematic coding schemes of cohesion types were addressed and modified through detailed discussion. The categorization of cohesion types for each RCT item is presented in Appendix C.

Design of Distractors

For a given item, distractors were designed and controlled for the following criteria: a) parts of speech, and b) frequency of vocabulary. To avoid possible guessing effect, a target choice and its accompanying distractors were syntactically equalized, so the test-takers might not gain conspicuous clues if syntactic differences could be easily detected. Participants were expected to make reasonable judgments or intelligent guessing by reading the text within a broader context. They might need to read intersententially and identify the missing cohesive chains for successful

restoration. Take Item 6 in Cloze A for instance:

One of the reasons for Shakespeare’s worldwide appeal is the number and variety of characters he created. 6 may include persons of all types, who came from all walks of life.

(8)

(A) It (B) He (C) Another (D) They In this item, each of the four choices can be feasible for closure when a test-taker merely resorts to intrasentential reading. However, semantically (D) is the most desired option when the preceding sentence is considered. The target choice “They”

anaphorically refers to “characters,” forming a cohesive tie of PR. In the same vein, for Item 23 in Cloze C, intersentential reading becomes even more critical for judgment making:

Writing well is seldom the result of a one-time effort. 23 , it’s a continual struggle. Why? It’s because writing is a process which involves searching, planning, organizing, and, above all, rewriting.

(A) Likewise (B) Instead (C) Therefore (D) Next Syntactically, all the alternatives are feasible when the sentence containing the target

item is taken independently. Nevertheless, semantically the surrounding sentences render (B) a more appropriate, desired choice.

After the construction of target choices and the accompanying distractors, all these alternatives were subject to analyses of lexical frequency via a combination of three word lists, to show if any modification was needed. Two sets of word lists were utilized. One was a package comprising the General Service List (West, 1953) and the Academic Word List (Coxhead, 2000)⁵. The other list was the CEEC High School English Word List (Jeng, 2002)⁶. Performing frequency analysis aimed to avoid shedding undesired clues due to apparent differences between a target choice and its distractors in terms of lexical frequency. That is, if the test-takers resorted to guessing or elimination, they might either adopt or abandon an seemingly low-frequency choice, as a test-taking strategy. The results of frequency analyses can be seen in Appendixes D and E.

When the rational procedure was performed on the design of target choices and

5 The software for this package is available at http://www.lextutor.ca/vp/eng/

6 The PDF file for the list is available at http://www.ceec.edu.tw/Research/paper_doc/ce37/ce37.htm

(9)

distractors, there were a total of 30 items for the RCT, 10 for each of the three subtests.

For purposes of test equating, the number of the RCT items equaled that of the DST.

The final product of the RCT is presented in Appendix F.

Procedures

The DST and the RCT were administered to the participants in two separate classroom periods, with the exact administration dates scheduled for individual classes. The DST session was held first, followed by the RCT session. Based on a pre-trial of both tests, each required around thirty to forty-five minutes. The formal sessions would last a maximum of fifty minutes, which would suffice for test

completion. Based on the assumption of equivalent tests (Henning, 1987), the interval between the first session and the second was as close as possible. Due to concerns about school administration, the interval between the sessions was set within two weeks.

Data Collection and Analysis

Upon test completion, the data were immediately collected and checked for the correct answers first by the participants. The same data were double-checked by the researcher to avoid any scoring errors. A dichotomous scoring scheme was applied to both tests. One point was assigned to a correct response, while a zero was recorded for an incorrect response. The full score for both tests was 30, with a maximum of 10 for each of the three RCT subtests, and a maximum of 5 for each of the six DST subtests.

Independent and Dependent Variables

For test equating, the grade level of the participants was treated as the independent variable. The score on the DST and the RCT served as the dependent variable. For regression analyses, the score on the RCT and its subtests (i.e., Cloze A, Cloze B and Cloze C) served as the independent variable. The score on the DST was

(10)

the dependent variable.

Data Analysis

The overall data were analyzed in three consecutive phases. In the first phase, analyses of reliability and validity were performed. Internal consistency coefficients for both tests were computed by Conbach’s alpha via the SPSS^® (version 14.0). For validity, a qualitative⁷ approach was conducted by the researcher, two graduate students in linguistics and TESOL, and three in-service English teachers. While the DST was an established test format, validity was specifically examined for the RCT, because it was a self-made, research-oriented test requiring more scrutiny. The results of analyses may provide implications for test-designers and future research.

In the second phase, test equating was carried out based on classical equating methods for equivalence in means, variances and inter-form covariance. Statistical significance of the difference in means between the DST and the RCT was examined by a pair-sampled t-test. Significance of the difference in the distribution of variances was tested by Hartley’s F-max test. For the examination of inter-form covariance, the participants’ term grades of the latest semester served as the external criterion

measure. This measure might be indicative of the participants’ current proficiency levels (see Laufer & Nation, 1999). Significance of the difference in inter-form covariance was determined by Hotelling’s (1940) t-test.

In the third phase, regression models were built for the regression of the DST

7 A frequently-used quantitative approach to construct validity of test items is factor analysis, one of the latent variable models. However, this approach necessitates the assumption or axiom of local independence, i.e., test items should be assumed to be independent of each other (Bartholomew &

Knott, 1999; Lazarsfeld & Henry, 1968). Research on cloze has indicated that the assumption of item independence may not be articulated and justified because deletions may still be highly

interdependent with each other for successful closure (see Sasaki, 2000, p. 92). Therefore, factor analysis may not be an appropriate approach to the examination of the RCT items in the present study.

(11)

respectively on the RCT and its three subtests. A simple linear regression model was computed for the regression of the DST on the RCT as a sole predictor. A series of linear multiple regression models and hierarchical regression models were constructed to examine the predictive power of the three RCT subtests on performance of the DST.

Summary

In this chapter, the background of the participants was stated. The rationales for the selection and construction of test instruments were specified. The self-made RCT was determined to measure knowledge of intersentential cohesion, comprising three major cohesion types: reference, lexical cohesion and conjunction. The results and discussion of research findings will be presented in the next chapter.