Reporting test results - 九十八年學測閱讀測驗考生作答策略之初探

Validity issues: accuracy, quality control; timely;

meaningful; misuse issues; challenges; retakes 11. Item banking Security issues; usefulness, flexibility; principles for

effective item banking 12. Test technical

report

Systematic, thorough, detailed documentation of validity evidence; 12-step organization; recommendations

In Downing’s framework, purposes are crucial to the development of an overall

plan for the test. They are critical in determining test content, in selecting the format

of test items, and in guiding the evaluation of test uses and interpretations of scores.

Once the purposes of the test are clarified, the test content should then be precisely

specified. Clear content specifications “specify the number or proportion of items that

assess each content and process/skill area; the format of items, responses, and scoring

rubrics and procedures; and the desired psychometric properties of the items and test

such as the distribution of item difficulty and discrimination indices.” (AERA, APA,

& NCME, 1999, p.183).

As Downing notes, appropriate identification of test content facilitates effective

test items, which are designed to “measure important content at an appropriate

cognitive level.” Downing emphasizes the importance of effective item creation and

describes it as “more art than science.” In many large-scale high-stakes tests, the

multiple choice format is widely used because multiple choice items can be

administered in a short time and the test taker responses can be objectively and

efficiently scored. Despite research evidence for the principles of writing effective

multiple choice items (Alderson, Clapham, & Wall, 1995; Haladyna, 2004; Haladyna

& Downing, 1989, 2004; Haladyna, Downing, & Rodriguez, 2002), the creation of

effective test items remains a challenge for test developers. As Downing states:

…The principles of writing effective, objectively scored

multiple-choice items are well-established and many of these principles have a solid basis in the research literature… Yet, knowing the

principles of effective item writing is no guarantee of an item writer’s ability to actually produce effective test questions. Knowing is not necessarily doing. Thus, one of the more important validity issues associated with test development concerns the selection and training of item writers…The most essential characteristic of an effective item writer is content expertise…many other item writer characteristics such as regional geographic balance, content subspecialization, and racial, ethnic, and gender balance must be considered in the selection of item writers.

(Downing, 2006, p.11)

Weir’s Framework for Language Test Development

Based on current developments in theory and practice, Weir (2005) presents a

coherent “evidence-based validity” framework for language test design and

implementation, which involves providing evidence relating to context validity,

theory-based validity, criterion-based validity, scoring validity, and consequential

validity. According to Weir, the framework is specifically designed for English for

Speakers of Other Languages (ESOL), but also applies to all forms of educational

assessment. In the construction of tests, test developers must provide all of the five

types of validity evidence to “justify the correctness of our interpretations of abilities

from test scores” (p.2). Among the five types of evidence, context validity and

theory-based validity evidence, which are collected before the test event, are

concerned with what abilities a test is intended to measure and how the choice of tasks

in a test is representative of the abilities required in “real life language use.” The other

three types of validity evidence (i.e., scoring validity, criterion-based validity, and

consequential validity), which are generated after the test has been administered, are

concerned about the reliability of test scores, the extent to which test scores correlate

with external criteria of real life performance, and the consequences of test use for test

stakeholders: learners, teachers, parents, government and official bodies, and the

marketplace.

Weir further describes in detail a socio-cognitive framework specifically

designed for validating reading tests, which is presented as a flowchart of boxes, from

test taker characteristics, theory-based validity, and context validity, to scoring validity,

consequential validity, and criterion-related validity. As shown in Figure 1, Weir’s

framework provides us with insights into what type of evidence can be collected at

different stages of reading test construction and how the different types of validity

evidence fit together.

Figure 1. A Socio-cognitive Framework for Validating Reading Skills

From Language testing and validation: An evidence-based approach (p.44), by C. J.

Weir, 2005. New York: Palgrave Macmillan.

Research in Second Language Reading

This section provides an overview of research in second language reading. We

will begin with theoretical accounts of reading comprehension and reading strategies,

which pave the way for a later review of research in second language reading.

Reading Comprehension and Strategy Use

Reading comprehension has been discussed from a number of perspectives. In a

review of reading comprehension research, Pressley (2000) summarizes that

comprehension depends on a number of lower order processes (e.g., skilled decoding

of words) and higher order processes (e.g., relating text content to background

knowledge; use of comprehension strategies). As Pressley notes, reading

comprehension “begins with decoding of words, processing of those words in relation

to one another to understand the many small ideas in the text, and then, both

unconsciously and consciously, operating on the ideas in the text to construct the

overall meaning encoded in the text” (p.551). Along with previous research, Pressley

confirms that accurate and fluent (automatic) word recognition is a prerequisite for

reading comprehension (Carver, 1997; LaBerge & Samuels, 1974; Perfetti, 1997;

Pressley, 1998, 2000). During the process of reading, language comprehension

processes interact with higher-level processes. Readers may automatically relate text

content to prior knowledge and/or consciously activate comprehension strategies.

When readers’ activation of schematic knowledge is relevant to the information in the

text, reading is successful. While good readers typically make inferences based on

prior knowledge directly relevant to the ideas in the text, poor readers make

“unwarranted and unnecessary” inferences by drawing on prior knowledge not

directly relevant to the most important ideas in the text (Anderson & Pearson, 1984;

Hudson, 1990; Hudson & Nelson, 1983; Rosenblatt, 1978; Williams, 1993).

In terms of strategy use, good readers use a variety of strategies, including

being aware of reading purposes, overviewing the text, reading selectively, making

associations, evaluating and revising hypotheses, revising prior knowledge, figuring

out unknown words in text, underlining and making notes, interpreting text,

evaluating the text, reviewing the text, and using the information in the text (Pressley

& Afflerbach, 1995). Given the importance of both lower order and higher order

processes in reading comprehension, Pressley further suggests that teachers promote

learners’ comprehension abilities by improving word-level competences, building

background knowledge, and promoting use of comprehension strategies.

In developing a proposed research agenda for reading comprehension, the

RAND Reading Study Group (2002) defines reading comprehension as “the process

of simultaneously extracting and constructing meaning through interaction and

involvement with written language” (p.11). According to the proposal, three key

elements are essential in reading comprehension: the reader, the text, and the activity

(e.g., purpose for reading, processes while reading, and consequences of reading). In

reading comprehension, the three elements are interrelated within a larger

sociocultural context which interacts with all of the three elements, as illustrated in

Figure 2.

Figure 2. A Heuristic for Thinking about Reading Comprehension

From Reading for understanding: Toward a R&D program in reading comprehension (p.12), by RAND Reading Study Group, 2002. Santa Monica, CA: Science and Technology Policy Institute, RAND Education.

In this framework, good readers have a wide range of capacities and abilities,

including cognitive capacities (e.g., attention, memory, critical analytic ability,

inferencing, visualization ability), motivation (e.g., a purpose for reading, an interest

in the content being read, self-efficacy as a reader), and different types of knowledge

(e.g., vocabulary, domain and topic knowledge, linguistic and discourse knowledge,

knowledge of specific comprehension strategies). Before reading, readers have

purposes in mind. While reading, they process the text with regard to the purposes.

They construct various representations of the text that are important for

comprehension, including the surface code (e.g., the exact wording of the text), the

text base (e.g., idea units representing the meaning), and a representation of mental

models embedded in the text. Reading activities may have direct consequences in

knowledge, application, and engagement, or other long-term consequences. In reading

comprehension, all of the three key elements—the reader, the text, and the activity —

are interrelated within a sociocultural context.

Issues in Second Language Reading

In the context of second language reading, research has stressed the interactive

nature of bottom-up and top-down processing (Bernhardt, 1991; Carrell, Devine &

Eskey, 1988). While reading, readers interact with both bottom-up and top-down

processing. In bottom-up processing, readers “begin with the printed words, recognize

graphic stimuli, decode them to sound, recognize words and decode meaning.” In

top-down processing, readers “activate what they consider to be relevant existing

schemata, and map incoming information onto them” (Alderson, 2000, pp. 16-17).

Drawing on an extensive review of research in reading comprehension, Alderson

concludes that bottom-up and top-down approaches are both important in reading and

“the balance between the two approaches is likely to vary with text, reader, and

purpose” (p. 20). According to Alderson, variables that affect the nature of reading are

mainly “the interaction between reader and text variables in the process of reading”

(p.32). Reader variables include schemata and background knowledge, knowledge of

language, knowledge of genre/text type, metalinguistic knowledge and metacognition,

content schemata, knowledge of subject matter/topic, knowledge of the world, cultural

knowledge, reader skills and abilities, reader motivation, reader affect, etc. Text

variables include text topic and content, text type and genre, text organization,

linguistic variables, text readability, typographical features, the medium of text

presentation, etc.

Bernhardt and Kamil (1995) make a thorough review of research and claim that

second language reading is an interaction of L1 reading ability and L2 linguistic

knowledge (e.g., word knowledge and syntax). While L1 literacy accounts for 20% of

the variance of L2 reading ability, L2 linguistic ability accounts for 30% of the

variance (27% of word knowledge and 3% of syntax). A number of studies have

confirmed the contribution of L1 to L2 reading development (Grabe, 2009; Guthrie,

1988; Koda, 2005; Rutherford, 1983). Koda (2005) argues that L1 processing

experience has influence on the development of L2 reading skills. Grabe (2009) also

suggests that L1 reading abilities such as metalinguistic awareness and basic cognitive

skills are likely to transfer to L2 reading contexts.

Alderson (1984, 2000) concludes from a number of studies that both L2

language knowledge and L1 reading knowledge are important factors in second

language reading, with L2 language knowledge being a more powerful factor than L1

reading ability. He also confirms that a “linguistic threshold” exists and that L2

learners transfer their L1 reading ability to L2 reading contexts only when they reach

a certain proficiency level. In other words, less proficient L2 learners need to improve

their linguistic knowledge so as to engage themselves in L2 reading. Alderson (2000)

further suggests that learners’ linguistic threshold varies with task: “the more

demanding the task, the higher the linguistic threshold.”

Research in second language reading has shown that fluent word recognition,

processing efficiency, and reading rate are vital in reading comprehension (Alderson,

2000; Bernhardt, 1991, 2000; Grabe, 1991; Grabe & Stoller, 2002; Koda, 1996, 1997).

Insufficient linguistic knowledge constrains second language reading processes

(Alderson, 2000; Bernhardt, 1991, 2000; Brisbois, 1995). Vocabulary difficulty has

consistently been shown to have an effect on comprehension for L1 and L2 readers

(Alderson, 2000; Carver, 1994; Freebody & Anderson, 1983; Hu & Nation, 2000;

Laufer, 1992; Nation, 1990, 2001; Read, 2000; Williams & Dallas, 1984). Brisbois

(1995) argues that L2 knowledge is critical in reading comprehension, especially

among learners at the beginning levels. Other studies also suggest that insufficient

vocabulary hinders L2 reading performance (Hu & Nation, 2000; Segalowitz, 1986;

Segalowitz, Poulsen, & Komoda, 1991), and that lower-level processing predominates

the reading process among beginning L2 learners (Clarke, 1979; Horiba, 1993).

Meanwhile, reading speed is related to fluency: with increased L2 proficiency, reading

rate improves (Favreau & Segalowitz, 1982; Haynes & Carr, 1990), and error rate

decreases (Bernhardt, 1991).

Research in Second Language Reading Assessment

This section provides an overview of research in second language reading

assessment. We will first highlight the model proposed by Urquhart and Weir (1998)

and Weir (2005) in the construct of second language reading tests. Next, we will

address issues in assessment, including the use of verbal report in assessment,

individual differences in strategy use, and item difficulty in reading assessment.

Construct of Second Language Reading Tests

In the context of second language reading assessment, how reading ability is

defined affects the construct of a test. One prevalent perspective is to view reading as

a set of comprehension processes (Alderson, 2000; Grabe, 1991, 1999, 2000; Grabe

and Stoller, 2002; Urquhart & Weir, 1998; Weir, 2005) and can be broken down into

reading skills and strategies needed for testing purposes (Urquhart & Weir, 1998; Weir,

2005). Based on Urquhart and Weir’s model (1998), Weir (2005) develops a model of

the reading process, as presented in Figure 3, to account for four types of reading, as

shown in Table 2.

Figure 3. Urquhart and Weir’s Model of the Reading Process

From Language testing and validation: An evidence-based approach (p. 92), by C. J.

Weir, 2005. New York: Palgrave Macmillan. It is adapted from Reading in a second language: Process, product and practice (p. 106), by A. H. Urquhart & C. J. Weir, 1998. Harlow: Longman.

Discourse topic and main ideas, or structure of text, or relevance to needs.

Search reading to locate quickly and understand information relevant to predetermined needs.

Scanning to locate specific points of information.

Note. From Language testing and validation: An evidence-based approach (p. 90), by C. J. Weir, 2005. New York: Palgrave Macmillan. It is adapted from Reading in a second language: Process, product and practice (p. 123), by A. H. Urquhart & C. J.

Weir, 1998. Harlow: Longman.

In this model, Goalsetter and Monitor, which are metacognitive mechanisms,

“mediate among different processing skills and knowledge sources available to a

reader” and “enable a reader to activate different levels of strategies and skills to cope

with different reading purposes” (Weir, 2005: 95-96). Once the test takers have clear

purposes for reading, they choose the most appropriate strategies in response to the

task demand. The higher the demand of a task is, the more components of the model

are involved (Urquhart and Weir, 1998; Weir 2005).

As illustrated in Table 2, the process of reading involves the use of different

skills and strategies. According to Urquhart and Weir (1998) and Weir (2005), reading

can be global or local comprehension. Global reading is comprehension beyond the

sentence level such as reading for main idea or important details, whereas local

reading is comprehension within the sentence level, such as reading for word meaning

or pronominal reference. In a reading test, the demand of a careful reading item at the

global level is usually higher than that of a scanning item since the former requires the

test taker to go through the whole text and activate all components of the model in use

while the latter might just involve a few components. Weir (2005) further points out

that test developers should consider the appropriateness of different questions and

reading strategies for different types of texts. In a scanning test, for example, the text

should provide sufficient and varied specific details for readers. In a careful reading

test, the text should include enough main ideas or important points. In an inferencing

test, the text should include pieces of information that can be linked together. In a

skimming or search reading test, the text should have a clear organization and provide

explicit ideas in the surface level.

Verbal Report in Assessment

Research in reading comprehension assessment has consistently recognized the

importance of investigating the examinees’ cognitive processing, thought process, and

strategy use through verbal report measures as part of the process of test validation

(Afflerbach, 2007; Anderson, 1991; Anderson, Bachman, Perkins, & Cohen, 1991;

Cheng, Fox, & Zheng, 2007; Cohen, 1984, 1988, 2000; Cohen & Upton, 2006, 2007;

Ericsson & Simon, 1993; Gass & Mackey, 2000; Green, 1998; Perkins, 1992; Phakiti,

2003; Pressley & Afflerbach, 1995; Urquhart & Weir, 1998; Weir, 2005; Weir, Yang,

& Jin, 2000). Green (1998) defines verbal reports or verbal protocols as “the data

gathered from an individual under special conditions, where the person is asked to

either think aloud or to talk aloud” (p. 1). According to Green, verbal protocols may

be gathered as the task is carried out concurrently (i.e., when the task is carried out) or

retrospectively (i.e., after the task has been carried out). In either concurrent or

retrospective verbal reports, the prompts given to the individual can be non-mediate

(e.g., requests such as ‘keep talking’) or mediate (e.g., requests for explanations or

justifications). Green provides a comprehensive and in-depth overview of the use of

verbal protocols in language assessment and concludes that verbal protocol analysis

has the potential to “elucidate the abilities that need to be measured, and also to

provide a means for identifying relevant test methods and selecting appropriate test

content” (p.120).

Verbal protocol analysis is widely used to probe into the examinees’ use of

reading and test taking strategies during the test (Alderson, 2005; Anderson, Bachman,

Perkins, & Cohen, 1991; Cheng, Fox, & Zheng, 2007; Urquhart & Weir, 1998; Weir,

2005; Weir, Yang, & Jin, 2000). As Ellis (2004) states, “collecting verbal

explanations…would appear, on the face of it, to provide the most valid measure of a

learner’s explicit knowledge” (p. 263). Weir (2005) suggests that test developers and

teachers use verbal report measures to investigate the examinees’ mental process

while taking a test. The analysis of verbal reports allows test developers and teachers

to: (1) evaluate whether the test measures what it is intended to measure; (2) compare

the use of reading skills and strategies between good and poor readers.

Individual Differences in Strategy Use

In a review of reading comprehension research, Perfetti (1997) claims that

research on individual differences among readers is crucial to understanding the

nature of reading abilities. In other words, if we want to understand the nature of

reading comprehension, we need to know the sources of individual differences

between good and poor readers. Perfetti suggests that good readers differ from poor

readers in the following aspects: processing efficiencies (e.g., speed & automaticity of

word recognition), word knowledge, processing efficiencies in working memory,

fluency in syntactic parsing and proposition integration, and the development of an

accurate and reasonably complete text model of comprehension.

In the evaluation of second language reading assessment, Weir (2005) claims

that when proficient readers process different reading tasks (e.g., skimming, scanning,

search reading, careful reading) through skills and strategies appropriate to the

purposes of the tasks, then the test measures what it is intended to measure and is

valid in terms of theory-based validity. On the contrary, if examinees successfully

process the tasks through test taking strategies instead of applying appropriate reading

skills and strategies, then the test does not measure what it is intended to measure and

provides weak evidence for theory-based validity. According to Weir, typical test

taking strategies include: (1) matching words in the question with the same words in

the text; (2) using clues in other questions to answer the question under consideration;

(3) using prior knowledge to answer the questions; (4) blind guessing not based on

any particular rationale (p. 94).

Pressley and Afflerbach (1995) classify reading strategies into three types:

planning and identifying strategies, by which readers construct the text meaning;

monitoring strategies, by which readers regulate comprehension and learning; and

evaluating strategies, by which readers reflect or respond to the text. Research in

second language learning has shown that readers use the same range of strategies to

comprehend, interpret, and evaluate texts (Carrell & Grabe, 2002; Cohen & Upton,

2007; Upton, Lee-Thompson, & Li-Chun, 2001).

Extensive studies have demonstrated that readers use their prior knowledge to

determine the importance of information in the text and make inferences about the

text. While good readers typically make inferences based on prior knowledge directly

relevant to the ideas in the text, poor readers make inferences by drawing on prior

在文檔中九十八年學測閱讀測驗考生作答策略之初探 (頁 23-56)