Literature Review - 台灣公務人員採用英語標準化測驗為評量機制之研究:從2002到2010年

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Chapter 3 Literature Review

This literature review begins by discussing ancient China’s imperial exam system, which remained in existence for over two thousand years. The current testing of civil servants in Taiwan is influenced both by the history of testing in Asia and the development of objective testing methods in the West in the twentieth century. Next, literature related to language policy is discussed, starting with a policy analysis framework that will be employed in this research.

Then language testing policies are introduced, beginning broadly by examining different analytical approaches, and then turning to the influence of the CEFR and challenges associated with the adoption of its common reference levels as a proficiency yardstick. Finally, this section ends with a discussion of current English language testing policy issues in Asia and Taiwan. The third section of this chapter addresses the use of high-stakes language tests. The topic of

consequential validity of language tests is introduced in order to establish the key concerns for appropriate test use and validation procedures. The theory of assessment use arguments is next outlined, and the potential value of this method for validation is discussed. The issues of test washback and impact are reviewed with the goal of helping to establish the rationale for carrying out this research project. In the final section, the literature review turns its attention to lifelong learning and motivation associated with language learning and testing. The promotion of lifelong learning is one of the rationales behind the effort to encourage civil servants in Taiwan to improve their English language ability. To identify and explain the role of factors that could influence the efforts of Taiwan’s government employees to develop their English proficiency, theories of adult learning and education are discussed. Motivation for learning plays an essential role in education and is influenced by testing. Efforts to describe the influence of motivation on

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Taiwanese learners of English are reviewed, and the potential implications of such context-specific motivation for civil servants in Taiwan are addressed.

History of Civil Service Exams

The use of examinations as a gate-keeping device for entrance to government

employment began with the Chinese imperial examination system. The long history of exam-oriented education in Asia has had a strong influence on test use and educational practice in Taiwan. A review of the history of civil service exams in China, as well as the use of such exams in Europe in the nineteenth century, may offer insight into the use of English proficiency examination as a criterion for the promotion of civil servants in Taiwan. Of particular relevance is discussion of the test preparation strategies employed by the exam-takers, the changes to the exam procedure and contents that were intended to minimize unintended consequences of test use, and the narrowing of curriculum that occurred as a result of the use of high-stakes tests.

Cheng (2010) notes that the imperial examination system, founded in 206 BCE during the Han Dynasty and lasting until 1904, had as its purpose the selection of candidates for service in the imperial government on the basis of their merit rather than their membership in the

aristocracy. Cheng notes that the imperial exam had two complementary purposes, the selection of officials and the testing of knowledge and ability. The content and categories of the

examination changed over time, with factors such as the separation of the functions of selection and appointment, the adoption of printing, which led to an increase in the number of candidates and increased its difficulty, and the adoption of the eight-legged essay format, which turned the examination into a “contest of regurgitation,” all having an impact. Cheng describes the examination as extremely competitive, noting that 2 million candidates took the exam at the

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

district level in 1850, and only 300 passed the metropolitan exam, giving candidates a 1 in 6,000 chance of passing all three levels.

Suen and Yu (2006) similarly examine the development of these high-stakes exams, noting the impact of test preparation strategies on exam scores and attempts by exam officials to repeatedly modify the test tasks over time in order to reduce the impact of test preparation.

These authors found that the imperial exam system led to a narrowing of the education system and a rise in the use of exam-taking strategies. Scholars writing at that time judged such methods as having the effect of subverting the intended purpose of the exams, thereby

weakening the validity of their scores. Suen and Yu conclude that the exam officials were never able to revise the exam content sufficiently to eliminate the influence of exam-taking strategies, and that the continued reliance on high-stakes testing in the modern day is bound to encounter the same difficulty.

The view that these authors share is significant in that it relates a modern phenomenon to a historical trend, but the context of high-stakes testing has clearly been altered in the more than two thousand years since the imperial examination system was first introduced. One could argue that the stakes currently associated with civil service exams are much lower than in past

centuries given the development of non-governmental social institutions. Similarly, the test methods now being employed, particularly on proficiency tests that assess communicative competence, are substantially different than those used in early exams. Given the long history of the use of examinations in Asia, it may not be possible to draw too many conclusions about long-term trends based on observation and analysis of the use of test methods in use today or in recent decades.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

The adoption of civil service testing in England in 1855 was a direct result of the need to select staff for the British colonial government in India, but it had its root in the introduction of Chinese testing principles to Europe by the Jesuit missionaries who returned from Asia. (Cheng, 2009; Spolsky, 1995) Like the Chinese imperial exams, competitive exams in Europe were a means for replacing patronage in the selection of individuals for government positions, but it was also acknowledged that such testing did produce a narrowing of the education process,

subordinating teaching to testing (Cheng, 2009). Spolsky (2009) describes the introduction of competitive exams to the United States in the early twentieth century as being understood to have three purposes: to stimulate education, to exert control over the education system, and to select civil servants or award professional qualifications. It was in the U.S. that the development of intelligence testing and the subsequent founding of statistics occurred, and these were to be key antecedents to the development of objective testing and psychometric measurement, two innovations that increased the reliability of tests while temporarily putting aside questions about the validity of the measurements that were being made.

The introduction of the Chinese imperial examination system to the West was significant in that it influenced examination procedures used for both government employment and

education. The development of objective testing and psychometric methodology in the West transformed the traditional testing practices that had been employed in Asia for two thousand years. While the testing methods have changed, the purposes of testing have largely remained the same. Performance on competitive examinations is seen to be an indicator of individual’s merit and also of potential contribution to society. The use of tests for the recruitment of civil servants in Taiwan today is seen as democratic, fair, and expedient. It may be that the use of tests as gate-keepers in Taiwan benefits as much from the history of examinations in Asia as

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

from the development of objective testing in the West; the advantages of the two methods making up for the other’s shortcomings. Tests can serve as both an objective method for selection and as an indicator of ethical standards with deep cultural roots.

Language Policies and Testing

This section begins with a discussion of the policy analysis framework that is employed in this research. Next, it considers principles of language planning and policies. Then it turns its attention to the CEFR. Finally, it reviews the literature on language testing policies in Taiwan.

Policy analysis framework. Policy analysis frameworks are useful for describing policy measures in a systematic manner that facilitates analysis of the component stages in a policy cycle and the relationships among the various agencies and institutions. It is believed that the application of a policy analysis framework could clarify the roles of the different agencies responsible for making and implementing the English language proficiency testing policy for civil servants. Further, it may provide a basis for evaluating the effectiveness of the policy and help to identify factors that influence the policy’s outcomes.

In 2002, the Geelhoed-Schouwstra (G-S) framework for policy analysis was created for use by the Dutch Ministry of Finance in the evaluation of national and international policies.

The framework was based on a conceptualization of the policy-making cycle as being composed of six steps: goals, objectives, methods/instruments, activities, performances, and evaluation.

The purpose of the framework was to “identify factors that cause policy outcomes to diverge from the intended results” and may be used both when making and evaluating policy (Geelhoed and Ellman, 2006, p. 1). One drawback of the original framework was that it did not establish the way in which the policy would bring about its intended effects; therefore, the basic

framework was expanded to incorporate conceptual and institutional frameworks to account for

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

key contextual elements that influence policy-makers. The conceptual framework accounts for the effects of ideology, social norms and values, fundamental theories and assumptions,

definitions, and attitudes and behaviors and is itself subject to influence by culture, geography, and history, both individual and collective. The institutional framework is composed of elements that include the political, social, and economic setting, the institutional and legal setting, and stakeholders. The authors note that the various elements of the institutional framework will be weighted differently according to the type of policy that is being analyzed, and that education policies (such as the testing policy that is the focus of this study) might warrant much greater attention to elements of social setting than the others. They recommend that the basic framework be applied first in order to identify the key features of a policy, and then the conceptual and institutional frameworks be systematically applied to bring greater transparency to the analysis.

A key feature of this extended framework is that it focuses attention on policy stakeholders and the relationships between them that may influence the outcome of the policy. The conflicting interests of the various stakeholders could have a negative influence on a policy, and are thus possible sources of divergence from expected outcomes.

The extended G-S framework is used to trace the development of the testing policy from the goals it was intended to accomplish through the evaluation of the performances that it produced. The conceptual and institutional frameworks provide an opportunity to introduce contextual features into the analysis of the decision making that is at the core of the policy cycle.

Language policies. Policies that call for language testing are part of what Cooper terms language planning. This he defines as “deliberate efforts to influence the behavior of others with respect to the acquisition, structure, or functional allocation of their language codes” (Cooper, 1989, p. 45). Language planning plays a role in the political process as decisions are made by

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

authorities and then implemented for the benefit of society via policies. Cooper outlines a descriptive decision analysis as an examination of “what people actually do in arriving at their decisions, good or bad,” with a focus on the three separate roles of individuals, institutions, and the public arena. He offers this summative statement to express how such an analysis would gather data about language planning decisions: “Who makes what decisions, why, how, under what conditions, and with what effect” (Cooper, p. 88).

Insofar as Cooper addresses acquisition planning as one subset of language planning, he recognizes two fundamental bases for such efforts. These are overt language planning goals, such as acquisition of a second or foreign language, and the methods employed to reach the goal.

These methods could include the creation or improvement of opportunity to learn, the incentive to learn, or a combination of both opportunity and incentive. Cooper notes that it is difficult to evaluate the effectiveness of acquisition policies, whether the degree of improvement or the relative contributions of various factors. Generalizing about effective acquisition planning, Cooper makes two points that are relevant to this research into language policies in Taiwan.

First, he asserts that unless the language serves a necessary function for the targeted population, the policy is unlikely to be effective. This implies that the goals of the Action Plan adopted by Taiwan’s government, which the testing policy is designed to support, are more likely to be achieved if they are based on actual need. If policy makers overestimate civil servants’ need for English proficiency, the policy’s goals may prove difficult to realize, and resistance to the policy may be encountered. Secondly, implementing a policy may “require repeated efforts by planners to cope with the resistance of those they seek to influence” (Cooper, p. 185). As the testing policy timeline in Chapter 2 shows, the initial targets for the number of civil servants that had demonstrated their English proficiency by taking an English test were not met, and additional

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

measures to encourage compliance, such as increasing the value of the incentives, were put into effect

Shohamy (2008) traces a development within the field of language testing that sees a shift from focusing mainly on psychometric traits of tests toward a greater interest in test impact, the ethics of testing, fairness, and consequences. Promoting use-oriented design and critical language testing, Shohamy calls for the questioning of tests, not just how tests are designed, or what content they test, but also their societal role, such as how tests-takers are affected by tests, what knowledge is created by tests, what decisions are made based on test results, and what motivations are behind the introduction of tests (Shohamy, p. 363). Testers have recognized the need to examine the role of tests in relation to their powerful impact on society and washback on educational practices. Policy makers also understand the power of tests to influence the behavior of others, and Shohamy (2001) has argued that it is this awareness that has led those in authority to introduce tests as instruments of policy.

According to Shohamy (2008), language testing and language policy influence each other in two directions. One is the policy of introducing the test itself, as an entrance or graduation requirement, or as a promotion criterion, and this perspective, she says, has been the subject of little research, particularly into the intentions, reasons, and arguments for introducing particular language tests (Shohamy, p. 366). The other direction is related to the consequences of

introducing a test, specifically the influence on language practices. This research study makes an effort to examine both directions of the English proficiency testing policy. The interviews with representatives of the agencies responsible for formulating and implementing the policy are intended to identify why the goal of encouraging civil servants was adopted and how using tests to certify English proficiency was expected to produce the desired result. The survey of

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

employees of the three agencies is intended to explore the consequences of introducing the testing policy, looking at its influence on studying, motivation, and on professional duties.

One such consequence of test policies that is of interest to Shohamy is the effect of creating uniformity with regard to ideologies about language learning and use. Writing on the CEFR, Shohamy identifies several areas, such as the absence of language context and advanced language proficiency use, in which the adoption of its rating scales could lead to a prescriptive view about the hierarchy of language acquisition and use that is not based in reality. Rejecting language policies that impose uniform ideologies through the imposition of tests with insufficient regard for their consequences, Shohamy calls for adaptive language policies that recognize that language learners bring different knowledge and experience to the act of language learning.

From this perspective, one could expect that a test policy that assumes a uniform need for English among employees in different agencies and different positions might be ill-suited to the needs and interests of those it seeks to influence. An adaptive policy would recognize diversity and avoid imposing standards for language learning and use that are not based in actual

conditions.

The way forward, Shohamy proposes, is asking how tests can lead to language policies that do not simply promote a top-down view. From this viewpoint, the challenge for testers is to explore how tests can be “more inclusive, democratic, open, just, fair and equal, and less biased”

(Shohamy, 2008, p. 371). Shohamy calls on testers to recognize language tests not just as measurement tools, but as devices with the power to influence education, society, and language policy. She sees that language testing and language policies share a common interest, not to

“confirm ‘bad policies’ but rather to pose questions about how language tests can be instrumental in the development of ‘good’ and ‘just’ policies which are more in line with current language

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

practices” (Shohamy, p. 371). The significance of this view of tests and test policies as powerful instruments is that the power need not be wielded without regard for the interests of those who are affected. The goals for which tests are employed can benefit society and individuals alike, and widening the range of stakeholders who participate in the discussion of how best to utilize tests and formulate testing policies can increase the value that they may produce. Language tests, like other types of tests, have a history and a future. The present task is to develop an

understanding of how to maximize benefits of test use in the future based on an analysis of their use in the past.

CEFR. In 2005, the MOE adopted the CEFR Common Reference Levels as an English proficiency yardstick. An English examination scoring table produced by the CPA in that same year equated scores on different English proficiency tests on the basis of their alignment with the CEFR levels. Concern that the Common Reference Levels were not a suitable basis for test

在文檔中台灣公務人員採用英語標準化測驗為評量機制之研究:從2002到2010年 - 政大學術集成 (頁 48-107)