CHAPTER THREE. METHODOLOGY - 建構與驗證英文詞彙知識測驗

Test Construction

The present vocabulary test includes 180 items. According to frequency, the most frequent 6050 words were chosen from the new frequency list (NFL)

(Gardner & Davies, 2013). The words chosen were ordered by frequency and each

group of one thousand words was divided into the same level, yielding six levels.

In order to determine whether the subjects reached a particular word frequency level, 30 target words were chosen from the margin of two levels between words 950 to 1050 relative to the lower margin of the lower level. For example, in Level Two, 30 target words were chosen randomly from words in the range from 1950 to 2050. In addition, thirty test items were chosen randomly from each frequency level. In order to avoid systematic error, the items were split into two forms, Form A and Form B. There were 90 items of each form with target words in six levels.

A four-choice MC format was adopted in this test. The subjects had to relate the definition of the stem to the word with the closest meaning. The format of

definition-word matching was chosen because it was simpler to write a stem with high-frequency words. In order to avoid including words in a stem that were too difficult, which might lead to misapprehension of a word, it was essential to keep the words in a stem as easy as possible. As a result, the definitions in the stems were all written with words in the first two high-frequency levels so as to minimize the possibility that the subjects failed the test because of the use of obscure words. Four well-known English dictionaries of high reputation, the Longman Dictionary, Oxford Dictionary, MacMillan Dictionary, and Cambridge Dictionary, were used as references to keep the stem concise and clear. To ensure that the difficulty levels of the three distractors were identical, words were picked within the same frequency band as the target word. To eliminate the guessing effect, if the target word had an obvious suffix or prefix, which would reveal information about the target word, the distractors should also contain it. Take the target word “legislation,” for example. The distractors for it would be

combination, motion, and contribution. The suffix –tion indicates that the part of

speech was noun; therefore, if the distractors were not in a similar form, the subjects might get the correct answer without knowing the meaning.

Besides eliminating the guessing factor, the test should be designed to

prevent possible fatigue. For instance, a continuous sequence of items from low or high-frequency levels might make subjects tired or bored. As a result, in the present study the order of the target words did not follow the original order of the word frequency level from one to six. Instead, they were organized in the

following order: Level One, Level Two, Level Five, Level Three, Level Four, and Level Six. For each level, there were fifteen target words.

The test items were closely examined by the author, three experienced senior high school teachers, the supervising professor, and a foreign consultant. After a thorough examination, ten high school students were assigned to take the test as a pretest to prevent possible errors and allow the author to fix problems and make changes to it.

The sample items were as follows. The correct answer is shown here in bold font. For the whole vocabulary test, please see Appendix A(Form A) and

Appendix B (Form B).

Level 1

1. the part of the day between the afternoon and night (A)evening

(B)unit (C)adult (D)fear

2. a large formal meeting (A)writer

(B)conference (C)camera (D)chair

32 3. a way of doing something

(A)board (B)doctor (C)husband (D)style Level 2

4. someone who breaks the law (A)criminal

(B)audience (C)participant (D)labor

5. to go or do something very quickly (A)stare

10. mental or physical pain (A)accounting

(B)counterpart (C)suffering (D)departure

11.a part that you make or join to another part (A)transaction

(B)addition (C)determination

33 (D)pension

12.a person who works in another person's house, doing jobs such as cooking and cleaning

13. the players in a sports team who play in a particular game (A)forecast

(B)lineup (C)rebound (D)cleanup

14. to use something or someone instead of another thing or person (A)formulate

16. showing what something is like (A)outrageous

(B)reflective (C)miniature (D)lengthy

17. relating to muscles (A)marginal

(B)muscular (C)sensible (D)naïve

18. to go regularly to and from work (A)commute

(B)multiply (C)offset (D)notify

Participants and Test Administration

This test was administered in five senior high schools and a national

university, located in Taipei, Changhua, Yilan, and Keelung. The PR value of the

participants was from 70 to 95. English classes for tenth graders were about four hours every week, whereas eleventh graders had five hours. The college students were non-English-major senior students taking an English class as an elective course. There were in total 1,198 participants taking the test, of whom 849were from senior high schools and 349 participants were college students. For senior high school students, it took 50 minutes to finish the test, either Form A or Form B, as pilot tests. For each class, test administrators were all English teachers who chose one form to implement the test. Neither the teachers nor the test participants knew the purpose of the test, which was a part of the research. None of them were informed of the source of the vocabulary words. The participants’ English teachers asked their students to take the test seriously in order to measure their own

vocabulary size. Also, according to the test results, the English teacher would choose the top five students in each class to participate in the national English vocabulary contest. The test administrator distributed and later collected the test paper after the bell rang. After the pilot tests showed good test validity, the formal test combined with Forms A and B was implemented among the senior students in the national university. The formal test followed the same test schedule as the pilot tests; it consisted of 180 test items and the test lasted 100 minutes.

Scoring and Coding

The test was split into two forms. In either Form A or Form B, each item accounted for 1%. Correct answers received one point, while wrong ones received zero points. The participants used computer cards to answer the questions, and the test results were collected by computer.

35 Data Analysis

The present study aims to establish an instrument to provide valid and reliable estimates of vocabulary size for L2 English learners and to explore how item difficulty, discrimination, and pseudo-guessing parameters interact with frequency bands. The 3PL model of Latent Trait Theory was adopted to meet the goals of the study for the

following reasons. First, LTT would present the author with statistics of overall model fit that would reveal whether the test as a whole is an appropriate measure of latent ability. With overall model fit established, the validity of the present vocabulary test would also be established. Information on reliability would be calculated by means of the LTT model. Second, the 3PL model allows insights into three item parameters—

difficulty, discrimination, and pseudo-guessing. Hence, the author would be able to gather information on and make comparisons of the three parameters across frequency bands. Also, the information on difficulty and discrimination would enable the author to separate well-written items from poor ones. Third, since test items in multiple-choice form might be susceptible to the effects of guessing, the employment of 3PL provides critical information on the pseudo-guessing effect, allowing the author to rule out

unreliable individual examinees. Thus, misfit items and test-takers would be spotted and ruled out. XCalibre was used to analyze the data. Finally, the test can show to what extent the selected margin words in each vocabulary band can demonstrate that the subjects have passed a particular frequency band.

在文檔中建構與驗證英文詞彙知識測驗 (頁 36-42)