• 沒有找到結果。

Assessment Procedures

Day 2 – Round 2 and 3

3.4.2 Assessment Procedures

Prior to the beginning of the standard setting, judges were mailed a Panelist Information Form. A copy of this form can be found in Appendix 5. Most of the information requested on the form was demographic, although some of the questions refer to knowledge of testing.

In addition to the items selected from the operational EPT and the mock EPT used during Day 1 of training, this study used four different measurement scales and activities. These

included

1. a test of matching CEFR items with statements about linguistic competency

43

2. assessment of knowledge of standard setting and the Angoff method 3. assessment of knowledge of the CEFR

4. assessment of knowledge of the Practical English Test

Day 2 used a further two evaluations of the validity of the Angoff method and the relationship with training. These included,

5. an evaluation and overview of the procedure at the beginning of Day 2

6. a final evaluation of the activities of the standard setting procedure at the end of Day 2 ASSESSMENT 1, Day 1 Knowledge of CEFR Descriptors

The first measurement taken during the study was the PLD sorting procedure described above.

Judges were given a sheet of paper that listed 20 CEFR reading descriptors and another sheet that listed 20 CEFR listening descriptions taken from the CEFR listening and reading Global

Descriptors for CEFR levels A1 to B2. They were then asked to write down the CEFR level of the 40 CEFR Global Reading and CEFR Global Listening Descriptors into the order that the CEFR would rank them.

Cronbach’s Alpha was calculated for the reading descriptor scale and for the listening descriptor scale. Usually, in this sort of study, Alpha is set at 0.70. However, the items were descriptors taken from the handbooks of the CEFR handbook. (Council of Europe, 2001, 2009).

The handbooks state that the “Can do” statements of the CEFR standard are not intended to be used as a proficiency scale. Rather, they are intended for use as a statement of learner

competence. As a result, items may not be psychometrically sound. In fact, the listening test was unable to reach Alpha = 0.70 and the reading test could only obtain Alpha = 0.70 by deleting a large number of items. As such, the criteria for inclusion to the scale were set at 0.60.

44

The Alpha for the 20-item listening scale was 0.49. Two items was deleted one-at-a-time according to the suggestions provided by SPSS20. The final Alpha for the 18-item scale was 0.61. Similarly, the Alpha calculated for the 20-item reading scale was -0.12. Seven items were deleted one-at-a-time according to the suggestions provided by SPSS20. The final Alpha for the 13-item scale was 0.71. Scores for Alpha reported in this study are for the 18-item listening scale and for the 13-item reading scale.

ASSESSMENT 2, Day 1 Confidence in Standard Setting Procedures

The second measurement was an 8-item scale constructed solely for this study, although items resemble those found in Cizek (2001). Cizek’s later book (2012) supplies a wider range of item possibilities, but these were not available when this study was done. Loomis (2012) reports that NAEP has a series of questions that it has been asking since their standard setting studies began.

Details about the results of these questions, either by Cizek or Loomis, are not reported at all, so there is no way of comparing their results with those of this study. Likert-scale scores selected by the judges were used to calculate judge’s confidence in their knowledge of standard setting procedures and the Angoff method in particular.

Assessment 2 is aimed at measuring judge’s basic knowledge about standard setting after being introduced to the idea and to the Angoff method of conducting the procedure. A copy of the form can be found in Appendix 6. With an N=18, the Cronbach’s Alpha for the assessment is 0.70.

ASSESSMENT 3, Day 1 Familiarity with the CEFR

Assessment 3 is a 9-item scale constructed solely for this study and designed to assess judge’s basic familiarity and opinions following training about the CEFR. The form can be found in

45

Appendix 7. It should be noted that this was not designed to assess judge’s ability to use the CEFR properly. Rather it was designed to measure judge’s confidence in their understanding and ability to use the CEFR. With an N=18, the Cronbach’s Alpha for this scale is 0.81.

ASSESSMENT 4, Day 1 Familiarity with the Practical English Program

Assessment 4 is an 8-item survey designed to provide the standard setting facilitators with an overview of how well the judges believed they understood the events of the entire Day 1 training. The items of Assessment 4 are contained in Appendix 8. The Cronbach’s Alpha for Assessment 4 is 0.79.

ASSESSMENT 5, Day 2 Morning Confidence in Training Skills

Assessment 5 was the first activity conducted on the second day of the standard setting and was conducted as part of a warm-up session to remind judges about the procedures. The main research purpose of the activity was to survey judge’s feeling of confidence as they entered the operational part of the standard setting. Assessment 5 was an 8-item survey of judge’s

perceptions of their current confidence with the procedures and skills that had been covered on the previous day’s training (see Appendix 9). The survey was not intended as a scale, although all of the questions were connected to an understanding of the various parts of the Angoff

procedure and the instructions the day before. Judges were asked if they felt qualified to conduct the standard setting and in another question, were asked if they felt they were ready to start the procedure. Likert-scale scores selected by the judges were used to calculate judge’s confidence of their knowledge of standard setting procedures and the Angoff method in particular and calculate judge’s confidence in their preparation to perform a standard setting.

46

Even though the questions were not intended as a scale, treating it as a scale and calculating the Cronbach’s Alpha for the scale produced an Alpha = 0.90.

ASSESSMENT 6, Day 2 Final Evaluation of Confidence in Overall Performance

Assessment 6 was the measure of judge’s overall perceptions of the standard setting. More than any other assessment, this was not a scale (see Appendix 10). Likert-scale scores selected by the judges were used to calculate judge’s confidence in their performance during the standard setting.

Once again, even though the questions in the evaluation were not designed with the idea of being a scale, if we treat Appendix 10 as a scale, the Alpha – at 0.86 – was high.

47 3.5 Assessment Expectations

The goal of this study was to determine the efficacy of training on the ability of judges during an Angoff standard setting. Raw scores were treated as the unit of analysis and not converted using latent trait models, such as Rasch modeling, etc. Two aspects of training were investigated to see if they had an effect on the final performance of the judges; the efficacy of the training and the self-reported familiarity of judges.

3.5.1. Efficacy

Effectiveness of performance was determined through a series of different types of assessments.

The first of these was Assessment 1. Although this was done as part of the training exercises, it was intended as a measure of judge’s understanding of the PLDs.

Efficacy was measured using PLDs through comparison of p-value correlations and Root Mean Square Error (RMSE). P-value correlations were determined by calculating the Pearson

correlation between the empirical p-value of the item measured during actual use of the item during high stakes tests and the judge’s estimate of the p-value obtained during the operational standard setting (Brandon, 2004). The RMSE was calculated for the performance for each member of a panel.

相關文件