• 沒有找到結果。

Summary of Results

CHAPTER 5 CONCLUSIONS & DISCUSSION

5.1 Summary of Results

Did knowledge of the PLDs predict a judge’s final ability? No evidence was found in this study for such a relationship. Some of the judges who showed strong knowledge of the PLDs were among the lowest in demonstrated ability. Likewise, some of the judges who did not score well on the PLD Test were among the best performing judges in terms of their final ability. The idea that there is a relationship between knowledge of the PLDs and ability to determine cutscores is rejected.

This observation is especially sensitive. This study used the CEFR as its Performance Level Descriptors. The CEFR is widely used by teachers and researchers throughout the World. One of the findings of this study was the poor psychometric performance of the CEFR-based PLD Test.

The PLD Test was composed of descriptors taken directly from the CEFR. It was only with difficulty that these descriptors could be used into this study. While the Council of Europe warns

80

against using these descriptors as a measure of language acquisition, describing them instead as statements of language competence, it is not entirely clear why the two ideas should be as different as found in this study. Because of this problem with the CEFR descriptors, they should only be used cautiously in research, and measures of reliability and suitability as a scale, such as Cronbach’s Alpha, should also be reported. This problem suggested future research using the CEFR descriptors as measurement items should develop a proper measurement instrument rather than the list of items used in this study.

Do self-report measures of familiarity and confidence with the standard setting procedures predict a judge’s accuracy? This study found a mixed response to this question. Measures taken on Day 1 of the study showed only a marginal relationship to judge’s final ability. On Day 2, however, measures of familiarity and confidence showed a much better relationship with judge’s final ability.

These findings can be interpreted that self-report measures are only of limited value when predicting judge’s final performance in an Angoff standard setting. However, another interpretation is possible. The two-day standard setting that was described in this paper is unusual in that respect. Most standard settings are only one-day events. It may be no accident that the results of the first day of the two-day standard setting all show only marginal relationship with final performance. This is true not only for the self-report measures but also for the PLD Test, which was also held on Day 1. It may be that the poor results correlating the PLD Test with judge’s final performance resulted not from the poor performance of the PLD Test, but rather because the standard setting was held over two days and the PLD Test was administered on Day 1.

81

Establishing whether the best length of time for a standard setting should be one or two days long would involve experimental manipulation of the length of training. Such a study is beyond the resources of this project, however, the optimal length of training remains a question of interest.

Another factor that may have played a hidden role in masking the effects of PLD knowledge and self-report results are the two related factors of social influence and decision-making styles.

These factors were mentioned several times in the text as reasonable explanations for some of the differences that appeared in the study. The potential role for social influences is large, especially during the periods of training and standard setting before judges have developed a strong image of the borderline candidate.

Research on social influences in the standard setting has been generally ignored in more recent research work on the subject which has focused on the role of feedback and the way in which panelists use this to make their own estimates. It is clear from the results reported in this study that factors other than knowledge of the procedures of standard setting are responsible for the final performance of a standard setting judge. Social influences may play an important role, at least in some aspects of standard setting. In this study, judge failure was clustered inside panels and seemed to stem from problems with groups of individuals within a panel; rather than lone judges or panels. This is not necessarily the case with every standard setting panel, or even every panel that fails, but it certainly raises the issue. A revival of research into the social influences that play a role in standard setting may help overcome some of the problems produced by the small number of the panelists involved.

82

Despite these problems with the standard setting, the Angoff method worked as planned. Most judges changed their judgments during the interval between Round 1 and Round 2 after they had been exposed to the feedback data which included information about item difficulty. This gave judges the information they needed to position their estimates about the borderline candidates.

Judges who were unable to do this appeared to have problems understanding the instructions and performing their duties as judges. Interviews with judges after they had performed their standard setting indicated that they had used the feedback information as intended – to help determine the difference between their first estimates and true estimates of the ability of the borderline student.

相關文件