• 沒有找到結果。

Many-Faceted Rasch Measurement Model (MFRM)

In oral or writing tests, or performance assessment of any kind in general, it is possible that factors or facets may concurrently occur and result in unexpected or unfair judgements, like tests items, test takers, raters, etc. People of various proficiency levels, ethnological background, or even gender may be appraised drastically different on performance assessment if one exclusive single variable, such as their background, is taken into account. Variables such as task itself, raters, and other related factors may

16

produce a wide range of result and thus mistakenly analyze and erroneously present the real competence of the test taker. Due to the complicated nature of performance assessment, i.e. manifold variables, it is essential to reliably estimate to what extent test takers can perform. Hence, the Many-Faceted Rasch measurement model has been introduced to recompense for the deficiencies of raw scores and is suitable for gauging the corresponding competence of test takers, comparative difficulties of test items, and the stringency of raters in a logit scale. Further, a model can be created to forecast any test taker’s likelihood from a certain rater with particular stringency on a specific testing item. This matters tremendously, performance assessment in particular, since the objective is to present credible and stable scores, which may be affected by various factors collectively and concurrently (Bachman, 2004; McNamara, 1996).

The Many-Faceted Rasch Model derives from the basic Rash measurement model and is suitable for processing dichotomous information. In this model, the value of number manifests itself, the value of 1 signifies significance compared to the value of 0. Plus, raw scores can be adjusted to form a linear, calculable, and

duplicable estimation. The parameters of each test item, test taker, and rater become distinct in the Rasch model, which allows researchers to compare and contrast without being subjective. The advantage of MFRM lies in the feature that each parameter is capable of being conditioned during the processing (Brentari & Golia, 2007). This empowers the researcher to place the facets, like item difficulty, personal competence, and rater severity, on the equal logit scale to compare. In addition to specific factors, latent factors such as personal characteristics can also be revealed based on the calibration of a Rasch map (Figure 1).

17

Figure 1 The Facets variable map of This Study.

Unidimensionality and local independence are two fundamental assumptions of Rasch measurement model that requires being distinguished. The former refers to the condition in which a single attribute undergoes assessment at a time and the entire assessment items gauge exclusively a single construct. All examination questions help assess an individual attribute, along with the estimations of test taker’s competence and test items difficulty in the data matrix being taken into consideration. The

characteristics of unidimensionality is not unusual to educational statistics and it functions under the circumstances when an individual quality or facet in an

examination is being adopted to construe the major measurements of the testing score sums (Bond & Fox, 2007).

18

Local independence also plays a vital role in the functioning of the Rasch

measurement model under the unidimensionality scheme. It is presumed that each test item is linked to a latent trait value on its own. Latent trait refers to the idea that the data comply with an aligned measurement line or the fundamental construct. Every

individual examination item carries value of its own and people who participate in the test respond to these items independently. The test taker’s competence, the

corresponding test item difficulty, and the stringency of raters can thus be outlined in a logit scale and a model can be produced to forecast each individual’s likelihood from a single rater with a given stringency on a specific item (McNamara, 1996).

Researchers will be informed about the interrelatedness of the data by the fit analysis, which comes from MFRM and it is able to demonstrate how well each examination item conforms to the intended construct. Abnormal signs, if any, will be pointed out by the fit analysis. Those unfitting examination items are recommended to be redesigned, substituted, or simply stated in other forms (Bond & Fox, 2007). Besides examination items, researches also suggest that with the help of the fit analysis,

aberrant raters can also be identified, either misfitting or overfitting, and thus pinpoint certain raters required to be improved, or even recruit new ones (Bachman, 2004;

McNamara, 1996). Hence, based on the anticipation of the model, any varying characteristic of the raters will be revealed and compiled by the fit statistics

(McNamara, 1996). In the meanwhile, investigators are able to inspect the condition of the examination items and keep track of them by the fit analysis.

Furthermore, Rasch analysis can yield two indicators, infit and outfit. Infit is the square of the model standard deviation of the observation and pertains to the anticipated value. Outfit, on the other hand, depends on the squared standardized residuals and pertains to underlying unanticipated ratings. Contrary to standardized statistical methods, infit and outfit mean square tend to be less impressionable to sample size and

19

the value is determined by test takers’ response patterns, which is the reason why they are broadly adopted and welcomed (Smith, Schumacker, & Bush, 1995).

To sum up, MFRM is an essential measurement, which functions include gauging the characteristics of the participants and the examination itself under common, suitable measurement circumstances. It is devised to discern the likelihood of success established by the difference between the individual’s competence and item difficulty, presented in a table of expected likelihood. Investigators will be able to deduce each individual’s true competence and optimally decipher the examination since the probabilistic estimates are based on actual test performance.

It is also worth mentioning that in the Rasch model test takers can be arranged in order according to their competence. Examination items can also be presented in sequence based on how hard it is. This is a vital feature since raters are crucial in the outcome of the test result in the performance assessment. MFRM may serve as a remedy for the righteousness of grades given by raters in performance assessment.

Engelhard (1992) indicated that MFRM is an impartial and trustworthy mechanism, especially applicable in writing competence assessment. Traditionally, if one is simply judged by its own given raw scores, his/her real competence may be under or overrated.

Besides, each individual rater may be relatively more or less stringent on each examination, and it is possible that the exact same test taker will receive drastically different grades from different raters. Even if there are rating trainings in advance for raters, some disparities may still exist, high-stakes examinations in particular. The application of MFRM hence becomes critical since giving raw scores only may yield a misrepresentation of test takers’ true competence. McNamara (1996) indicated that raw score should not be considered as a trustworthy index of the test taker’s true

competence since many variables may affect the result in one single examination. The gauge produced by the Rasch model will take miscellaneous rating circumstances into

20

consideration and thus more reliable compared to raw scores. Figure 2 demonstrates the measurement model for the writing assessment, in which the theoretical aspect is clearly illustrated for a prototype.

Figure 2 Measurement Model for The Writing Assessment (Engelhard, 1992, p. 174)

The MFRM model that reflects the conceptual model of writing ability takes the following general form (Engelhard, 2002, p. 175):

log[Pnijmk/ Pnijmk-1] = Bn – Ti– Rj– Dm– Fk

where

Pnijmk= probability of student n being rated k on translation task i by rater j for domain m Pnijmk-1 = probability of student n being rated k-1 on translation task i by rater j for domain m

Bn= translation ability of student n Ti= Difficulty of translation task i Rj= Severity of rater j

Dm= Difficulty of domain m

Fk= Difficulty of rating Step k relative to Step k-1

21

In this model, the observed rating is the dependent variable. The three primary facets which characterize the intervening variables are domain difficulty, difficulty of writing task, and rater severity. Test taker’s writing ability is the fourth underlying facet. Apart from the above-mentioned, the structure of rating scale is a vital factor as well.

Conventionally, inter-rater reliability is adopted to inspect raters’ effects, which is to see whether the rating pattern among raters is deviating or not. Nonetheless,

inter-rater reliability is unable to distinguish each rater’s distinctness in terms of their rating stringency and leniency (Bond & Fox, 2007).

What makes MFRM magnificent is that the latent volatility among raters can be revealed. After raters’ volatility being modified, test taker’s real competence becomes crystal clear. According to Bond and Fox (2007), the issue of inter-rater reliability lies in the fact that even though the rank orders of test takers are in conformity, the

stringency or leniency variation between raters failed to exhibit. The MFRM is capable of mending this gap via modelling the relatedness among raters and thus safeguard the consistency of rating (Bond & Fox, 2007). Accordingly, inter-rater reliability is

deficient in proffering an exhaustive rating pattern between raters; the implementation of MFRM becomes essential.

Another worth-mentioning feature of MFRM lies in the interplay between a certain rater and a specific facet of significance. In performance assessment of any kind, interplay of facets may lead to bias. It could be a specific rater with some particular rating circumstances. In the Many-Faceted Rasch measurement, there is a feature called bias analysis, which is capable of detecting these sub-patterns analytically. With this function of MFRM, the observed and expected values, called residuals, would be able to analyze them in contrast. In-depth analysis will reveal whether any sub-patterns exist, like interplay within groups or between groups systematically (McNamara, 1996).

Therefore, the FACETS will be put into use for the current study.

22

相關文件