The comparisons of PIRT and NIRT

Chapter 2 Literature Review

2.4 The comparisons of PIRT and NIRT

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

2.4 The comparisons of PIRT and NIRT 2.4.1 The limitations of PIRT

The methods of estimation affect the usage of PIRT. The methods of estimation for PIRT, either MMLE or Bayesian, require a large sample size to converge. In addition, a number of assumptions and premises have to be satisfied before processing (Dyehouse, 2009). MMLE may theoretically apply in the condition of a large sample size. The requirements are for both sample size and items, which should be more than 20. The requirements are to avoid generating incorrect results due to the atypical responses and potential non-normal distribution of ability. The data cannot include either all correct or all incorrect responses when adopting MMLE as the method of estimation. Although Bayesian provides a solution for this condition, Bayesian is a bias estimator rather than unbias estimation from MMLE. Bayesian would be affected by means of prior distribution (Yu, 2009). These limitations on the methods of estimation would be an issue for the applications of PIRT.

The assumption of invariance is a crucial feature of PIRT. A number of IRT related studies, such as the measurement of item bias, equating and computer adaptive testing, rely on the assumption of invariance. However, the feature of invariance is only applicable when the assumed model and data fit (Wells & Bolt, 2008). When the model-data cannot fit, then some false applications of IRT would occur (Bolt, 2002).

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

That is, it is vital to ensure that there is a match between the assumed model and real data.

Numerous PIRT models have been used to analyze dichotomous items and polytomous items. To select the appropriate method for analysis, researchers have to know the distribution of their data (Higgins, 2004), as there is an assumption about the distribution of ability in PIRT (Meijer & Baneke, 2004; Sodano &Tracey, 2011).

The distribution for parametric methods is assumed as normal (Granberg-Rademacker, 2010). In addition, the item response curve is assumed to be a specific form, such as logistic or monotonous. For the relationship between item and trait, the symmetric form is assumed (Meijer & Baneke, 2004; Sodano &Tracey, 2011), and is sometimes unreasonable in a real condition. A number of methods could correct or transfer the data from the non-normal distribution, before using parametric statistics (see Granberg-Rademacker, 2010). Alternatively, researchers have to select methods which are based on other models. For example, Woods (2006) suggested that managing skewed data by using the Ramsay-Curve item response theory (RC-IRT) could result in a more accurate estimation.

2.4.2 NIRT

In relation to the limitations of PIRT, the NIRT is as a valuable method. First, in the aspect of the method of estimation, some methods have been developed under the

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

scope of NIRT, such as Kernel smooth, bootstrapping, EM estimation (expectation maximization). These allow NIRT to reduce the relay on normal distribution and logistic ogive models (Molenaar, 2001).

Compared to PIRT, NIRT models do not need to fit the data to models, but rather base the models on the real data (Embretson & Reise, 2000; Reise & Henson, 2003;

Tate, 2002; Sodano & Tracey, 2011). NIRT models do not define parameters, and are less restrictive on the type of responses, and the relationship between item and latent trait (Meijer & Baneke, 2004; Sodano & Tracey, 2011). Due to the lower limitation, the target models were less confined. As such, when the data cannot fit with the assumed models in PIRT, the NIRT models can usually fit adequately (Yu, 2009).

The parametric models can work efficiently once the data is accorded with the strict assumptions. However, it is difficult to achieve all of the requirements of PIRT for a real dataset. Treating the ordinal scale as a continuous scale is common. The validation of the study is questionable, even though it has been accepted conventionally (Robie, Zickar, & Schmitt, 2001; Waller, Thompson, & Wenk, 2000).

The NIRT models theoretically assume that the data is an ordinal scale, and do not confine the form of ICC (Dyehouse, 2009; Meijer & Baneke, 2004). The assumption is similar to the real dataset. Higgins (2004) also claimed that it would be more suitable to use NIRT models when datasets are categorical or ordinal scales.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

In comparison to PIRT models, NIRT models are impacted less by strong assumptions on models. For PIRT, the functions have to satisfy independence, monotonicity of item response function, and unidimensionality of latent trait. NIRT can be used to analyze even when the data only meets some of these factors (Yu, 2009;

Junker & Sijtsma, 2001). According to the comparable flexibility, NIRT is more suitable than PIRT models in exploratory study and in the early stage of study (Meijer

& Baneke, 2004; Sodano & Tracey, 2011). NIRT models provide a more elastic and realistic context to develop methodology and analysis. It affords the possible models for data analysis except for PIRT (Junker & Sijtsma, 2001).

The required assumptions of NIRT are less than those of PIRT, as the analysis and models of NIRT is distribution-free. The form of distribution can be any type.

That is, in NIRT models, there is no assumption about the mean of population, and the normal distribution of data (Sprinthall, 1997). When the theory and data are insufficient to identify appropriate models for analysis, the features of population are unclear, assumptions for population cannot be set, the distribution of data is unknown, and therefore, the NIRT models would be a superior choice. As such, the method of NIRT models can be used in skewed data, and the scale can be either ordinal or categorical (Sprinthall, 1997; Dyehouse, 2009). The sample size is a flaw of PIRT.

The parametric models require a large sample to calibrate the function (Dyehouse,

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

2009). The short test and small sample size would result in a mis-fit of model and data(Junker &Sijtsma, 2001). While educational exams have a large scope, the study in psychological and social areas might recruit limited participants. Subsequently, the NIRT models would be a suitable tool.

The NIRT models have flaws as well. It was suggested that, when the data satisfies the assumptions of PIRT, it would lose power in analyzing by NIRT models in this condition. The method of NIRT is inferior in detecting the difference between groups (Sprinthall, 1997). As such, when the data is expected to meet the assumptions, researchers prefer to choose the methods of PIRT, to perform an analysis which as a higher power for discrepancy detecting. However, in real conditions, it is difficult to fit all of them (Dyehouse, 2009). van den Writtenboer, Hox & De Leeuw (2000) stated that scales which fit the strict assumptions of IRT-models, such as the Rasch-model, are scarce. These results could be incorrect under the insufficient premises. Therefore, the selection is based on the aims and options of the researchers.

在文檔中 Person-fit偵測作假之效用- 非參數試題反應理論的模擬與應用 - 政大學術集成 (頁 40-44)