• 沒有找到結果。

Chapter 1 Introduction

1.1 Background

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Chapter 1 Introduction

1.1 Background

Tests are widely used, and are usually categorized into achievement tests and psychological tests. The former could be used to understand the condition of learning, and may be used as a determinant for school entrance (Yu, 2002). The latter could assist in identifying the personality and characteristics of individuals, and therefore, be used in the selection of employees (Chen, Lee, & Yen, 2004; Lai, Yu, & Hsu, 2009). The purposes of using tests are to order and filter the participants. That is, the results of the test have a benefit-based relationship to participants, such as pass or fail, and accept or reject. Therefore, participants are likely to provide answers that will benefit them. These types of answers lead to an unmatched test score and the real ability of the participant, and also create unfairness in the test. As such, to obtain the true ability or latent feature score of the participant would be a vital task for test expert. That is, we have to accurately and immediately detect faking, which could maintain the fairness of the test.

There are two main methods used to detect faking behavior in psychological tests in common. The first method is by means of the Social Desirability Scale. If participants have a higher score in the scale than a pre-determined cut-off score, then

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

the faking behavior is confirmed. The second method is by using force choice questions. This method avoids faking by arranging the types of test items (Chen et al., 2004). These methods are used to determine whether the answers represent the true ability or latent trait of participants. The concept of person-fit provides a similar function. Person-fit concerns the atypical test performance (Emons, 2008; Meiler &

Sijtsma, 2001), and is based on the level of dependency between item-score patterns and individual response patterns (LaHuis & Copeland, 2009; Meijer & Sijtsma, 2001).

Therefore, person-fit is an alternative method to detect faking.

A number of studies consider the level of person-fit as an indicator of faking.

Schmitt, Chan, Sacco, McFarland and Jennings (1999) used lz under parametric item response theory as an indicator of faking in personality tests. LaHuis and Copeland (2009) used the person-fit technique in a multilevel logistic regression approach to explore faking. Schmitt et al. (1999) indicated that the slope variance of a person’s response curve (PRC) reflected the phenomenon in which participants try to score a higher value, which leads to an inferior person-fit. Zickar and Drasgow (1996) claimed that the method of parametric item response theory to examine faking was more effective than using the Social Desirability Scale, to categorize faking and non faking participants. It can be known that using person fit as an alternative technique of detecting faking was not new. But if the efficiency can maintain cross various

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

conditions, such as non-normal distribution and small sample, is a pending issue which would be investigated in the current study.

Person-fit is a type of technique in the scope of item response theory. The parametric item response theory (PIRT) has been widely used; however, the nonparametric item response theory (NIRT) is becoming a preferred choice (Cliff &

Keats, 2003; Sijtsma & Molenaar, 2002; Stout, 1990). Mokken (1971) proposed the theory, and a set of procedures for dichotomous items, which is known as Mokken’s scale analysis. The development followed the methods of estimation, software application, and the various types of models. The models of NIRT have been developed from dichotomous to polytomous items. Applied Psychological Measurement (2001) devoted a special issue to discuss the NIRT. These provide

strong evidence to the drastic development of NIRT. The NIRT could be applied to similar areas or topics as PIRT. Junker and Sijtsma (2001) applied the technique to cognitive analysis and claimed the applicability under the fewer limitations of NIRT.

Sijtsma, Emons, Bouwmeester, Nyklicek and Roorda (2008), and Stewart, Watson, Clark, Ebmeier, and Deary (2010), found that the model of NIRT could efficiently match the data format in scale analysis. In differential item function (DIF), Glickman, Seal and Susan (2009) conducted the study by using nonparametric Baysian estimation to diagnose the DIF in the IRT model. Emons (2008) conducted a

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

simulation study to investigate the stability of person-fit detection under the scope of NIRT, and asserted the effectiveness of unusual pattern detecting. Nozawa (2008) and Xu (2004) applied the model of NIRT to the study of computer adaptive test, and compared the adaptability between PIRT and NIRT in nonequivalent group design.

As such, the development and widespread applications of NIRT are apparent. It has been from cognitive measurement to psychological test, from dichotomous to polytomous. The NIRT is highly efficient in a number of common fields, such as DIF, computer adaptive testing, and person-fit.

The use of NIRT offers a number of advantages. The nonparametric item response theory was created because the premises of the parametric item response theory do not fit in a number of conditions. Those premises, which include the distribution type of population ability, ordinal scale, but not continuous scale, and the requirement of sample size, are conventionally agreed upon. The fewer limitations and unreal assumptions in the usage of NIRT, in comparison to PIRT, may result in a better fit and more real conditions (Meijer & Baneke, 2004). Chernyshenko, Stark, Chan, Drasgow, and Williams (2001) state that, compared to the two-parameter graded response model and the three-parameter graded response model (abbreviated as 2 PL and 3PL, respectively, with PL meaning that the model is estimated by logistic function)(Osterlind & Everson, 2008), the nonparametric item response model

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

may achieve a better fit in analyzing a Sixteen Factor Personality Scale. Due to the calculation of the simple covariance structure between items and nonparametric regression in the NIRT model, it is easier to interpret and understand the results, also to use the software. The required sample size is relatively small in NIRT for conducting a confidential measurement of psychological properties (Emons, 2008).

The sample size is an issue especially for mixed-method, which is currently the common method, one hundred participants recruited for interviews or observations are a really exhausted work; however, this sample size is still small for quantitative analysis. The sample size affects the choice of analysis under the premise of a number of statistic methods, and may limit the contribution of the research. Based on the sections above, measuring the person-fit under the NIRT model is a superior developmental method. Meijer and Sijtsma (2001) conducted a comprehensive review of the measurement of person-fit, and identified a number of merits and demerits of classical test theory and item response theory in person-fit measuring. They also stated that further research is required on the model-free and robust methods of NIRT model.

The roles of PIRT and NIRT in relation to each other are compensatory rather than opposite. Point estimation may be provided by PIRT, therefore it is more applicable and widely used when conducting estimation and calculation (Yu, 2009).

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

PIRT is more applicable than NIRT when real data meets the premises of PIRT (Nozawa, 2008). However, the assumptions and premises, such as normal distribution of data, interval scale, and sufficient sample size are not easy to achieve (Dyehouse, 2009). As such, it is essential to develop a method which is easier and closer to the real situation. Various studies have investigated the relation between person-fit and faking detection under the definition of PIRT (Zicker & Drasgow, 1996), and the exploration of analytic techniques for polytomous items of NIRT (Emons, 2008).

However, there is limited research on the effectiveness of using person-fit to detect faking in NIRT.

Therefore, the study intends to investigate the detection rate between NIRT and PIRT on person fit statistics under several potential empirical issues, such as sample size and distribution of ability, to provide appropriate evidence for the usage under the less constraint model.