• 沒有找到結果。

Chapter 4 Results

4.2 Study 2: Empirical study

4.2.2 Social desirability scale and person-fit statistics

Table 8 Descriptive statistics and normal distribution test by sample size

Note.

Sample size Mean1 SD Skewness Kurtosis Shapiro test

30 0.152 0.815 0.352 2.140 5/112

1. Except for the results in a sample size of 30, the values of the mean, SD, skewness, and kurtosis are the means of 30 replications. Because of GRM model-fit limitations, the sample size of 30 was based on 11 replications.

2. In the Shapiro-test column, the numerator represents the frequency-rejected null hypothesis (the data is normal distribution), and the denominator represents the number of replications.

4.2.2 Social desirability scale and person-fit statistics

The purpose of the sub-study was to understand the possibility of using three person-fit indicators to detect faking under different sample sizes.

We first examined the social desirability scale, a 6-item edition with a 1 to 4 score for each item. A participant who shows positive answers on most extreme items (i.e., “I have never intensely disliked anyone”) is classified as faking (Zicker &

Drasgow, 1996). We chose 19 as the cutoff score for the faking criteria. A participant with a total score of 19 or higher indicates faking on these items. The classification result from the social desirability scale is used to calculate the detection rate, or the fraction of participants who are identified as faking by given person-fit statistics, and

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

We then analyzed the three person-fit statistics of the same subjects in each sampling. The critical value for Gp and U3p was determined using the bootstrap method (Emons, 2008), and the cut-off score for lz was taken as -2.

Table 9 shows the detection-rate results. When the sample size is higher than 200, U3p has a detection rate of 0.7, and its Guttman error is 0.6. When the sample size drops to 100 and less, the Gp detection rate is 0.5. However, lz does not have the same detection ability as the social desirability scale, and its detection rate is approximately 0.2.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Table 9 Detection rate of Gp, U3p, and lz by sample size

Sample size Gp U3p lz

30 0.51 0.03 0.13

100 0.52 0.46 0.16

200 0.60 0.67 0.16

500 0.58 0.65 0.17

1000 0.57 0.65 0.18

3000 0.59 0.67 0.16

0 0.2 0.4 0.6 0.8 1

30 100 200 500 1000 3000

Sample size

Detection rate

Gp U3p lz

Figure 16. Eetection rate of three person-fit statistics by sample size

Note. The x-axis represents sample size. The y-axis represents the detection rate.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

This page intentionally left blank

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Chapter 5 Discussion and Conclusion

Both a simulation and empirical data analysis were employed to investigate the possibility of using person-fit statistics as an alternative method to detect faking. In the simulation study, the manipulated factors comprising sample size, distribution of ability, the aberrant rate, and faking degree were used to observe the accuracy of person-fit statistics between parametric and nonparametric indices. Also, the empirical study was designed, a data set of responses on both the personality scale and social desirability scale, to test the possibility in a real context. According to the results, the conclusions and some suggestions are derived as follows.

5.1 Discussion on major findings 5.1.1 Sample size

In discussing the influence of sample size among person-fit statistics, St-Onge (2009) indicated that the parametric person-fit statistics have greater accuracy, even in a small sample size. The findings provide a further explanation. For a small sample size, the selection of the most appropriate index should base on the ability distribution and the percentage of faking items in the scale. When the sample sizes are 30 or 100, under the four distributions set in this study, in partial-item faking and slight faking,

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

is 100, the two types of person-fit statistics have similar efficiency in normal distribution and platykurtic distribution. The Guttman error index also rises when the sample size is more than 500. However, when all items are faked, lz would be adopted as a candidate.

The results show no large differences in detection rates across sample sizes under a certain person-fit statistic. However, this conclusion is temporary because of the computational constraint when the sample size is too small to provide sufficient computation information. This limitation triggers the issue of the accuracy on statistics calculation. In the original program setting for lz computation, each option of item has to be selected at least once. The requirement reveals the importance of using items with normal characteristics and implies the need for certain methods to replace the blank cell (Reise & Yu, 1990). Replacing the blank cell by using a certain technique affects trueness of data-set and accuracy on results, but this would not be problematic for the Guttman error index because the computation object is a summed score. In the calculation of Guttman errors, the frequency of each option is not the critical point. In conclusion, although using parametric person-fit statistics might yield a higher detection rate in some situations, certain prerequisites must be satisfied.

However, for the Guttman error index, the nonparametric indicator, all computation information is original.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

5.1.2 Faking degree

We considered the faking-magnitude influence as a factor to understand the detection rate. Emons (2009) indicated that the lz detection rate would increase by the aberrance degree. The findings of this study corroborate his partial observations. The varied theta weights in the Emons (2009) study were 1 and 1.5, and the detection rate rose when the amount increased. The amounts of varied theta in our study were 1, 2, and 3, with an increasing trend when the theta rose from 1 to 2; however, the direction decreased slightly when the amount was from 2 to 3 for all three person-fit statistics.

The result of nonlinear relations between the detection rate and varied ability shows the restrictions on the usage of person-fit statistics on faking detection.

The discussion about the suitable indices by different degree of faking also emerged. Under the condition of slight faking, the Guttman error index is more suitable, and lz has a lower detection rate than the nonparametric indicators.

Comparatively, in the condition of medium faking, lz is an appropriate index in the four distributions. When all items are faked and the faking magnitude is severe, U3p is suitable for negatively skewed distribution and platykurtic distribution, whereas lz is appropriate for normal distribution and positively skewed distribution.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

5.1.3 Aberrant rate

The aberrant rate is influential when discussing the detection rates among three person-fit indices. St-Onge et al. (2011) investigated the detection rate of four person-fit indices, lz, ECI2z, HT, and U3, on cheating with dichotomous data. The lz

value rose with an increasing aberrant rate. The detection-rate peak appeared when the aberrant rate was 0.25, and the U3 peak appeared when the aberrant rate was between 0.35 and 0.45. While the present study found no significant detection rate difference between the 33% and 66% aberrant rates, but the decreasing trend is obvious when the aberrant rate is equal to 1. The results suggest that researchers and applicators should consider the possible aberrant rate when choosing the person-fit index as a tool. In the St-Onge et al. study (2011), when the test length was 40 and the aberrant rate was 33%, the detection rate of lz was from 0.19 to 0.70. For U3, it was from 0.23 to 0.75. Compared with the results from previous study, the present study shows a higher U3p and lz detection rate under similar conditions. With the same sample size and aberrant rate, the lz detection rate is between 0.5 and 0.9, and U3 detection is between 0.2 and 0.9. The possible reasons could be the different number of options. A greater number of options could lead to a higher detection rate (Emons, 2008). Emons (2008) simulated the polytomous response in another study. When the aberrant rate was from 0.5 to 1, all person-fit indices, lz, U3p, and Gp, increased.

However, he did not simulate atypical faking behavior. Therefore, the three person-fit

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

statistics in this study could possibly be performed well in other atypical behaviors.

Also, based on the Emons (2009) results, lz has a different detection rate in different atypical behaviors such as careless response to reverse-worded items, a tendency to choose extreme response options, and the statement by Meijer (2003).

5.1.4 The superior indicator

There are several factors would affect three person-fit indices, which implies that suitable person-fit indices would differ by conditions. The Guttman-error detection rate is almost zero when all items are faked. The lz detection rate is higher than the nonparametric indices, but only in the normal distribution, which does not exist in the other three distributions. The U3p index is less affected by the sample size and faking degree. When the aberrant rate is equal to one, the U3p detection rate is better than the Gp nonparametric index. However, U3p may only be appropriate in a normal distribution because the detection rate is only 0.5 in the other three distributions. The conclusions are in line with Emons (2009). The best performance of the two person-fit indices, and the sum-score based approach, is in different situations. The detection rate of sum-score patterns is higher in a local aberrant response, such as carelessly responding to reverse-worded items; is good at a global aberrant response, such as the tendency to choose extreme response options.

p

lz

p

lz

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

5.1.5 The methods of data simulation

The data-simulation method is an issue should be considered when comparing the detection rate between parametric and nonparametric person-fit indices. St-Onge et al. (2009) indicated that the data generation method could affect the accuracy of person-fit statistics. Emons et al. (2002) also claimed that the choice of logistic IRFs would limit generalization of the results, particularly for the nonparametric situation and index. In the present study, the data-simulation method is one contribution of this study. We simulate the polytomous data set based on monotone non-decreasing IRFs.

Before this, there is only dichotomous response pattern was been simulated under the assumption of monotone non-decreasing IRFs.

As to investigate the effectiveness and to understand the influence of the new method used in this study, the detection rate was compared to previous study in similar conditions. Under the parametric approach, Emons (2009) utilized the GRM model to simulate the atypical response under normal distribution. In his study, the varied theta (θ) were 1σθ and 1.5σθ, the affected person was 10%, and the aberrant rate was 33%. The lz value in the Emons (2009) study was 0.43. Similarly, in the study by St-Onge (2011), the lz detection rate was from 0.2 to 0.7. Compared the results to the similar conditions of present study with the only difference being the response-simulation method, the value of lz is 0.55. The consistent result provides an evidence of the effectiveness on the data-generated method in this study.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

In other comparison, the U3 detection rate in St-Onge (2011) study was from 0.23 to 0.75, which has an apparent difference from the results of this study, which the value of U3p is 0.9. The discrepancy may cause from the different index and the percentage of faking people. St-Onge’s study involved lower affected persons and a dichotomous response. We can conclude that the simulated nonparametric method may raise the nonparametric-index detection rate, but it does not suppress the parametric-index efficiency.

5.2 Discussion on Empirical Study

Except for the simulation study, we did empirical data analysis to investigate the possibility of taking person-fit as a tool to detect faking in a practical context. Our results indicate that the data distribution sampled from a given data set by several sample sizes, from small to large, does not distribute normally. According to the results of descriptive analysis, most of them would be classified as leptokurtic distributions. The findings show that the parametric-index assumptions of normal ability distribution are not easily satisfied. We also represent the consistency between the results from three person-fit indices and the social desirability scale, and the nonparametric indices are more similar to the results from social desirability, which the detection rate achieves 0.7. In the further analysis, we could know that when the

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

sample size is less than 100, the Gp and U3p detection rates are halved, while when the sample size is more than 200, the Gp detection rate was 0.6, and that of U3p is 0.7.

However, the lz detection rate is relatively low in empirical study.

The findings from empirical data arise the some points need to be discussed. The discussion would more focus on the inconsistency between simulation study and empirical study. That is, taking the social desirability scale as a criterion, the detection rate of the three person-fit statistics is not high. However, the detection rate in our simulation study outperformed the results in empirical data analysis. This may caused by two possible reasons at least. First, although the social desirability scale is taken as a mean to detect faking in a scale in some study, the validity is still under discussion due to some flaws of the tool(Lai, 2010). For example, the participant may determine the items of the social desirability scale, and result in the questionable validity (Zickar

& Drasgow, 1996). Future studies could consider adopting data from an experimental study instead of a survey investigation as the criterion to be a potential solution.

Secondly, the high detection rate in our simulation study supports the indirect feasibility of taking person-fit statistics as a tool to detect faking. Compared to the simulation detection rate, the low consistency may be due to the non-normal data distribution. The low detection rate of lz shows that the data set might be slightly

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

faking, and that only certain items are faked according to the trend derived from the

simulation study results.

5.3 Suggested steps for application

The manipulated variables in the study comprised of sample size, aberrant rate, faking degree, and the ability distribution. The present research simulated condition to investigate the suitable person-fit index in certain situation. How to choose a better person-fit index based on the findings when a researcher faces a set of collected data is one of the contributions of the research. According to the results and factors included in the study, the study suggested some thinking points and the steps for application in real context.

Since not all of the simulated variables in this study could be apparently known in the stage of the data collection, such as aberrant rate and faking degree, the study suggested the following viewpoints to do some prejudgments before start.

A. the prejudgments of aberrant rate: is it possible that the faking people fake in all of the test items? Normally, the possibility of faking on part test would be higher than all of the test. That is, in the practical situation, the person-fit index which is superior on faking on part test could be adopted frequently.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

B. the prejudgments of faking degree: Since the faking degree is not easy to have when we have a set of data, the extent of test stake and the purpose could be take as a criteria. When the test result is related to individual’s benefits, such as recruitment and selection, then the faking degree would be increasing. Otherwise, we could assume the degree of faking might be lower.

C. the prejudgments of ability distribution: there is a possible distribution for each measured trait on the target group. As the sample size is large, the distribution could be assumed as normal distribution. However, the trait distribution would vary by different target group. Researchers could refer to past studies on the distribution information, and to choose a suitable person-fit index.

Accordingly, the following systematic judgments are proposed for applicators to select a proper person-fit index which fit their certain condition. It is divided into two conditions to suggest.

A. Where part of items are faked

(a) When the trait is normal, positively-skewed, and negatively-skewed distribution, no matter in what size of sample, for low stake test, the indices of Guttman errors and U3p is suggested to be the proper person-fit index. Comparatively, for high stake test situation, the person-fit index of lz would be the better choice.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

(b) When the trait is platykurtic distribution, Guttman error and U3p would be suggested in low stake test. In high stake test, the size of sample have to be considered. If participant is more than 500, the indices of Guttman errors and U3p could be adopted, while when participants are less than 500, then lz would be the better selection.

B. Where all of items are faked

(a) When the trait distributes normally, lz is suggested.

(b) When the trait is positively-skewed distribution, lz is suggested except for less 200 participants. In the case of small sample size, nonparametric indices are suggested, and Guttman error is superior than U3p.

(c) When the trait distributions are negative skew and platykurtic, in the condition of high stake test, the index of U3p is recommended. On the contrary, the index lz is suitable for low and medium test condition.

5.4 Limitations of Research

The influence of ability distribution is one of the interested variables in this study.

However, item characteristics should be considered because they may affect the distributions of responses as well. In data simulation of this study, certain theta

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

depending on the different item characteristics. Observing the varied response pattern, we realized that although the theta is generated from the setting distribution, the response-matrix distribution might not exactly the same with the original setting distribution. One of the reasons is that the response matrix is also affected by item difficulty. Theoretically, all response patterns are the representative to the participants’ ability. But once the item difficulty is medium, or the difficulty distribution is normal distribution, but the item is difficult, participants with good ability would still select the lower option, and this might lead to the questionable representativeness of response. Unfortunately, difficulty is not an interested variable in this study, and is therefore set as a normal distribution.

The social desirability scale was taken as a criterion in our empirical study. This means that the scale validity was confirmed first. However, whether the social desirability scale is faked requires further research to investigate. The wide use of the social desirability scale in most research confirms its basic validity. The cut-off score of the social desirability scale is currently under discussion. The method in our study is one of the ways to classify participants, and thus, the results should be used limitedly and carefully.

The program for deriving lz and U3p was adopted from Emons (2008), who set each option to be chosen at least once to successfully execute the program. The

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

computation information for the three person-fit statistics adopted in this study differs.

Each U3p option should be chosen at least once; otherwise, “Nan” shows up. The lz

must provide item-step difficulty and ability value. When item-step difficulty is

must provide item-step difficulty and ability value. When item-step difficulty is