• 沒有找到結果。

We set the number of MCMC iterations to be 250000 for each chain. Due to the criteria of convergence, we choose to burn in the first 50000 iterations to ensure convergence and the thinning is set to be 10 to avoid high correlations between successive draws. Table 5 reports the Bayes factor values obtained under different response style setting. According to Table 5, the Bayes factors obtained in the ARS case show very decisively ARS pattern whereas they are 0 for the other three types of response styles. In the ERS case, the Bayes factors show both the ERS and ARS patterns. In the DARS case, the Bayes factors also show very decisively DARS pattern whereas they are 0 for the other three types of response styles. In the MRS case, the Bayes factors also show very decisively MRS pattern in Chain 1 and 3, but not in Chain 3.

6 Real data

In 1998, the International Association for the Evaluation of Educational Achieve-ment (IEA) established an international comparative research project of Informa-tion and CommunicaInforma-tion Technology (ICT) infusion in different countries educa-tional systems which is called Second Information Technology in Education Study

Table 5: The Bayes factor under different response style settings.

(SITES). The main purpose of the project was to provide policy-makers and educa-tional practitioners with information on how ICT can to contribute to those system reforms to meet the needs of the Information Society.

In this project, IEA made a survey in 26 countries. The survey collect samples of at least 200 computer-using schools from at least one of primary, lower secondary, and upper secondary. Van Herk, Poortinga & Verhallen (2004) found out that acquiescence and extreme response style present more in the Mediterranean such as Greece,Italy and Spain, than in the Northwestern Europe such as German, France and the United Kingdom. Fortunately, both Italy and France were included in the survey, so we will test for acquiescence and extreme response style between Italy and France. There are 768 respondents from France and 425 from Italy. The original questionnaire contains a total of 24 questions and for each question there are five response categories which are strongly disagree, slightly disagree, uncertain, slightly agree and strongly agree, i.e., in the form of a five-point Likert scale. Here, we only select seven questions from the survey which are presumably measuring the single factor that to what extent one thinks ICT can improve the student’s achievement or ability. The questions are listed in Table 6.

We set France as group 1 and Italy as group 2, and all the hyperparameters of prior distributions will be the same as in (35). And the starting values for the MCMC runs are as follows:

Λ(g) =17×1, α(g) =17×1

Table 6: The seven questions chosen from the original survey on “Information and Communication Technology” from SITES.

item item description

1.

Students are more attentive when computers are used in class.

8

ICT can effectively enhance problem solving and critical thinking skills of students.

11

ICT-based learning enables students to take more responsibility for their own learning.

15

ICT improves the monitoring of student’s learning progress.

18

The achievement of students can be increased when using computers for teaching.

19

The use of e-mail increases the motivation of students.

23

Using computers in class leads to more productivity of students.

For the other two chains, we only change the setting of α(2) as :

α(g) =17×1

Since we are interested in testing for ARS and ERS, we constrain the second thresh-old (i.e., c = 2) on the first item between Italy and France according to Table 1.

That is, we set α1,2(1)= α(2)1,2.

6.1 Results

We ran each chain for 150000 iterations. After the first 50000 burn-in, we used the thinning of 10 to collect a total of 10000 posterior draws. The convergence is assured with a shrink factor values 1, 1.01, 1.01, and 1 of the four thresholds of item 1 as shown in Figure 2. Moreover, Figure 3 has the shrink factors for µ(2) for Italy, and φ(1) and φ(2) for France and Italy respectively. Based on Figure 3 with shrink factor values of 1.05, 1, and 1, their MCMC are thought to have reached convergence after 50000 iterations. Thus, we only report the following results from Chain 1. Tables 7 and 8 report the posterior means and standard deviations of all the model parameter for France and Italy. Note that we do not discuss the posteriors of λ(1)1 , λ(2)1 and µ(1) because they are all fixed at 1 to ensure identifiability without estimating.

Furthermore, we depict the posterior distributions for the model parameters.

Figure 4 contains the posterior distributions of the thresholds of the first item for France. Figure 5 gives the posterior distributions of the factor loadings for both France and Italy. Note that we do not show the factor loading of the first item because they are both set to be equal to 1 for identification.The distributions with respect to the same threshold are paired for an easy comparison. For example, the loadings of items 2 to 7 for Italy appear to be larger than those for France, while constraining the loadings of item 1 to be equal to 1 for both countries.

Figure 2: Convergence assessment using shrink factor of thresholds of item 1 for France with 1=threshold 1, 2=threshold 2, 3=threshold 3, 4=thresholds 4.

Moreover, since we set the latent factor mean of France to be equal to 0, we only show in Figure 6 the posterior distribution of the latent factor mean of Italy.

Based on Figure 6, we can see that the posterior distribution of the factor mean of Italy have mean close to 0 in comparison to its standard deviation. That is, the mean attitude towards the benefit of using ICT do not seem to differ much between France and Italy. Moreover, we also show in the Figure 7 the factor variances for both France and Italy and they appear to have very similar factor variabilities.

Figure 3: Convergence assessment using shrink factor of factor mean and variance with 2=factor mean for Italy, 3=factor variance for France, and 4=factor variance for Italy.

Finally, we compute the Bayes factors to testing for different types of response styles. Table 9 reports the Bayes factor obtained from all the three chains. The large magnitudes of Bayes factor for ARS and 0 for the rest imply that Italy has ARS in comparison to France in responding to these ICT questions.

Table 7: The posterior means and standard deviations (SD) of the thresholds pa-rameters, α.

threshold 1 thershold 2 threshold 3 threshold 4

Country Parameter mean SD mean SD mean SD mean SD

France

α1 -2.322 0.116 -1.861 0.078 -0.534 0.050 1.135 0.066 α2 -3.136 0.201 -2.115 0.110 -0.197 0.060 1.523 0.083 α3 -3.988 0.343 -2.509 0.131 -0.859 0.069 1.258 0.083 α4 -3.495 0.298 -2.145 0.108 -0.488 0.055 1.154 0.067 α5 -3.559 0.230 -2.906 0.171 -0.715 0.076 1.554 0.093 α6 -2.237 0.118 -1.702 0.082 0.027 0.051 1.282 0.067 α7 -3.125 0.187 -2.154 0.122 0.106 0.067 2.059 0.125

Italy

α1 -2.987 0.257 -1.861 0.078 -0.798 0.107 -0.283 0.113 α2 -3.137 0.312 -2.131 0.265 -1.088 0.241 -0.021 0.234 α3 -3.146 0.330 -2.134 0.297 -0.629 0.273 0.624 0.284 α4 -3.135 0.319 -2.002 0.262 -0.954 0.240 0.106 0.227 α5 -4.230 0.468 -2.914 0.388 -1.670 0.345 -0.503 0.331 α6 -2.416 0.248 -1.717 0.220 -0.580 0.200 0.259 0.194 α7 -3.785 0.425 -2.656 0.378 -1.127 0.338 -0.100 0.332

7 Discussion

In the simulations, the Bayes factor showed for data with ARS very decisively ARS pattern and no the other three types of response styles with values of 0. For data with ERS, large values of Bayes values are also obtained as support for the presence of the ARS and ERS patterns. In the DARS case, the Bayes factors also show very decisively DARS pattern whereas they are 0 for the other three types of response styles. In the MRS case, the Bayes factors show very decisively MRS pattern in Chain 1 and 3, but not in Chain 2. In analyzing the real data from SITES1998, the Bayes factor suggested that Italy showed ARS pattern compared to France.

However, it brought to our attention to suspect that the commonly used guideline for the interpretation of Bayes factor shown in Table 2 may not that appropriate for our proposed method in testing response style since those Bayes factor values we obtained from both the simulations and the analysis of real data were considerably much larger than 150, for criterion for a “Decisive” support, when response styles were present. For the example of ARS case shown in Table 5, a Bayes factor of 39038 was obtained in Chain 1 while testing for the ARS pattern.

The reason for getting such extreme values of the Bayes factor is due to the small magnitude of complexity induced by our definition of having any type of response style in the MCCFA model. More specifically, when testing for the response styles, we expect that the respondents’ answer will reflect their response style patterns on all the questions in the questionnaire. For the questionnaire with P items, the

Table 8: The posterior means and standard deviations (SD) of Λ, φ, and µ.

Country France Italy

Parameter mean SD mean SD

λ(g)2 1.667 0.237 2.373 0.45 λ(g)3 2.004 0.270 2.836 0.515 λ(g)4 1.380 0.195 2.247 0.424 λ(g)5 2.266 0.306 3.492 0.645 λ(g)6 1.009 0.155 2.023 0.379 λ(g)7 2.335 0.318 3.407 0.628 φ(g) 0.270 0.060 0.245 0.078

µ(g) 0* 0* -0.027 0.101

µ(g) = 0 for France for identification.

Table 9: The Bayes factor for testing various response styles of Italy versus France.

Chain ARS DARS ERS MRS

1 428 0 0 0

2 517 0 0 0

3 635 0 0 0

calculation of the complexity of each response style becomes some smaller-than-one constant to the power of P . Therefore, ci will decrease as the number of items P in the questionnaires increases. As ci is such a small number, the value of (1 − ci)/ci will become very large, as shown in Table 10.

Figure 4: Posterior distributions of thresholds for France.

As ci is the proportion of the prior distribution of Hu in agreement with Hi, an extremely small ci implies that it is very unlikely to observe the corresponding response style pattern. If we do not obtain from any posterior draw with the pa-rameters consistent with the response style pattern, i.e,, fi = 0, the Bayes factor will be 0. On the other hand, once we obtain even only one posterior draw with its threshold parameters satisfying the inequality constrained hypothesis Hi for the response style, the Bayes factor would become extremely large. For illustration, consider we have observed one single posterior draw, in the 20000 posterior draws, satisfying the inequality constrained hypotheses for particular response styles the resulting Bayes factors according to (25) are shown in Table 11.

Based on the results in Table 11, we see that only one single posterior draw could make such a big impact on the magnitude of the Bayes factor. Therefore, it might be necessary to check the value of fi while computing the Bayes factor values as a seemingly decisive value of the Bayes factor could also result from a small fi but large ci odds inverse, (1 − ci)/ci. Thus, we went back to examine the fi values for out simulations and analysis of the real data, the fi, as reported in Table 12.

In the ARS and ERS cases, the fi of ARS pattern were over 50% for the former, and fi’s for ARS and ERS are both over 10%, for the DARS case, the fi of DARS

Figure 5: Posterior distribution of factor loadings for both France (Group 1) and Italy (Group 2).

pattern were over 45%. Thus, we believed that the obtained decisively values of the Bayes factor in these cases were not caused simply by the incredibly large value of (1−ci)/ci, but the considerable amount of posterior draws satisfying the correspond-ing inequality constrained hypothesis. In other words, ARS response style pattern is correctly supported for data with ARS through using the Bayes factor. DARS response style pattern is also correctly supported for data with DARS through using the Bayes factor. However, both ERS and ARS patterns are supported for data with ERS. To further investigate whether this simply happens by chance, we generated a different dataset with nq = 1000 from the same setting of ERS, and found a Bayes factor of 0 for ARS and an extremely large value for ERS. More investigation is necessary to see whether the large Bayes factor value for ARS under the ERS data is simply due to chance. However, we would conclude that both ARS and ERS response style pattern could be correctly supported for data with ARS and ERS through using the Bayes factor for inequality constrained hypotheses. In MRS case, the Bayes factor of three chains show different results and the fi of MRS pattern were very close to 0.00005 in Chain 1 and 3. In other words, MRS case didn’t fit well in this study. However, it is obvious that the commonly used guideline in Table

Figure 6: Posterior distribution of factor mean µ(2) for Italy.

Table 10: The inverse of complexity’s odds, 1−cc i

i , for different numbers of questions.

number of items ARS DARS ERS MRS

7 1348 1348 48199 1721822

8 3777 3777 224932 13391962

9 10577 10577 1049688 104159712

10 29618 29618 4898551 810131101

11 82934 82934 22859910 6301020000 12 232217 232217 106679600 49007930000

2 should not be directly applied to define or interpret the strength of a Bayes factor value. To build a guideline for the interpretation of Bayes factor for hypothesis that rarely occurs under the prior distributions is much needed for future research.

In this paper, for the real data, we analyze response style by grouping respondents with their nationality because of the outcomes from the questionnaires. But the respondents with the same nationality may also have different response styles, in other words, respondents may group by their response styles. Therefore, the concept of latent group would be another direction for the future research.

As for the running time of the proposed Bayes factor method using Gibbs sam-pling, in the ARS case, Chain 1 took 55.25 hours, Chain 2 with 54.48, and Chain 3 with 52.38 hours; in the ERS case, Chain 1 took 49.65 hours, Chain 2 with 50.52, and Chain 3 with 55.21 hours; in the DARS case, Chain 1 took 48.45 hours, Chain 2 with 49.41, and Chain 3 with 48.53 hours; in MRS case, Chain 1 took 49.88 hours, Chain 2 with 50.01, and Chain 3 with 51.31 hours; in analyzing real data of Italy and France, Chain 1 took 48.64 hours, Chain 2 with 49.52, and Chain 3 with 50.20 hours. Running times of obtaining the joint posterior distributions via the Gibbs

Figure 7: Posterior distribution of φ for France and Italy.

sampling depend on how many times of iterations we set, sample size, how many items in the questionnaire and what kind of settings on starting value. Since we can only test for hypotheses after the Gibbs sampling converge, if we choose the starting value with too far away, we need more iterations to reach convergence, which there-fore cause more running time. Although the Gibbs sampling will spend longer time to obtain the Bayesian estimations rather than others estimation methods, its main benefit is that we can test for inequality constrained hypotheses on the correspond-ing parameter’s posterior distributions. To reduce the runncorrespond-ing time of the Gibbs sampling, we could try to use some starting values which were estimated by other estimation methods such as Mplus (Muth´en & Muth´en, 1998-2015), or using such values that have already been used by the previous researchers. Once the Gibbs sampling converges early, we could reduce the number of iterations necessary for obtaining posterior draws and therefore the running time could be directly reduced.

Table 11: The Bayes factor with 1 satisfying observation under different response styles for different number of items.

number of items ARS DARS ERS MRS

7 0.067 0.067 2.410 86.09

8 0.188 0.188 11.24 669.6

9 0.528 0.528 52.48 5208

10 1.481 1.481 244.9 40508

11 4.146 4.146 1143 315066

12 11.61 11.61 5334 2450519

8 Conclusion

In this study, we discussed a Bayesian approach for MCCFA model under minimal identifiability constraints for polytomous data via the Gibbs sampling, and used the Bayes factor to test for response styles with inequality constraints among the corre-sponding thresholds parameters. We conclude that the Bayes factor is effective in testing for different types of response styles but MRS case didn’t fit well in our sim-ulations. We also conclude that the commonly used guideline for the interpretation of Bayes factor values in Table 2 may not be appropriate while using our proposed method to test for response styles.

Table 12: fi in the simulations and the real data

Setting Chain ARS DARS ERS MRS

ARS

1 0.56860 0 0 0

2 0.65660 0 0 0

3 0.54865 0 0 0

DARS

1 0 0.45770 0 0

2 0 0.56776 0 0

3 0 0.49296 0 0

ERS

1 0.1607 0 0.20945 0

2 0.11955 0 0.28165 0

3 0.18065 0 0.23310 0

MRS

1 0 0 0 0.00025

2 0 0 0 0

3 0 0 0 0.00015

Real data

1 0.2411 0 0 0

2 0.2773 0 0 0

3 0.3204 0 0 0

References

[1] Arminger, G., & Muth´en, B. O. (1998). A Bayesian approach to nonlinear latent variable models using the Gibbs sampler and the Metropolis-Hastings algorithm. Psychometrika, 63, 271-300.

[2] Boone, H. N., & Boone, D. A. (2012). Analyzing Likert data. Journal of Extension, 50, 1-5.

[3] Broemeling, L. D. (1985). Bayesian analysis of linear models. New York:

Marcel Dekker.

[4] Chang, Y. W., Hsu, N.-J. & Tsai, R. (2016).Unifying differential item function-ing of categorical CFA and GRM under a discretization of a normal variant.

Manuscript submitted for publication.

[5] Chang, Y. W., Huang, W. K., & Tsai, R. C. (2015). DIF detection using multiple-group categorical CFA with minimum free baseline approach. Journal of Educational Measurement, 52, 181-199.

[6] Drasgow, F. (1984). Scrutinizing psychological tests: Measurement equivalence and equivalent relations with external variables are the central issues. Psycho-logical Bulletin, 95, 134-135.

[7] Friedman, H. H., Herskovitz, P. J., & Pollack, S. (1993). The bi-asing effects of scale-checking styles on response to a Likert scale.

In JSM Proceedings, Survey Research Methods Section. Alexandria, VA: American Statistical Association. (pp. 792-795). Retrieved from http://www.amstat.org/sections/srms/Proceedings/papers/1993 133.pdf

[8] Gleman, A., & Carlin, J. B., & Stern, H. S., & Rubin, D. B. (2004). Bayesian Data Analysis. Second Edition. Chapman & Hall/CRC.

[9] Gelman, A. & Rubin,D.B.(1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457-511.

[10] Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on pattern analysis

and machine intelligence, 6, 721-741.

[11] Hoijtink, H. (2013). Objective Bayes Factors for Inequality Constrained Hypotheses. International Statistical Review, 81, 207-229.

[12] Jeffreys, H. (1935). Some tests of significance, treated by the theory of probability. Proceedings of the Cambridge Philosophy Society, 31, 203-222.

[13] Jeffreys, H. (1961). Theory of Probability. Third Edition. Oxford, U. K.:

Oxford University Press.

[14] Kass, R. E. & Raftery, A. E.(1995). Bayes factors. Journal of American Statistical Association, 90, 773-795.

[15] Harzing, A.W. (2006). Response styles in cross-national mail survey research:

A 26-country study. The International Journal of Cross-cultural Management, 6, 243-266.

[16] Klugkist, I., & Hoijtink, H. (2007). The Bayes factor for inequality and about equality constrained models. Computational Statistics & Data Analysis, 51, 6367-6379.

[17] Lee, S.-Y. (1981). A Bayesian approach to confirmatory factor analysis.

Psychometrika, 46,153-160.

[18] Lee, S.-Y., & Zhu, H.-T. (2000). Statistical analysis of nonlinear structural equation models with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology, 53, 209-232.

[19] Lindley, D. V., & Smith, A. F. M. (1972). Bayes estimates for the linear model (with discussion). Journal of the Royal Statistical Society, Series B, 34, 1-42.

[20] Lunn, D. J., Thomas, A., Best, N., & Spiegelhalter, D. (2000). WinBUGS - a Bayesian modelling framework: Concepts, structure, and extensibility.

Statistics and Computing, 10, 325–337.

[21] Millsap, R. E., & Tein, Y. J. (2004). Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research, 39, 479-515.

[22] Plummer, M., Best, N., Cowles, K., & Vines, K. (2006). CODA: Convergence Diagnosis and Output Analysis for MCMC, R News, 6, 7-11.

[23] R Core Team. (2015). R: A language and environment for statistical computing.

R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

[24] Robert, C., & G. Casella, G. (2010). Introducing Monte Carlo Methods with R. New York: Springer.

[25] Song,X.-Y., & Lee,S.-Y. (2001).Bayesian estimation and test for factor analysis model with continuous and polytomous data in several populations. British Journal of Mathematical and Statistical Psychology, 84, 237-263.

[26] Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distribu-tions by data augmentation. Journal of the American statistical Association, 82, 528-540.

[27] van Herk, H., Poortinga, Y. H., & Verhallen T. M. M. (2004). Response styles in rating scales evidence of method bias in data from six EU countries. Journal of Cross-Cultural Psychology, 35, 346-360.

Appendix

Figure 8: Questionnaire of SITES1998 part1

Figure 9: Questionnaire of SITES1998 part2

相關文件