Q矩陣錯誤設定在G-DINA模型下對參數估計和辨識率之影響

全文

(1)國立臺灣師範大學數學系碩士班碩士論文. 指導教授 : 蔡蓉青博士. The Effects of Q-matrix Misspecification on Parameter Estimates and Classification Accuracy of the Generalized DINA Model. 研究生 : 郭柏甫. 中華民國一百零二年七月.

(2) 誌謝一年多來的研究生活即將畫下句點，在論文即將付梓之際，心中充滿喜悅與感動。雖然在撰寫論文期間輾轉經歷許多困難，但在良師益友的鼓勵協助下總算完成。感謝我的指導教授蔡蓉青老師在論文撰寫過程中用心的指導與鼓勵，並在這一年多的求學過程中不斷認真的點出我不足之處。在學術以外，老師也時常關心我的生活狀況並給予協助。老師不論是做學問的態度和為人處世的風範都讓我受益良多，我期望自己如在未來開啟正式教師的生涯，也能像蓉青老師協助我一樣啟發我的學生。另外也要十分感謝兩位口委郭伯臣老師和蔡碧紋老師，其中郭伯臣老師在專業研究領域上提供許多寶貴意見，使得我的論文內容更加的充實。而蔡碧紋老師更是一年多來引導我學習使用統計軟體 R 的恩師，在這一年多的訓練下我對寫程式從一開始感到恐懼到現在可以利用 R 來解決許多問題，雖然過程艱苦但再回首感到非常開心。感謝育緯學長在一次次 meeting 中提供的寶貴建議及鼓勵，還有和我一起努力的戰友辰諭在論文許多方面的支援與協助，以及克謙和淑貞的互相砥礪。另外特別感謝好友瑋翔提供我許多技術上的支援，使我學會許多撰寫程式的好方法。很幸運在求學的路上有你們，讓我可以克服一次次的困難終至完成。感謝樹林高中許多學長姊在這學期的照顧，還有 218 的可愛學生們寫的卡片和鼓勵，讓我對自己更有期待和充滿動力面對挑戰。感謝從碩一以來一起支持文林愛樂管樂團的學弟妹和我度過一次次的周六團練和幾次演出，使我在求學與做論文遇到瓶頸時能用音樂紓解壓力再出發，還有感謝這學期新買的寶貝長號，陪我度過許多個挑燈夜戰的日子(當然我有裝消音器)。感謝爸爸媽媽對我念研究所的支持，還有妹妹，很抱歉妳要指考了我竟然沒有很多時間教妳，好在妳最後數學也考得還不錯，不幸中的大幸，哈哈。畢業在即，人生即將脫離學生的身分到外頭開創自己的未來，除了興奮外還有幾分不捨，在服完兵役後即將全力踏入正式教師甄試的戰場，再次感謝在碩士求學過程中遇到的所有人，與我一起完成這趟旅程。.

(3) 摘要. 本篇研究探討 Q 矩陣的過度設定與不足設定對 G-DINA 模型參數估計和分類辨識率造成的影響，並使用平均絕對誤差(MAD)及個別概念辨識率、認知組型辨識率做為評估指標。研究結果發現 Q 矩陣不足設定對模型參數估計、辨識率及一些特定認知組型的正確答題機率造成影響，反之 Q 矩陣過度設定在各方面皆影響不大。此外一些因子如樣本數及認知組型的分佈在 Q 矩陣不足設定時也會造成影響。. 關鍵字: Q 矩陣錯誤設定、G-DINA 模型、辨識率.

(4) The Effects of Q-matrix Misspecification on Parameter Estimates and Classification Accuracy of the Generalized DINA Model Po-Fu Kuo July 8, 2013. Advisor: Rung-Ching Tsai Mathematics, National Taiwan Normal University, Taipei, Taiwan. 1.

(5) Abstract This study investigates the influence of different types of Q-matrix misspecification on parameter estimates and classification accuracy of the G-DINA model. In particular, underspecification and overspecification are the two types of Qmatrix misspecification under consideration. Furthermore, mean absolute deviation and classification accuracy index are used as the indices for parameter estimates and classification accuracy, respectively. Our results show that underspecification has a great impact on item parameter estimates, as well as on the probability of answering an item correctly for some latent mastery patterns. In contrast, overspecification has little impact on parameter estimates. Classification accuracy is also influenced by underspecification, with interactions with sample sizes as well as the distribution of underlying cognitive attribute patterns. Keywords: Q-matrix misspecification, G-DINA model, classification accuracy. 2.

(6) Contents 1 Introduction. 6. 2 Method 2.1 The G-DINA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Evaluation Indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 9 9 10 12. 3 Simulation 3.1 Data generation . . . . . . . . . . . . . . . . 3.1.1 The Q-matrix . . . . . . . . . . . . . 3.1.2 Parameter Values . . . . . . . . . . . 3.2 Manipulated factors . . . . . . . . . . . . . . 3.2.1 Sample size . . . . . . . . . . . . . . 3.2.2 Distribution of the attribute patterns 3.3 Condition settings . . . . . . . . . . . . . . . 3.3.1 Underspecification . . . . . . . . . . 3.3.2 Overspecification . . . . . . . . . . .. . . . . . . . . .. 15 15 15 16 18 18 18 21 21 22. . . . .. 23 23 36 36 38. 4 Results 4.1 Effect 4.2 Effect 4.2.1 4.2.2. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. on parameter estimates . . . . . . . . . . . . . . . . . on classification accuracy . . . . . . . . . . . . . . . . Effect on the attribute-specific classification accuracy Effect on the overall classification accuracy . . . . . .. . . . . . . . . .. . . . .. . . . . . . . . .. . . . .. . . . . . . . . .. . . . .. . . . . . . . . .. . . . .. . . . . . . . . .. . . . .. 5 Summary and Conclusion. 41. 6 Reference. 42. 3.

(7) List of Tables 1 2 3 4 5 6. 7. 8. 9 10. 11. 12. 13 14. The Q-matrix for data generation . . . . . . . . . . . . . . . . . . . . . Parameter values of the G-DINA model . . . . . . . . . . . . . . . . . . Underspecification on the Q-matrix . . . . . . . . . . . . . . . . . . . . Overspecification on the Q-matrix . . . . . . . . . . . . . . . . . . . . . The MADs of δˆj for Q-matrix underspecification, discrete uniform distribution of attribute patterns, and sample size of 1000 . . . . . . . . . The MADs of δˆj for Q-matrix underspecification, underlying multivariate normal distribution with ρ = 0.3 of attribute patterns, and sample size of 1000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The MADs of δˆj for Q-matrix underspecification, underlying multivariate normal distribution with ρ = 0.8 of attribute patterns, and sample size of 1000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The MADs of δˆj for Q-matrix underspecification, underlying higherorder multivariate normal distribution of attribute patterns, and sample size of 1000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The EBs of Pˆj(m) with and without Q-matrix underspecification, discrete uniform distribution of attribute patterns, and sample size of 1000 . . . The EBs of Pˆj(m) with and without Q-matrix underspecification, underlying multivariate normal distribution(ρ = 0.3) of attribute patterns, and sample size of 1000 . . . . . . . . . . . . . . . . . . . . . . . . . . . The EBs of Pˆj(m) with and without Q-matrix underspecification, underlying multivariate normal distribution(ρ = 0.8) of attribute patterns, and sample size of 1000 . . . . . . . . . . . . . . . . . . . . . . . . . . . The EBs of Pˆj(m) with and without Q-matrix underspecification, underlying higher-order distribution of attribute patterns, and sample size of 1000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The overall classification accuracy for Q-matrix underspecification with different distributions of attribute patterns and each sample size . . . . The overall classification accuracy for Q-matrix overspecification with different distributions of attribute patterns and each sample size . . . .. 4. 16 17 21 22 27. 28. 29. 30 32. 33. 34. 35 40 40.

(8) List of Figures 1 2. 3. 4. 5. Distributions of the cognitive attribute patterns . . . . . . . . . . . . . The interaction plots of the type of Q-matrix misspecification and sample size on the mean MAD values of the parameter estimates of those misspecified items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The interaction plots of the type of Q-matrix misspecification and the underlying distribution of attribute patterns on the mean MAD values of the parameter estimates of those misspecified items . . . . . . . . . . The attribute-specific classification accuracy with respect to the type of Q-matrix misspecification and the underlying distributions of attribute patterns with sample size of 1000 . . . . . . . . . . . . . . . . . . . . . The interaction plots of the type of Q-matrix misspecification and sample size and the underlying distribution of attribute patterns on the arcsin transformed overall classification accuracy . . . . . . . . . . . . .. 5. 20. 25. 26. 37. 39.

(9) 1. Introduction. In recent years, many cognitive diagnostic models (CDMs) have been proposed to help classroom instruction and assess student learning. These CDMs included the Rule Space Model(RSM; Tatsuoka, 1983), the Noisy Inputs, Deterministic And Gate Model(NIDA; Maris, 1999), the Deterministic Inputs, Noisy And Gate Model(DINA; de la Torre, 2009; Junker and Sijtsma, 2001), the Deterministic Input, Noisy Or Gate Model (DINO; Templin & Henson, 2006), the Log-linear Cognitive Diagnostic Model (LCDM; Henson, Templin, & Willse, 2009), the general diagnostic model (GDM; von Davier, 2005), and the Generalized DINA model (G-DINA; de la Torre, 2011). Different from the conventional item response theory models (IRMs), CDMs can provide more information on the practical diagnosis of students’ specific strengths and weakness, hence instructors can understand the skills and knowledge each individual student has mastered or acquired. With such a feedback, effective remedial instructions could be designed for each student. Skills have been generically referred to as attributes in cognitive diagnostic models. To clearly portray the relation between each test item and the relevant attributes in a test, the use of the CDMs requires the specification of a Q-matrix as a comparison table (Tatsuoka, 1983). Each element of this Q-matrix contains either a “1” or a “0” indicating respectively whether answering an item requires mastery of a particular attribute or not. In short, the Q-matrix reflects the design of the assessment test and therefore plays an important role in providing proper diagnosis. If the Q-matrix is correct, it can be used to provide useful diagnosis of each student’s mastery or non-mastery status on each attribute. However, the Q-matrix are commonly specified by some experts in that particular field of interest. Therefore, the Q-matrix specification might not be identical from different experts, and any difference in the Qmatrix may result in different feedback for the students. To reduce such an uncertainty, methods for choosing the “valid” Q-matrix have been proposed. de la Torre (2008) proposed an empirically based method for validating a Q matrix. DeCarlo (2012) suggested a bayesian approach on deciding some elements of the Q-matrix because they contain greater uncertainty due to the existence of disagreement among the experts’ specifications. Both of these two methods were proposed and illustrated for the DINA. 6.

(10) model. In studying the impact of Q-matrix misspecification on parameter estimates and classification accuracy under different cognitive diagnostic models, Corter and Im (2011) conducted attribute misspecification on the RSM model. Rupp and Templin (2008) also used the DINA model to investigate the impact of Q-matrix misspecification on parameter estimates and classification accuracy under the misspecification conditions of overspecification and underspecification as well as conditions with incorrect logical dependencies between the attributes, They found that the guessing parameter was typically overestimated and the estimate of the slipping parameter was typically accurate when in case of the Q-matrix underspecification. In contrast, the slipping parameter was typically overestimated whereas the guessing parameter estimate was usually accurate when in case of Q-matrix overspecification in the DINA model. DINA model is a restricted latent class model of which respondents’ are classified as belonging to two cognitive groups, either those who have mastered all the attribute required for that item, or those who are still lacking at least one of the required attributes for that item. The respondents with their attribute vectors in the same group are assumed to have the same probability of answering the item correctly. Such an assumption may not always hold for the latter group in reality because respondents with attribute vectors in the second group have varying degrees of deficiency with respect to the required attributes, and therefore their probabilities of success may not be identical. Due to the plausibility challenge of the above assumption, there has been a trend toward specifying and estimating cognitive diagnostic models within some general modeling families. The most commonly discussed families are the LCDM, the GDM, and the G-DINA models. Under the LCDM modeling framework, Choi, Templin, Cohen, and Atwood (2010) firstly looked at the impact of the Q-matrix misspecification on the interaction parameters, and found that underspecification of the Q-matrix resulted in a huge impact on both the recovery of item parameters and the resulting respondent classifications, whereas overspecification of the Q-matrix had only little impact on parameter estimates and classification recovery. Kunina-Habenich, Rupp, and Wilhelm (2012) manipulated the level of factors and added some other types of model misspecification to investigate parameter recovery, classification accuracy under the same model framework. Their results showed that while the misspecification of the interaction ef7.

(11) fects had little impact on classification accuracy, some Q-matrix misspecification led to notable decrease in classification accuracy. To date, the influence of Q-matrix misspecification for the Generalized DINA model has not been investigated. Although the modeling framework between the G-DINA and the LCDM models are similar, there is a small but essential difference in the meaning of parameter interpretations. The G-DINA model describes the additive effects of attribute mastery on the probability of success, whereas the LCDM describes the multiplicative impact of attribute mastery on the probability of success. This study operates a simulation design to investigate the impact of Q-matrix misspecification on parameter estimates and classification accuracy of the G-DINA model, the two types of Q-matrix misspecification under consideration are underspecification and overspecification. In addition to assessing the estimates of the original parameters in the model, we also pay attention to the probability of answering the item correctly of some latent cognitive groups which may be expected to be most influenced by different types of the Q-matrix misspecification. For classification accuracy, we report the classification rates for comparison. Furthermore, because the sample size and the distribution of cognitive patterns have been found as important factors in the estimation the G-DINA model (de la Torre, 2011), we also investigate how these two factors interacted the Q-matrix misspecification on parameter estimates and classification accuracy the in the simulation studies. In the next section, we briefly present the G-DINA model and introduce the evaluation indices for parameter estimates and classification accuracy used in this study. The remaining of the thesis focuses on the design and the results of the simulation studies to investigate the impact of the Q-matrix misspecification as well as of other relevant factors. At last, we summarize our results with some general concluding remarks.. 8.

(12) 2. Method. In this section we first describe the G-DINA model and its parameter estimation. Next, we describe the evaluation indices for the effects of Q-matrix misspecification on parameter estimates and classification accuracy.. 2.1. The G-DINA Model. de la Torre (2011) recently proposed a generalization of the DINA model, namely the generalized DINA (G-DINA) model. Just like many other cognitive diagnosis models, G-DINA model requires the input of a pre-specified J × K Q-matrix, where J is the number of items and K is the number of attributes associated with the items in the test. The Q-matrix links each item to its associated attributes such that the (j, k)th element of the matrix satisfying qjk = 1 if solving item j is related to the mastery of attribute k, and qjk = 0 otherwise for j = 1, . . . , J and k = 1, . . . , K. In the Q-matrix, each row indicates the attributes related to that particular item, and each column characterizes which items are associated with that particular attribute. Compared to the DINA model, the G-DINA model relaxes the DINA model’s assumption of equal probability of getting a correct answer for those attribute mastery patterns lacking at least one of the required attributes for that item. To understand how the G-DINA model partition the latent group probability, denote that αl be the lth attribute pattern with K dimensions, l = 1, 2, · · · 2K , αlk is the kth element of the vector αl . And qjk is the kth element of qj as mentioned above. Both αl and qj are binary vectors with elements “0” or “1”. In the G-DINA model, the information in αl relevant to item j is in the vector consisting of those elements αlk ’s satisfying qjk = 1 for k = 1, . . . , K. In other words, for each item j attribute pattern αl can be reduced to the vector α∗lj , which contains only the elements that are associated with the required attributes for item j and is of P length Kj∗ = K k=1 qjk . Accordingly, the number of latent attribute groups for item j ∗ are reduced from 2K to 2Kj in the G-DINA model. For example, suppose αl = (010) and qj = (101). According to the first and the third required attributes for qj , we get α∗lj of the vector (00), and all the reduced latent groups are (00), (10), (01), and (11) for item j. However, the first three latent groups of (00), (10), and (01) are considered 9.

(13) belonging to the same group in the DINA model, which highlights the difference in their latent groups between the DINA and the G-DINA model. For brevity, we denote the probability that a student with reduced pattern α∗lj will answer item j correctly as P (Xj = 1 | α∗lj ) = P (α∗lj ). Using these notations, the G-DINA model is formulated as: K∗. P (α∗lj ) = δj,0 +. j X. k=1. K∗. δj,k αlk +. K ∗ −1. j j X X. K∗. δj,kk0 αlk αlk0 + · · · + δj,12...Kj∗. 0. k =k+1 k=1. j Y. αlk .. (1). k=1 ∗. For the explanation of these parameters in the G-DINA model, there are 2Kj parameters for item j. δj,0 is the intercept for item j, which represents the probability of correctly answering item j when none of the required attributes is present for the respondent. δj,k is the main effect due to the kth attribute of item j, which is the change in the probability of correctly answering item j due to mastering a single attribute k, δj,kk0 is the interaction effect on the probability of correctly answering the item j due 0 to the mastery of both the kth and the k th attributes, and δj,12···Kj∗ is the interaction effect on the probability of a correct response associate with the mastery of all the required attributes for item j.. 2.2. Estimation. To obtain the estimates of the G-DINA model, de la Torre (2011) used the marginalized maximum likelihood estimation (MMLE) approach which we briefly review below. In the G-DINA model, the probability of the response vector Xi for student i with the attribute pattern αl can be formulated as P (Xi | αl ) =. J Y. P (α∗lj )Xij [1 − P (α∗lj )]1−Xij .. j=1. The likelihood of the model given the response data X = (X1 , . . . , XI ) is defined accordingly, and the log-marginalized likelihood can be written as K. l(X) = log[L(X)] = log. I X 2 Y i=1 l=1. 10. P (Xi | αl )p(αl ),. (2).

(14) where p(αl ) is the prior probability of αl . Taking the derivative of l(X) with respect to P (α∗lj ), we could obtain the marginal maximum likelihood estimate of P (α∗lj ) such that Pˆ (α∗lj ) =. Rα∗lj Iα∗lj. (3). P where Iα∗lj = Ii=1 p(α∗lj | Xi ) represents the number of students expected to be in the P latent group α∗lj , Rα∗lj = Ii=1 p(α∗lj | Xi )Xij is the expected number of respondents in those who have answered item j correctly to be in the reduced cognitive pattern α∗lj , and p(α∗lj | Xi ) stands for the posterior probability of student i belonging to the latent group α∗lj and its value can be calculated from p(αl | Xi ). The second derivative of the log-marginalized likelihood with respect to p(α∗lj ) and p(α∗l0 j ) is equal to I. X ∂L(Xi ) ∂L(Xi ) ∂ 2 l(X) = − [L−2 (Xi ) ] ∗ ∗ ∂p(αlj )∂p(αl0 j ) ∂p(α∗lj ) ∂p(α∗l0 j ) i=1 =. I X i=1. {p(α∗lj. Xij − P (α∗l0 j ) Xij − P (α∗lj ) ∗ | Xi ) }{p(αl0 j | Xi ) } P (α∗lj )[1 − P (α∗lj )] P (α∗l0 j )[1 − P (α∗l0 j )]. By using Pˆ (α∗lj ) and the observed X, we can obtain the approximate information matrix for the all Pˆ (α∗lj )’s. Denote that Pj = {P (α∗lj )}, we have I(Pˆj ), the resultant information matrix for item j, and the squared-root of the lth diagonal component of I −1 (Pˆj ) is an estimate of the standard error of Pˆ (α∗lj ). In the G-DINA model, Pj stands for the probability vector of correctly answering ∗ item j of all P (α∗jk ) and therefore it contains 2Kj probabilities for item j. Let δj = (δj,0 , δj,1 , . . . , δj,Kj∗ , δj,12 , . . . , δj,12···Kj∗ ) be the parameter vector for item j, there exists a one-to-one correspondence between Pj and δj using the linear transformation with some matrix representation Mj . ∗ To specify the matrix Mj , we denote A∗j = {α∗lk } as the 2Kj × Kj∗ matrix of all the possible combinations of the required attributes for item j. The elements of Mj correspond to those αlk terms in (1) for the G-DINA model such that the first column ∗ consists of elements of ones, followed by Kj∗ columns of elements of αlk (l = 1, . . . , 2Kj , 11.

(15) k = 1, . . . , Kj∗ ), next by (Kj∗ − 1)Kj∗ columns of elements αlk αlk0 , the product of αlk and αlk0 for all l and k as well, and so on so forth up to the last column of elements QKj∗ ∗ Kj∗ × 2Kj . For instance, for an k=1 αlk . As a result, the matrix Mj is of dimension 2 item j with Kj∗ =3, then we will have       A∗j =     . 0 1 0 0 1 1 0 1. 0 0 1 0 1 0 1 1. 0 0 0 1 0 1 1 1. . .            , and Mj =         . 1 1 1 1 1 1 1 1. 0 1 0 0 1 1 0 1. 0 0 1 0 1 0 1 1. 0 0 0 1 0 1 1 1. 0 0 0 0 1 0 0 1. 0 0 0 0 0 1 0 1. 0 0 0 0 0 0 1 1. 0 0 0 0 0 0 0 1.       .    . Because there exists a one-to-one correspondence defined by Mj , we can obtain estimates of δj from δˆj = Mj−1 Pˆj . Moreover, the standard errors of δˆj are obtained as the square roots of the diagonal elements of Mj−1 Cov(Pˆj )(Mj−1 )t .. 2.3. Evaluation Indices. For parameter estimates, we examine their estimation accuracy using the mean absolute deviation (MAD) and the empirical bias (EB) in a simulation study. More specifically, we use PN MADδj(m) =. n=1. (n). (n). |δˆj(m) − δj(m) | N. ˆ (n) n=1 (Pj(m). PN and EBPj(m) =. (n). − Pj(m) ). N. ,. (n) where δˆj(m) denotes the estimate of the mth element of δj = (δj,0 , δj,1 , . . . , δj,12...Kj∗ ), (n) denoted as δj(m) , obtained at replication n. Notations of Pj(m) and Pˆ are similarly j(m). defined. For every single parameter of each item, the MAD index computes its mean absolute deviation of estimates from the N replications. This MADδj(m) index has been used to evaluate estimation performance of the DINA model (Rupp & Templin, 2008; de la Torre, 2010). We adopt the same criterion to evaluate the estimation performance 12.

(16) that the parameters with a value of MAD greater than 0.1 and a value of EB outside the range of -0.1 to 0.1 are considered relatively poorly estimated. For classification accuracy, the posterior probability of belonging to the latent attribute group αl given the response vector Xi can be obtained from p(αl | Xi ) =. p(Xi | αl )p(αl ) p(Xi | αl )p(αl ) . = P2K p(Xi ) p(Xi | αl )p(αl ) l=1. Let A = {αl } be the 2K × K matrix of all the possible combinations of all the required attributes in a test, and P(α|Xi ) is the 2K × 1 vector consisting of all the posterior probabilities p(αl | Xi ) for l = 1, . . . , 2K . Define Ti = At P(α|Xi ) . Consequently, Ti = (Ti1 , . . . , TiK )t is the K ×1 vector indicating the posterior probability of mastering each individual attribute by respondent i. To determine the mastery status of respondent i on each attribute, indicator variables are defined for such a classification, that is, we define for k = 1, . . . , K  1 if T ≥ γ ik Iik = 0 if Tik < γ, where γ is the pre-specified threshold for mastery status. The default value of γ in the codes written in Ox by de la Torre (2011) is 0.5. That is, once the posterior probability of mastering an attribute exceeds 0.5 for respondent i, he or she is considered to have mastered this particular attribute. Based on this calculation, respondent i will be classified into the attribute pattern Ii = (Ii1 , . . . , IiK )t . In the simulation study, data on both the attribute patterns αli and test responses Xi of each respondent i are generated and known, so we can then use such information to calculate the empirical classification accuracy. Two different kinds of classification accuracy are examined in this study. Firstly, to understand the impact of Q-matrix misspecification on the classification accuracy of each attribute, we consider the attribute-specific classification accuracy. Moreover, we also investigate the overall 13.

(17) classification accuracy to capture the correct classification rate of all the respondents into their true attribute patterns. The attribute-specific classification accuracy, denoted as Pask , are defined for each dataset as follows: PI i=1 I(Iik =αli k ) Pask = , I where I represents the number of respondents in the dataset, I(Iik =αli k ) is an indicator variable for whether Iik = αli k , that is, the classified mastery status on attribute k is the same as the kth element of true attribute pattern αli of respondent i. In other words, respondent i is correctly identified as mastering attribute k or not. Moreover, to summarize the attribute-specific classification accuracy over the N replications, we use N (n) X Pask , Pak = N n=1 (n). where Pask is the kth attribute classification accuracy for the dataset in replication n. For the overall classification accuracy, we define the index for each dataset as: PI Pca =. i=1 I(Ii =αli ). I. ,. where I(Ii =αli ) is an indicator variable for whether Ii = αli , that is, the classified attribute pattern is in fact the true attribute pattern αli of respondent i. Similarly, to summarize the overall classification accuracy over the N replications, we use Pc =. N (n) X Pca n=1. (n). N. ,. where Pca is the overall classification accuracy for the dataset in replication n.. 14.

(18) 3. Simulation. In this section, we give details on the simulation studies conducted to investigate the effects of Q-matrix misspecification on parameter estimates and classification accuracy of the G-DINA model. We firstly describe the characteristics of the Q-matrix and the parameter values used for data generation. Two types of condition settings for the Qmatrix misspecification were considered. For each condition, three levels of sample size and four different distributions of the respondents’s underlying cognitive patterns were manipulated. Consequently, we have a total of 12 combinations for each condition. For each combination, 100 replications were run. In fitting the simulated data with the G-DINA model to obtain parameter estimates, we used the R codes written by Hong (2013) which was translated from the Ox codes originally written by de la Torre (2011).. 3.1 3.1.1. Data generation The Q-matrix. It is known that both the number of attributes and the length of the test in the Q-matrix have an impact on item parameter estimates (Wang, 2010). Many studies suggested that the greater the number of attributes in Q-matrix, the longer the length of a test is needed to provide a sufficient number of items that can provide reliable information. Choi, Templin, Cohen, and Atwood (2010) considered a Q-matrix with 40 items and four attributes in their study. Kunina-Habenicht, Rupp, and Wilhelm (2012) used 25 and 50 items together with three and five attributes in their simulation design. Both studies use no more than five attributes to investigate the effects of Q-matrix misspecification under the log linear modeling framework. In our study, we chose a Q-matrix with 30 items and 5 attributes which is the same as the Q-matrix used in the simulation studies for G-DINA model by de la Torre (2011). The Q-matrix is reported in Table 1. The choice of the Q-matrix specifies that the test is composed of three types of items, namely the one-attribute, two-attribute, and three-attribute items, with ten items of each type.. 15.

(19) Table 1: The Q-matrix for data generation. item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 3.1.2. A1 1 0 0 0 0 1 0 0 0 0 1 1 1 1 0. A2 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1. A3 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1. A4 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0. A5 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0. item 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30. A1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0. A2 1 1 0 0 0 1 1 1 0 0 0 1 1 1 0. A3 0 0 1 1 0 1 0 0 1 1 0 1 1 0 1. A4 1 0 1 0 1 0 1 0 1 0 1 1 0 1 1. A5 0 1 0 1 1 0 0 1 0 1 1 0 1 1 1. Parameter Values. The parameter values used to generate the datasets are presented in Table 2.. 16.

(20) Table 2: Parameter values of the G-DINA model. item. δ0. δ1∗. 1 2 3 4 5 6 7 8 9 10 item. 0.094 0.084 0.102 0.097 0.142 0.091 0.111 0.115 0.105 0.104 δ0. 0.800 0.807 0.819 0.809 0.766 0.807 0.783 0.777 0.791 0.787 δ1∗. 11 12 13 14 15 16 17 18 19 20 item. 0.092 0.081 0.110 0.076 0.098 0.103 0.079 0.091 0.072 0.093 δ0. qj -. -. δ2∗. ∗ δ12. 0.024 0.003 0.043 0.023 0.033 0.006 0.032 0.042 0.064 0.020 0.039 0.015 0.016 −0.006 0.002 0.003 0.020 0.019 0.015 0.005 δ1∗ δ2∗. 0.797 0.777 0.737 0.779 0.703 0.744 0.787 0.855 0.821 0.767 δ3∗. -. -. -. -. (10000) (01000) (00100) (00010) (00001) (10000) (01000) (00100) (00010) (00001) qj. -. -. -. -. ∗ δ12. ∗ δ13. ∗ δ23. ∗ δ123. (11000) (10100) (10010) (10001) (01100) (01010) (01001) (00110 (00101) (00011) qj. 21 0.105 0.034 0.035 −0.023 −0.102 0.001 −0.013 0.861 (11100) 22 0.098 −0.028 0.012 0.001 0.034 0.057 0.018 0.663 (11010) 23 0.125 −0.026 −0.006 −0.033 −0.004 0.012 0.060 0.814 (11001) 24 0.179 −0.074 −0.089 −0.118 0.096 0.161 0.099 0.640 (10110) 25 0.033 0.099 0.086 0.081 −0.117 −0.125 −0.128 0.989 (10101) 26 0.097 −0.037 −0.039 0.002 −0.091 0.030 0.020 0.774 (10011) 27 0.075 0.023 0.008 0.005 −0.035 −0.003 0.028 0.791 (01110) 28 0.148 −0.040 −0.029 −0.077 0.011 0.093 0.060 0.701 (01101) 29 0.042 0.041 0.069 0.070 −0.061 −0.053 −0.112 0.861 (01011) 30 0.158 −0.078 −0.109 −0.061 0.169 0.091 0.121 0.551 (00111) δ0 : The intercept parameter; δk∗ : The main effect parameter of the kth attribute of that item; 0 ∗ δkk 0 : The interaction parameter of the kth and k th attributes of that item.. 17.

(21) 3.2 3.2.1. Manipulated factors Sample size. de la Torre, Hong, and Deng (2010) and DeCarlo (2011) have found that a sample size of 1,000 is sufficient for the DINA model in providing accurate parameter estimates. Choi et al. (2010) found that although intercepts and main effects appeared to be estimated consistently in samples with at least 500 respondents, higher sample sizes were required to reliably estimate of the two-way or higher interactions in the log linear model. Their study also suggested that with sample sizes of 200 or more, relative fit indexes were able to point to the true generating model. Based on these findings, to find out the effects of Q-matrix misspecification on parameter estimates and classification accuracy in the G-DINA model, we used three levels of sample sizes: 200, 500, and 1000, each respectively representing the cases of a small, a medium, and a large sample size. 3.2.2. Distribution of the attribute patterns. As shown in equation (2) that the distribution of cognitive patterns p(αl ) is employed in obtaining the log-likelihood of the model. However, the distribution of cognitive patterns is mostly unknown while analyzing test data in practice. Thus, we further investigate whether the effects of Q-matrix misspecification will differ for populations with different p(αl ). To be specific, four different distributions of cognitive patterns were used and described below. We first consider a discrete uniform distribution for p(αl ). That is, we assume that there are equal proportions of people with each attribute pattern in the population. Because the Q-matrix contains five attributes, the probability of randomly choosing 1 . any attribute pattern αl is equal to p(αl ) = 215 = 32 For the other three cases, we consider that the distributions of cognitive patterns result from discretizing underlying distributions of some continuum associated to the mastery of the relevant attributes. In particular, in the second and third cases the underlying distributions of the continuum are assumed to be multivariate normal with the same mean vector but different correlations coefficients. When the continuum value of a respondent exceeds the threshold of 0, the respondent is said to master the 18.

(22) corresponding attribute (Kunina-Habenich, Rupp, & Wilhelm, 2012). Let Y ∼ MVN(µ, Σ) be a 5 × 1 vector representing the mastery level of each attribute, and the mean vector and covariance matrix used to generate the underlying continuum areas follows:    −0.50 1 0.3 0.3 0.3 0.3  −0.25   0.3 1 0.3 0.3 0.3      0  , and Σ =  0.3 0.3 1 0.3 0.3  . case (1): µ =   0.25   0.3 0.3 0.3 1 0.3  0.50 0.3 0.3 0.3 0.3 1     −0.50 1 0.8 0.8 0.8 0.8  −0.25   0.8 1 0.8 0.8 0.8      0  , and Σ =  0.8 0.8 1 0.8 0.8  . case (2): µ =   0.25   0.8 0.8 0.8 1 0.8  0.50 0.8 0.8 0.8 0.8 1 Taking from the setting of Kunina, Rupp, and Wilhelm (2012), the above specification of µ led to the marginal attribute mastery proportions of approximately 70, 60, 50, 40, and 30 percent. In other words, the specification of µ can also be regarded as representing the marginal difficulty of the attributes. Two covariance matrices with different strength of correlations between the latent continuum variables are used. The correlations used here are ρ = 0.3 and ρ = 0.8 (Cui, Gierl, & Chang, 2012). At last we consider the case with a higher-order structure for the distribution of the latent continuum. This concept can be traced back to the study of de la Torre and Douglas (2004). More specifically, the latent variable θ standing for the student’s ability or mastery level was randomly generated from a standard normal distribution, and the relationship between the latent continuum and the ability is presented as follow: Yi = β0 + β1 θi + εi ,    where θ ∼ N (0, 1), and β0 =  . −0.50 −0.25 0 0.25 0.50. . .     , β =  1   . 19. 0.9 0.8 0.7 0.6 0.5.    , and εi follows normal with .

(23) 1 − (0.9)2 0 0 0 0 2 0 1 − (0.8) 0 0 0   0 0 1 − (0.7)2 0 0 mean 0 and Cov(εi ) =   2 0 0 0 1 − (0.6) 0 0 0 0 0 1 − (0.5)2 Consequently, Y ∼ MVN(µ, Σ), with µ = β0 , and Σ = β1 β1t + (I − Diag(β1 β1t )). The following figures present the resulting probability mass functions of the tribute patterns under the four different underlying distribution specifications: .    . . at-. (a) Discrete Uniform distribution. (b) Multivariate normal distribution(ρ = 0.3). (c) Multivariate normal distribution(ρ = 0.8). (d) Higher-order distribution. Figure 1: Distributions of the cognitive attribute patterns. 20.

(24) 3.3. Condition settings. The misspecification of the Q-matrix we investigate in this study are categorized into two conditions. The first condition of misspecification is excluding an essential attribute that is needed for solving the test item, and the second condition of misspecification is including an attribute that may not be necessary to solve the test item. They are respectively referred to as the underspecification and over specification conditions. 3.3.1. Underspecification. In the underspecification setting, we consider the situation that although both attributes 4 and 5 are associated with the mastery of an item, attribute 5 is mistakenly excluded whenever attribute 4 is present. As a result, the misspecified items in our Q-matrix are item 20, 26, 29, and 30. Among these items, item 20 measures two attributes and the other three measure three attributes based on the true Q-matrix. Table 3: Underspecification on the Q-matrix. item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15. A1 1 0 0 0 0 1 0 0 0 0 1 1 1 1 0. A2 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1. A3 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1. A4 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0. A5 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0. item 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30. 21. A1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0. A2 1 1 0 0 0 1 1 1 0 0 0 1 1 1 0. A3 0 0 1 1 0 1 0 0 1 1 0 1 1 0 1. A4 1 0 1 0 1 0 1 0 1 0 1 1 0 1 1. A5 0 1 0 1 1→0 0 0 1 0 1 1→0 0 1 1→0 1→0.

(25) 3.3.2. Overspecification. In the overspecification setting, we consider the situation that although only attribute 4, not 5, is associated with the mastery of an item, attribute 5 is mistakenly included whenever attribute 4 is present. As a result, the misspecified items in our Q-matrix are item 4, 9, 13, 16, 18, 22, and 27. In the true Q-matrix, items 4 and 9 measure one attribute (only attribute 4), items 13 and 16 measure two attributes, and items 18 and 22 measure three attributes. Table 4: Overspecification on the Q-matrix. item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15. A1 1 0 0 0 0 1 0 0 0 0 1 1 1 1 0. A2 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1. A3 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1. A4 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0. A5 0 0 0 0→1 1 0 0 0 0→1 1 0 0 0→1 1 0. item 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30. A1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0. A2 1 1 0 0 0 1 1 1 0 0 0 1 1 1 0. A3 0 0 1 1 0 1 0 0 1 1 0 1 1 0 1. A4 1 0 1 0 1 0 1 0 1 0 1 1 0 1 1. A5 0→1 1 0→1 1 1 0 0→1 1 0 1 1 0→1 1 1 1. It is important to point out that although these two different conditions led to different number of times some items being assessed in the test, the average number of items measured by an attribute and the average number of attributes measured by an item both remained approximately the same in the misspecified Q-matrix compared to the true Q-matrix. This is called a balanced design, which is expected to offer more robust results across various simulation conditions. Balanced design have been considered in researches relating to Q-matrix misspecification (Rupp & Templin, 2008; Kunina-Habenicht, Rupp, & Wilhelm, 2012). Furthermore, enough one-attribute item(more than three) is crucial for maintaining the stability of parameter estimates and classification accuracy, this is also considered in our condition setting. 22.

(26) 4. Results. In this section, we firstly report the overall impact of Q-matrix misspecification on parameter estimates. Since the results for different sample sizes are similar, we report and interpret the results for sample size of 1000 under the four distributions of attribute patterns. ˆ j . Furthermore, Tables 5 to 8 report the results for δˆ j and Tables 9 to 12 are for P to understand how the manipulated factors influence the effect of Q-matrix misspecification, we examine the interaction effect of the type of misspecification and other factors on parameter estimates. More specifically, two-way interaction plots of the type of misspecification and sample size, and the type of misspecification and distribution of the attribute patterns are depicted in Figures 2 and 3, respectively. To better understand how the type of Q-matrix misspecification interacts with the manipulated factors on attribute-specific classification accuracy, we depict Figure 4 to show the attribute-specific classification accuracy with respect to different types of Qmatrix misspecification for four distributions of cognitive attribute patterns. Again, we only report the cases with sample size of 1000 because the similar results are obtained for cases with different sample sizes. Tables 13 and 14 present the results for the overall classification accuracy, and we also give the interaction plot between the type of misspecification and the manipulated factors on the overall classification accuracy, as shown in Figure 5.. 4.1. Effect on parameter estimates. Firstly, we focus on the effect of misspecification on the quality of item parameter estimates. Tables 5 to 8 present the MAD’s of each δˆj in the G-DINA model within the underspecification condition with sample size of 1000. A great impact of underspecified Q-matrix on parameter recovery or estimation bias are shown for those misspecified items (items 20, 26, 29, and 30). For these items, MAD values of some parameter estimates related to the fourth attribute are greatly magnified. For instance, as shown in Table 5 for simulated data with sample size of 1000 and the distribution of attribute patterns is discrete uniform, the MAD values for δˆ20,4 of item 20 with q20 = (00011) → (00010) is 0.409, for δˆ26,14 of item 26 with q26 = (10011) → (10010) is 0.401, for δˆ29,24 23.

(27) of item 29 with q29 = (01011) → (01010) is 0.445, and δˆ30,34 of item 30 with q30 = (00111) → (00110) is 0.299. The common characteristic of these items is that all of these magnified estimates are for the highest interaction parameter for the misspecified item. On the other hand, there are no MAD values greater than 0.1 for those estimates of non-misspecified item parameter. In contrast, recovery of item parameter estimates shows little impact on the item parameter estimates under the overspecification condition. The MAD values under this condition are all smaller than 0.1, even for those misspecified items. In other words, parameter estimates appear to be unaffected by overspecification of the Q-matrix. To investigate the possible interaction effects between the type of Q-matrix misspecification and other factors on parameter estimates, we first look at the interaction between the type of misspecification and sample size. Figure 2 shows an interaction plot for the type of Q-matrix misspecification and sample size on the mean MAD values of the parameter estimates of those misspecified items. More specifically, the mean MAD value of items 20, 26, 29, and 30 are computed for the comparison between the underspecification and true Q-matrix specification conditions. For the effect of overspecification, the mean MAD value of items 4, 9, 13, 16, 18, 22, and 27 are computed for comparison. We find that the mean MAD values become smaller as the sample size increases within the underspecification condition, whereas those of the three levels of sample size under the overspecification condition are nearly the same. That is, the factor of sample size only has an impact on the effect of Q-matrix underspecification. In fact, similar interaction patterns are also present for the other three distributions of attribute patterns and therefore they are omitted for brevity. Secondly, Figure 3 presents the interaction plot for the type of Q-matrix misspecification and the distribution of cognitive patterns on the mean MAD values of the parameter estimates of those misspecified items. The results show that the MAD values are the largest for higher-order distribution, followed by multivariate normal distribution with ρ = 0.3 and ρ = 0.8, and lastly the uniform distribution. In other words, the distribution of discrete uniform for the attribute patterns results in the best parameter recovery. This phenomenon remains the same for three levels of sample size. By conducting an three-factor analysis of variance for type of misspecification, sample size and distribution of cognitive patterns, no significant three-way interaction effect is 24.

(28) Figure 2: The interaction plots of the type of Q-matrix misspecification and sample size on the mean MAD values of the parameter estimates of those misspecified items. found (p = 0.88). We also look at Tables 5 to 8 to investigate whether MADs differ for items with different numbers of attribute and its parameter type. The results show that the MAD values are the highest for those items measuring three attributes, and the lowest for those measuring only one attribute. Moreover, the MADs of the three types of item parameters i.e. the intercept, the main effect, and the two-way and the three-way interaction parameters are also examined. We can see that the MADs are much greater for the estimates of higher-order interaction parameters of the misspecified items 26, 29 and 30. Similarly, the MADs for the main effects of item 20 are higher than expected with the underspecification that its true number of measuring attributes of two are misspecified to be only one.. 25.

(29) Figure 3: The interaction plots of the type of Q-matrix misspecification and the underlying distribution of attribute patterns on the mean MAD values of the parameter estimates of those misspecified items. 26.

(30) Table 5: The MADs of δˆj for Q-matrix underspecification, discrete uniform distribution of attribute patterns, and sample size of 1000. item. δ0. δ1∗. 1 2 3 4 5 6 7 8 9 10 item. 0.001 0.001 0.001 0.007 0.004 0.001 0.001 0.001 0.005 0.003 δ0. 0.001 0.002 0.002 0.006 0.005 0.002 0.002 0.001 0.004 0.005 δ1∗. δ2∗. ∗ δ12. -. -. -. -. (10000) (01000) (00100) (00010) (00001) (10000) (01000) (00100) (00010) (00001) qj. 11 12 13 14 15 16 17 18 19 20 item. 0.001 0.001 0.002 0.002 0.001 0.002 0.002 0.002 0.002 0.010 δ0. 0.002 0.002 0.006 0.004 0.002 0.006 0.005 0.005 0.005 0.409 δ1∗. 0.002 0.002 0.003 0.005 0.001 0.004 0.004 0.004 0.004 δ2∗. 0.003 0.003 0.007 0.007 0.002 0.008 0.007 0.008 0.008 δ3∗. ∗ δ12. ∗ δ13. ∗ δ23. -. (11000) (10100) (10010) (10001) (01100) (01010) (01001) (00110 (00101) (00011) qj. qj. ∗ δ123. 21 0.002 0.004 0.003 0.003 0.006 0.005 0.005 0.007 (11100) 22 0.002 0.002 0.003 0.005 0.006 0.006 0.006 0.010 (11010) 23 0.005 0.006 0.008 0.009 0.012 0.011 0.014 0.016 (11001) 24 0.003 0.004 0.004 0.006 0.006 0.007 0.007 0.009 (10110) 25 0.005 0.006 0.006 0.009 0.009 0.010 0.012 0.015 (10101) 26 0.018 0.023 0.026 0.401 (10011) 27 0.003 0.004 0.003 0.005 0.005 0.008 0.007 0.010 (01110) 28 0.005 0.007 0.007 0.009 0.011 0.012 0.010 0.016 (01101) 29 0.031 0.030 0.061 0.445 (01011) 30 0.034 0.049 0.061 0.299 (00111) δ0 : The intercept parameter; δk∗ : The main effect parameter of the kth attribute of that item; 0 ∗ δkk 0 : The interaction parameter of the kth and k th attributes of that item.. 27.

(31) Table 6: The MADs of δˆj for Q-matrix underspecification, underlying multivariate normal distribution with ρ = 0.3 of attribute patterns, and sample size of 1000. item. δ0. δ1∗. 1 2 3 4 5 6 7 8 9 10 item. 0.001 0.002 0.002 0.029 0.008 0.001 0.027 0.002 0.027 0.008 δ0. 0.001 0.002 0.002 0.023 0.009 0.002 0.019 0.002 0.019 0.009 δ1∗. δ2∗. ∗ δ12. -. -. -. -. (10000) (01000) (00100) (00010) (00001) (10000) (01000) (00100) (00010) (00001) qj. 11 12 13 14 15 16 17 18 19 20 item. 0.001 0.001 0.002 0.003 0.001 0.003 0.003 0.003 0.004 0.016 δ0. 0.003 0.003 0.025 0.013 0.004 0.020 0.010 0.025 0.010 0.517 δ1∗. 0.001 0.001 0.004 0.005 0.002 0.004 0.004 0.006 0.006 δ2∗. 0.004 0.004 0.025 0.015 0.004 0.021 0.011 0.029 0.011 δ3∗. ∗ δ12. ∗ δ13. ∗ δ23. -. (11000) (10100) (10010) (10001) (01100) (01010) (01001) (00110 (00101) (00011) qj. qj. ∗ δ123. 21 0.002 0.005 0.004 0.002 0.008 0.007 0.005 0.010 (11100) 22 0.003 0.007 0.005 0.006 0.018 0.011 0.010 0.023 (11010) 23 0.004 0.012 0.011 0.006 0.023 0.014 0.014 0.025 (11001) 24 0.003 0.004 0.004 0.006 0.006 0.007 0.007 0.009 (10110) 25 0.005 0.008 0.005 0.005 0.012 0.013 0.007 0.016 (10101) 26 0.018 0.046 0.039 0.577 (10011) 27 0.003 0.008 0.006 0.005 0.012 0.012 0.009 0.016 (01110) 28 0.005 0.015 0.009 0.008 0.022 0.018 0.016 0.027 (01101) 29 0.039 0.048 0.074 0.524 (01011) 30 0.040 0.060 0.083 0.595 (00111) δ0 : The intercept parameter; δk∗ : The main effect parameter of the kth attribute of that item; 0 ∗ δkk 0 : The interaction parameter of the kth and k th attributes of that item.. 28.

(32) Table 7: The MADs of δˆj for Q-matrix underspecification, underlying multivariate normal distribution with ρ = 0.8 of attribute patterns, and sample size of 1000. item. δ0. δ1∗. 1 2 3 4 5 6 7 8 9 10 item. 0.001 0.001 0.001 0.007 0.004 0.001 0.001 0.001 0.005 0.003 δ0. 0.001 0.002 0.002 0.006 0.005 0.002 0.002 0.001 0.004 0.005 δ1∗. δ2∗. ∗ δ12. -. -. -. -. (10000) (01000) (00100) (00010) (00001) (10000) (01000) (00100) (00010) (00001) qj. 11 12 13 14 15 16 17 18 19 20 item. 0.001 0.001 0.002 0.002 0.001 0.002 0.002 0.002 0.002 0.010 δ0. 0.002 0.002 0.006 0.004 0.002 0.006 0.005 0.005 0.005 0.562 δ1∗. 0.002 0.002 0.003 0.005 0.001 0.004 0.004 0.004 0.004 δ2∗. 0.003 0.003 0.007 0.007 0.002 0.008 0.007 0.008 0.008 δ3∗. ∗ δ12. ∗ δ13. ∗ δ23. -. (11000) (10100) (10010) (10001) (01100) (01010) (01001) (00110 (00101) (00011) qj. qj. ∗ δ123. 21 0.002 0.004 0.003 0.003 0.006 0.005 0.005 0.007 (11100) 22 0.002 0.002 0.003 0.005 0.006 0.006 0.006 0.010 (11010) 23 0.005 0.006 0.008 0.009 0.012 0.011 0.014 0.016 (11001) 24 0.003 0.004 0.004 0.006 0.006 0.007 0.007 0.009 (10110) 25 0.005 0.006 0.006 0.009 0.009 0.010 0.012 0.015 (10101) 26 0.018 0.023 0.026 0.593 (10011) 27 0.003 0.004 0.003 0.005 0.005 0.008 0.007 0.010 (01110) 28 0.005 0.007 0.007 0.009 0.011 0.012 0.010 0.016 (01101) 29 0.031 0.030 0.061 0.547 (01011) 30 0.034 0.049 0.061 0.563 (00111) δ0 : The intercept parameter; δk∗ : The main effect parameter of the kth attribute of that item; 0 ∗ δkk 0 : The interaction parameter of the kth and k th attributes of that item.. 29.

(33) Table 8: The MADs of δˆj for Q-matrix underspecification, underlying higher-order multivariate normal distribution of attribute patterns, and sample size of 1000. item. δ0. δ1∗. 1 2 3 4 5 6 7 8 9 10 item. 0.001 0.002 0.001 0.031 0.011 0.001 0.001 0.002 0.029 0.009 δ0. 0.001 0.002 0.002 0.026 0.011 0.001 0.002 0.002 0.022 0.010 δ1∗. δ2∗. ∗ δ12. -. -. -. -. (10000) (01000) (00100) (00010) (00001) (10000) (01000) (00100) (00010) (00001) qj. 11 12 13 14 15 16 17 18 19 20 item. 0.001 0.001 0.002 0.027 0.001 0.003 0.025 0.026 0.004 0.017 δ0. 0.004 0.007 0.020 0.012 0.005 0.021 0.013 0.032 0.014 0.649 δ1∗. 0.003 0.002 0.005 0.004 0.003 0.005 0.004 0.007 0.006 δ2∗. 0.006 0.007 0.020 0.013 0.006 0.023 0.014 0.037 0.015 δ3∗. ∗ δ12. ∗ δ13. ∗ δ23. -. (11000) (10100) (10010) (10001) (01100) (01010) (01001) (00110 (00101) (00011) qj. qj. ∗ δ123. 21 0.001 0.020 0.007 0.003 0.028 0.023 0.009 0.030 (11100) 22 0.002 0.038 0.008 0.004 0.053 0.042 0.012 0.055 (11010) 23 0.003 0.035 0.021 0.005 0.048 0.036 0.024 0.050 (11001) 24 0.004 0.029 0.004 0.005 0.038 0.033 0.007 0.044 (10110) 25 0.003 0.056 0.011 0.005 0.066 0.058 0.011 0.068 (10101) 26 0.015 0.021 0.030 0.747 (10011) 27 0.002 0.023 0.007 0.005 0.030 0.028 0.011 0.037 (01110) 28 0.005 0.030 0.011 0.007 0.031 0.033 0.014 0.036 (01101) 29 0.039 0.100 0.071 0.787 (01011) 30 0.036 0.064 0.075 0.723 (00111) δ0 : The intercept parameter; δk∗ : The main effect parameter of the kth attribute of that item; 0 ∗ δkk 0 : The interaction parameter of the kth and k th attributes of that item.. 30.

(34) Since the parameter estimates δˆj can be one-to-one transformed to or from the probability for correctly answering an item, namely Pˆj , we also analyze Pˆj to understand how the probabilities of correctly answering an item are influenced by the two types of Q-matrix misspecification for respondents with the attribute patterns related to the misspecified item. Firstly, under the Q-matrix underspecification, the discrete uniform distribution of attribute patterns, and a sample size of 1000, the empirical bias of the probability of correctly answering for respondents with attribute pattern (00010), denoted by EBP20 (00010) , was 0.371 for item 20, EBP26 (10010) was 0.380 for item 26, EBP29 (01010) was 0.384 for item 29, and EBP30 (00110) was 0.339 for item 30. On the other hand, we also found that Q-matrix underspecification caused considerable decrease in the probabilities of answering correctly for those attribute patterns mastering both the fourth and the fifth attributes, such as the empirical bias for P20(00011) for item 20 was -0.401, EBP26 (10011) for item 26 was -0.446, EBP29 (01011) for item 29 was -0.381, and EBP30 (00111) for item 30 was -0.363. In Tables 9 to 12, we list all the affected EBPj(m) values in accordance with the corresponding attribute pattern due to Q-matrix underspecification. The symbol EB0 represents the empirical bias of parameter estimates under the true Q-matrix averaged over 100 replications, and the symbol EBq represents the empirical bias of parameter estimates under Q-matrix misspecification averaged over 100 replications. When Q-matrix overspecification occurs, the EBs for all attribute patterns among all the items ranged from -0.1 to 0.1. That is, overspecification does not show apparent impact on the probability of answering correctly for each attribute pattern. This is consistent with the results of MADδj ’s.. 31.

(35) Table 9: The EBs of Pˆj(m) with and without Q-matrix underspecification, discrete uniform distribution of attribute patterns, and sample size of 1000. item (misspecification) 20 (00011)→ (00010). sample size Attribute pattern of EBPj(m) EB0 EBq. 26 (10011)→ (10010). EB0 EBq. 29 (00011)→ (00010). EB0 EBq. 30 (00011)→ (00010). EB0 EBq. 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000. 00010 0.074 -0.003 -0.038 0.346 0.386 0.371 10010 -0.079 -0.041 0.021 0.291 0.353 0.380 01010 0.068 -0.047 0.033 0.326 0.320 0.384 00110 0.034 0.035 -0.006 0.311 0.343 0.339. 00011 -0.074 -0.035 0.009 -0.426 -0.386 -0.401 10011 0.010 -0.015 -0.018 -0.535 -0.473 -0.446 01011 -0.039 -0.034 -0.008 -0.440 -0.446 -0.381 00111 -0.065 -0.026 -0.020 -0.391 -0.359 -0.363. Number of replication=100; EB0 : Empirical bias of estimates under the correct Q-matrix; EBq : Empirical bias of estimates under Q-matrix misspecification.. 32.

(36) Table 10: The EBs of Pˆj(m) with and without Q-matrix underspecification, underlying multivariate normal distribution(ρ = 0.3) of attribute patterns, and sample size of 1000. item (misspecification) 20 (00011)→ (00010). sample size Attribute pattern of EBPj(m) EB0 EBq. 26 (10011)→ (10010). EB0 EBq. 29 (00011)→ (00010). EB0 EBq. 30 (00011)→ (00010). EB0 EBq. 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000. 00010 -0.032 -0.014 -0.034 0.531 0.508 0.505 10010 0.273 -0.110 0.095 0.594 0.585 0.514 01010 0.024 -0.021 0.149 0.564 0.548 0.530 00110 0.046 0.025 -0.017 0.520 0.512 0.510. 00011 -0.014 0.009 0.018 -0.141 -0.164 -0.117 10011 0.033 -0.017 0.002 -0.187 -0.124 -0.072 01011 -0.001 -0.074 0.015 -0.103 -0.108 -0.072 00111 -0.038 -0.017 -0.015 -0.190 -0.192 -0.174. Number of replication=100; EB0 : Empirical bias of estimates under the correct Q-matrix; EBq : Empirical bias of estimates under Q-matrix misspecification.. 33.

(37) Table 11: The EBs of Pˆj(m) with and without Q-matrix underspecification, underlying multivariate normal distribution(ρ = 0.8) of attribute patterns, and sample size of 1000. item (misspecification) 20 (00011)→ (00010). sample size Attribute pattern of EBPj(m) EB0 EBq. 26 (10011)→ (10010). EB0 EBq. 29 (00011)→ (00010). EB0 EBq. 30 (00011)→ (00010). EB0 EBq. 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000. 00010 0.074 -0.003 -0.038 0.586 0.577 0.561 10010 -0.079 -0.041 0.021 0.591 0.553 0.540 01010 0.068 -0.047 0.033 0.526 0.520 0.514 00110 0.034 0.035 -0.006 0.571 0.543 0.539. 00011 -0.074 -0.035 0.009 -0.426 -0.401 -0.381 10011 0.010 -0.015 -0.018 -0.535 -0.473 -0.446 01011 -0.039 -0.034 -0.008 -0.440 -0.446 -0.381 00111 -0.065 -0.026 -0.020 -0.391 -0.359 -0.363. Number of replication=100; EB0 : Empirical bias of estimates under the correct Q-matrix; EBq : Empirical bias of estimates under Q-matrix misspecification.. 34.

(38) Table 12: The EBs of Pˆj(m) with and without Q-matrix underspecification, underlying higher-order distribution of attribute patterns, and sample size of 1000. item (misspecification) 20 (00011)→ (00010). sample size Attribute pattern of EBPj(m) EB0 EBq. 26 (10011)→ (10010). EB0 EBq. 29 (00011)→ (00010). EB0 EBq. 30 (00011)→ (00010). EB0 EBq. 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000 200 500 1000. 00010 0.064 -0.045 -0.027 0.665 0.630 0.481 10010 -0.089 -0.052 -0.005 0.758 0.728 0.708 01010 0.119 0.087 0.044 0.667 0.666 0.651 00110 0.014 -0.138 -0.055 0.608 0.594 0.553. 00011 -0.057 -0.028 0.043 -0.169 -0.159 -0.126 10011 0.034 0.006 0.009 -0.197 -0.298 -0.272 01011 0.055 0.001 -0.014 -0.235 -0.210 -0.274 00111 -0.008 0.023 -0.012 -0.248 -0.208 -0.204. Number of replication=100; EB0 : Empirical bias of estimates under the correct Q-matrix; EBq : Empirical bias of estimates under Q-matrix misspecification.. 35.

(39) 4.2 4.2.1. Effect on classification accuracy Effect on the attribute-specific classification accuracy. Figure 4 presents the attribute-specific classification accuracy with respect to different types of Q-matrix misspecification for four different underlying distributions of cognitive attribute patterns with sample size of 1000. Our results show that in the case of underspecification, the attribute-specific classification accuracy for the fourth attribute, Pa4 , and that of the fifth attribute, Pa5 , are always smaller than those in the true Q-matrix specification (original) case. In particular, for Pa4 and Pa5 in underspecification, the multivariate normal distribution with ρ = 0.8 results in the highest attribute-specific classification accuracy, followed by the discrete uniform distribution and the multivariate normal distribution with ρ = 0.3. Higher-order distribution results in a relatively poor attribute-specific classification accuracy for the fourth and fifth attributes. In contrast, in the case of overspecification, attribute-specific classification accuracy indices are close to the original case for each of the attributes.. (a) Discrete Uniform distribution. (b) Multivariate normal distribution(ρ = 0.3). 36.

(40) (c) Multivariate normal distribution(ρ = 0.8). (d) Higher-order distribution. Figure 4: The attribute-specific classification accuracy with respect to the type of Q-matrix misspecification and the underlying distributions of attribute patterns with sample size of 1000. 37.

(41) 4.2.2. Effect on the overall classification accuracy. For the effect of Q-matrix misspecification on the overall classification accuracy, we give two interaction plots to observe how factors interact with the effect of Q-matrix misspecification. Figure 5-(a) presents the interaction between the type of misspecification and sample size on the mean overall classification accuracy averaged over 100 replications. We find that an increase in the overall classification accuracy accompanies with a larger sample size for both underspecification and overspecification, and there exists nearly no difference between the overall classification accuracy based on the true and the overspecified Q-matrix. Again the trend is similar among the four distributions of cognitive patterns, so there is no need to draw the three-way interaction plot. Figure 5-(b) presents the interaction between type of misspecification and the distribution of attribute patterns on the mean overall classification accuracy averaged over 100 replications. The results show that the multivariate normal distribution with ρ = 0.8 lead to the best classification accuracy, followed by discrete uniform distribution and multivariate normal distribution with ρ = 0.3. In addition, classification accuracy goes down across all the distribution when Q-matrix is underspecified. Higher-order distribution results in more decline in the overall classification accuracy than the other three distributions. However, there is nearly no decline in the overall classification accuracy with Q-matrix overspecification for all distributions. The overall classification accuracy indices for Q-matrix underspecification and overspecification for each attribute pattern are also presented respectively in Tables 13 and 14. Because sample size has only a slight impact on the overall classification accuracy, we only report the results with sample size of 1000.. 38.

(42) (a) Q-matrix misspecification and sample size. (b) Q-matrix misspecification and the underlying distribution of cognitive patterns. Figure 5: The interaction plots of the type of Q-matrix misspecification and sample size and the underlying distribution of attribute patterns on the arcsin transformed overall classification accuracy. 39.

(43) Table 13: The overall classification accuracy for Q-matrix underspecification with different distributions of attribute patterns and each sample size. sample size 200 500 1000. uniform Pc0 Pc1 0.893 0.800 0.907 0.863 0.915 0.896. Pc0 0.894 0.906 0.911. Mv0.3 Pc1 0.827 0.862 0.889. sample size 200 500 1000 Pc0 : The Pc1 : The. Mv0.8 higher-order Pc0 Pc1 Pc0 Pc1 0.914 0.819 0.902 0.787 0.918 0.889 0.910 0.836 0.930 0.905 0.916 0.874 classification accuracy for the true Q-matrix classification accuracy for the underspecified Q-matrix. Table 14: The overall classification accuracy for Q-matrix overspecification with different distributions of attribute patterns and each sample size. sample size 200 500 1000. uniform Pc0 Pc2 0.893 0.890 0.907 0.908 0.915 0.916. sample size 200 500 1000 Pc0 : The Pc2 : The. Mv0.3 Pc0 Pc2 0.894 0.891 0.906 0.902 0.911 0.909. Mv0.8 higher-order Pc0 Pc2 Pc0 Pc2 0.847 0.819 0.902 0.895 0.888 0.889 0.910 0.906 0.899 0.905 0.916 0.914 classification accuracy for the true Q-matrix classification accuracy for the overspecified Q-matrix. 40.

(44) 5. Summary and Conclusion. Accurate estimation of item parameters and classification accuracy are important for cognitive diagnosis models because they are necessary in getting valid inferences. This study contributes to a better understanding of the effects of Q-matrix misspecification on these two practical issues for the G-DINA model. For parameter estimates, underspecification of Q-matrix in its row vector caused an overestimation of the last parameter of the misspecified item as well as the corresponding probabilities of answering that item correctly. These affected parameters were all related to the excluded attribute. Additionally, higher-order interaction parameters under underspecification appeared more difficult to recover than under overspecification, whereas the recovery of main effects was poorer than those two-way interaction parameters. Also, the smaller the number of attributes an item requires, the better the parameter estimates will be. These results are consistent with the previous studies (de la Torre, 2008; Rupp & Templin, 2008; Choi, Templin, Cohen, & Atwood, 2010). For classification accuracy, the attribute-specific classification accuracy for the misspecified attributes went down with an underspecification of the Q-matrix. However, no significant impact of overspecification on attribute-specific classification accuracy was present. Interestingly, the response probabilities for respondents who have mastered all the measured attributes were underestimated when Q-matrix underspecification occurs. For instance, item 26 was misspecified from q26 = (10011) to (10010) and the results showed that δ26,12 was overestimated. However, its corresponding probability estimate Pˆ26 (10010) was also affected by the underspecification, though the probability P26 (10011) was underestimated. This phenomenon may result from the change in the number of latent cognitive groups of the G-DINA model under Q-matrix misspecification. Take item 26 as an example, the number of latent cognitive groups is reduced from eight to four due to underspecification. Hence the probabilities of answering item 26 correctly for the two groups (10010) and (10011), namely P26 (10010) and P26 (10011) with no misspecification on the Q-matrix will be constrained to have equal probability of answering item 26 correctly because the two groups would be considered falling in the same latent cognitive group of (1001) with Q-matrix underspecification on attribute 5. However, overspecification does not show any apparent impact on the estimates for. 41.

(45) either item parameters or probability of answering correctly. Furthermore, some factors may interact with the impact of Q matrix misspecification on the parameter estimates and classification accuracy in the G-DINA model. For distribution of cognitive attribute patterns, discrete uniform distribution performed the best in parameter recovery and multivariate normal distribution with high correlation coefficient gave the highest attribute-specific classification accuracy and overall classification accuracy. These results indicated that different distributions of cognitive patterns in the population interacted with the impact of Q-matrix misspecification on parameter estimates and classification accuracy. For sample size, both parameter estimates and classification accuracy improved with an increase of sample size when Q-matrix underspecification is present, but there was of no difference when Q-matrix overspecification occurs. This indicates that a large sample size help reduce the impact of Q-matrix underspecification on both parameter estimates and classification accuracy when underspecification. In summary, underspecification of Q-matrix causes great impact on parameter estimates, while the impact due to overspecification is little or minor. In addition, the estimation of all other parameters for items whose row vectors were not misspecified was unaffected due to the two types of misspecification. Factors such as sample size and the distribution of cognitive attribute patterns interacted with the impact of Q-matrix misspecification on parameter estimates and classification accuracy in the G-DINA model.. 6. Reference. Chang, H, H., Cui, Y., & Gierl, J, M. (2012). Estimating classification consistency and accuracy for cognitive diagnostic assessment. Journal of Educational Measurement, 49, 19-38. Choi, H, J., Templin, J., Cohen, A., & Atwood, C. (2010). The impact of model misspecification on estimation accuracy in diagnostic classification models. Paper presented at the annual meeting of the National Council on Measurement in Education in Denver, Colorado. Corter, J. E. & Im, S. (2011). Statistical consequences of attribute misspecification in. 42.

(46) the rule space method. Educational and Psychological Measurement, 71(4), 712-731. de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179-199. de la Torre, J., Deng, W., & Hong, Y. (2010). Factors affecting the item parameter estimation and classification accuracy of the DINA model. Journal of Educational Measurement, 47(2), 227-249. de la Torre, J. (2009). DINA Model and parameter estimation: a didactic. Journal of Educational and Behavioral Statistics, 34(1), 115-130. de la Torre, J. (2008). An empirically based method of Q-matrix validation for the DINA model: Development and applications .Journal of Educational Measurement, 45(4), 343-362. de la Torre, J. & Douglas, J. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333-353. DeCarlo, T, L. (2012). Recognizing Uncertainty in the Q-Matrix via a Bayesian Extension of the DINA Model. Applied Psychological Measurement, 36(6), 447-468. DeCarlo, T, L. (2011). On the analysis of fraction subtraction data: The DINA model classification latent class sizes, and the Q-Matrix. Applied Psychological Measurement, 35(1), 8-26. DiBello, L. V., Roussos, L. A., & Stout, W.F. (2007). Review of cognitively diagnostic assessment and a summary of psychometric models. In C. R. Rao and S. Sinharay (Eds.), Handbook of Statistics, 26 (pp. 979-1030). Amsterdam: Elsevier. Hong, C, Y. (2013). Estimation of Generalized DINA Model with Order Restrictions. master thesis. Taiwan, Taipei: National Taiwan Normal University. Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191-210. Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258-272. Kunina-Habenicht, O., Rupp, A, A., & Wilhelm, O. (2012). The impact of model misspecification on parameter estimation and item-fit assessment in log linear diagnostic classification models. Journal of Educational Measurement, 49, 59-81. Maris,E. (1999). Estimating multiple classification latent class models. Psychometrika, 43.

(47) 64, 187-212. Rupp, A, A., & Templin, J. (2008). The Effects of Q-Matrix misspecification on parameter estimates and classification accuracy in the DINA Model. Educational and Psychological Measurement, 68, 76-96. Tatsuoka, K. K. (1983). Rule-space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345-354. von Davier, M. (2010). Hierarchical mixtures of diagnostic models. Psychological Test and Assessment Modeling, 52(1), 8-28. von Davier, M. (2005). A general diagnostic model applied to language testing data. (ETS Research Report RR-05-16). Wang, W, C. (2010). Compare the Parameters Estimated by DINA Model with by G-DINA Model. master thesis. Taiwan, Taichung: National Taichung University of Education.. 44.

(48)