5. Simulation study
5.2 Simulation results
In each case, the results of simulation study are represented in five tables which include the average parameters estimates (listed in two tables), average conditional probabilities, average latent prevalences and average correlation coefficients for 100 replications, separately. We shall explain these results later. The simulation results for 3-class model with 100 sample sizes are presented from Table 7 to Table 11. The simulation results for 3-class model with 500 sample sizes are presented from Table 12 to Table 16. The simulation results for 6-class model with 300 sample sizes are presented from Table 17 to Table 21. The simulation results for 6-class model with 1000 sample sizes are presented from Table 22 to Table 26. The simulation results for 2-class model with 150 sample sizes are presented from Table 27 to Table 31. The simulation results for 3-class model with 750 sample sizes are presented from Table 32 to Table 36. According to Table 7 ~ Table 36, we can see that these results of the k-means method using correlation coefficients measurement are similar to those of k-means method using covariance measurement. So, we shall discard the results of k-means method using covariance measurement in the following discussion.
First, we discuss the simulation results which are presented from Table 12 to
Table 16 of 3-class model with 500 sample sizes.
Average parameters estimations:
Table 12 and Table 13 under the column “TRUE” include all
{
βpj,γjmk,αqmk}
insimulated data. All average of
{
βpj,γjmk,αqmk}
estimates got from the k-means method using correlation coefficients measurement (K_Corr) and covariance measurement (K_Cova) separately and the hierarchical method using covariance measurement (H_Cova) are also shown in Table 12 and Table 13. Why not correlation coefficients measurement for Hierarchical method? At the initial stage, correlation coefficients in two objects are always one. Table 12 and Table 13 can demonstrate that the parameters estimates got from the k-means method are not bad compared to the true parameters. But the parameters estimates got from the hierarchical procedure are poor. The bad performance of the hierarchical procedure is result from there is no provision for a reallocation of objects that may have been “incorrectly” grouped at an early stage. Furthermore, the hierarchical procedure is sensitive to cluster structure.This means that hierarchical procedure have the chance to perform more well only when there is clear cluster structure than when there is no clear cluster structure.
Table 12 and Table 13 also include the standard errors of parameters estimates in doing multinomial regressions, (4.1) and (4.2), and the average sample standard errors of the parameters estimates for 100 replications. The sample standard errors of the estimates for 100 replications include the variation of doing multinomial regression and creating cluster membership. Because we use the multinomial regression to estimate parameters under the assumption of known cluster membership, the standard errors of parameters estimates in doing multinomial regression did not include the variations of creating cluster membership. Therefore, the standard errors of parameters estimates in doing multinomial regressions should be smaller than the
sample standard errors of the estimates for 100 replications. This is demonstrated in Table 12 and Table 13. However this is not demonstrated in Table 7 and Table 8 for the 3-class model with 100 sample sizes which gave very few individuals per parameter. For the sparse data, the estimated standard errors of parameters estimates in doing multinomial regressions are not accurate. Therefore, the standard errors of parameters estimates in doing multinomial regressions are not always smaller than the sample standard errors of the estimates over 100 replications for the 3-class model with 100 sample sizes.
Average conditional probabilities:
Table 14 under the column “TRUE” displays the RLCA conditional probabilities evaluated at the sample means of the incorporated covariates:
,
The average of estimated conditional probabilities over 100 replications with k-means (K_Corr and K_Cova) and hierarchical (H_Cova) methods are also shown in Table 14. The estimated conditional probabilities for k-means and hierarchical methods are:
Overall, the average conditional probabilities for the k-means method are more closed to the true conditional probabilities than the average conditional probabilities for the hierarchical method.
Average latent prevalence:
Table 15 under the column “TRUE” displays the sample average of the RLCA prevalences:
The average of estimated prevalences over 100 replications with k-means (K_Corr and K_Cova) and hierarchical (H_Cova) methods are also shown in Table 15.
The estimated prevalences are:
Overall, the average latent prevalences for the k-means method are more closed to the true latent prevalences than the average latent prevalences for the hierarchical method.
Average correlation coefficients:
We evaluated theMCovkof the objects in the same cluster k. Table 16 displays the average of MCovkover 100 replications in each cluster k. As expected, the k-means method resulted smaller average correlation coefficients than the hierarchical method.
Next, for the 6-class model with 1000 sample sizes, we shall discuss the simulation results which are presented from Table 22 to Table 26. These tables show that the results of whether the k-means procedure or the hierarchical procedure are poor obviously comparing to the 3-class model with 500 sample sizes. Figure 3 shows the dendrogram of the hierarchical procedure for 6-class model with 1000 sample sizes. The dendrogram indicates the cluster structure in 6-class model with 1000 sample sizes. Therefore we guess the objects in 6-class model with 1000 sample sizes should be divided to two clusters, which is demonstrate in the 2-class model. For the 2-class model with 750 sample sizes, the simulation results, which are presented from
Table 32 to Table 36, are much better than the 6-class model with 1000 sample sizes.
To go back to the Figure 3, the hierarchical procedure produces inversions (Morgan, Byron and Andrew P.G., 1995). An inversion occurs when an object joins an cluster at smaller covariance than that of a previous consolidation.
When we use maximum likelihood to estimate the parameters in (3.2), the maximum likelihood estimation (MLE) is relative to the number of individuals given in per parameter. For the spare data, which gave less individuals per parameter, the MLE can not be obtained or the MLE is not a good estimation .For the three models, 3-class RLCA with 100 sample sizes, 6-class RLCA with 300 sample sizes and 2-class RLCA with 150 sample sizes, which gave less individuals per parameter, the simulation results are not worse than those that gave more individuals per parameter.
It demonstrates that our clustering procedure is irrelative to the number of individuals given per parameter.