第五章 實驗與結果
5.3 實驗結果分析
5.3.2 執行速度分析
這 一 小節我 們將針 對 執行速 度 來探討 , 首先我們 一樣要 和 NaiveBayes 、 Logistic、JRip和DecisionTree比較。
我們針對執行速度進行比較。如表 5.1 所示,我們的方法耗時最長,這是因 為我們學習的規則比較零散,而且我們的程式只以單一執行緒執行,導致建立模 型的時間比 Weka 提供的方法普遍運用多執行緒的執行技巧慢了許多。未來我們 也將採用多執行緒的方法實作我們的方法。
多支持度 權重參數 Precision
或 Recall 值
55
第六章
結論與未來研究
6.1 結論
在本研究中,針對大學生的休退學預測問題,我們建構了一個休退學預測系 統,此系統能有效預測在學學生每學期結束後是否休退學或繼續在學。另外,此 系統也能分析探勘所得的分類規則,將最重要關鍵的規則產出,提供給相關的人 員進行決策參考。在這個系統中,我們發展一種新的分類方法,稱為 GACMS 方 法。此方法以關聯分類方法為基礎,加入多重支持度的機制,可有效解決出現頻 率少,但很重要的分類條件被單一支持度過濾掉的問題,讓更多有用的分類資訊 得以保留到模型的產生,提高了模型的預測準確度。另外此方法還利用專家定義 的階層樹,讓零散且有用的規則有機會彙整為強度足夠的分類規則,並有效修剪 訓練後的模型裡規則中多餘的條件,讓分類規則更精實。
在實驗部份,雖然我們的運算速度不比其它演算法。但我們獲得的規則經測 試資料試驗後,證實可獲得精確度遠高於其他方法的分類規則。
綜合以上,我們提出的 GACMS 方法,極適合用資料集中,各種條件出現頻 率差異極大,且又具有豐富階層分類資訊的問題。例如除了做學生休退學研究,
也可做學生未來就業研究,或是將之應用到其它領域去探勘有用的分類規則。
56
6.2 未來研究工作
學生的休退學預測問題,過去已有相當多的研究,但利用關聯式分類法的作 法,就我們所知是種新的嘗試。位來我們將以此研究為基礎,繼續朝向下列幾個 方向進行研究 :
(1) 此次研究,發覺有些規則是有時序性的休退學規則。例如,有六成以上的退學學 生是曾經休學。可經由增加時序性的分類規則探勘法找出這類的規則,以增加分 類模型的準確度。
(2) 本論文採用的多重支持度關聯規則探勘,其所設定的權重參數扮演相當重要的角 色。有研究[25]提出以信賴度調整此參數,未來也可參考此一方法找出更好的參數 調整方法,以有效提升訓練模型時的運算速度。
(3) 本研究因採用多重支持度關聯規則探勘,讓許多有用資訊得以保留,但也相對大 幅降低運算速度,如果能有效運用多工處理,應能解決此一問題。如何將本文中 所提方法切分為多個線程(Threads)來執行而不互相干擾,也是接下來可以研究的方 向。目前比較可行的是在 GACMS 作規則排序和修剪時,因為每條規則出現時就 做排序和修剪,所以這邊極有機會可以切分多工模式,只要能有效控制每個線程 不互搶資源即可。
57
參考文獻
[1] R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 207-216, 1993.
[2] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proceedings of the 20th International Conferences on Very Large Data Bases, pp.
487-499, 1994.
[3] F. Araque, C. Roldan, and A. Salguero, “Factors influencing university drop out rates,” Computers & Education, vol. 53, pp. 563–574, 2009
[4] G.W. Dekker, M. Pechenizkiy, and J.M. Vleeshouwers, “Predicting students drop out: A case study,” in Proceedings of the 2nd International Conference on Educational Data Mining, pp. 41–50, 2009.
[5] M. Feng, N. Heffernan, and K. Koedinger, “Looking for sources of error in predicting student’s knowledge,” in Proceedings of AAAI Workshop on Education Data Mining, pp. 1–8, 2005.
[6] J. Han and Y. Fu, “Discovery of multiple-level association rules from large databases,” in Proceedings of the 21st International Conference on Very Large Data Bases, pp. 420-431, 1995.
[7] J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 1-12, 2000.
[8] S. Kotsiantis, “Educational data mining: A case study for predicting dropout-prone students,” International Journal of Knowledge Engineering and Soft Data Paradigms, vol. 1, no. 2, pp. 101–111, 2009.
58
[9] S. Kotsiantis, K. Patriarcheas, and M. Xenos, “A combinational incremental ensemble of classifiers as a technique for predicting students performance in distance education,” Knowledge Based Systems, vol. 23, no. 6, pp. 529–525, 2010.
[10] S.B. Kotsiantis and P.E. Pintelas, “Predicting students’ marks in Hellenic Open University,” in Proceedings of IEEE International Conference on Advanced Learning Technologies, pp. 664–668 , 2005.
[11] W. M. Li, J. W. Han, and J. Pei, “CMAR: Accurate and efficient classification based on multiple class-association rules,” in Proceedings of IEEE International Conference on Data Mining, pp. 369-376, 2001.
[12] B. Liu, W. Hsu, and Y. Ma, “Integrating classification and association rule mining,” in Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pp. 80–86, 1998.
[13] B. Liu, W. Hsu and Y. Ma, “Mining association rules with multiple minimum supports”, in Proceedings of SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 337-341, 1999.
[14] C. L. Lui and F. L. Chung, “Discovery of generalized association rules with multiple minimum supports,” in Proceedings of 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 510-515,
2000.
[15] I. Lykourentzo, I. Giannoukos, V. Nikolopoulos, G. Mpardis, and V. Loumos,
“Dropout prediction in elearning courses through the combination of machine learning techniques,” Computers & Education, vol. 53, pp. 950–965, 2009.
[16] C. Márquez-Vera, A. Cano, C. Romero, and S. Ventura, “Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data,” Applied Intelligence, vol. 16, pp. 315-330,
59
2003.
[17] C. Marquez-Vera, C. Romero, and S. Ventura, “Predicting school failure using data mining,” in Proceedings of 4th International Conference on Educational Data Mining, pp. 271-276, 2011.
[18] D. Martinez, “Predicting student outcomes using discriminant function analysis,”
in Proceedings of Annual Meeting of the Research and Planning Group, pp.
163–173, 2001.
[19] G. Mendez, T.D. Buskirk, S. Lohr, and S. Haag, “Factors associated with persistence in science and engineering majors: An exploratory study using classification trees and random forests,” Journal of Engineering Education, vol. 9, no. 1, pp. 57-70, 2008.
[20] A. Parker, “A study of variables that predict dropout from distance education,”
International Journal of Educational Technology, vol. 1, no. 2, pp.1–11, 1999.
[21] M.N. Quadril and N.V. Kalyankar, “Drop out feature of student data for academic performance using decision tree techniques,” Journal of Computer Science and Technology, vol. 10, pp. 2–5, 2010.
[22] C. Romero and S. Ventura, “Educational data mining: A review of the state of the art,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 40, no. 6, pp.
601-618, 2010.
[23] R. Srikant and R. Agrawal, “Mining generalized association rules,” in Proceedings of the 21st International Conference on Very Large Data Bases, pp. 407-419,
1995.
[24] J.F. Superby, J.P. Vandamme, and N. Meskens, “Determination of factors influencing the achievement of the first year university students using data mining methods,” in Proceedings of AAAI Workshop on Educational Data Mining, pp.
60
1–8 , 2006.
[25] M.C. Tseng and W.Y. Lin, “Maintenance of generalized association rules with multiple minimum supports,” Intelligent Data Analysis, vol. 8, pp. 417-436, 2004.
[26] M.C. Tseng and W.Y. Lin, “Efficient mining of generalized association rules with non-uniform minimum support,” Data and Knowledge Engineering, vol. 62, no. 1, pp. 41-64, 2007.
[27] T.Y. Tang and G. Mccalla, “Student modeling for a web-based learning environment: A data mining approach,” in Proceedings of the 18th National Conference on Artificial Intelligence, pp. 967–968, 2002.
[28] W. Veitch, “Identifying characteristics of high school dropouts: Data mining with a decision tree model,” in Proceedings of Annual Meeting of the American Educational Research Association, pp. 1–11, 2004.
[29] L. Wegner, A.J. Flisher, P. Chikobvu, C. Lombard, and G. King, “Leisure boredom and high school dropout in Cape Town, South Africa,” Journal of Adolescence, vol. 31, pp. 421–431, 2008.
[30] M.V. Yudelson, O. Medvedeva, E. Legowski, M. Castine, D. Jukic, and D.
Rebecca, “Mining student learning data to develop high level pedagogic strategy in a medical ITS,” in Proceedings of AAAI Workshop on Educational Data Mining, pp. 1–8, 2006.
[31] ETToday 東 森 新 聞 雲 - 教 育 部 102 年 推 方 案 整 併 6 所 國 立 大 學 http://www.ettoday.net/news/20121119/129267.htm
[32] P.N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining: Pearson, 2005, pp. 373-374.