研究限制與後續研究方向

第五章結論與建議

第三節研究限制與後續研究方向

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

為了貼近現實世界中的應用，我們實驗了許多資料遺失的情境，並且根據不同的情境歸納結果，找出適用的填值策略，期望未來企業界在建立預測模型，或是欲將新的一筆資料進行歸類時，可以用較少的經費，以最有效率的取得重要的遺失資訊，減少預測未來的偏誤。

第三節研究限制與後續研究方向 5.3.1 研究限制

1. 並未取得數據的相關成本資訊

本研究的實驗數據主要自 UCI Machine Learning 網站蒐集，但由於數據蒐集的緣故，本研究並沒有將成本因素列入考量，因此當使用 I

Sampling 方式選定屬性值填補時，很有可能選擇到的關建屬性值的取得成本較高。

2. 較少嘗試資料量較大的數據

受限於實驗的時間，我們並未嘗試資料量較大的數據，因此無法試驗與比較本研究所提出的 I Sampling 是否合適於更大的資料量。

5.3.2 未來研究建議

1.

納入資訊取得成本一同考量

未來若在進行相關研究時，可以考量尋找具有成本資訊的相關數據，

將遺失值的成本資訊與重要性一同評估，也許可以透過給予權重或是或是其他的評估方式，找出更有效率、更貼近現實情境的填值策略。

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

2.

將 Error Sampling 的想法納入 I Sampling 中

考量到填值策略的效率，假設現在的填值經費只夠填補 22 個數據，但從 I Sampling 方式所選擇的關建屬性有 40 個遺失值，除了從 40 個遺失值中隨機選擇填補外，或許未來可以將 Error Sampling 的想法與 I Sampling 的想法結合，從這 40 個數據中先填補被分類錯誤的數據(Error Sampling 的概念)。

3.

改善程式運作

以更大規模的數據融入 I Sampling 的想法進行實驗，將可以更精準的找到合適的遺失值填補方式。

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y 參考文獻

外文文獻

1. Bennett, D. A. (2001), “How can I deal with missing data in my study?

“Australian and New Zealand Journal of Public Health, 25(5), 464–469.

2. Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). The KDD Process for Extracting Useful Knowledge from Volumes of Data. Communications of the ACM, 39(11), 27-35.

3. Gilks, W. R., Richardson, S.,& Spiegelhalter, D. J. (1996). Introducing Markov chain Monte Carlo. In Markov chain Monte Carlo in practice (pp.

1-19). London: Chapman & hall/CRC.

4. Kohavi, R. (1995, August). A study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In IJCAI, (Vol.14, No.2, pp.

1137-1145).

5. Levin, N., & Zahavi, J. (2001). Predictive modeling using segmentation.

Journal of Interactive Marketing, 15(2), 2-22.

6. Lindenbaum, M., Markovitch, S., & Rusakov, D. (2004). Selective

Sampling for Nearest Neighbor Classifiers. Machine Learning, 54(2), 125-152.

7. Lizotte, D. J., Madani, O., & Greiner, R. (2002, August). Budgeted learning of Naive-Bayes Classifiers. In Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence (pp. 378-385). Morgan Kaufmann Publishers Inc..

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

8. Melville, P., Saar-Tsechansky, M., Provost, F., & Mooney, R. (2004, November). Active Feature-Value Acquisition for Classifier Induction. In Proceedings of the 4th IEEE International Conference on Data Mining. (pp.

483-486). Brighton, UK.

9. Peng, C. Y. J., Harwell, M., Liou, S.M., & Ehman, L.H. (2006). Advances in missing data methods and implications for educational research. In Real data analysis, 31-78. North Carolina,US : Information Age Publishing.

10. Pyle , D. (1999). Data Preparation for Data Mining. Massachusetts:

Morgan Kaufmann.

11. Quinlan, J. R. (1989, December). Unknown attribute values in induction. In ML (pp. 164-168).

12. Redman, T. C. (1996). Data quality for the information age. Massachusetts:

Artech House, Incorporated.

13. Rubin, D. B. (1987). Multiple imputation for non-response in surveys. New York: John Wiley & Sons.

14. Saar-Tsechansky, M., Melville, P., & Provost, F. (2009, 4). Active Feature-Value Acquisition. Management Science,55(4), 664-684.

15. Schafer, J. L. (1999). Multiple imputation: a primer. Statiscal methods in medical research, 8(1), 3-15.

16. Schlomer, G. L., Bauman, S., & Card, N. A. (2010). Best Practices for Missing Data Management in Counseling Psychology. Journal of Counseling Psychology, 57(1), 1-10.

17. Settles, B. (2010). Active Learning Literature Survey. Computer Sciences Technical Report 1648, Unversity of Wisconsin, Madison, 52, 55-66.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

18. Simon, H. A., & Lea, G. (1974). Problem solving and rule induction: A unified view. Knowledge and cognition. Oxford, England: Lawrence Erlbaum.

19. Tong, S., & Koller, D. (2001, August). Active learning for structure in Bayesian networks. In International joint conference on artificial intelligence, (vol. 17, No.1, pp. 863-869).

20. Vinod, N. C., & Punithavalli, D. M. (2011). Classification of Incomplete Data Handling Techniques-An Overview. International Journal on Computer Science and Engineering, 3(1), 340-344.

21. Zheng, Z., & Padmanabhan, B. (2002). On Active Learning for Data Acquisition. In Proceedings of IEEE International Condference on Data Mining. (pp. 562-569).

22. Zhu, X., & Wu, X. (2005). Cost-Constrained Data Acquisition for

Intelligent Data Preparation. IEEE Transactions on Knowledge and Data Engineering, 17(11), 1542-1556.

中文文獻

1. 麥爾荀伯格、庫基耶 (2013)，大數據 (初版) (林俊宏譯)，台北市：天 下文化 (原著出版年：2013 年)。

2. 王鴻龍、楊孟麗、陳俊如、林定香 (2012)，缺失資料在因素分析上的處理方法之研究，教育科學研究期刊，第五十七卷第一期，頁 29-50。

3. 吳元彰、沈永勝、楊鍵樵 (2007)，應用加權式灰關聯法與自動分群技術於遺失值填補問題，技術學刊，第二十二卷第一期，頁 77-87。

4. 彼得杜拉克(1980)，動盪時代下的經營(初版)(李辛模譯)，台北市: 現代企業經營管理 (原著出版年：1980 年)。

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

5. 林惠玲、陳正倉 (2004)，統計學：方法與應用，台北市：雙葉書廊。

6. 林曉芳 (2002)，以 Hot deck 插補法推估成就測驗之不完整作答反應，國立政治大學教育學系教育心理與輔導組博士學位論文，未出版，台北市。

7. 翁頌舜、梁德馨 (2002)，資料採礦資料缺值插補之變異數分析，輔仁管理評論，第九卷第三期，頁 163-180。

8. 馬芳資、林我聰 (2003)，決策樹形式知識之線上預測系統架構，圖書館學與資訊科學，第二十九卷第二期，頁 60-76。

9. 陳信木、林佳瑩 (1997)，調查資料之遺漏值的處置─以熱卡插補法為例，調查研究─方法與應用，第三期，頁 75-106。

10. 黃齡葦 (2005)，遺失資料之多重插補法模擬比較，國立台灣大學農藝學研究所碩士論文，未出版，台北市。

網路資料

1. UCI machine Learning Repository. (n.d.). Retrieved from https://archive.ics.uci.edu/ml/index.html

在文檔中預測模型的遺失值處理─選值順序的研究 - 政大學術集成 (頁 66-71)

第五章 結論與建議

第三節 研究限制與後續研究方向

國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

第三節 研究限制與後續研究方向 5.3.1 研究限制

1. 並未取得數據的相關成本資訊

2. 較少嘗試資料量較大的數據

5.3.2 未來研究建議

1.

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

2.

3.

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y 參考文獻

外文文獻

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

中文文獻

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

網路資料

第五章結論與建議

第三節研究限制與後續研究方向

立政治大學

第三節研究限制與後續研究方向 5.3.1 研究限制

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學