• 沒有找到結果。

CHPATER 7 CONCLUSIONS AND FUTURE WORK

7.2 Future Work

we concluded that concept drifts existing in the TAIEX Futures data would not be gradual or incremental ones. In order to identify reoccurring concept drifts, we proposed a method based on the evolving ensemble method and used it to observe the relative weights of sub-classifiers.

From the experimental results, we found that concept drift occurs frequently and reoccurring concept drift exists. Furthermore, our method achieves higher accuracy than AUE does for the TAIEX Futures data. Our method has another advantage -- it can adaptively adjust chunk size.

It is appropriate to be applied to streaming data that is collected from a dynamic environment.

7.2 Future Work

There are many possible directions to extend the work presented in this thesis. First, we use only transaction records for classification. However, more data sources may help us find better results. Hence, we can combine data from multiple sources to extend the base of data samples. Furthermore, feature selection will become more important when multiple data sources are considered. Additionally, how to retrieve association of different attributes of various data sources is a practical problem. Of course we can alternatively model the problem as a multi-class classification problem or a regression problem. Nevertheless, there still is an issue on which we can concentrate -- how to explain the associations between the time points of concept drift observed from experimental results and the trend of the stock in the real world, including investigation of reoccurring concept drift. It also includes exploration of ways to incorporate our methods into the creation of adaptive ensembles. For profit, cost-sensitive strategy is worthy to be concerned. However, the cost should be defined in the data. Domain experts perhaps interest in the rules of models, and observing rule sets in various times and giving a comparison for directing to concept drift may help domain experts to understand problems and to make decision.

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

62

For dealing with reoccurring concept drift, we designed ADE algorithm. It classifies by weighted majority vote. However, select the result of the most appropriate sub-classifier which reflects the current concept perhaps is better way to classifying. The simple way to select is choosing the sub-classifier which has highest weight. However, there maybe is much better way to select the sub-classifier. Moreover, we found ADE generated a few similar sub-classifiers. Similar sub-classifiers will reduce advance of dealing with reoccurring concept drift because the number of active sub-classifier is limited. Hence, combine or avoid to generating similar sub-classifiers is helpful.

[1] C. Sammut and M. Harries, "Concept Drift," in Encyclopedia of Machine Learning, ed:

Springer, 2010, pp. 202-205.

[2] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, "Moa: Massive online analysis," The Journal of Machine Learning Research, vol. 99, pp. 1601-1604, 2010.

[3] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA data mining software: an update," ACM SIGKDD Explorations Newsletter, vol.

11, pp. 10-18, 2009.

[4] J. A. Ou and S. H. Penman, "Financial statement analysis and the prediction of stock returns," Journal of accounting and economics, vol. 11, pp. 295-329, 1989.

[5] R. W. Holthausen and D. F. Larcker, "The prediction of stock returns using financial statement information," Journal of accounting and economics, vol. 15, pp. 373-411, 1992.

[6] D. P. Brown and R. H. Jennings, "On technical analysis," Review of Financial Studies, vol. 2, pp. 527-551, 1989.

[7] H. V. Roberts, "Stock‐Market “Patterns” And Financial Analysis: Methodological Suggestions," The Journal of Finance, vol. 14, pp. 1-10, 1959.

[8] L. Blume, D. Easley, and M. O'hara, "Market statistics and technical analysis: The role of volume," The Journal of Finance, vol. 49, pp. 153-181, 1994.

[9] E. J. Hannan, Multiple time series vol. 38: Wiley, 1970.

[10] P.-F. Pai and C.-S. Lin, "A hybrid ARIMA and support vector machines model in stock price forecasting," Omega, vol. 33, pp. 497-505, 2005.

[11] S. H. Cheng, "Data mining techniques to identify the direction of Taiwan Stock Index

Futures day trading," PhD Thesis, Department of Financial Engineering and Actuarial Mathematics of Soochow University. 2011. (in Chinese)

[12] C.-H. L. Chiu, Zne-Jung, "Application of Data Mining Technologies for IC Stock Category," Digital Technology Information Management. 2009. (in Chinese)

[13] S.-H. C. Cheng, I-LING, "Data Mining for Analysis of Choosing Stocks from Taiwan Stock Market," 2009 International Conference on Advanced Information Technologies (AIT), 2009. (in Chinese)

[14] P.-C. Chang and C.-H. Liu, "A TSK type fuzzy rule based system for stock price prediction," Expert Systems with Applications, vol. 34, pp. 135-144, 2008.

[15] T.-N. Lin, "Using AdaBoost for Taiwan Stock Index Future Intra-day Trading System,"

Graduae Institute of Network and Multimedia college of Electrical Engineering and computer Science, National Taiwan University. 2008. (in Chinese), 2008. (in Chinese) [16] M. Harries and K. Horn, "Detecting concept drift in financial time series prediction using

symbolic machine learning," in AI-CONFERENCE-, 1995, pp. 91-98.

[17] K. B. Pratt and G. Tschapek, "Visualizing concept drift," in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 735-740.

[18] G. R. Marrs, R. J. Hickey, and M. M. Black, "The impact of latency on online classification learning with concept drift," in Knowledge Science, Engineering and Management, ed: Springer, 2010, pp. 459-469.

[19] C.-M. Y. Chao, Huei-Wen, "Application of Multiple Data Streams Sequential Pattern Mining on Taiwan Stock Market," Journal of Information Management, vol. 12, pp.

113-132, 2010. (in Chinse)

[20] J. Sun and H. Li, "Dynamic financial distress prediction using instance selection for the

disposal of concept drift," Expert Systems with Applications, vol. 38, pp. 2566-2576, 2011.

[21] M. Last, "Online classification of nonstationary data streams," Intelligent Data Analysis, vol. 6, pp. 129-147, 2002.

[22] J. R. Quinlan, C4. 5: programs for machine learning vol. 1: Morgan kaufmann, 1993.

[23] J. R. Quinlan, "Induction of decision trees," Machine learning, vol. 1, pp. 81-106, 1986.

[24] W. W. Cohen, "Fast effective rule induction," in Machine Learning-International Workshop Then Conference, 1995, pp. 115-123.

[25] T. Cover and P. Hart, "Nearest neighbor pattern classification," Information Theory, IEEE Transactions on, vol. 13, pp. 21-27, 1967.

[26] C. Cortes and V. Vapnik, "Support-vector networks," Machine learning, vol. 20, pp.

273-297, 1995.

[27] Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," Journal of computer and system sciences, vol. 55, pp.

119-139, 1997.

[28] G. Widmer and M. Kubat, "Learning in the presence of concept drift and hidden contexts," Machine learning, vol. 23, pp. 69-101, 1996.

[29] A. Bifet, J. Gama, M. Pechenizkiy, and I. Zliobaite, "Handling concept drift: Importance, challenges and solutions," PAKDD-2011 Tutorial, Shenzhen, China, 2011.

[30] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, and S. Y. Philip, "Top 10 algorithms in data mining," Knowledge and Information Systems, vol. 14, pp. 1-37, 2008.

[31] P. Domingos and G. Hulten, "Mining high-speed data streams," in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining,

[32] A. Bifet and R. Gavaldà, "Adaptive learning from evolving data streams," in Advances in Intelligent Data Analysis VIII, ed: Springer, 2009, pp. 249-260.

[33] G. Holmes, R. Kirkby, and B. Pfahringer, "Stress-testing hoeffding trees," in Knowledge Discovery in Databases: PKDD 2005, ed: Springer, 2005, pp. 495-502.

[34] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, "Learning with drift detection," in Advances in Artificial Intelligence–SBIA 2004, ed: Springer, 2004, pp. 286-295.

[35] M. Baena-García, J. del Campo-Ávila, R. Fidalgo, A. Bifet, R. Gavaldà, and R.

Morales-Bueno, "Early drift detection method," 2006.

[36] H. Wang, W. Fan, P. S. Yu, and J. Han, "Mining concept-drifting data streams using ensemble classifiers," in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 226-235.

[37] D. Brzeziński and J. Stefanowski, "Accuracy updated ensemble for data streams with concept drift," in Hybrid Artificial Intelligent Systems, ed: Springer, 2011, pp. 155-163.

[38] E. Kirkos, C. Spathis, and Y. Manolopoulos, "Data mining techniques for the detection of fraudulent financial statements," Expert Systems with Applications, vol. 32, pp.

995-1003, 2007.

[39] P. Ou and H. Wang, "Prediction of stock market index movement by ten data mining techniques," Modern Applied Science, vol. 3, p. P28, 2009.

[40] B. Rosenberg and W. McKibben, "The prediction of systematic and specific risk in common stocks," Journal of Financial and Quantitative Analysis, pp. 317-333, 1973.

[41] G. Gidófalvi and C. Elkan, "Using news articles to predict stock price movements,"

Department of Computer Science and Engineering, University of California, San Diego, 2001.

相關文件