結論與未來展望 - 基於中國餐廳過程之在線學習方法

5.1 研究總結

在大多的情況下，資料類別及個數資訊在巨量資料問題下是未知的，而且也無法透過專家經驗或實驗方式得知，因此無母數方法比固定參數的機器學習方法更適合處理巨量資料。在實驗中，當資料量大時，我們提出的 Online CRP 不僅在分類的效能上能夠達到監督式學習方法的標準，且在執行時間也比很多方法快速，驗證本方法可準確並有效率的處理巨量資料問題。

5.2 未來展望

目前有 online 概念的 graphical model 研究，大部分只研究主題隨時間的演變，

較少探討使用標記資料來對模型的參數估計做調整的問題，因此未來可能可以利用標記資料來探討其他 graphical model 和 Online Learning 的結合。

標記資料引入 graphical model 在此論文是使用一個隨機變數來表示標記資料，

未來的方向可以在 graphical model 中多設計一些隨機變數，例如加入 Universum 的資料等等，來影響整個系統的機率參數之估計。

本論文提出的方法具動態的自我成長和自我訓練功能，並不排斥於對新領域的即時學習與擴展，因此很適合整合成一套實務上的應用系統，也是未來可以考慮的發展方向。此外，針對新訓量資料引進，參數一直在變動，可以由圖 4.5-1 中看出，

新訓練資料加進來，參數變動並不一定造成效能提升，因此未來可以考慮借用 Average Perceptron 之概念，將全部或部分參數紀錄下來，並使用所有參數預測之平均的機率值來當預測值。

參考文獻

[1] T. Joachims, “Text categorization with support vector machines: Learning with many relevant features”, ECML, Berlin: Springer, pp. 137–142, 1998.

[2] R. E. Schapire and Y. Singer, “Boostexter: A boosting-based system for text categorization”, Machine Learning, vol. 39, no. 2/3, pp. 135–168, 2000.

[3] LEO BREIMAN , ”Random Forest” , Machine Learning, 45, 5–32, 2001, 2001 Kluwer Academic Publishers. Manufactured in The Netherland

[4] D. W. Hosmer and Stanley Lemeshow, Applied Logistic Regression, 2nd ed., Wiley, 2000.

[5] A. McCallum and K. Nigam, “A comparison of event models for naïve bayes text classification”, in IN AAAI-98 WORKSHOP ON LEARNING FOR TEXT CATEGORIZATION. AAAI Press, pp. 41–48, 1998.

[6] MacQueen, J. B. (1967). "Some Methods for classification and Analysis of Multivariate Observations". Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability

[7] McLachlan, G., " Mixture Models." Marcel Dekker, New York, NY (1988) [8] Ferguson, Thomas (1973). "Bayesian analysis of some nonparametric

problems". Annals of Statistics 1 (2): 209–230.

[9] Aldous, D. J. (1985). "Exchangeability and related topics". É cole d'É té de Probabilités de Saint-Flour XIII — 1983. Lecture Notes in Mathematics1117.

pp. 1–1.

[10] Pitman. Combinatorial Stochastic Processes. Lecture Notes for St. Flour Summer School. Springer-Verlag, New York, NY, 2002.

[11] Navarro, D, & Perfors. A. “The Chinese restaurant process” University of Adelaide

[12] Kunle Olukotun, “Map-Reduce for Machine Learning on Multicore” in NIPS, 2006

[13] Léon Bottou “Large-Scale Machine Learning with Stochastic Gradient Descent”

Proceedings of COMPSTAT'2010, pp 177-186

[14] Shalev-Shwartz “Online Learning: Theory, Algorithms, and Applications”

Foundations and TrendsRin Machine Learning Vol. 4, No. 2 (2011) 107–194 [15] ROBERT E. SCHAPIRE “Large Margin Classification Using the Perceptron

Algorithm” 1999 Kluwer Academic Publishers, Machine Learning, 37, 277–

296 (1999)

[16] M. Hoffman, D. Blei, and F. Bach. Online learning for latent dirichlet allocation.

Advances in Neural Information Processing Systems, 23:856–864, 2010.

[17] Chong Wang, John Paisley, David M. Blei. Online variational inference for the hierarchical dirichlet process. In Proc. of the 14th Int'l. Conf. on Artificial Intelligence and Statistics (AISTATS), Vol. 15 (2011), pp. 752-760.

[18] Cauwenberghs and T. Poggio, "Incremental and Decremental Support Vector Machine Learning," in Adv. Neural Information Processing Systems

(NIPS*2000), Cambridge MA: MIT Press, vol. 13, 2001

[19] Fedor Zhdanov and Vladimir Vovk. “Competitive online generalized linear regression under square loss” , ECML 2010

[20] Daphne Koller, Nir Friedman, “Probabilistic Graphical Models: Principles and Technique” ,MIT Press, 2009

[21] Sethuraman, J. (1994), “A Constructive Definition of Dirichlet Priors,” Statistica Sinica, 4, pp. 639–650.29

[22] Blackwell, D. and MacQueen, J. (1973), “Ferguson Distributions via Polya ´ Urn Schemes,” Annals of Statistics, 1, pp. 353–355.

[23] Dempster, A.P.; Laird, N.M.; Rubin, D.B. (1977). "Maximum Likelihood from

Incomplete Data via the EM Algorithm". Journal of the Royal Statistical Society, Series B 39 (1): 1–38

[24] Neal “Markov chain sampling methods for Dirichlet process mixture models.”

Journal of Computational and Graphical Statistics, 9(2):249–265, 20001

[25] Blei, D., M. Jordan. Variational methods for the Dirichlet process.

In 21st International Conference on Machine Learning. 2004.

[26] D. M. Blei and P. I. Frazier. Distance dependent Chinese restaurant processes. In ICML, 2010.

[27] Richard Socher,Andrew Maas,Christopher D. Manning,” Spectral Chinese Restaurant Processes: Nonparametric Clustering Based on Similarities”, AISTATS,2011

[28] A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm.

In NIPS 14. MIT Press, 2001

[29] D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003.

[30] D. Blei, T. Griths, M. Jordan, and J. Tenenbaum. Hierarchical topic models and the nested chinese restaurant process. Advances in Neural Information

Processing Systems, 16, 2003.

[31] Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Science, 101, 5228-5235

[32] Thomas Hofmann, “ Learning the Similarity of Documents : an information-geometric approach to document retrieval and

categorization” ,Advances in Neural Information Processing Systems 12, pp-914-920, MIT Press, 2000

[33] Mark Girolami, and Ata Kaban, “On an equivalence between PLSI and LDA”, SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference

on Research and development in informaion retrieval, page 433--434.

[34] Loomes, G. and Sugden, R. (1982), “Regret theory: An alternative theory of rational choice under uncertainty”, Economic Journal, 92(4), 805–24.

[35] Erik B. Sudderth, “Graphical Models for Visual Object Recognition and Tracking”, Submitted to the Department of Electrical Engineering and Computer Science on May 26, 2006 in Partial Fulfillment of the

Requirements for the Degree of Doctor of Philosophy in Electrical Engineering and Computer Science

[36] Charles E. Antoniak, ” Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems”, Ann. Statist. Volume 2, Number 6 (1974), 1152-1174.

[37] Lewis, D. D.; Yang, Y.; Rose, T.; and Li “F. RCV1: A New Benchmark Collection for Text Categorization Research” Journal of Machine Learning Research, 5:361-397, 2004.

[38] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis.

Cambridge University Press, 2004.

[39] Léon Bottou, “Stochastic Gradient Descent Tricks “, Neural Networks: Tricks of the Trade , Lecture Notes in Computer Science Volume 7700, 2012, pp 421-436

在文檔中基於中國餐廳過程之在線學習方法 (頁 64-68)