6.2 生成式對抗網路
6.2.2 生成式對抗網路 實驗結果
模型/資料集 IMDB 20Newsgroups
MLP 74.35% 65.48%
GAN 75.61% 66.83%
表 6.3 生成式對抗網路實驗結果
在表 6.3 我們呈現了生成式對抗網路實驗結果,實驗結果並不比前述所提之連體
神經網路效果佳,但我們認為原因在於噪音的維度調整是重要關鍵,對於生成式
模型如何使用這個噪音是無法控制的,因此使得生成式對抗網路的訓練過程和結
果目前都不太可控,為了穩定生成式對抗網路,後續有許多學者提出模型改進和
理論分析。
69
第 7 章 結論與未來展望
本論文提出利用連體神經網路和生成式對網路來學習更優良於自動文本分類的
表示,在連體神經網路中,我們讓模型可以學習到文本與類別之間主題的關聯性,
並能有效提升自動文本分類任務中的效能,
在未來,在連體神經網路的部分我們會嘗試使用一些更複雜的子網路架構,
並且探討子網路與相似度函數的關係,在生成式對抗網路部分,我們會嘗試其他
的對抗網路,並探討其差異性,並能夠建立一個專門用於自動文本分類的架構。
70
參考書目
[1] Feldman, R., & Sanger, J.: “The text mining handbook: advanced approaches in
analyzing unstructured data.” (2007).
[2] Joachims T et al.: “Text categorization with support vector machines: Learning
with many relevant features.” Machine learning: ECML-98, (1998).
[3] Cunningham H, Maynard D, Bontcheva K, et al.: “A framework and graphical
development environment for robust NLP tools and applications.” ACL, (2002).
[4] LeCun Y, Bengio Y and Hinton G.: “Deep learning.” Nature, (2015).
[5] Salton G, Wong A, Yang C S.: “A vector space model for automatic indexing.”
Communications of the ACM, (1975).
[6] Mikolov T, Yih W and Zweig G. : “Linguistic regularities in continuous space
word representations.” NAACL, (2013).
[7] Hayes-Roth, Frederick, Donald Waterman, and Douglas Lenat.: “Building expert
systems.” (1984).
71
[8] Stachniss, Cyrill, Giorgio Grisetti, and Wolfram Burgard.: “Information
Gain-based Exploration Using Rao-Blackwellized Particle Filters.” (2005).
[9] Viola, Paul, and William M. Wells III.: “Alignment by maximization of mutual
information.” (1997).
[10] Mantel, Nathan.: “Chi-square tests with one degree of freedom; extensions of the
Mantel-Haenszel procedure.” (1963).
[11] Yitzhaki, Shlomo.: “Relative deprivation and the Gini coefficient.” The quarterly
journal of economics, (1979).
[12] De Boer, Pieter-Tjerk, et al.: “A tutorial on the cross-entropy method.” Annals of
operations research, (2005).
[13] Joachims, Thorsten.: “A Probabilistic Analysis of the Rocchio Algorithm with
TFIDF for Text Categorization.” No. CMU-CS-96-118. Carnegie-mellon univ
pittsburgh pa dept of computer science, (1996).
72
[14] Lewis, David D.: “Naive (Bayes) at forty: The independence assumption in
information retrieval.” European conference on machine learning. Springer, Berlin, Heidelberg, (1998).
[15] Masand, Brij, Gordon Linoff, and David Waltz.: “Classifying news stories using
memory based reasoning.” Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, (1992).
[16] Weston, Jason, et al.: “Feature selection for SVMs.” Advances in neural
information processing systems, (2001).
[17] Joachims, Thorsten.: “Making large-scale SVM learning practical.” (1998).
[18] Kohavi, Ron.: “Scaling Up the Accuracy of Naive-Bayes Classifiers: A
Decision-Tree Hybrid.” (1996).
[19] De Mántaras, R. López.: “A distance-based attribute selection measure for
decision tree induction.” (1991).
73
[20] Chawla, Nitesh V.: “C4. 5 and imbalanced data sets: investigating the effect of
sampling method, probabilistic estimate, and decision tree
structure.” Proceedings of the ICML, (2003).
[21] Friedl, Mark A., and Carla E. Brodley.: “Decision tree classification of land cover
from remotely sensed data.” Remote sensing of environment, (1997).
[22] Maas, Andrew L., et al.: “Learning word vectors for sentiment
analysis.” Proceedings of the 49th Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies-Volume 1. Association
for Computational Linguistics, (2011).
[23] Cardoso-Cachopo, Ana, and Arlindo L. Oliveira.: “An empirical comparison of
text categorization methods.” SPIRE, (2003).
[24] Bengio, Yoshua, Aaron Courville, and Pascal Vincent.: “Representation learning:
A review and new perspectives.” IEEE transactions on pattern analysis and machine intelligence, (2013).
74
[25] Brown, Peter F., et al.: “Class-based n-gram models of natural
language.” Computational linguistics 18.4, (1992).
[26] Bengio, Yoshua, et al.: “A neural probabilistic language model.” Journal of
machine learning research 3, (2003).
[27] Hinton, Geoffrey E.: “Learning distributed representations of
concepts.” Proceedings of the eighth annual conference of the cognitive science society, (1986).
[28] Mikolov, Tomas, et al.: “Distributed representations of words and phrases and
their compositionality.” Advances in neural information processing systems, (2013).
[29] Lawrence, Steve, et al.: “Face recognition: A convolutional neural-network
approach.” IEEE transactions on neural networks, (1997).
[30] Hochreiter, Sepp, and Jürgen Schmidhuber.: “Long short-term memory.” Neural
computation 9.8, (1997).
75
[31] Bromley, Jane, et al.: “Signature verification using a" siamese time delay neural
network.” Advances in Neural Information Processing Systems, (1994).
[32] Chopra, Sumit, Raia Hadsell, and Yann LeCun.: “Learning a similarity metric
discriminatively, with application to face verification.” Computer Vision and
Pattern Recognitio, (2005).
[33] Mueller, Jonas, and Aditya Thyagarajan.: ”Siamese Recurrent Architectures for
Learning Sentence Similarity.” AAAI, (2016).
[34] Goodfellow, Ian, et al.: “Generative adversarial nets.” Advances in neural
information processing systems, (2014).
[35] Zhao, Junbo, Michael Mathieu, and Yann LeCun.: “Energy-based generative
adversarial network.” (2016).