第五章 研究結論與未來方向
第二節 未來方向
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
65
第二節 未來方向
本研究透過新聞主題偵測與追蹤及網絡連結之方式建構新聞主題脈絡以及 萃取新聞情緒來探討與股價之關係,但仍有許多限制與待改進方向:
(1) 更多樣化之資料來源:
本研究之新聞來源僅自中央社,而且僅限定於經濟新聞,對於計算情緒而言 可能較無法以綜觀之方式來進行情感分析,未來可以撰寫適合更多新聞網站之爬 蟲程式,甚至衍伸之社群網站,以求資料來源更全面,更能反映大環境之輿情。
(2) 新聞主題偵測與追蹤之成效改進:
本研究透過新聞主題偵測與追蹤之技術的確可以將相似內容之新聞群聚,但 常常會有數量過少之無效群集,或是發生於不同時空但相似之新聞事件卻會被分 至相同群集中。對此可以從事更進一步研究如:無效群集合併,將新聞數量過少 之群集打散,新聞重新排入偵測與追蹤序列中;偵測何類新聞事件還在發展中,
發展中新聞事件可以接受新進新聞文章進入群集,而過了特定時間區間後新聞事 件不再發展,有相似之新聞事件則要分類至新的群集。
(3) 新聞事件之因果關係:
本研究群集間建立連結之依據乃群集間之相似度,對此只能確定群集彼此內 容有相似,卻無法分析群集間之因果關係。往後之研究可以針對群集間因果關係 深入探討,如此一來對於新聞事件、新聞主題之脈絡產生也將會更準確。
(4) 情感分析成效改進:
本研究於新聞情感分析上使用關鍵字偵測法,其成效取決於情感辭典的完備 程度,然而所參考之台大意見詞辭典對於經濟類之情感詞彙建構不甚完善,於經
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
66
濟類新聞情緒判斷上將會有誤差。未來建議可以參考國外學者,建立起中文之經 濟類情感詞庫,進而改善情感分析之成效。
‧
Tremendous Value For People And Businesses. Retrieved from Business Insider website:http://www.businessinsider.com/growth-in-the-internet-of-things-2013-10
Allan, J., Papka, R., & Lavrenko, V. (1998). On-line new event detection and tracking.
Paper presented at the Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne, Australia.
Atzori, L., Iera, A., & Morabito, G. (2010). The internet of things: A survey. Computer
networks, 54(15), 2787-2805.
Ballve, M. (2013). Big Data Will Drive The Next Phase Of Innovation In Mobile Computing. Retrieved from Business Insider website:
http://www.businessinsider.com/big-data-is-growing-thanks-to-mobile-2013-12
Bar-Haim, R., Dinur, E., Feldman, R., Fresko, M., & Goldstein, G. (2011). Identifying and
following expert investors in stock microblogs. Paper presented at the
Proceedings of the Conference on Empirical Methods in Natural Language Processing.Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal
of Computational Science, 2(1), 1-8.
Brown, G. W. (1999). Volatility, sentiment, and noise traders. Financial Analysts Journal, 82-90.
Cambria, E., Rajagopal, D., Olsher, D., & Das, D. (2013). Big social data analysis. Big
Data Computing, 401-414.
Chen, C., Chen, Y.-T., Sun, Y., & Chen, M. (2003). Life Cycle Modeling of News Events Using Aging Theory. In N. Lavrač, D. Gamberger, H. Blockeel & L. Todorovski (Eds.), Machine Learning: ECML 2003 (Vol. 2837, pp. 47-59): Springer Berlin Heidelberg.
Cieri, C., Strassel, S., Graff, D., Martey, N., Rennert, K., & Liberman, M. (2002). Corpora for topic detection and tracking Topic detection and tracking (pp. 33-66):
Springer.
Davenport, T. H., & Dyché, J. (2013). Big Data in Big Companies.
Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
Devitt, A., & Ahmad, K. (2007). Sentiment polarity identification in financial news: A
cohesion-based approach. Paper presented at the ACL.
‧
Esuli, A., & Sebastiani, F. (2006). Determining Term Subjectivity and Term Orientation
for Opinion Mining. Paper presented at the EACL.
Feldman, R. (2013). Techniques and applications for sentiment analysis. Commun.
ACM, 56(4), 82-89. doi: 10.1145/2436256.2436274
Feldman, R., Rosenfeld, B., Bar-Haim, R., & Fresko, M. (2011). The stock sonar—
sentiment analysis of stocks based on a hybrid approach. Paper presented at
the Twenty-Third IAAI Conference.Gantz, J., & Reinsel, D. (2012). THE DIGITAL UNIVERSE IN 2020: Big Data, Bigger Digital Shadow s, and Biggest Grow th in
the Far East. IDC: IDC.
Ghemawat, S., Gobioff, H., & Leung, S.-T. (2003). The Google file system. Paper presented at the ACM SIGOPS Operating Systems Review.
Gloor, P. A., Krauss, J., Nann, S., Fischbach, K., & Schoder, D. (2009). Web science 2.0:
Identifying trends through semantic social network analysis. Paper presented
at the Computational Science and Engineering, 2009. CSE'09. International Conference on.Gold, M. K. (2012). Debates in the Digital Humanities: University of Minnesota Press.
Handcock, M. S., Raftery, A. E., & Tantrum, J. M. (2007). Model‐based clustering for social networks. Journal of the Royal Statistical Society: Series A (Statistics in
Society), 170(2), 301-354.
Hatzivassiloglou, V., & McKeown, K. R. (1997). Predicting the semantic orientation of
adjectives. Paper presented at the Proceedings of the 35th Annual Meeting of
the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics.Hu, M., & Liu, B. (2004). Mining opinion features in customer reviews.
Huang, Y.-L. (2013). The Asymmetric Effect of Investor Sentiment and Stock Returns.
IBM. What is big data? Bringing big data to the enterprise. Retrieved 3/15, 2014, from http://www-01.ibm.com/software/au/data/bigdata/
Ikeda, D., Fujiki, T., & Okumura, M. (2006). Automatically Linking News Articles to Blog
Entries. Paper presented at the AAAI Spring Symposium: Computational
Approaches to Analyzing Weblogs.Intel. What Happens In An Internet Minute? Retrieved 3/21, 2014, from
http://www.intel.com/content/www/us/en/communications/internet-minute-infographic.html
Issenberg, S. (2013). How president obama's campaign used big data to rally individual voters. Technology Review, 116(1), 38-49.
Ku, L.-W., Lo, Y.-S., & Chen, H.-H. (2007). Using polarity scores of words for
sentence-level opinion extraction. Paper presented at the Proceedings of NTCIR-6
‧
workshop meeting.
Laney, D. (2001). 3D Data Management: Controlling Data Volume, Velocity, and Variety.
Laney, D. (2012). The Importance of 'Big Data': A Definition: Gartner.
Lin, F.-r., & Liang, C.-H. (2008). Storyline-based summarization for news topic retrospection. Decision Support Systems, 45(3), 473-490.
Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human
Language Technologies, 5(1), 1-167.
Liu, B., Mobasher, B., & Nasraoui, O. (2011). Web Usage Mining Web Data Mining (pp.
527-603): Springer Berlin Heidelberg.
Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10‐Ks. The Journal of Finance, 66(1), 35-65.
Magnusson, J. (2012). Social Network Analysis Utilizing Big Data Technology.
Melnik, S., Gubarev, A., Long, J. J., Romer, G., Shivakumar, S., Tolton, M., & Vassilakis, T. (2010). Dremel: interactive analysis of web-scale datasets. Proceedings of the
VLDB Endowment, 3(1-2), 330-339.
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to wordnet: An on-line lexical database*. International journal of lexicography,
3(4), 235-244.
Mishne, G. (2006). Multiple ranking strategies for opinion retrieval in blogs. Paper presented at the Online Proceedings of TREC.
Mohammad, S. M., & Turney, P. D. (2010). Emotions evoked by common words and
phrases: Using Mechanical Turk to create an emotion lexicon. Paper presented
at the Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text.NIST. (2004). 2004 Topic Detection and Tracking (TDT-2004) Evaluation. Retrieved 12/25, 2013, from http://www.itl.nist.gov/iad/mig/tests/tdt/2004/
Normandeau, K. (2013). Beyond Volume, Variety and Velocity is the Issue of Big Data Veracity. Retrieved 3/21, 2014, from
http://inside- bigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/
Papka, R. (1999). On-line new event detection, clustering, and tracking. University of Massachusetts Amherst.
Popescu, A. R. (2001). Implementation of term weighting in a simple IR system.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval.
Information processing & management, 24(5), 513-523.
Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval.
Salton, G., Wong, A., & Yang, C.-S. (1975). A vector space model for automatic indexing.
Communications of the ACM, 18(11), 613-620.
‧
Scherer, M. (2012). Inside the secret world of the data crunchers who helped Obama win. swampland. time.
com/2012/11/07/inside-thesecret-world-of-quants-and-data-crunchers-who-helped-obama-win.
Stone, P., Dunphy, D. C., Smith, M. S., & Ogilvie, D. (1968). The general inquirer: A computer approach to content analysis. Journal of Regional Science, 8(1), 113-116.
Turney, P. D. (2002). Thumbs up or thumbs down?: semantic orientation applied to
unsupervised classification of reviews. Paper presented at the Proceedings of
the 40th annual meeting on association for computational linguistics.Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information
Systems (TOIS), 21(4), 315-346.
Uramoto, N., & Takeda, K. (1998). A method for relating multiple newspaper articles
by using graphs, and its application to webcasting. Paper presented at the
Proceedings of the 17th international conference on Computational linguistics-Volume 2.Vigna, P. (2013). Stocks Plunge, Quickly Recover, on Fake Tweet. Retrieved from The
Wall Street Journal website:
http://blogs.wsj.com/moneybeat/2013/04/23/stocks-plunge-quickly-recover-on-fake-tweet/
Vu, D. Q., Hunter, D. R., & Schweinberger, M. (2013). Model-based clustering of large networks. The Annals of Applied Statistics, 7(2), 1010-1039.
Wilson, T., Wiebe, J., & Hoffmann, P. (2005). Recognizing contextual polarity in
phrase-level sentiment analysis. Paper presented at the Proceedings of the conference
on human language technology and empirical methods in natural language processing.Wu, H.-H., CHARNG-RURNG TSAI, A., TZONG-HAN TSAIi, R., & YUNG-JEN HSU, J. (2013).
Building a Graded Chinese Sentiment Dictionary Based on Commonsense Knowledge for Sentiment Analysis of Song Lyrics. Journal of Information
Science & Engineering, 29(4).
Yang, Y., Ault, T., Pierce, T., & Lattimer, C. W. (2000). Improving text categorization
methods for event tracking. Paper presented at the Proceedings of the 23rd
annual international ACM SIGIR conference on Research and development in information retrieval.Yang, Y., Pierce, T., Archibald, B. T., Carbonell, J. G., Brown, R. D., & Liu, X. (1999).
Learning approaches for detecting and tracking news events. IEEE Intelligent
Systems, 14(4), 32-43.
Zhang, W., & Skiena, S. (2010). Trading Strategies to Exploit Blog and News Sentiment.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
71
Paper presented at the ICWSM.
古倫維. (2000). 中英文新聞文件主題偵測方法之研究. 國立臺灣大學.
李啟菁. (2010). 中文部落格文章之意見分析. (碩士), 國立台北科技大學.
胡家瑜. (2009). 追蹤進行中新聞議題產生事件主軸摘要. 清華大學. Available from Airiti AiritiLibrary database. (2009 年)
孫瑛澤, 陳建良, 劉峻杰, 劉昭麟, & 蘇豐文. (2010). 中文短句之情緒分類.
婁鑫坡, 柴., 昝紅英,韓英傑. (2012). 微博情感倾向性分析.
許凱玲. (2011). Twitter「情緒指數」成預測股市走勢利器. Retrieved from 數位時 代 website: http://www.bnext.com.tw/focus/view/cid/103/id/20060
郭敏華. (2009). 如何測量投資人情緒?.
戴尚學. (2003). 運用事件偵測與追蹤技術於中文多文件摘要之研究.