Conclusion and Future Works - 運用短文件分類技術改良微網誌政府服務之研究

An increasing number of government services are exploiting Web2.0 applications to obtain instant feedback from the public. Generally, micro-blogs (e.g., Twitter) can provide seamless communications between governments and citizens. However, the problems of information overload and text sparseness make micro-blog management a difficult task. To solve the problems, we have proposed a classification framework that incorporates an external knowledge base and the temporal information of tweets into the Naive Bayes classification model. We further employ two smoothing techniques to leverage the temporal distribution, consider user content to enrich the content of the training tweets, and incorporate the WordNet synonyms into our classification model. Experiments based on the 311NYC dataset demonstrate that the proposed framework classifies tweets correctly and achieves a significant improvement over the Naive Bayes model.

This paper focuses on the publication time of tweets. In a future work, we will investigate different metadata of tweets to enhance the proposed framework. For instance, the social network of Twitter users could be investigated to identify opinion leaders for a certain service class. Analysing the tweets published by opinion leaders would help governments identify the subjects that citizens regard as the most important. We believe that by incorporating text mining techniques into Government2.0, government services will become more transformative, available, and interactive.

References

Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., and Spyropoulos, D. (2000). Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach.

In Proceedings of the 4th PKDD’s Workshop on Machine Learning and Textual Information Access. pp.1-13.

Chen, M., Jin, X. and Shen, D. (2011). Short Text Classification Improved by Learning Multi-Granularity Topics. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence. Spain, Barcelona.

Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., and Slattery, S.

(2000).Learning to Construct Knowledge Bases from the World Wide Web. Artificial Intelligence.118(1-2):69-114.

Gabrilovich, E. and Markovitch, S. (2007). Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In Proceedings of the International Joint Conference on Artificial Intelligence, pp.1606-1611. India, Hyderabad.

Ghaharhani, Saeed. (2004) Fundamentals of Probability. 3rd Edition. Pearson - Prentice Hall.

Hao, L. and Hao, L. (2008). Temporal Data Driven Naive Bayes Text Classifier. The 9th International Conference for Young Computer Scientist. pp.699-702. China, Hunan.

Hu, X., Zhang X., Lu C., Park E. K., and Zhou X. (2009). Exploiting Wikipedia as External Knowledge for Document Clustering. In Proceedings of KDD, pp.389–396. France, Paris.

Java, A., Song, X., Finn, T., and Tseng, B. (2007).Why we Twitter: Understanding Microblogging Usage and Communities. In Proceedings of the Joint 9th WEBKDD and 1st SNA-KDD Workshop. pp.56-65. USA, California, San Jose.

Keller, G. (2005). Statistics for Management and Economics. 7^th International Edition, International Thomson.

Kelman, H. C. (1958).Compliance, Identification, and Internalization: Three Processes of Attitude Change, Journal of Conflict Resolution, Vol. 2. 51-60.

Liu, Y., Scheuermann, P., Li, X., and Zhu, X., (2007). Using WordNet to Disambiguate Word Senses for Text Classification ICCS 2007, Part III, LNCS 4489, pp. 780–788.

Manning, C. D., Raghavan, P. and Schütze, H. (2008). Introduction to Information Retrieval, Cambridge University Press.

Nam, T. (2011). New Ends, New Means, but Old Attitudes: Citizens’ Views on Open Government and Government 2.0. In Proceedings of the 44th Hawaii International Conference on System Sciences. pp.1-10.

Phan, X., Nguyen, L. and Horiguchi, S. (2008). Learning to Classify Short and Sparse Test &

Web with Hidden Topics from large-scale Data collection. In Proceedings of the 17th International World Wide Web Conference. pp.91-100. China, Beijing.

Revesz, P. and Triplet, T. (2011). Temporal Data Classification using Linear Classifiers.

Information Systems Journal Volume: 36, Issue: 1, Publisher: Elsevier, pp.30-41

Rocchio Jr, J.J. (1971). Relevance feedback in Information Retrieval. In G. Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall.

Rodríguez, Manuel de Buenaga; Gômez-Hidalgo, José María; and Díaz-Agudo, Belén. (1997).

Using WordNet to complement training information in text categorization. Recent Advances in Natural Language Processing-97. 150-157.

Sahami, M., and Heilman, T. (2006). A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets. In Proceedings of the 15th International World Wide Web Conference. pp.377-386. Scotland, Edinburgh.

Salon, G. and Mcgill, M. 1. (1983). Introduction to Modern Information Retrieval.

McGraw-Hill Book Co., NewYork.

Sankaranarayanan, J., Samet, H., Teitler, B. E., Lieberman, M. D., and Sperling, J. (2009).

TwitterStand: News in Tweets. Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp.42–51, USA, Washington, Seatle.

Schuler, Doug (2001). Cultivating Society's Civic Intelligence. Journal of Society, Information and Communication, Vol. 4 No. 2.

Scott, Sam and Matwin, Stan. (1998). Text classification using WordNet hypernyms. Usage of WordNet in Natural Language Processing Systems: Proceedings of the Workshop.

45-51.

Shen, D., Pan, R., Sun, J. T., Pan, J. J., Wu, K., Yin, J., and Yang, Q. (2006). Query Enrichment for Web-query Classification. Journal of ACM Transaction on Information Systems. 24:320–352.

Voorhees, E. (1993). Using WordNet to Disambiguate Word Senses for Text Retrieval. In Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval., pp171-180. Pittsburgh, USA

Widrow, B. and Sterns, S. (1985) . Adaptative Signal Processing. Prentice-Hall.

Wigand, F. D. L. (2010). Twitter Takes Wing in Government: Diffusion, Roles, and Management . In Proceedings. Digital Government Society of North America, pp.66–71.

Wigand, F. D. L. (2010). Twitter in Government: Building Relationship One Tweet at a Time.

In Proceedings of the 7th International Conference on Information Technology.

pp.563–567. USA, Nevada, Las Vegas.

Yih, W. and Meek, C. (2007). Improving Similarity Measures for Short Segments of Text. In Proceedings of Association for the Advancement of Artificial Intelligence. pp.1489–1494.

Canada, Vancouver.

Zelikovitz, S. and Hirsh, H. (2000).Improving Short-Text Classification Using Unlabeled Background Knowledge to Assess Document Similarity. In Proceedings of the 17th International Conference on Machine Learning, pp.1183–1190 USA, Stanford University.

Zelikovitz, S. and Hirsh, H. (2003). Integrating Background Knowledge Into Text Classifiers.

In Proceedings of the International Joint Conference of Artificial Intelligence.

Mexico,Acapulco.

在文檔中運用短文件分類技術改良微網誌政府服務之研究 (頁 38-42)