國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
54
第五章 結論
本研究設計新的多重插補法來填補消費者評論資料,有別於一般多重插補將 全部資料一併插補,本研究考慮到消費者評論時只會閱讀發表時間較早的評論,
計算過程中只考慮發表時間較早的資料;同時當有新的消費者評論資料時也不用 將全部資料重新計算,只需計算新的資料即可。
此方法在估計參數之表現比其他插補方法都要更優秀,能夠相當準確的找出 消費者所提到的特徵對產品總分的影響高低。但這些遺漏值填補後是否能接近消 費者真實意見仍然有待討論,在第三章的模擬中,資料前百分之五未遺漏時,填 補的值確實能在一定程度上還原原始的分配,可是前百分之五的資料也有遺漏且 資料總遺漏比例相當高時,就幾乎無法做到。
此外本研究雖透過模擬比較不同方法估計結果之優劣,但實際上並無法得知 消費者撰寫評論的模式,因此無法確認本研究之方法是否能有效填補所有的消費 者評論資料。且隨著越來越多人使用線上產品評論,消費者撰寫評論之模式勢必 也越來越多樣,不會僅有本研究中假設的兩種人,未來研究者也許能夠設計出前 提限制較少的方法以因應各種資料。
本研究另一個缺點在於每一筆資料都要計算與較早資料的相似度,計算效率 遠遠不如將全部資料同時計算,若要修改計算過程,每一次重新計算都要花費許 多時間,未來研究或許可以設計出更優良的運算過程,使廠商能夠更有效率的分 析大量的消費者評論。
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
55
參考文獻
Aggarwal, C. C. (2016). Recommender Systems, New York: Springer.
Androdge, R. R. and Little, R. J. A. (2010). A Review of Hot Deck Imputation for Survey Non-response, International Statistical Review, 78(1), 40-64.
Atkinson, A. C. and T.-C. Cheng (2000). On Robust Linear Regression with Incomplete Data, Computational Statistics and Data Analysis, 33, 361-380.
Azur, M. J., E. A. Stuart, C. Frangakis, and P. J. Leaf, (2011).Multiple Imputation by Chained Equations: What is it and how does it work? International Journal of Methods in Psychiatric Research, 20(1), 40–49.
Dror, G., Koenigstein, N., Koren, Y., & Weimer, M. (2011). The yahoo! music dataset and kdd-cup'11. In Proceedings of the 2011 International Conference on KDD Cup 2011-Volume 18, 3-18.
Duric, A. and F. Song (2011). Feature selection for sentiment analysis based on content and syntax models, Decision Support Systems, 53, 704–711.
Heckerman, D., D. M. Chickering, C. Meek, R. Rounthwaite, and C. Kadie (2001).
Dependency Networks for Inference, Collaborative Filtering, and Data Visualisation, Journal of Machine Learning Research, 1, 49–75.
Hennig-Thurau, T., K. P. Gwinner, G. Walsh, and D. D. Gremler (2004). Electronic Word-of-Mouth via Consumer-Opinion Platforms: What Motivates Consumers to Articulate Themselves on the Internet? Journal of Interactive Marketing, 18(1), 38–52.
Horrigan, J. A. (2008). Online shopping. Pew Internet and American Life Project Report, 36.
Hu, Y., Zhang, D., Ye, J., Li, X., & He, X. (2013). Fast and accurate matrix
completion via truncated nuclear norm regularization. IEEE Transactions on
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
56
Pattern Analysis and Machine Intelligence, 35(9), 2117-2130.
Lin, P.-Y. (2013). Latent Opinion Extraction: Identify Critical Product Features in Multiple Generations. Unpublished master’s thesis. National Chengchi University MBA Program. Taipei, Taiwan. Available at
http://thesis.lib.nccu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dallcdr&s=id=%22G0100355026%22.&searchmode=ba sic
Lipsitz, S. R., M. Parzen, and L.-P. Zhao (2002). A Degrees-of-Freedom Approximation in Multiple Imputation, Journal of Statistical Computation and Simulation, 72(4), 309-318.
Little, R. J. A. (1979). Maximum likelihood inference for multiple regression with missing values: a simulation study, Journal of the Royal Statistical Society Series B. Statistical Methodology, 44, 226-233.
Little, R.J.A., D. B. Rubin (2002). Statistical analysis with missing data, 2nd edition, New Jersey: Wiley.
Pradel, B., N. Usunier, and P. Gallinari (2012). Ranking With Non-Random Missing Ratings: Influence of Popularity and Positivity on Evaluation Metrics.
Raghunathan, T. E., P. W. Solenberger, and J. Van-Hoewyk (2002), IVEware:
Imputation and Variance Estimation Software, available at
http://www.isr.umich.edu/src/smp/ive/
Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys, New York : John Wiley & Sons.
Rubin, D.B. (1996). Multiple Imputation after 18+ Years, Journal of the American Statistical Association, 91(434), 473–489.
Shih, W.J., Weisberg, S., 1986. Assessing in uence in multiple linear regression with incomplete data, Technometrics 28, 231–239.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
57
Sridhar, S. and R. Srinivasan (2012). Social influence effects in online product ratings, Journal of Marketing, 76(5), 70-88.
Steck, H. (2010). Training and testing of recommender systems on data missing not at random, Proc. 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’10), 713-722.
Steck, H. (2011). Item popularity and recommendation accuracy. In Proceedings of the fifth ACM conference on Recommender systems, 125-132.
Van Buuren, S. and K. Groothuis-Oudshoorn, (2011). mice: Multivariate Imputation by Chained Equations in R, Journal of Statistical Software, 45(3), 1-67. Also available at http://www.jstatsoft.org/v45/i03/