CHAPTER 8 Conclusions
8.2 Future work
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
73
may depend on many factors and would have more than one possible situation. As a consequence, the relationships between annual earnings and MWEs could go beyond the literal meanings of the MWEs. We can find that the negative MWE “may be uncollectible” is spuriously correlated with positive SUE because of missing latent variables “Earning Management”. However, the “Earning management” is an unobservable activity that we cannot easily find a variable that precisely captures whether the company has “Earning management” activity [28]. Hence, automatic semantic analysis of the textual contents is the tough problem that we face past, present and future.
8.2 Future work
Some constraints arise while attempting to solve non-linear problems with linear machine learning methods. First of all, the dependency between features may cause the estimation of parameters in CRF models biased. Moreover, our linear CRF models cannot capture the whole syntactic structure information that is expressed in parse tree structures. Finally, the linear chain CRF blocks the influences of the long distance dependency phenomenon. In the future, the principle component analysis (PCA) would be taken to tackle the feature dependency between syntactic features, and the arbitrary structure of CRF may relieve the long dependency problem.
We think our CRF models could be used in the Chinese corpus while the linguistic features might be totally different with our linguistic features. In English, verbs convey the type of event. However, there are no literatures to provide the evidence that the Chinese verbs can convey the type of event, so the contribution of the predicate-argument structure is
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
74
questionable to be the good linguistic feature to identify the opinion labels. The morphological features, the orthographical features, and the semantic features in Chinese are totally different in English because of different cultures and language origins. Hence, we shell find some different Chinese linguistic features that are better to capture the language nature of Chinese to tackle the Chinese opinion labeling problem.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
75
References
[1] W. Antweiler and M. Z. Frank, “Is all that Talk just Noise? The Information Content of Internet Stock Message Boards,” Journal of Finance, 59(3), pp. 1259-1294, 2004.
[2] Apache Lucene 3.0.0, http://lucene.apache.org/java/docs/index.html.
[3] Automatic Statistical SEmantic Role Tagger-v0.14b (ASSERT), http://cemantix.org/assert.html.
[4] Charniak Parser, http://www.cs.brown.edu/~ec/#software.
[5] Y. Choi, C. Cardie, E. Riloff and S. Patwardhan, “Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns,” Proceedings of the Conference on
Human Language Technology and Empirical Methods in Natural Language Processing,
pp. 355-362, 2005
[6] M. J. Collins, Head-Driven Statistical Models for Natural Language Parsing, Ph.D.
thesis, University of Pennsylvania, 1999.
[7] Electronic Data Gathering, Analysis and Retrieval system (EDGAR), http://www.sec.gov/edgar.shtml.
[8] FrameNet, http://framenet.icsi.berkeley.edu.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
76
[9] D. Gildea and D. Jurafsky, “Automatic Labeling of Semantic Role,” Computational
Linguistics, 28(3), pp. 245-288, 2002.
[10] W. H. Greene, Econometric Analysis, Pearson Prentice Hall, 2008.
[11] Illinois Chunker, http://cogcomp.cs.illinois.edu/page/software.
[12] S.-M. Kim and E. Hovy, “Identifying Opinion Holders for Question Answering in Opinion Texts,” Proceedings of AAAI Workshop on Question Answering in Restricted
Domains, pp. 20-26, 2005.
[13] J. D. Lafferty, A. McCallum and F. C. N. Pereira, “Conditional Random Fields:
Probabilistic Models for Segmenting and Labeling Sequence Data,” Proceedings of the
International Conference on Machine Learning, pp. 282-289, 2001.
[14] F. Li, “Do Stock Market Investors Understand The Risk Sentiment Of Corporate Annual Reports?” University of Michigan Working Paper, 2006.
[15] D. Lin, “Automatic Retrieval and Clustering of Similar Words.” Proceedings of the
International Conference on Computational Linguistics (COLING)), pp. 768-774, 1998.
[16] LingPipe 3.9 sentence model, http://alias-i.com/lingpipe.
[17] B. Liu, “Sentiment Analysis and Subjectivity,” Handbook of Natural Language
Processing, N. Indurkhya and F. J. Damerau (editors), CRC press , Second Edition, 2010.
[18] T. Loughran and B. McDonald, “When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks,” Journal of Finance, 66(1), pp. 67-97, 2011.
[19] MAchine Learning for LanguagE Toolkit-2.0.6 (MALLET), http://mallet.cs.umass.edu.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
77
[20] C. D. Manning, P. Raghavan and H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2009.
[21] Multi-Perspective Question Answering 2.0 (MPQA), http://www.cs.pitt.edu/mpqa.
[22] B. Pang, L. Lee and S. Vaithyanathan, “Thumbs up? Sentiment Classification Using Machine Learning Techniques,” Proceedings of the Conference on Empirical Methods in
Natural Language Processing, pp. 79-86, 2002.
[23] F. Peng, F. Feng and A. McCallum, “Chinese Segmentation and New Word Detection using Conditional Random Fields,” Proceedings of the conference on Computational
Linguistics, 2004.
[24] R.W. Picard, E. Vyzas and J. Healey, “Toward Machine Emotional Intelligence: Analysis of Affective Physiological State,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, 23(10), pp. 1175-1191, 2001.
[25] S. Pradhan, W. Ward, K. Hacioglu, J. Martin and D. Jurafsky, “Shallow Semantic Parsing Using Support Vector Machines,” Proceedings of the Human Language
Technology Conference/North American Chapter of the ACL, 2004.
[26] L. A. Ramshaw and M. P. Marcus, “Text Chunking Using Transformation-based Learning,” Proceedings of the ACL Workshop on Very Large Corpora, pp 82–94, 1995.
[27] E. Riloff and J. Wiebe, “Learning Extraction Patterns for Subjective Expressions,”
Proceedings of the Conference on Empirical Methods in Natural Language Processing,
pp. 25-32, 2003.
[28] J. Ronen and V. Yaari, Earnings Management: Emerging Insights in Theory, Practice,
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
78
and Research, Springer-Verlag, 2008.
[29] Standard & Poor’s Compustat Research Insight 8.4.1, http://www.compustat.com.
[30] Stanford Dependencies manual,
http://nlp.stanford.edu/software/dependencies_manual.pdf.
[31] Stanford NLP Toolkits, http://nlp.stanford.edu/software.
[32] Stata dataset of Compustat Quarterly Match to SEC Filings, http://faculty.chicagobooth.edu/amir.sufi/data.htm.
[33] Stata/MP 11.2, http://www.stata.com.
[34] P. C. Tetlock, “Giving Content to Investor Sentiment: The Role of Media in the Stock Market,” Journal of Finance, 62(3), pp.1139-1168, 2007.
[35] P. C. Tetlock, M. Saar-Tsechansky and S. Macskassy, “More than Words: Quantifying Language to Measure Firms' Fundamentals,” Journal of Finance, 63(3), pp. 1437-1467, 2008.
[36] J. Wiebe, R. Bruce and T. O’Hara, “Development and Use of a Gold Standard Data Set for Subjectivity Classifications,” Proceedings of the Annual Meeting of the ACL, pp.
246-253, 1999.
[37] T. Wilson, J. Wiebe and P. Hoffmann, “Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis,” Proceedings of the Conference on Human Language Technology
and Empirical Methods in Natural Language Processing, pp. 347-354, 2005.
‧
expending opinion holder and subjective expression pairs in the same sentence.Frequent opinion pattern combinations in sentences
Opinion patterns combination Total freq. Doc freq. Opinion patterns combination Total freq. Doc freq.
company AND "will be able" 110758 693 company AND "could materially adversely" 44644 73 company AND "successfully" 95549 738 revenue AND "could adversely" 52679 61 company AND "adversely affected" 115858 518 each AND "will be able" 53460 60
we AND "successfully" 68912 542 they AND "adversely affected" 63667 48
company AND "could adversely" 94135 390 customer AND "may not be able" 59123 51 we AND "will be able" 64307 555 revenue AND "reasonably assured" 21596 139 we AND "may not be able" 76491 451 management AND "will be successful" 47701 62 we AND "may not be able to" 73531 428 revenue AND "will be able" 47945 61 we AND "adversely affected" 83124 378 management AND "may not be able" 66012 44 company AND "better" 66216 449 management AND "may not be able to" 65952 43 company AND "could adversely affect" 89387 322 condition AND "not successful" 72502 39 company AND "going concern" 74693 314 customer AND "may not be able to" 58947 47 company AND "will be successful" 66896 333 party AND "will be able" 56351 49 condition AND "adversely affected" 65934 306 revenue AND "may not be able" 49295 56 company AND "as a going concern" 66038 288 revenue AND "could adversely affect" 51737 53 condition AND "could adversely" 73747 222 we AND "may not be successful" 24896 110 management AND "successfully" 77041 209 company AND "may be unable to" 32765 83 condition AND "successfully" 71416 214 company AND "will not be able to" 38912 69 company AND "may not be able" 80483 172 systems AND "could adversely" 55208 48 condition AND "could adversely affect" 71790 192 each AND "better" 44108 60 company AND "could be adversely" 73545 182 agreement AND "adversely affected" 73006 36 company AND "may not be able to" 79885 156 each AND "adversely affected" 70318 35 we AND "will be successful" 40584 304 systems AND "adversely affected" 39877 60
we AND "could adversely" 52831 229 consumer AND "better" 41142 58
we AND "may be unable to" 52028 232 investment AND "successfully" 56465 42
management AND "will be able" 67883 152 i AND "successfully" 30737 77
customer AND "successfully" 61236 165 party AND "may not be able" 60251 39
‧
we AND "could adversely affect" 50632 197 party AND "may not be able to" 60251 39 systems AND "successfully" 54330 173 they AND "may not be able to" 58165 40 management AND "could adversely" 76650 119 they AND "may not be able" 58165 40 management AND "better" 51467 171 revenue AND "may not be able to" 49014 47
we AND "better" 30432 268 systems AND "could adversely affect" 54305 42
management AND "could adversely affect" 75619 105
customer AND "could be adversely" 53239 42
company AND "not successful" 75814 101 you AND "successfully" 25175 87
customer AND "better" 50077 149
systems AND "may not be able" 47602 46 we AND "could be adversely" 47413 144 management AND "could be adversely" 57103 38 company AND "could adversely affect the" 27152 235
investment AND "adversely affected" 63296 34
each AND "successfully" 62692 97
condition AND "will be successful" 51592 41 customer AND "adversely affected" 50663 120
systems AND "may not be able to" 47564 44 we AND "not successful" 35381 167 condition AND "could materially adversely" 23797 87 company AND "could be adversely affected" 37927 154
customer AND "will be successful" 47745 43 company AND "adequately" 21619 267 management AND "not successful" 72635 28 management AND "adversely affected" 64469 89
condition AND "could adversely affect our" 23043 88 revenue AND "adversely affected" 63717 87
management AND "adequately" 14856 130
revenue AND "successfully" 56594 97
company AND "despite the" 13779 140
condition AND "will be able" 64948 83 each AND "may not be able" 56375 34
party AND "successfully" 65285 78
liability AND "adversely affected" 34524 55 you AND "will be able" 32928 147 agreement AND "may not be able" 64713 28 agreement AND "will be able" 57814 79
each AND "may not be able to" 56361 32 company AND "may be able" 47987 94 company AND "may not be successful" 41769 43 we AND "could be adversely affected" 34566 130
revenue AND "better" 43630 41
party AND "could adversely" 68832 64 the company AND "successfully" 2568 685 we AND "could adversely affect our" 36058 122
we AND "may be able" 13844 125
condition AND "could be adversely" 49172 88 the company AND "will be able" 2644 654
they AND "will be able" 49506 87
company AND "could negatively" 45171 38
systems AND "better" 48689 84 revenue AND "could be adversely" 60216 28
customer AND "could adversely" 65023 62
agreement AND "may not be able to" 64677 26 condition AND "may not be able" 66325 60 they AND "could adversely" 55384 30
we AND "adequately" 18973 202
condition AND "could negatively" 48438 34
agreement AND "successfully" 60942 62 they AND "may be able" 36935 44
they AND "successfully" 58647 64
agreement AND "better" 47681 34
party AND "adversely affected" 61424 61 management AND "going concern" 40515 40 party AND "could adversely affect" 67817 55
agreement AND "going concern" 45856 34
customer AND "will be able" 55974 66 executive AND "successfully" 39665 39
‧
customer AND "could adversely affect" 64351 56 we AND "will not be able to" 15576 99 condition AND "may not be able to" 66240 53
each AND "could adversely" 56972 27
systems AND "will be able" 50292 69
we AND "going concern" 23560 65
customer AND "may be able" 45983 73
they AND "better" 41111 37
The contents in the following table are all 112 explanatory variables (MWEs) in small dataset (referring section 7.2 for detail descriptions).
All 112 explanatory variables (MWEs) in small dataset
v1 adequately v31 could seriously v61 may make it difficult v91 ultimately an adverse determination v2 are not successful v32 could seriously harm our business v62 may never achieve v92 unable or
v3 as a going concern v33 could significantly and adversely v63 may not achieve v93 unable to timely
v4 better v34 cumbersome v64 may not be able v94 uncompetitive or obsolete
v5 can be achieved v35 despite the v65 may not be able to v95 unduly
v6 can be faulty v36 do not violate v66
may not be able to identify
and v96 very difficult
v7
can be no assurance that company
will be able v37
doubt about ability to continue as a
going concern v67 may not be successful v97
vigorously contested and management does not v8 can efficiently v38 fully support its contention v68 might adversely v98 vigorously defending itself
v9 can successfully v39 going concern v69 might be impaired v99 well tolerated
v10 cautioned not to place undue v40 highly volatile v70
must also gain industry
acceptance v100 will always be able to v11 collectibility is reasonably assured v41 if a dispute v71 must successfully v101 will be able to do so v12 complaint is without merit v42 if an event of default v72 no longer infringes v102 will be successful
v13 concern about the v43 if impaired v73 not adversely affect us v103 will not able to
v14 could adversely v44 if successful v74 not be impaired v104 will not adversely
v15 could adversely affect v45 if the company unable v75 not materially adversely v105 will not be able to v16 could adversely affect our v46 if we do not successfully v76 not successful v106 will not be challenged
v17 could adversely affect our business v47 improperly v77
not ultimately prevail in this
dispute v107 will not successfully v18 could adversely affect the v48 increasingly subject to infringement v78 outstanding immediately v108 will successfully
v19 could also adversely v49 is impaired v79 reasonably assured v109 would likely decline
v20 could be adversely v50 is not effective v80 satisfactorily v110 would not adversely
v21 could be adversely affected v51
materially adversely affect our
business v81 seriously harmed v111 would not be able to
v22 could be impaired v52 may adversely impact our business v82 severely burned patients v112
would not be able to realize all or part of its
‧
v24 could be negatively v54 may be adversely v84 shall not be unreasonably
v25 could declare an event of default v55 may be adversely affected v85
shall not be unreasonably withheld v26 could disrupt our operations v56 may be impaired v86 shall not unreasonably v27 could materially adversely v57 may be inadequate v87 significantly harmed v28
could materially adversely affect
the v58 may be reluctant to purchase services v88 subject to volatility v29 could materially and adversely v59 may be successful v89 successfully defend a
v30 could negatively v60 may be unable to v90 thereby adversely
The contents in the following table are all 174 explanatory variables (MWEs) in large dataset (referring section 7.3 for detail descriptions).
All 174 explanatory variables (MWEs) in large dataset
v1 adequately
liability v121 satisfactorily v151 well as exclusivity
v2 adversely affected
v32 could be inaccurate v62 if the company
unable v92 may make it
difficult v122 seriously harmed v152 well tolerated v3 are not successful
v33 could be materially
adversely affected v63 if we do not
successfully v93 may never
achieve v123 severely burned patients v153 will actually be achieved v4 as a going concern
v34 could be negatively v64 if we fail to have v94 may not achieve v124 shall be effective v154 will always be able to
v5 because of simple
error or mistake v35 could declare an event
of default v65 if you are
successful v95 may not be able v125 shall be effective and v155 will be able v6 better
v36 could disrupt our
operations v66 improperly v96 may not be able
to v126 shall be effective only v156 will be able to do so
v7 can be achieved
v37 could materially
adversely v67 improperly
recognized v97 may not be able
to identify and v127 shall not be
unreasonably v157 will be successful
v8 can be faulty
v38 could materially adversely affect the v68
increasingly subject to infringement
v98 may not be able
to maintain an v128 shall not be
unreasonably withheld v158 will likely suffer
v9
can be no assurance that company will
be able v39 could materially and
adversely v69
intends to defend the action vigorously
v99 may not be
achieved v129 shall not unreasonably v159 will not able to
v10 can efficiently
successful v130 should be evaluated for
impairment v160 will not adversely
v11 can greatly
v41 could negatively v71
intends to defend this matter vigorously
v101 may not be
unreasonably v131 should be revoked v161 will not be able to
v12 can not assure you
‧
penalties v73 is not effective v103 might adversely v133 should remain stable v163 will not be impaired
v14 can successfully
compete v44 could seriously v74 longer outstanding v104 might be impaired v134
should the company be unable to continue as a going concern
v164 will not materially impair its ability
v15 cautioned not to
place undue v45 could seriously harm
our business v75
marginally profitable or unprofitable
v105 much more
effective v135 significantly harmed v165 will not successfully
v16 collectibility is
reasonably assured v46 could significantly adversely v76 materially
v136 subject to volatility v166 will successfully
v17 complaint is
without merit v47 could significantly and
adversely v77
may adversely impact our business
v107 must successfully v137 successful or profitable v167 would be able to recover
v18 concern about the
v48 cumbersome v78
may also be terminated for cause
v108 no longer
infringes v138 successfully v168 would be honored
v19 could adversely
affect us v139 successfully defend a v169 would be impractical
v20 could adversely
affect v50 do not violate v80 may be able to
develop products v110 not be challenged
or invalidated v140 thereby adversely v170 would likely decline
v21 could adversely
affect its operating v51 doubt about ability to continue as a going concern
v81 may be adversely v111 not be impaired v141 ultimately an adverse
determination v171 would not adversely
v22 could adversely
affect our v52 even worse v82 may be adversely
affected v112 not materially
adversely v142 unable or v172 would not be able to
v23 could adversely
affect our business v53 fully support its contention v83 may be deprived
of any value v113 not materially
impair its ability v143 unable to timely v173
would not be able to realize all or part of its
v24 could adversely
affect the v54 going concern v84 may be impaired v114 not successful v144 uncompetitive or
obsolete v174
would not materially adversely v25 could also
adversely v55 have not yet responded
to the complaint v85 may be inadequate v115 not successfully
assert v145 unduly
v27 could be adversely
v57 if a dispute v87
may be reluctant to purchase services
v117 now better v147 very difficult
v28 could be adversely
affected v58 if an event of default v88 may be successful v118 outstanding
immediately v148
volatile v149 vigorously defending itself
v30 could be damaged
or disrupted v60 if successful v90 may be
uncollectible v120 reasonably
assured v150 we must satisfy