• 沒有找到結果。

CHAPTER 8 Conclusions

8.2 Future work

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

73

may depend on many factors and would have more than one possible situation. As a consequence, the relationships between annual earnings and MWEs could go beyond the literal meanings of the MWEs. We can find that the negative MWE “may be uncollectible” is spuriously correlated with positive SUE because of missing latent variables “Earning Management”. However, the “Earning management” is an unobservable activity that we cannot easily find a variable that precisely captures whether the company has “Earning management” activity [28]. Hence, automatic semantic analysis of the textual contents is the tough problem that we face past, present and future.

8.2 Future work

Some constraints arise while attempting to solve non-linear problems with linear machine learning methods. First of all, the dependency between features may cause the estimation of parameters in CRF models biased. Moreover, our linear CRF models cannot capture the whole syntactic structure information that is expressed in parse tree structures. Finally, the linear chain CRF blocks the influences of the long distance dependency phenomenon. In the future, the principle component analysis (PCA) would be taken to tackle the feature dependency between syntactic features, and the arbitrary structure of CRF may relieve the long dependency problem.

We think our CRF models could be used in the Chinese corpus while the linguistic features might be totally different with our linguistic features. In English, verbs convey the type of event. However, there are no literatures to provide the evidence that the Chinese verbs can convey the type of event, so the contribution of the predicate-argument structure is

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

74

questionable to be the good linguistic feature to identify the opinion labels. The morphological features, the orthographical features, and the semantic features in Chinese are totally different in English because of different cultures and language origins. Hence, we shell find some different Chinese linguistic features that are better to capture the language nature of Chinese to tackle the Chinese opinion labeling problem.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

75

References

[1] W. Antweiler and M. Z. Frank, “Is all that Talk just Noise? The Information Content of Internet Stock Message Boards,” Journal of Finance, 59(3), pp. 1259-1294, 2004.

[2] Apache Lucene 3.0.0, http://lucene.apache.org/java/docs/index.html.

[3] Automatic Statistical SEmantic Role Tagger-v0.14b (ASSERT), http://cemantix.org/assert.html.

[4] Charniak Parser, http://www.cs.brown.edu/~ec/#software.

[5] Y. Choi, C. Cardie, E. Riloff and S. Patwardhan, “Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns,” Proceedings of the Conference on

Human Language Technology and Empirical Methods in Natural Language Processing,

pp. 355-362, 2005

[6] M. J. Collins, Head-Driven Statistical Models for Natural Language Parsing, Ph.D.

thesis, University of Pennsylvania, 1999.

[7] Electronic Data Gathering, Analysis and Retrieval system (EDGAR), http://www.sec.gov/edgar.shtml.

[8] FrameNet, http://framenet.icsi.berkeley.edu.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

76

[9] D. Gildea and D. Jurafsky, “Automatic Labeling of Semantic Role,” Computational

Linguistics, 28(3), pp. 245-288, 2002.

[10] W. H. Greene, Econometric Analysis, Pearson Prentice Hall, 2008.

[11] Illinois Chunker, http://cogcomp.cs.illinois.edu/page/software.

[12] S.-M. Kim and E. Hovy, “Identifying Opinion Holders for Question Answering in Opinion Texts,” Proceedings of AAAI Workshop on Question Answering in Restricted

Domains, pp. 20-26, 2005.

[13] J. D. Lafferty, A. McCallum and F. C. N. Pereira, “Conditional Random Fields:

Probabilistic Models for Segmenting and Labeling Sequence Data,” Proceedings of the

International Conference on Machine Learning, pp. 282-289, 2001.

[14] F. Li, “Do Stock Market Investors Understand The Risk Sentiment Of Corporate Annual Reports?” University of Michigan Working Paper, 2006.

[15] D. Lin, “Automatic Retrieval and Clustering of Similar Words.” Proceedings of the

International Conference on Computational Linguistics (COLING)), pp. 768-774, 1998.

[16] LingPipe 3.9 sentence model, http://alias-i.com/lingpipe.

[17] B. Liu, “Sentiment Analysis and Subjectivity,” Handbook of Natural Language

Processing, N. Indurkhya and F. J. Damerau (editors), CRC press , Second Edition, 2010.

[18] T. Loughran and B. McDonald, “When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks,” Journal of Finance, 66(1), pp. 67-97, 2011.

[19] MAchine Learning for LanguagE Toolkit-2.0.6 (MALLET), http://mallet.cs.umass.edu.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

77

[20] C. D. Manning, P. Raghavan and H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2009.

[21] Multi-Perspective Question Answering 2.0 (MPQA), http://www.cs.pitt.edu/mpqa.

[22] B. Pang, L. Lee and S. Vaithyanathan, “Thumbs up? Sentiment Classification Using Machine Learning Techniques,” Proceedings of the Conference on Empirical Methods in

Natural Language Processing, pp. 79-86, 2002.

[23] F. Peng, F. Feng and A. McCallum, “Chinese Segmentation and New Word Detection using Conditional Random Fields,” Proceedings of the conference on Computational

Linguistics, 2004.

[24] R.W. Picard, E. Vyzas and J. Healey, “Toward Machine Emotional Intelligence: Analysis of Affective Physiological State,” IEEE Transactions on Pattern Analysis and Machine

Intelligence, 23(10), pp. 1175-1191, 2001.

[25] S. Pradhan, W. Ward, K. Hacioglu, J. Martin and D. Jurafsky, “Shallow Semantic Parsing Using Support Vector Machines,” Proceedings of the Human Language

Technology Conference/North American Chapter of the ACL, 2004.

[26] L. A. Ramshaw and M. P. Marcus, “Text Chunking Using Transformation-based Learning,” Proceedings of the ACL Workshop on Very Large Corpora, pp 82–94, 1995.

[27] E. Riloff and J. Wiebe, “Learning Extraction Patterns for Subjective Expressions,”

Proceedings of the Conference on Empirical Methods in Natural Language Processing,

pp. 25-32, 2003.

[28] J. Ronen and V. Yaari, Earnings Management: Emerging Insights in Theory, Practice,

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

78

and Research, Springer-Verlag, 2008.

[29] Standard & Poor’s Compustat Research Insight 8.4.1, http://www.compustat.com.

[30] Stanford Dependencies manual,

http://nlp.stanford.edu/software/dependencies_manual.pdf.

[31] Stanford NLP Toolkits, http://nlp.stanford.edu/software.

[32] Stata dataset of Compustat Quarterly Match to SEC Filings, http://faculty.chicagobooth.edu/amir.sufi/data.htm.

[33] Stata/MP 11.2, http://www.stata.com.

[34] P. C. Tetlock, “Giving Content to Investor Sentiment: The Role of Media in the Stock Market,” Journal of Finance, 62(3), pp.1139-1168, 2007.

[35] P. C. Tetlock, M. Saar-Tsechansky and S. Macskassy, “More than Words: Quantifying Language to Measure Firms' Fundamentals,” Journal of Finance, 63(3), pp. 1437-1467, 2008.

[36] J. Wiebe, R. Bruce and T. O’Hara, “Development and Use of a Gold Standard Data Set for Subjectivity Classifications,” Proceedings of the Annual Meeting of the ACL, pp.

246-253, 1999.

[37] T. Wilson, J. Wiebe and P. Hoffmann, “Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis,” Proceedings of the Conference on Human Language Technology

and Empirical Methods in Natural Language Processing, pp. 347-354, 2005.

expending opinion holder and subjective expression pairs in the same sentence.

Frequent opinion pattern combinations in sentences

Opinion patterns combination Total freq. Doc freq. Opinion patterns combination Total freq. Doc freq.

company AND "will be able" 110758 693 company AND "could materially adversely" 44644 73 company AND "successfully" 95549 738 revenue AND "could adversely" 52679 61 company AND "adversely affected" 115858 518 each AND "will be able" 53460 60

we AND "successfully" 68912 542 they AND "adversely affected" 63667 48

company AND "could adversely" 94135 390 customer AND "may not be able" 59123 51 we AND "will be able" 64307 555 revenue AND "reasonably assured" 21596 139 we AND "may not be able" 76491 451 management AND "will be successful" 47701 62 we AND "may not be able to" 73531 428 revenue AND "will be able" 47945 61 we AND "adversely affected" 83124 378 management AND "may not be able" 66012 44 company AND "better" 66216 449 management AND "may not be able to" 65952 43 company AND "could adversely affect" 89387 322 condition AND "not successful" 72502 39 company AND "going concern" 74693 314 customer AND "may not be able to" 58947 47 company AND "will be successful" 66896 333 party AND "will be able" 56351 49 condition AND "adversely affected" 65934 306 revenue AND "may not be able" 49295 56 company AND "as a going concern" 66038 288 revenue AND "could adversely affect" 51737 53 condition AND "could adversely" 73747 222 we AND "may not be successful" 24896 110 management AND "successfully" 77041 209 company AND "may be unable to" 32765 83 condition AND "successfully" 71416 214 company AND "will not be able to" 38912 69 company AND "may not be able" 80483 172 systems AND "could adversely" 55208 48 condition AND "could adversely affect" 71790 192 each AND "better" 44108 60 company AND "could be adversely" 73545 182 agreement AND "adversely affected" 73006 36 company AND "may not be able to" 79885 156 each AND "adversely affected" 70318 35 we AND "will be successful" 40584 304 systems AND "adversely affected" 39877 60

we AND "could adversely" 52831 229 consumer AND "better" 41142 58

we AND "may be unable to" 52028 232 investment AND "successfully" 56465 42

management AND "will be able" 67883 152 i AND "successfully" 30737 77

customer AND "successfully" 61236 165 party AND "may not be able" 60251 39

we AND "could adversely affect" 50632 197 party AND "may not be able to" 60251 39 systems AND "successfully" 54330 173 they AND "may not be able to" 58165 40 management AND "could adversely" 76650 119 they AND "may not be able" 58165 40 management AND "better" 51467 171 revenue AND "may not be able to" 49014 47

we AND "better" 30432 268 systems AND "could adversely affect" 54305 42

management AND "could adversely affect" 75619 105

customer AND "could be adversely" 53239 42

company AND "not successful" 75814 101 you AND "successfully" 25175 87

customer AND "better" 50077 149

systems AND "may not be able" 47602 46 we AND "could be adversely" 47413 144 management AND "could be adversely" 57103 38 company AND "could adversely affect the" 27152 235

investment AND "adversely affected" 63296 34

each AND "successfully" 62692 97

condition AND "will be successful" 51592 41 customer AND "adversely affected" 50663 120

systems AND "may not be able to" 47564 44 we AND "not successful" 35381 167 condition AND "could materially adversely" 23797 87 company AND "could be adversely affected" 37927 154

customer AND "will be successful" 47745 43 company AND "adequately" 21619 267 management AND "not successful" 72635 28 management AND "adversely affected" 64469 89

condition AND "could adversely affect our" 23043 88 revenue AND "adversely affected" 63717 87

management AND "adequately" 14856 130

revenue AND "successfully" 56594 97

company AND "despite the" 13779 140

condition AND "will be able" 64948 83 each AND "may not be able" 56375 34

party AND "successfully" 65285 78

liability AND "adversely affected" 34524 55 you AND "will be able" 32928 147 agreement AND "may not be able" 64713 28 agreement AND "will be able" 57814 79

each AND "may not be able to" 56361 32 company AND "may be able" 47987 94 company AND "may not be successful" 41769 43 we AND "could be adversely affected" 34566 130

revenue AND "better" 43630 41

party AND "could adversely" 68832 64 the company AND "successfully" 2568 685 we AND "could adversely affect our" 36058 122

we AND "may be able" 13844 125

condition AND "could be adversely" 49172 88 the company AND "will be able" 2644 654

they AND "will be able" 49506 87

company AND "could negatively" 45171 38

systems AND "better" 48689 84 revenue AND "could be adversely" 60216 28

customer AND "could adversely" 65023 62

agreement AND "may not be able to" 64677 26 condition AND "may not be able" 66325 60 they AND "could adversely" 55384 30

we AND "adequately" 18973 202

condition AND "could negatively" 48438 34

agreement AND "successfully" 60942 62 they AND "may be able" 36935 44

they AND "successfully" 58647 64

agreement AND "better" 47681 34

party AND "adversely affected" 61424 61 management AND "going concern" 40515 40 party AND "could adversely affect" 67817 55

agreement AND "going concern" 45856 34

customer AND "will be able" 55974 66 executive AND "successfully" 39665 39

customer AND "could adversely affect" 64351 56 we AND "will not be able to" 15576 99 condition AND "may not be able to" 66240 53

each AND "could adversely" 56972 27

systems AND "will be able" 50292 69

we AND "going concern" 23560 65

customer AND "may be able" 45983 73

they AND "better" 41111 37

The contents in the following table are all 112 explanatory variables (MWEs) in small dataset (referring section 7.2 for detail descriptions).

All 112 explanatory variables (MWEs) in small dataset

v1 adequately v31 could seriously v61 may make it difficult v91 ultimately an adverse determination v2 are not successful v32 could seriously harm our business v62 may never achieve v92 unable or

v3 as a going concern v33 could significantly and adversely v63 may not achieve v93 unable to timely

v4 better v34 cumbersome v64 may not be able v94 uncompetitive or obsolete

v5 can be achieved v35 despite the v65 may not be able to v95 unduly

v6 can be faulty v36 do not violate v66

may not be able to identify

and v96 very difficult

v7

can be no assurance that company

will be able v37

doubt about ability to continue as a

going concern v67 may not be successful v97

vigorously contested and management does not v8 can efficiently v38 fully support its contention v68 might adversely v98 vigorously defending itself

v9 can successfully v39 going concern v69 might be impaired v99 well tolerated

v10 cautioned not to place undue v40 highly volatile v70

must also gain industry

acceptance v100 will always be able to v11 collectibility is reasonably assured v41 if a dispute v71 must successfully v101 will be able to do so v12 complaint is without merit v42 if an event of default v72 no longer infringes v102 will be successful

v13 concern about the v43 if impaired v73 not adversely affect us v103 will not able to

v14 could adversely v44 if successful v74 not be impaired v104 will not adversely

v15 could adversely affect v45 if the company unable v75 not materially adversely v105 will not be able to v16 could adversely affect our v46 if we do not successfully v76 not successful v106 will not be challenged

v17 could adversely affect our business v47 improperly v77

not ultimately prevail in this

dispute v107 will not successfully v18 could adversely affect the v48 increasingly subject to infringement v78 outstanding immediately v108 will successfully

v19 could also adversely v49 is impaired v79 reasonably assured v109 would likely decline

v20 could be adversely v50 is not effective v80 satisfactorily v110 would not adversely

v21 could be adversely affected v51

materially adversely affect our

business v81 seriously harmed v111 would not be able to

v22 could be impaired v52 may adversely impact our business v82 severely burned patients v112

would not be able to realize all or part of its

v24 could be negatively v54 may be adversely v84 shall not be unreasonably

v25 could declare an event of default v55 may be adversely affected v85

shall not be unreasonably withheld v26 could disrupt our operations v56 may be impaired v86 shall not unreasonably v27 could materially adversely v57 may be inadequate v87 significantly harmed v28

could materially adversely affect

the v58 may be reluctant to purchase services v88 subject to volatility v29 could materially and adversely v59 may be successful v89 successfully defend a

v30 could negatively v60 may be unable to v90 thereby adversely

The contents in the following table are all 174 explanatory variables (MWEs) in large dataset (referring section 7.3 for detail descriptions).

All 174 explanatory variables (MWEs) in large dataset

v1 adequately

liability v121 satisfactorily v151 well as exclusivity

v2 adversely affected

v32 could be inaccurate v62 if the company

unable v92 may make it

difficult v122 seriously harmed v152 well tolerated v3 are not successful

v33 could be materially

adversely affected v63 if we do not

successfully v93 may never

achieve v123 severely burned patients v153 will actually be achieved v4 as a going concern

v34 could be negatively v64 if we fail to have v94 may not achieve v124 shall be effective v154 will always be able to

v5 because of simple

error or mistake v35 could declare an event

of default v65 if you are

successful v95 may not be able v125 shall be effective and v155 will be able v6 better

v36 could disrupt our

operations v66 improperly v96 may not be able

to v126 shall be effective only v156 will be able to do so

v7 can be achieved

v37 could materially

adversely v67 improperly

recognized v97 may not be able

to identify and v127 shall not be

unreasonably v157 will be successful

v8 can be faulty

v38 could materially adversely affect the v68

increasingly subject to infringement

v98 may not be able

to maintain an v128 shall not be

unreasonably withheld v158 will likely suffer

v9

can be no assurance that company will

be able v39 could materially and

adversely v69

intends to defend the action vigorously

v99 may not be

achieved v129 shall not unreasonably v159 will not able to

v10 can efficiently

successful v130 should be evaluated for

impairment v160 will not adversely

v11 can greatly

v41 could negatively v71

intends to defend this matter vigorously

v101 may not be

unreasonably v131 should be revoked v161 will not be able to

v12 can not assure you

penalties v73 is not effective v103 might adversely v133 should remain stable v163 will not be impaired

v14 can successfully

compete v44 could seriously v74 longer outstanding v104 might be impaired v134

should the company be unable to continue as a going concern

v164 will not materially impair its ability

v15 cautioned not to

place undue v45 could seriously harm

our business v75

marginally profitable or unprofitable

v105 much more

effective v135 significantly harmed v165 will not successfully

v16 collectibility is

reasonably assured v46 could significantly adversely v76 materially

v136 subject to volatility v166 will successfully

v17 complaint is

without merit v47 could significantly and

adversely v77

may adversely impact our business

v107 must successfully v137 successful or profitable v167 would be able to recover

v18 concern about the

v48 cumbersome v78

may also be terminated for cause

v108 no longer

infringes v138 successfully v168 would be honored

v19 could adversely

affect us v139 successfully defend a v169 would be impractical

v20 could adversely

affect v50 do not violate v80 may be able to

develop products v110 not be challenged

or invalidated v140 thereby adversely v170 would likely decline

v21 could adversely

affect its operating v51 doubt about ability to continue as a going concern

v81 may be adversely v111 not be impaired v141 ultimately an adverse

determination v171 would not adversely

v22 could adversely

affect our v52 even worse v82 may be adversely

affected v112 not materially

adversely v142 unable or v172 would not be able to

v23 could adversely

affect our business v53 fully support its contention v83 may be deprived

of any value v113 not materially

impair its ability v143 unable to timely v173

would not be able to realize all or part of its

v24 could adversely

affect the v54 going concern v84 may be impaired v114 not successful v144 uncompetitive or

obsolete v174

would not materially adversely v25 could also

adversely v55 have not yet responded

to the complaint v85 may be inadequate v115 not successfully

assert v145 unduly

v27 could be adversely

v57 if a dispute v87

may be reluctant to purchase services

v117 now better v147 very difficult

v28 could be adversely

affected v58 if an event of default v88 may be successful v118 outstanding

immediately v148

volatile v149 vigorously defending itself

v30 could be damaged

or disrupted v60 if successful v90 may be

uncollectible v120 reasonably

assured v150 we must satisfy

相關文件