以量化語料庫方法研究中文“導致”的三個近義詞在不同主題下之語義韻 - 政大學術集成

全文

(1)1. 國立政治大學語言學研究所碩士論文 National Chengchi University Graduate Institute of Linguistics Master’s Thesis. 指導教授: 陳正賢博士. 立. 治博士政萬依萍大. Advisors: Dr. Alvin Cheng-Hsien Chen. ‧. ‧ 國. 學. Dr. I-Ping Wan. y. Nat. 以量化語料庫方法研究中文“導致”的三個近義詞在不同主題下之語義韻. er. io. a. different topics:. sit. A study of semantic prosody of three near-synonyms of cause in Mandarin Chinese under. n. v l A quantitative corpus-based perspective ni Ch. engchi U. 研究生: 巫孟宸撰 Student: Meng-Chen Wu 中華民國一百零八年一月 January 2019. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(2) 2. A study of semantic prosody of three near-synonyms of cause in Mandarin Chinese under different topics: A quantitative corpus-based perspective. 立. 政 BY治大. Meng-Chen Wu. y. ‧. ‧ 國. 學. io. sit. Nat. A Thesis Submitted to the. n. al. er. Graduate Institute of Linguistics In Partial C Fulfillment of then. hengchi U. iv. Requirements for the Degree of Master of Arts. January 2019. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(3) 3. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Copyright © 2019 Meng-Chen Wu All Rights Reserved DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(4) 4. Acknowledgements 致謝終於到了撰寫此論文的最後階段，此刻我的內心卻意外地平靜，不過我明確地知道這平靜並不是無緣無故而來，反而是奠定在這一路以來出現在我學術旅途的老師、同學、朋友與家人們所給予我的幫助與溫暖。首先，我要深深感謝我的兩位指導教授，陳正賢老師與萬依萍老師。若沒有兩位老師的教導與耐心的陪伴，這篇論文是不可能完成的。感謝陳老師願意跨校當我的指導教授，並且給予我當研究助理的寶貴機會。從中，我不但能盡情發揮我對編程的熱情，也能時常與您討論與分享計算語言學的理論面以及其未來實際應用的可能性。同時，非常感謝您在量化語料庫語言學的領域中不斷地給予我最正確及最新的知識，也在我撰寫論文的過程中，以超乎常理的耐心與細心，導引我如何確切地修改文體架構與其中的邏輯闡述。謝謝萬老師，老師除了指導撰寫論文外，也不斷地給予我人工智慧與語言學結合的應用資訊，使我對自然語言處理以及計算語言學有更多的憧憬。謝謝老師提供許多業界的珍貴一手消息，讓我更加了解相關領域在國內的狀況。最後，也謝謝老師給予我精神上的支持，因此，我在撰寫論文的過程中，能持續地奔跑不放棄。. 立. 政治大. sit. y. ‧. ‧ 國. 學. Nat. n. er. io. 我要感謝兩位本論文的口試委員，戴智偉與許展嘉老師。感謝兩位老師在繁忙的學期間撥空參與我的論文口試，並給予許多有建設性、明確的建 al iv 議，點出了論文中的盲點，讓我能更加提升論文的整體級別。另外，我要感 n Ch engchi U 謝曾在政大語言所及師大英語系語言組指導過我的教授，黃瓊之老師、何萬順老師、蕭宇超老師、戴智偉老師、甯俐馨老師。你們的教導扎深了我的語言學專業知識。另外，我要特別感謝助教惠玲學姊，您總是不厭其煩地提醒了我諸多行政上面重要的事項，讓我能順利地走完研究所的道路。我要感謝政大 105 級的同學，吳璐、佩璇、淑丰、迎婕，以及師大的同學，仲恩、Vicky、筱芸、又萱。感謝課堂中有你們，讓我的研究生生涯充滿了色彩。我要感謝最愛我的家人，爸爸、媽媽、妹妹。感謝你們對我的關愛與照顧，以及精神上的支持。謝謝你們永遠都站在我這邊，我愛你們。感謝教會的弟兄姊妹們，不停地為我禱告，陪伴我屬靈上的成長。感謝 Steve、. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(5) 5. Allison、Yawen、Celine、Tina、Naomi、Gloria、Mark、Guru、Philip、 Mitzuris、Cristina、Ginny、Jenny、David、Grace、Patrick、Deb、Max、 Cate、Jamison、Maisie、Fei、Carmen、Angel、Yena。最後，我要將一切榮耀歸於最愛我的上帝，我感謝撰寫論文過程中所發生的一切，不管是喜還是悲，我都知道它們是滋潤我生命的養分，感謝主。. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(6) 6. Table of Contents Abstract ......................................................................................................................................... 12 1 Introduction ................................................................................................................................ 13 2 Literature Review....................................................................................................................... 18 2.1 Semantic Prosody................................................................................................................ 18 2.2 Context-dependent Semantic Prosody ................................................................................ 20. 政治大. 2.3 Semantic Prosody of CAUSE ............................................................................................. 21. 立. 2.3.1 Significant Collocate Analysis ..................................................................................... 21. ‧ 國. 學. 2.3.2 Concordance Line Analysis ......................................................................................... 22. ‧. 2.4 Semantic Network Analysis ................................................................................................ 24 3 Methodology .............................................................................................................................. 27. y. Nat. al. er. io. sit. 3.1 Data Collection and Preprocessing ..................................................................................... 27. n. 3.1.1 Data Source .................................................................................................................. 27. Ch. engchi. i n U. v. 3.1.2 Data Preprocessing....................................................................................................... 28 3.2 Automatic Concordance Line Analysis .............................................................................. 31 3.2.1 ANTUSD ..................................................................................................................... 31 3.2.2 Judgement of Evaluative Meanings of Concordances and the SP of a Node Word .... 34 3.3 Semantic Network Analysis ................................................................................................ 37 3.3.1 Attracted Collocates Extraction ................................................................................... 37 3.3.2 Semantic Network Construction .................................................................................. 41. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(7) 7 4 Results ........................................................................................................................................ 41 4.1 Semantic Prosody of Chansheng ........................................................................................ 45 4.1.1 Concordance Semantic Prosody Analysis ................................................................... 45 4.1.2 Network Analysis......................................................................................................... 48 4.2 Semantic Prosody of Niangcheng ....................................................................................... 63 4.2.1 Concordance Semantic Prosody Analysis ................................................................... 63. 政治大 4.3 Semantic Prosody of Cucheng ............................................................................................ 71 立 4.2.2 Network Analysis......................................................................................................... 65. ‧ 國. 學. 4.3.1 Concordance Semantic Prosody Analysis ................................................................... 71 4.3.2 Network Analysis......................................................................................................... 73. ‧. 4.4 Internal Summary................................................................................................................ 74. y. Nat. er. io. sit. 5. Discussion ................................................................................................................................. 77 5.1 Topics Triggering Positive SP ............................................................................................ 79. al. n. iv n C 5.2 Topics Triggering Negative SP ........................................................................................... 86 hengchi U 5.3 Topics Triggering Positive and Negative SP ...................................................................... 94 6. Conclusion ................................................................................................................................ 98 References ................................................................................................................................... 101 Appendix ..................................................................................................................................... 106. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(8) 8. List of Tables Table 1 Distribution of article number under each of the nine topics in ADN corpus………..…29 Table 2 Number of concordances of each node word across different topics…………………...31 Table 3 Entry of milian ‘obsessed’ in the ANTUSD……………………………………………..33 Table 4 Entry of wugu ‘innocent’ in the ANTUSD………………………………………………33 Table 5 Frequency distribution of the max value (>0) in the evaluation fields of each word in the ANTUSD…………………………………………………………………………………………34. 政治大 Table 7 An example of four types of frequencies necessary for calculating the Delta P value 立. Table 6 Cohen’s kappa statistics on interrater agreement in the three node words…………….37. ‧ 國. 學. between the node word chansheng and cuojui ‘illusion’………………………………………...38 Table 8 Number of collocates of chansheng across each topic before/after the filtering……….39. ‧. Table 9 Number of collocates of niangcheng across each topic before/after filtering…………..40. y. Nat. er. io. sit. Table 10 Number of collocates of cucheng across each topic before/after filtering…………….40 Table 11 Standardized Residuals in a Chi-Square Contingency Table for the SP distribution of. al. n. iv n C the node word chansheng under each topic……………………………………………………...47 hengchi U Table 12 Negative prototypical collocates under topic society………………………………….57. Table 13 Top 20 positive prototypical collocates under topic entertainment………………...…58 Table 14 Top 20 negative prototypical collocates under topic international…………………...59 Table 15 Top 20 positive prototypical collocates under topic sports…………………………....60 Table 16 Positive prototypical collocates under topic finance………………………………….61 Table 17 Top 20 negative prototypical collocates under topic finance………………………....62. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(9) 9 Table 18 Standardized Residuals in a Chi-Square Contingency Table for the SP distribution of the node word niangcheng under each topic…………………………………………………….64 Table 19 Top 20 negative prototypical collocates under topic international…………………...69 Table 20 Positive prototypical collocates under topic sports…………………………………....70 Table 21 Positive prototypical collocates under topic lifestyle………………………………….70 Table 22 Standardized Residuals in a Chi-Square Contingency Table for SP distribution the node word cucheng under each topic……………………………………………………………72 Table 23 Top 10 negative prototypical collocates under topic international…………………...74. 治政大 Table 24 Semantic prosody profile of the three node words…………………………………….75 立 ‧. ‧ 國. 學. Table 25 Comprehensive view of SP tendencies across topics…………………………………..76. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(10) 10. List of Figures Figure 1. The flowchart of method of procedure ......................................................................... 27 Figure 2. Distribution of the evaluative meanings of the collocate types of the node word chansheng in the ADN .................................................................................................................. 46 Figure 3. Subset semantic network under topic society with negative collocates highlighted .... 49 Figure 4. Subset semantic network under topic entertainment with positive collocates highlighted .................................................................................................................................... 50. 政治大 highlighted .................................................................................................................................... 51 立 Figure 5. Subset semantic network under topic international with negative collocates. ‧ 國. 學. Figure 6. Subset semantic network under topic sports with positive collocates highlighted ...... 52 Figure 7. Subset semantic network under topic finance with positive collocates highlighted .... 53. ‧. Figure 8. Subset semantic network under topic finance with negative collocates highlighted ... 54. sit. y. Nat. Figure 9. Distribution of the evaluative meanings of the concordances of the node word. al. er. io. niangcheng under the seven topics ............................................................................................... 63 Figure 10. Subset semantic network under topic international with negative collocates. n. iv n C highlighted .................................................................................................................................... 66 hengchi U Figure 11. Subset semantic network under topic sports with positive collocates highlighted .... 67 Figure 12. Subset semantic network under topic lifestyle with positive collocates highlighted . 68 Figure 13. Distribution of the evaluative meanings of the collocates of the node word cucheng in the ADN ........................................................................................................................................ 71 Figure 14. Subset semantic network under topic international with negative collocates highlighted .................................................................................................................................... 73. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(11) 11 國立政治大學研究所碩士論文提要研究所別：語言學研究所論文名稱：以量化語料庫方法研究中文“導致”的三個近義詞在不同主題下之語義韻指導教授：陳正賢博士、萬依萍博士研究生：巫孟宸論文提要內容：(共一冊，20344 字，分六章) 本論文之主要目的為探討文本主題如何影響一個詞彙的語義韻 (semantic prosody)。主. 政治大之範圍。我們查驗了一個混合語義韻之詞 (產生)，與兩個強語義韻之詞 (釀成、促成)在立題在此定義為在新聞類文體中不同類別的文章，其涵蓋文本的範圍小於語域 (register). ‧ 國. 學. 蘋果日報中不同主題下的語義韻分布。我們以規則化的詞語所引列方法來決定這三個詞的語義韻分布，並運用語義網路分析來探尋它們的典型語意場 (semantic field)。研究結. ‧. 果指出，主題對產生的語義韻有中等強烈的影響，但對釀成與促成反而影響程度不大，因. y. Nat. io. sit. 此建議了詞彙的語義韻之主題依賴。我們的分析結果指出，新聞文章下的某一主題之內容. al. er. 可能是強化正/負語義韻趨勢的來源，同時揭示了主題下某一詞彙的常規用法。. n. iv n C 關進詞: 語義韻、語義偏好、主題、規則化評估、語義網路分析 hengchi U. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(12) 12. Abstract The objective of this study is to investigate how the semantic prosody (SP) of a lexical item may be mediated by the topic of the texts. Topic is defined as different categories of articles in news genre, covering a smaller scope of texts than register. In particular, we examine the SP distributions of three near synonyms: a mixed-SP node word, i.e., chansheng, and two strong-SP node words, i.e., niangcheng and cucheng, under different topics in the self-collected Apple Daily News corpus. We determine their SP distributions via a rule-based concordance line. 政治大 discover their prototypical semantic 立 fields. The results indicate that topic has moderately strong analysis on the Apple Daily News corpus and utilize semantic network analysis to further. ‧ 國. 學. effect on chansheng, a mixed-SP node word, but weak effect on niangcheng and cucheng, strong-SP node words, suggesting the topic-dependency of the lexical SP. Our analysis suggests. ‧. that the subject matters of the news articles under a given topic may be the source that intensifies. sit. y. Nat. the positive/negative SP tendency of a node word, and also reveals the conventionalized usage of. n. al. er. io. the node word under the topic.. i n U. v. Keywords: semantic prosody, semantic preference, topic, rule-based evaluation, semantic network analysis. Ch. engchi. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(13) 13. 1 Introduction Semantic prosody (SP) has been a popular topic in corpus linguistics research over the past three decades. Several researchers have studied the SP of a lexical item (i.e., a word or a phrase) based on the evaluative meaning of its concordances or its most typical collocates in corpora and have shed light on how a node item has a connotational meaning (Hunston, 1995; Hunston, 2007; Louw, 1993; Louw, 2000; Partington, 1998; Partington, 2004; Sinclair, 1991; Sinclair, 2004; Stubbs, 1995). For instance, Louw (1993) examined the collocational patterns of utterly, bent on,. 政治大 meaning, indicating negative SP.立 Also, words such as persistent (Hunston, 2007), Commit and. and symptomatic of, and found that they tended to co-occur with words with negative evaluative. ‧ 國. 學. dealings (Partington, 1998, pp. 66-74), budge (Sinclair, 2004, pp. 142-147), and cause (Stubbs, 1995) were found to be mainly associated with negative contexts, suggesting an unfavorable. ‧. connotation.. Nat. sit. y. The fruitful SP studies on English words have also sparked scholars’ interests in. n. al. er. io. Mandarin Chinese (Chinese) words via cross-linguistic analysis. One representative case is Xiao. i n U. v. and McEnery (2006), who conducted a comparative study of SP and semantic preference (i.e.,. Ch. engchi. semantic features that are associated with the collocates of a node item) of three sets of near synonyms in both English and their notional equivalents in Chinese and found that these synonyms showed different SP patterns and semantic preferences in each language. Specifically, synonyms of cause in Chinese such as zhishi, niangcheng, and zaocheng had a very strong negative SP, with very few concordances with positive/neutral evaluation; cucheng had a very strong positive SP; however, other synonyms such as chansheng and dailai had a rather mixed SP, with similar proportions of evaluative negative, positive, and neutral concordances. Similarly, Wei and Li (2014) tested the assumption whether the SP of a word in one language. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(14) 14 would also be the same as the translation equivalent of the word in another language. They studied the SP and semantic preference of four English words and their translation equivalents in Chinese and revealed that each translation pairs had, in certain degree, different SP patterns and semantic preference. In addition to comparative studies, Li and Jiao (2017) investigated the SP of Chinese logical resultative formulae (e.g., yinwei ‘because’ and youyu ‘due to’) and discovered that yinwei had a neutral SP and youyu mixed SP. Research on SP has also been extended to the analysis of the relationship between SP and. 政治大 registers in news genre) (Hunston, 2007; O'Halloran, 2007). That is, a lexical item may showcase 立. register (e.g., categories such as hard news, soft news, and sports reporting are considered as. ‧ 國. 學. a positive SP in one register but a negative SP in another. For example, the word erupted had a positive SP in the sports report register, as it was associated with contents of cheering, but a. ‧. negative SP in the hard news register that associates with natural forces (O'Halloran, 2007). In. sit. y. Nat. addition, Partington (2017) suggested topic as a possible factor that influences the SP of words.. io. er. He found that the SP of orchestrate had positive evaluation in the topics of music and sports, but negative evaluation in the topic of politics. However, the evidence he provided was limited and. al. n. iv n C the classification of topic type was basedhon researcher’siintuition. engch U. Therefore, it is possible that the existence of a mixed-SP word, as revealed in the study of Xiao and McEnery (2006), may have certain affinity to text categories such as register or topic. In this study, topic is defined as different categories of articles in news genre, and covers a smaller scope of texts than register. Because a given lexical item may have a mixed SP, and little research has addressed the link between topic and the mixture of SP in a large scale, the current study thus aims to examine more closely the relationship between TOPIC and SEMANTIC PROSODY. In particular, we investigate three near-synonymous verbs of cause in Chinese, i.e., 產生. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(15) 15 chansheng, 釀成 niangcheng, 促成 cucheng, regarding their SP to the topics of their cooccurring texts in a news genre corpus. The objective of this study is to investigate how the SP of a lexical item may be mediated by the topic of the texts. We examined the SP distributions of chansheng, a mixed-SP node word, and both niangcheng and cucheng, strong-SP node words, under different topics in the Apple Daily News corpus. Meanwhile, the relationship between TOPIC and SEMANTIC PROSODY are examined under three hypotheses. The strong hypothesis predicts that topic has an effect on. 政治大. both a mixed-SP and strong-SP words. The moderate hypothesis predicts that topic only affects. 立. mixed-SP words. The null hypothesis predicts that topic has limited influence on the SP of. ‧ 國. 學. words. If the strong hypothesis is true, we expect to see the SP distributions of chansheng, niangcheng, and cucheng vary significantly across different topics. If the moderate hypothesis is. ‧. true, we expect to see only chansheng having a topic-dependent SP distribution. If the null. sit. y. Nat. hypothesis is true, we expect to see the SP distribution of three target words are unresponsive to. al. er. io. topic change. Finally, the existence of either strong or moderate hypothesis would shed new light. v. n. on how the semantic representation of a lexical item may be contextually dependent, even at the level of topic.. Ch. engchi. i n U. To explore the SP of a lexical item, two methods were conventionalized in the previous studies. Before introducing these methods, we define a few related terminologies. A node word is a central, target lexical item within a certain window size (e.g., 3:3 window size indicates a span of three words to the left and right of a node word). Collocates are words that co-occur with a node word within a stretch of texts. A concordance is a span of words that contain both a node word and its collocates within a pre-defined window size. Beginning with significant collocate analysis (Stubbs, 1995; Stubbs, 2001a), the collocates of a node word within a certain window. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(16) 16 size which pass a researcher-defined threshold are classified into one of the three evaluations (i.e., positive, neutral, and negative). The SP of a node word is determined by the semantic distribution of these collocates. Another more common approach to determine the SP of a node word is concordance line analysis (Hunston, 2004; Hunston, 2007; Louw, 1993; Partington, 1998; Partington, 2017; Wei & Li, 2014; Xiao & McEnery, 2006). In this approach, the concordance lines of a node word in a corpus are often manually examined and categorized into positive, neutral, or negative evaluation by researchers. However, concordance line analysis is often time-consuming and labor-intensive, and often needs to limit the samples to a tractable. 治政大different annotators, leading to size. Also, criteria for determining SP may be ambiguous across 立 an issue of consistency.. ‧ 國. 學. In the present study, we adopt a hybrid method: rule-based concordance line analysis.. ‧. This method was based on concordance line analysis with some modifications. Since it is. sit. y. Nat. impossible to manually annotate the entire concordances of three node words (e.g., more than. io. er. 27,000 instances) in the corpus within the limited time, we developed a set of rules to efficiently classify each concordance of a node word into one of three evaluative meaning groups under. al. n. iv n C each topic. The idea was inspired by theh sentiment analysis, e n g c h i Ua computational linguistics. approach, which has been effectively applied in mining opinions within customer reviews or finance news (Khedr, Salama, & Yaseen, 2017; Pang & Lee, 2008). If the results from the above method show that there is an intimate interaction between TOPIC and SEMANTIC PROSODY,. we can further identify the source that amplifies the SP. distributions of the three node words, specifically under each topic in the corpus. As the research of Partington (2004) and Wei and Li (2014) suggests, SP of a node word has a strong relationship with semantic features of the collocates of that node word. Based on their. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(17) 17 proposition, we hope to articulate the connection between the topic-dependent SP and semantic features. Hence, we utilized semantic network analysis to find the top 20 prototypical collocates of the node words under topics which significantly enhance either positive or negative SP tendency of a node word. Subsequently, we categorized the semantic features of those prototypical collocates and examined a few concordances of the node words under those topics, so as to verify whether such semantic features contribute to the rising of positive/negative SP.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(18) 18. 2 Literature Review 2.1 Semantic Prosody First made public in 1993, Louw described semantic prosody (SP) as “a consistent aura of meaning with which a form is imbued by its collocates” (1993, p. 157) and “a form of meaning which is established through the proximity of a consistent series of collocates” (2000, p. 57). A node item may acquire a connotational attribute of positive or negative evaluative meaning from the collocates it habitually co-occur with. In another perspective, Partington added that SP is “the. 政治大. spreading of connotational coloring beyond single word boundaries” (1998, p. 68). Thus, the. 立. favorable/unfavorable meaning of a node word is the product of the node word coordinating with. ‧ 國. 學. its collocates within a certain span. The main function of SP is to express language user’s evaluation, attitude, or opinion toward a given topic in a pragmatic setting (Louw, 2000; Sinclair,. ‧. 2004, p. 34; Stubbs, 2001b, p. 98).. y. Nat. sit. Another important notion in research on SP is semantic preference, that is, the common. n. al. er. io. semantic features of the collocates associated with a node item (Stubbs, 2001b). For example,. i n U. v. Wei and Li (2014) compared the semantic preferences of the English-Chinese translation pair of. Ch. engchi. orchestrate and cehua and found that the collocates of orchestrate typically come from the semantic subsets of (1) destructive/violent operations and events and (2) political and social activities/events, and the collocates of cehua from (1) destructive/violent operations events, (2) malicious plots and (3) constructive events/activities. Partington (2004, p. 151) further explained the interplay between semantic prosody and semantic preference: semantic prosody “dictates the general environment which constraints the preferential choices of the node item,” while semantic preference “contributes powerfully to building semantic prosody.” Therefore, there is an. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(19) 19 intimate relationship between SP and semantic preference, such that the SP of a node word is conditioned by the semantic fields of its co-occurring words. In terms of the grading scheme for the evaluative meaning of SP, most scholars adopt a trinary distinction: positive, neutral and negative, (Li & Jiao, 2017; Partington, 2004; Stubbs, 1995; Wei & Li, 2014; Xiao & McEnery, 2006). On the other hand, Sinclair (2004) regarded SP as attitudinal, and thus the categorical grading scheme is not applicable. For instance, the sequence of [attempt + negation + budge] as in a sentence “the kid tried to roll the stone, but it. 政治大 difficult. The SP of this sequence is rather complex and cannot be simply classified into one of 立 did not budge” expresses the sense of frustration at the failure of endeavor on something. ‧ 國. 學. the three evaluative categories.. There are also two streams of methods to discover the SP of a given node item:. ‧. significant collocate analysis and concordance line analysis. The first method focuses on the. sit. y. Nat. consistent evaluative polarity of collocates of a node item. For instance, Stubbs (1995) extracted. n. al. er. io. the significant collocates of a given node item within a certain span (e.g., 3 words before and. i n U. v. after a node item) and determined the SP of the node word according to the majority of the. Ch. engchi. evaluative meaning the collocates reveal. However, such a method may ignore the phenomenon of evaluative embedding discussed in Morley and Partington (2009) and Partington (2017). The concept of evaluative embedding notes that evaluative polarity is contingent upon the higher syntactic level in a string of words. Let us consider an example provided by Partington (2017, p. 196): in “Global poverty is falling rapidly,” while the phrase global poverty conveys a negative evaluation, the overarching meaning of the sentence is instead positive. On the other hand, the concordance line analysis focuses on the consistent evaluative polarity of concordances where a node item exists. Researchers adopting this method extract. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(20) 20 either all or a random-sampled subset of concordances of a given node item and manually examine the evaluative meaning each concordance conveys (Louw, 1993; Louw, 2000; Partington, 1998; Partington, 2004; Sinclair, 2004; Wei & Li, 2014; Xiao & McEnery, 2006). Nonetheless, the concordance analysis has limitation on its scalability since it is time-consuming to examine every concordance of a node item. Meanwhile, the operational criteria for determining the evaluative meaning of concordances may not be consistent and may differ from one scholar to another.. 政治大. 2.2 Context-dependent Semantic Prosody. 立. Researchers have also drawn attention to the connection between SP and text category.. ‧ 國. 學. O'Halloran (2007) proposed the notion of register prosody and stated “some prosodies have probabilistic relationships to register.” At the lower level to genre, a register is defined as a. ‧. “configuration of meanings that are typically associated with a particular situational. Nat. sit. y. configuration of field, tenor, and mode” (Halliday & Hasan, 1989, p. 89). Thus, within the genre. n. al. er. io. of news journalism, categories such as hard news, soft news, sports report, and recipes are. i n U. v. considered as register (O'Halloran, 2007). In his study, the concordances of words such as. Ch. engchi. erupted and simmering were examined under different registers. The results showed that erupted had negative SP under hard news register and positive SP under sport report register; simmering was associated with negative SP under hard news register but neutral SP under recipe register. This suggested that both words have a register-dependent SP. In parallel, Hunston (2007) pointed out the word cause lost its negative SP but gained the neutral SP in scientific register. She suggested that the different SP phenomena of cause were the results of register selection. Furthermore, at the lower level of register, Partington (2017) pointed out the role of topic in the SP of the word orchestrate. Under topics of music and sports, since the word was used in. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(21) 21 its literal meaning ‘combine harmoniously,’ it transmitted a positive SP; under topic politics, it was used metaphorically ‘plotting,’ thus leading to a negative SP. He claimed that the two senses of the node word orchestrate is the cause of topic-dependent SP. However, there are issues regarding his argumentation: only few number of concordance was examined, and the judgement of topic type of the concordances was purely subjective. What sets our study significantly different are that (1) three target node words do not have metaphorical senses, (2) there are large number of instances being investigated, and (3) each article in the news genre corpus was classified by the editors into different topics at the time of publication.. 政治大 2.3 Semantic Prosody of CAUSE 立. ‧ 國. 學. 2.3.1 Significant Collocate Analysis. Stubbs (1995) studied the SP of the node word cause in both its nominal (e.g., the cause of) and. ‧. verbal (e.g., cause, caused) uses by examining the 120-million-word Cobuild Corpus and the 1-. sit. y. Nat. million-word LOB Corpus. In the former corpus, he identified the top 50 collocates within the. al. er. io. 4:4 window (four words to the left and right of the node word) that were most associated with the. v. n. node word cause in terms of t-score and MI scores, and the top 50 frequent collocates within the. Ch. engchi. i n U. window of 3:3. Among the top 50 collocates ranked by t-scores, all the content words were predominantly negative, e.g., anxiety, concern, crisis, damage, distress, embarrassment, aids, blood, cancer, and death. Among the adjective collocates, however, more of them were neutral, e.g., common, considerable, great, major, root. On the other hand, the top 50 collocates ranked by MI scores revealed more low-frequency collocates with strong association with the node word, e.g., consternation, grievous, uproar; célébre, irreparable. In regard of the top 50 frequent collocates, many of them have negative meaning, e.g., problem(s), damage, death(s), disease, cancer, pain, trouble, and serious (Stubbs, 1995, p. 15).. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(22) 22 In general, the node word cause has clear negative SP as it co-occurs with negative words, which are attested in collocates based on statistical association and raw frequencies. However, Stubbs (1995) pointed out the disadvantage of using significant collocates analysis. Take the top 12 frequent collocates for example, although 8 of the 12 collocates are negative, the other words, e.g., great, major, and common, are apparently not negative. These words are in fact not modifier of the node word cause, but of the other nouns that happened to be included in the 4:4 window. For instance, the word natural in cause for natural great concern. Also, they are not the direct object of the verb cause, but the adjectival modifier of the object argument of cause,. 治政大Moreover, the researcher needed such as grievous in cause grievous bodily harm (Stubbs, 1995). 立 considered as the contributing feature to the SP of cause.. ‧. 2.3.2 Concordance Line Analysis. 學. ‧ 國. to manually remove those highly frequent functional words within the window, which were not. sit. y. Nat. Xiao and McEnery (2006) conducted a comparative study of the SP of the cause group: cause. io. er. and its near-synonyms (i.e. arouse, lead to, result in/from, give rise to, bring about) in English and eleven close translations of cause in Mandarin Chinese (i.e. zhishi, niangchen, zaocheng,. al. n. iv n C yinfa, daozhi, dailai, yinqi, chansheng, h xingchen, cushi, icucheng). e n g c h U For English words, they used the Freiburg-LOB Corpus of British English (FLOB), the Freiburg-Brown Corpus of American English, and supplemented the corpora by the Brown University Corpus of American English and the British National Corpus when the former two corpora could not provide enough occurrences of the target words. For Mandarin Chinese, they used the Lancaster Corpus of Mandarin Chinese (LCMC) and supplemented the corpus by the People’s Daily Corpus (2000). Adopting the concordance line analysis, they investigated the SP of the target words according to the distributions of the evaluative meanings of the all the concordances of the cause. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(23) 23 group in the corpora. Moreover, they used 4:4 window size to capture the significant collocates (with MI score greater than 3 and joint frequency at least 3 times) of the target words to investigate the collocational behavior of the cause group. According to the proportions of evaluative meaning of the concordances reported in Xiao and McEnery (2006, p. 117), the synonym arouse had similar strong negative SP as cause (65% of the concordances of arouse were negative and 78% of the concordances of cause were negative) while the others synonyms such as lead to, result in/from and give rise to had weaker. 政治大 concordances respectively), and bring about had mixed negative and positive SP (38% negative 立 negative SP (lead to, result in/from, and give rise to have 49%, 47% and 46% negative. ‧ 國. 學. and 46% positive).. On the other hand, five of the eleven cause group in Chinese generally showcased the. ‧. strong negative SP, such as zhishi, niangcheng, zaocheng, yinfa, and daozhi. However, the words. sit. y. Nat. such as dailai, yinqi, chansheng and cushi had inconsistent SP: dailai had 49% negative, 27%. n. al. er. io. positive and 24% neutral concordances; yinqi had 43% negative, 15% positive, and 42% neutral. i n U. v. concordances; chansheng had 31% negative, 24% positive, and 45% neutral concordances; cushi. Ch. engchi. had 5% negative, 59 % positive, and 36% neutral concordances). The node word xingcheng had a strong neutral SP with 64% neutral concordances, and cucheng had a very strong positive SP with 98% positive concordances. According to the results, Xiao and McEnery (2006) concluded that SP is also phenomenal in Chinese language, and the near-synonyms of cause cannot be replaced with each other in both languages because of their different SPs and collocational patterns. Meanwhile, the SP profiles for the three target node words in this study are based on their findings: chansheng, the mixedSP node word, and niangcheng and cucheng, the strong-SP node words.. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(24) 24. 2.4 Semantic Network Analysis To conduct a semantic network analysis (SNA), we first need to identify significant collocates (i.e., collocation) of node words. The notion of collocation refers to the relation between two lexical items where the two co-occur together ‘with greater than random probability in its (textual) context’ (Hoey, 1991, pp. 6-7). To capture the collocation, many association measures (AMs) have been provided. The AMs can further be classified into bidirectional and directional. The bidirectional AMs include the MI, PMI, X2, t, z, LLR, Fisher-exact test; the directional AMs. 政治大. include Delta P (Evert, 2008; Gries, 2013; Manning & Schütze, 1999, pp. 169-183).. 立. If it is the significant relation between a grammatical construction and a lexical item. ‧ 國. 學. instead, such pattern is called collostruction (Stefanowitsch & Gries, 2003). Here, the lexical item(s) appearing in the grammatical slot(s) of a construction is called collexeme(s). In parallel. ‧. to collocation, the same AMs introduced above can also be applied in finding collostruction, and. sit. y. Nat. such approach is collostructional analysis (Stefanowitsch & Gries, 2003).. n. al. er. io. Ellis and his collaborators have utilized the collostructional analysis and the SNA to. i n U. v. study the learning of verb argument constructions (VACs) in language acquisition, e.g., [Verb. Ch. engchi. Locative] construction (A kangaroo jumped over the fence) and [Verb + Object + Object] construction (She baked me a cookie) (Ellis & Ogden, 2017; Ellis, Römer, & O'Donnell, 2016). They were interested in the statistical patterns in language use that may account for the acquisition of verbs in different types of the VAC, including verb frequency, verb contingency, and semantic prototypicality. They derived the values of these three factors from the British National Corpus (BNC).. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(25) 25 Verb contingency is the degree of faithfulness a verb has toward a construction. In the collostructional analysis, they used Delta P as the AM to test how strong a verb (i.e. collexeme) is attracted to a construction or how likely a verb can be predicted given the construction. The formula of Delta P is detailed in section 3.5 (Gries, 2013). Semantic prototypicality of a verb is defined by its degree of betweenness centrality (BC) within the semantic network of a VAC where that verb appears (Ellis et al., 2016, pp. 82-86). Verbs that are attracted to a particular VAC can together form a semantic network belonging to. 政治大 solution in human communication network (Freeman, 1977). Likewise, a verb’s relation to other 立 that construction. Originally, BC was developed to quantify an individual’s contribution to a. ‧ 國. 學. verbs in a semantic network of a VAC can also indicate the degree of BC. This identifies the number of shortest paths between other verbs that pass through that verb over the number of. ‧. shortest paths between any other pairs of verbs. The higher degree of BC a verb has, the more. sit. y. Nat. prototypical the verb will be. As in prototype theory, where a prototype of a category often. io. er. summarizes most of the features shared by the members in the category (Rosch, 1975), a prototypical verb should bear the most resemblance to most of the other verbs in a semantic. n. al. Ch network (Ellis et al., 2016, pp. 92-94).. engchi. i n U. v. Ellis and Ogden’s studies (2017; 2016) found that the three factors are crucial in both L1 and L2 acquisition. In L1 acquisition, the verb slot in children’s VACs were mainly occupied by verbs that were more frequent and more faithful to their VACs in adult language, and were semantically prototypical. These verbs were also believed to drive the acquisition of VACs. In L2 acquisition, Ellis et al. (2016) found that the first-acquired verb in each type of VAC (1) had more frequent occurrences in the verb slot in that construction, (2) was more faithful to that construction, and (3) was a more prototypical representation of the construction’s. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(26) 26 functional meaning. Moreover, the acquisition of the verb in a construction related significantly to its frequency and contingency, but not semantic prototypicality. In our study, we employed the same approach of SNA illustrated in Ellis et al. (2016, pp. 82-86) to retrieve the prototypical collocates of a node word. Subsequently, we investigated the semantic preference of those prototypicality collocates so that we may grasp the idea of typical contents with which a given node word is often associated in a topic.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(27) 27. 3 Methodology Two analyses were conducted in the current study: automatic concordance line analysis and semantic network analysis. Figure 1 presents a work flowchart of our method of procedure. The detail of each step is explained in the following sub-sections.. 1. Data collection and preprocessing. 立. 政治大 ANTUSD. Judgement of evaluative meanings of concordances. ‧ 國. 學. 3. Semantic network analysis. Semantic network construction. io. sit. y. ‧. Nat. Attracted Collocates Extraction. al. er. 2. Rule-based concordance analysis. Data preprocessing. Data scource. n. Figure 1. The flowchart of method of procedure. Ch. e. ngch 3.1 Data Collection and Preprocessing. i. i n U. v. 3.1.1 Data Source All the news articles 1 published online during 2003-05-02 to 2018-7-19 in the Apple Daily News 2(ADN) were automatically collected using a self-developed web crawler written in Python language. We categorized the articles in the ADN corpus into different topics 3 based on the. Total number of articles during 2003-05-02 to 2018-7-19 are 1,354,185 https://tw.appledaily.com/ 3 The topic categories include: 生活 shenghuo ‘life’, 地方綜合 difangzonghe ‘sum of local news’, 投訴 tousu ‘sue’, 法庭 fating ‘court’, 社會 shehui ‘society’, 政治 zhengzhi ‘politics’, 要聞 yaowen ‘important news’, 搶鏡頭 1 2. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(28) 28 annotations from the news provider. Among these topics, we filtered out general topics that included cross-topic contents (e.g., yaowen ‘important news’, difangzonghe ‘sum of local news’, luntan ‘forum’ and luntanyuzhuanlan ‘forum’) or ones which had few articles (e.g., nuanliu ‘warm current’ and pinglunzhenxian ‘Apple comment ranks’). Also, we merged topics with similar contents together, i.e., fating ‘court’ and tousu ‘file a lawsuit’ as topic forum. In the end, seven topics were chosen for our analysis, i.e., society, politics, entertainment, international, sports, finance, and lifestyle.. 政治大 First, we removed all the metadata in the articles, including hyperlinks, the author names of the 立 3.1.2 Data Preprocessing. articles, and image links and captions. Second, the JiebaR package 4 was utilized to conduct word. ‧ 國. 學. segmentation and part-of-speech (POS) tagging. The POS information was useful for retrieving. ‧. content words (i.e., noun, verb, adjective, and adverb), which would be crucial to our later. y. Nat. collocate extraction procedure. Also, based on the punctuations, we were able to break texts into. io. sit. chunk-size units. A chunk is delimited by punctuations 5 and symbols, and may correspond to a. n. al. er. sentence, clause, or phrase. The statistics of the article number and token number in each topic are summarized in Table 1.. Ch. engchi. i n U. v. qiangjingtou ‘scene stealing’, 暖流 nuanliu ‘warm current’, 號外 haowai ‘extra’, 網路新聞 wangluxinwen ‘internet news’, 論壇 luntan ‘forum’, 頭條 toutiao ‘headline’, 蘋果爆破社 pingguobaoposhe ‘Apple blasting club’, 蘋論陣線 pinglunzhenxian ‘comment ranks’, 娛樂 yule ‘entertainment’, 蘋果國際 pingguoguoji ‘international’, 體育 tiyu ‘sports’, 財經 caijing ‘finance’, 地產 dichan ‘home’, 副刊 fukan, 論壇與專欄 luntanyuzhuanlan ‘forum’. 4 https://github.com/qinwf/jiebaR The typical tags in JeibaR include n (noun), v (verb), a (adjective), t (time), r (pronoun), m (quamtifier), d (adverb), p (preposition), c (conjunction), u (particle), e (exclamation), o (onomatopoeia), x (symbol), and w (punctuation). 5 Punctuations are included in the square brackets: [─，、:：；。！？!?]; symbols include ◎●. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(29) 29 Table 1 Distribution of article number under each of the nine topics in ADN corpus. Topic name. Society. Politics. Enter.. Inter.. Sports. Finance. Lifestyle. Total. Number of. 62527. 36923. 225314. 97487. 157972. 139069. 151729. 871021. %. 6.47. 3.82. 23.32. 10.09. 16.35. 14.39. 15.71. 100. Token. 179798 20. 117317 61. 525698 40. 5611637 6. 6477659 7. 2631222 24. 287.55. 317.74. 233.32. 403.51. 426.92. 305.98. articles. token. 229.03. ‧. number. 243.8. 學. Average. 立. ‧ 國. number. 治 361804 政 237673 87 43 大. y. Nat. io. sit. Third, we extracted the concordances of the three node words for SP analyses. We. n. al. er. defined the span of the concordance line as 1:1 chunk-based window size. The concordance lines. Ch. i n U. v. did not span over paragraph boundaries. An example was provided in (2):. engchi. (2) a. 今年賽制轉變產生差異在於成績領先車手先出發，可能因賽道上抓地力較差而吃虧， jinnian saizhi zhuaibian chansheng chayi zaiwu chengji lingxian cheshou xian chubo, genai yin saidao shang zhuadeli jiaocha er chikui,. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(30) 30 This-year game-system change caused difference at score leading driver first start, may because track on grip worse and suffer-loose, ‘The difference in the system of this year 's competition is that the drivers with the leading scores start first and may suffer disadvantage from poor grip on the track,’ b. 砸中在下方的工人，釀成意外。 za zhong zai xiafang de gongren, niangcheng yiwai.. 政治大 ‘hitting the worker below, 立causing the accident.’ Hit among at below ATTR worker, cause accident. ‧ 國. 學. c. 美國和俄羅斯居中斡旋、促成的敘利亞停火協議，昨凌晨零時生效，. ‧. meiguo he eluoshi jizhong guanxuan, chuocheng de xulie tinghuo xieyi, zuo lingchen lianshi shengxiao,. y. Nat. early-morning 12:00 am take effective. n. al. Ch. engchi. er. io. sit. US and Russia amidst mediate、cause/promote DE Syria ceasefire agreement, yesterday. i n U. v. ‘The ceasefire agreement of Syria meditated and caused/promoted by US and Russia was effective last night 12:00 am’ In (2a) and (2b), since the current chunk of the node word chansheng/niangcheng was at the beginning/end of a paragraph, only the current chunk (i.e., the chunk where a node word exists) and the following/preceding chunk were included as part of the concordance line. When the chunk of the node word was surrounded by punctuations, as cucheng in (2c), the concordance lines included the current chunk as well as one chunk preceding/following it. Table 2 shows the number of concordances of each node word under the seven topics.. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(31) 31 Table 2 Number of concordances of each node word across different topics Topic name. Society. Politics. Enter.. Inter.. Sports. Finance. Lifestyle. chansheng. 1102. 1112. 2074. 2808. 1730. 5593. 8999. niangcheng. 606. 50. 120. 556. 125. 73. 141. cucheng. 29. 403. 631. 586. 200. 805. 149. 政治大. 3.2 Rule-based Concordance Line Analysis. 立. 3.2.1 ANTUSD. ‧ 國. 學. We classified the collocates in each concordance line into one of the three evaluative meaning. ‧. groups according to the manual annotation of the sentiment labels (i.e., positive, neutral and negative) of words in the ANTUSD 6 (Wang & Ku, 2016). The ANTUSD is the combination of. y. Nat. sit. words from The National Taiwan University Sentiment Dictionary (NTUSD), the NTCIR. n. al. er. io. MOAT Task Dataset (NTCIR), the Chinese Opinion Treebank (COT), and the Advanced. i n U. v. Chinese Bi-Character Word Morphological Analyzer Corpus (ACBiMA). It provided five. Ch. engchi. sentiment-related fields for each annotated word: Pos, Neu, Neg, Non, and Not. These five fields were the number of times a given word was tagged as positive, neutral, negative, non-opinion word, and non-word respectively. The NTUSD was the former version of the ANTUSD, containing 2812 positive and 8276 negative words, and was constructed for the purpose of sentiment analysis (Ku, Liang, & Chen, 2006). The sentiment of each word was determined by the majority among the three annotators.. 6. http://academiasinicanlplab.github.io/. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(32) 32 The two sentiment corpora are the NTCIR MOAT Task Dataset (NTCIR), the Chinese Opinion Treebank (COT). Three annotators labeled each sentence as positive, neutral, and negative in the two corpora based on the majority decision. Meanwhile, each annotator individually labeled words that he/she considered as sentiment words in each sentence as one of the three evaluations. Lastly, ACBiMA is the Chinese word morphological type. The ACBiMA contains about 11-thousand Chinese words along with their morphological types, and was developed to improve sentiment analysis by provided morphological information (Huang, Chen, & Kong, 2015). The words in the dataset were either randomly extracted from the Chinese. 治政大Ku, & Chen, 2010), and were Treebank 5.1 and the NTCIR CIRB040 news corpus (Huang, 立 7. labeled as positive, neutral, negative, non-opinioned, and not-a-word by the agreement of the two. ‧. ‧ 國. 學. annotators.. Due to its multiple sources, there were words with duplicated entries or with multiple. sit. y. Nat. values under the five fields in the ANTUSD. We followed a procedure to assign each matched. al. er. io. collocate with a consistent sentiment label. First, if a collocate was found to have only one entry. iv n C sentiment label of the max value within h theeevaluation i U(i.e., Pos, Neu, and Neg). However, n g c hfields n. in the ANTUSD (94.5% of words have only one entry), we assigned the collocate with a. in cases where two evaluation fields shared the same max value, we assigned the collocate with the positive label if Pos and Neu share the same max value; the negative label if Neg and Neu share the same max value; and the neutral label if Pos and Neg share the same max value. Let us consider one example in Table 3. Since the word milian ‘obsessed’ was tagged 2 times as positive and negative, it was labeled as neutral.. 7. https://catalog.ldc.upenn.edu/LDC2005T01. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(33) 33 Table 3 Entry of milian ‘obsessed’ in the ANTUSD Word. Pos. Neu. Neg. 迷戀 milian. 2. 1. 2. Second, if a collocate was found to have more than one entry in the ANTUSD, we chose the one of the max value within the evaluation fields and assigned the collocate with the. 政治大. sentiment label based on that entry. Consider an example in Table 4. There were two entries in. 立. the ANTUSD for the word wugu ‘innocent’. The word was labeled as negative based on the first. ‧ 國. ‧. Table 4. 學. entry.. Entry of wugu ‘innocent’ in the ANTUSD Neu. n. al. 無辜 wugu. 1. 無辜 wugu. 2. Neg. er. io. Pos. sit. y. Nat. Word. Ch. 0. engchi 0. i n U. v. 3 2. Third, if a matched collocate was found to have 0 value across the evaluation fields, it was not considered as sentiment word, and thus no sentiment label was given. The frequency distribution of Pos, Neu, and Neg with the max value among the evaluation fields of each word in the ANTUSD are summarized in Table 5.. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(34) 34 Table 5 Frequency distribution of the max value (>0) in the evaluation fields of each word in the ANTUSD Evaluation pos. neu. neg. pos and neu neu and neg pos and neg pos, neu and. fields. neg. Number of 11489 1791. 13243 244. 145. 43. 0. max value. 政治大 Note. There are 2708 instances that 0 are across the evaluation fields. 立. ‧ 國. 學. 3.2.2 Judgement of Evaluative Meanings of Concordances and the SP of a Node Word With the sentiment labels of the collocates established from the ANTUSD, the evaluative. ‧. meaning of a given concordance was automatically determined. Inspired by the method of Li and. sit. y. Nat. Jiao (2017), We developed a set of heuristics to optimize the accuracy of the judgement of the. al. er. io. evaluation of a concordance. First, if a concordance consisted of collocates which were uniformly tagged as positive/neutral/negative, that concordance was classified as. n. iv n C hofecollocates positive/neutral/negative. Second, if none concordance were found in the n g c hini a U ANTUSD, it was classified as neutral. Third, since the three target node words are all verbs, objects are main contributors of evaluative meaning of concordances. Thus, if one of the collocates following a node word in the current chunk was tagged as positive/negative, that concordance was classified as positive/negative evaluation; however, if there were positive and negative collocates following the node word, the evaluation of that concordance is determined by the majority number of positive/negative sentiment tags. Moreover, we took into account the. effect of evaluative embedding discussed in Partington (2017), who pointed out that if a node. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(35) 35 word was preceded by a negator (shown in Appendix 1), either positive or negative evaluative meaning of a concordance determined from the above rules would be reversed (cf. (3a) 8). Similarly, if a node word was preceded by a word expressing the idea of prevention (shown in Appendix 2), the originally negative evaluation of a concordance would be reversed to positive (cf. (3b)). Finally, if the chunk after the node word started with concessive conjunctions, e.g., dan/danshi ‘but’, and there were negative collocate(s) at the same time, it would be classified as negative evaluation (cf. (3c)). (3). 立. 快速上車離去，幸好沒 (釀成) 更火爆的衝突。. 學. ‧ 國. a.. 政治大. kuaisu shangche chiqu, xinhao mei (niangcheng) geng huobao de chongtu.. ‧. Quickly go-on-car leave, fortunate not cause more hot ATTR conflict. Nat. sit. io. 有通鼻、預防鼻子因溫度變化 (產生) 過敏反應的功效,. al. er. b.. y. ‘Get on the car quickly, and fortunately no more serious conflicts were caused.’. n. iv n C U h ewendu you tongbi, yufang bizi yin guomin fanying de n gbianhua c h i (chansheng). gongxiao, have opening-nose, prevent nose because temperature change cause allergic reaction ATTR effect,. In (3), collocates tagged as negative were underlined and those tagged as positive are in bold, and the node words were in parentheses.. 8. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(36) 36 ‘It has the effects of opening nose and preventing the allergic reaction of the nose due to temperature changes.’ c.. 他們也希望藉由歌曲 (促成) 兩韓統一，但擔心留在北韓的家人會遭到當局報復。 tamen ye xiwang jiyao gequ (chuocheng) lianghan tongyi, dan danxin liuzai beihan de guren hui zaodao dangju baofu.. 政治大 Korea ATTR will suffer authority retaliate 立. 3-PL also hope through song cause two-Koreas unification, but worry stay North. ‧ 國. 學. ‘They also hope that the two Koreas will be united through songs, but they fear that the family members who stay in North Korea will be retaliated by the. ‧. authorities.’. Nat. sit. y. In our first rule-based concordance line analysis, the SPs of the node words were. n. al. er. io. determined based on the distribution of the three evaluative meanings of the concordances across. i n U. v. different topics. In particular, by a cut-off rate of 50 percent (Li & Jiao, 2017), the SP of the node. Ch. engchi. word was determined if one of the evaluative meanings reached over 50 percent. If the proportions of the three evaluative meanings of a node word were all below 50 percent, the node word was claimed to have a mixed SP. To evaluate the credibility of our rule-based concordance line analysis, we first manually annotated one hundred random concordances of each of three node words. The number of these concordances under a topic is proportional to the percentage of number of concordances in that topic within the corpus. Then, we used Cohen’s kappa coefficient (k) to test the interrater. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(37) 37 agreement between the rule-based and the human annotated results. In each case of the three node words, Cohen’s kappa statistics all showed that there are moderate agreements between the two types of annotated results as shown in Table 6. Table 6 Cohen’s kappa statistics on interrater agreement in the three node words z. p. kappa. chansheng. 6.6. <.001. 0.51. niangcheng. 4.85. 0.52. cucheng. 4.14. 治政 <.001 大. 立. <.001. 0.46. ‧ 國. 學 ‧. 3.3 Semantic Network Analysis. y. Nat. 3.3.1 Attracted Collocates Extraction. io. sit. Our second analysis, semantic network, was built upon the collocates attracted to a given node. n. al. er. word. For all the content words (i.e., noun, verb, adjective, and adverb) co-occurring with the. Ch. i n U. v. node words within the 1:1 chunk-based window, we used Delta P to assess the directional. engchi. association between a node word and its collocates under each topic. It measured how faithful a collocate was toward a node word (Gablasova, Brezina , & McEnery, 2017; Gries, 2013). If Delta P value of the collocate was greater than 0, it was attracted by the node word; conversely, if Delta P was lower than 0, the collocate was repulsed by the node word. We defined as significant collocates those collocates with Delta P > 0, and at the same time, with a joint frequency > 1 to remove hapax legomena (i.e., collocates occurring only once in the corpus).. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(38) 38 Let us consider an example of the node word chansheng and its collocate cuojui ‘illusion’ under topic society in our corpus. The computation of Delta P value relied on the distributional information of the node word and its collocate, often cross-tabulated as a contingency table as shown in Table 7. a was the joint frequency of chansheng ‘cause’ and cuojui ‘illusion’; b was the frequency of the node word co-occurring without other collocates; c was the frequency of the collocate co-occurring with the other node items. Finally, d was the corpus size N – (a+b+c), where N in the our analysis was defined as the total number of words tagged as verbs since our target node item was on a clause-level (Stefanowitsch & Gries, 2003, p. 218). Based on the. 治政 formula provided in (4), the Delta P value between the node 大 word chansheng ‘cause’ and cuojui 立 ‘illusion’ was 0.05.. y. 𝑎𝑎 𝑐𝑐 − 𝑎𝑎 + 𝑏𝑏 𝑐𝑐 + 𝑑𝑑. er. io. sit. ‧ 國. ‧. ∆𝑃𝑃 = 𝑃𝑃(𝑂𝑂|𝐶𝐶) − 𝑃𝑃 (𝑂𝑂|−𝐶𝐶) =. Nat. Table 7. 學. (4) Delta P Formula. An example of four types of frequencies necessary for calculating the Delta P value between the. n. al. Ch. node word chansheng and cuojui ‘illusion’. engchi. i n U. v. Outcome: cuojue. No Outcome: not cuojue. Cue: chansheng. a=40. b=758. No Cue: other node words. c=191. d=2,794,678. Note. N = 2,795,667 The type/token frequency of co-occurring words and attracted collocates of the node words chansheng, niangcheng, and cucheng are shown in Table 8, Table 9, and Table 10. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(39) 39 respectively. Attracted collocates referred to those which passed the cut-offs of Delta P and joint frequencies. Table 8 Number of collocates of chansheng across each topic before/after the filtering Enter.. Inter.. Sports. Finance. Lifestyle. 4389. 3689. 6281. 8309. 4659. 9624. 13487. 375. 176. 241. 626. (8.5%). (4.8%). (2.5%). (4.6%). (7.3%). (3.5%). (5.9%). 11173. 20533. 28507. 15590. 1123. 3995. 2474. 1497. (10.1%). (19.5%). (8.7%). (9.6%). y. 2986 (5.1%). io. sit. (21.6%). 58887. ‧. 2518. 政治大 459 290 277. 學. 11656. 立. Nat. Token number (cooccurring) Token number (attracted). Politics. 88678 15277 (17.2%). er. Type number (cooccurring) Type number (attracted). Society. ‧ 國. Topic. Note. The percentage of the number of type/token frequency of the collocates remained after the. n. al. Ch filtering are recorded in the parentheses.. engchi. i n U. v. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(40) 40 Table 9 Number of collocates of niangcheng across each topic before/after filtering Politics. Enter.. Inter.. Sports. Finance. Lifestyle. 1954. 314. 746. 2071. 508. 487. 723. 138 (7.1%). 24 (7.6%). 71 (9.5%). 167 (8.1%). 66 (13%). 41 (8.4%). 62 (8.6%). 4863. 414. 695. 1104. 1323 (27.2%). 78 (18.8%). 155 (17.8%). 218 (19.7%). 立. 250 (23.4%). 1211 (26.7%). 296 (31.2%). ‧ sit. y. Nat. Table 10. 治政大 1068 4533 946. 學. Type number (cooccurring) Type number (attracted) Token number (cooccurring) Token number (attracted). Society. ‧ 國. Topic. Type number (cooccurring) Type number (attracted) Token number (cooccurring) Token number (attracted). al. n. Topic. er. io. Number of collocates of cucheng across each topic before/after filtering. i n U. Society. Politics. Enter.. Inter.. Sports. 257. 1740. 2425. 2511. 15 (5.8%). 103 (5.9%). 150 (6.2%). 286. 3749. 5092. 13 (12.2%). 513 (13.7%). 690 767 (13.6%) (14.7%). Ch. v. Finance. Lifestyle. 1035. 3078. 1013. 162 (6.5%). 77 (7.4%). 75 (2.4%). 67 (6.6%). 5219. 1770. 7659. 1443. 229 (12.9%). 445 (5.8%). 203 (14.1%). engchi. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(41) 41 3.3.2 Semantic Network Construction Semantic network analysis allows us to find a collocate with the prototypical semantics among the selected collocates of a node word. Delta P gave us a lexicon list, consisting of the collocates statistically attracted to the three node words. These attracted collocates were often semantically interrelated, and thus formed a semantic network, where some collocates were more prototypical. A prototypical collocate tends to encapsulate most of the semantic features shared by most collocates included in the network. Therefore, of particular importance to the construction of semantic network was the similarity in-between the collocates.. 政治大 In the analysis of VACs, Ellis et al. (2016, pp. 82-86) used WordNet, an English lexical 立. ‧ 國. 學. database built on an ontological framework where semantically similar words were grouped together, to measure the distance between each pair of verbs in the semantic space. Based on. ‧. these distances, they built up a semantic network of verbs in a VAC construction. However, due. sit. y. Nat. to the limited Chinese tokens in Chinese Wordnet (Huang & Hsieh, 2010), we instead utilized. io. er. GloVe (Pennington, Socher, & Manning, 2014), an effective computational learning algorithm, to transform words into vectors and obtained the semantic distance between each pair of words. n. al. via cosine similarity.. Ch. engchi. i n U. v. GloVe is an unsupervised algorithm that learns the semantic relations between words from a corpus and generates representations for each word in the vector space. The training was initialized by converting a given raw corpus into a word-word co-occurrence matrix, in which rows corresponded to words and columns corresponded to context words (i.e., collocates in a window size). The semantic relation between two words was captured by the ratio of cooccurrence probabilities of the two words with each context word, which was mathematically encoded by the vector differences. The objective function of the model was to minimize the. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(42) 42 proposed weighted least square function of the difference between the dot product of the word vectors of each two words and their co-occurrence probability in a corpus. During the training process, there were three main parameters for users to set: context window size, symmetric/asymmetric window size, and dimension. Context window size referred to the range to consider collocate, and symmetric/asymmetric window size referred to whether the range to the left/right of a given word should be symmetric. Lastly, dimension was the dimensionality of word vector for each word. In our case, we used the ADN corpus as the. 政治大 size as 5 with symmetric context window and set the dimension to 200. Then, based on the word 立. training corpus to obtain the word representations in the vector space. We set the context window. ‧ 國. 學. representations in the vector space, we calculated the distance between each pair of words using cosine similarity, based on (5), where A and B referred to the vectors of two words respectively.. sit. io. n. al. �⃗ 𝐴𝐴⃗ . 𝐵𝐵 ‖𝐴𝐴‖‖𝐵𝐵‖. er. Nat. cosine similarity = cos 𝜃𝜃 =. y. ‧. (5) Collocate Similarity (Cosine Similarity). i n U. v. Third, we constructed different semantic networks for each topic based on the cosine. Ch. engchi. similarity between each pair of collocates under different topics. A semantic network was composed of vertex (word type) and edge (connection between each vertex). The length of edge between each pair of words was based on their cosine similarity value. The higher a cosine similarity value shared between the two vertexes, the closer the two vertexes will be. We set the cosine similarity value 0.6 as a threshold to the connection between each node: Only pair of collocates with a cosine similarity value greater than or equal to 0.6 are connected. The details of building a semantic network can be found in Ellis et al. (2016, pp. 82-86).. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(43) 43 After attaining the cosine distance between each pair of collocates in a network, we measured the semantic prototypicality of each collocate in the network by calculating the degree of betweenness centrality (Ellis et al., 2016, pp. 82-86). The betweenness centrality of a vertex is computed as follows: (6) 𝑔𝑔(𝑣𝑣) = �. 𝑠𝑠≠𝑣𝑣≠𝑡𝑡. 𝜎𝜎𝑠𝑠𝑡𝑡 (𝑣𝑣) 𝜎𝜎𝑠𝑠𝑠𝑠. 政治大. Where 𝜎𝜎𝑠𝑠𝑠𝑠 is the total number of the shortest paths from vertex s to vertex t and 𝜎𝜎𝑠𝑠𝑠𝑠 (𝑣𝑣) is the. 立. number of those paths that involve v.. ‧ 國. 學. We used the visNetwork package (Almende, Benoit, & Titouan, 2018) to plot semantic. ‧. networks based on the parameters computed above, i.e., word embedding, cosine similarity, and degree of betweenness centrality. In the visual representation of the network, each vertex. y. Nat. io. sit. referred to a collocate type and the thickness of the edges between each pair of vertexes were. n. al. er. their mutual distances measured by cosine similarity. The size of a vertex was proportional to its. Ch. i n U. v. semantic prototypicality which was calculated by the degree of betweenness centrality of that. engchi. collocate in the network. The more semantically prototypical a collocate, the larger vertex size of the collocate. The evaluative meaning of a collocate was indicated by one of the three distinctive colors: blue for positive, grey for neutral, and red for negative. The results from the SNA allow us to objectively identify the prototypical collocates of a node word in terms of their semantic commonalities. Given the fact that the semantic features shared by collocates may indicate the typical contexts a node word is often associated with, we. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(44) 44 can therefore extrapolate the possible sources of the topic-dependent positive/negative SP for each node word.. 立. 政治大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(45) 45. 4 Results Our research question is to investigate whether topic has an effect on the SPs of both mixed-SP node word, chansheng, and strong-SP node words, niangcheng and cucheng, in Chinese. There were two main analyses for each node word. The first approach is the rule-based concordance line analysis. This had given us the SP distributions of each node word under each topic. Based on the results, a chi-square test was applied to test the association between TOPIC and SEMANTIC PROSODY.. In the chi-square test, the standardized residuals (SR) in each combination of TOPIC. 政治大 impact on the SP. Any topic with 立an absolute SR value greater than 1.96 in one of the three. and SEMANTIC PROSODY were computed, which allowed us to identify which topic had a greater. ‧ 國. 學. meaning groups suggests that such topic-evaluative meaning pair greatly contributes to the rejection of the null hypothesis of no interaction existing between TOPIC and SEMANTIC PROSODY.. ‧. The follow-up was the network analysis. We utilized the network analysis to find the. sit. y. Nat. prototypical collocates of each node word under topics that significantly enhanced either positive. al. er. io. or negative SP tendency of that node word. Meanwhile, the figures of the semantic networks of these topics were provided. Furthermore, semantic features of the prototypical collocates in these. n. iv n C topics would be generalized so that we can h egrasp h i Ueffect on the contents a node word n gthectopical emerge.. 4.1 Semantic Prosody of Chansheng 4.1.1 Concordance Semantic Prosody Analysis Figure 2 shows the overall distribution of the evaluative meanings of the concordances of the node word chansheng in the entire ADN corpus. The concordances with positive evaluation account for 29.1 %, neutral evaluation 10.6 % and negative evaluation 60.3 % of all the data. Our data suggest that chansheng has a primary negative SP with a less common positive SP.. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.

(46) 46. 立. 政治大. ‧. ‧ 國. 學. io. y. sit. chansheng in the ADN. Nat. Figure 2. Distribution of the evaluative meanings of the collocate types of the node word. n. al. er. Table 11 shows the SP distribution (i.e., positive, neutral, and negative) of the. Ch. i n U. v. concordances of the node word chansheng across the seven topics. A chi-square test shows that. engchi. there is a very significant association between TOPIC and SEMANTIC PROSODY (χ2 =403.48, df=12, p<.001, Cramer’s V=0.26).. DOI:10.6814/THE.NCCU.GIL.001.2019.A07.