• 沒有找到結果。

歧義現象的多層次分析架構-由中文動詞出發

N/A
N/A
Protected

Academic year: 2021

Share "歧義現象的多層次分析架構-由中文動詞出發"

Copied!
117
0
0

加載中.... (立即查看全文)

全文

(1)國立交通大學 外國文學與語言學研究所. 碩士論文. 歧義現象的多層次分析架構: 由中文動詞出發. A Multi-layered Resolution for Disambiguation: Insight from Mandarin Verbs. 研 究 生: 徐雅苓 指導教授: 劉美君. 教授. 中華民國九十五年六月.

(2) 歧義現象的多層次分析架構: 由中文動詞出發 A Multi-layered Resolution for Disambiguation: Insight from Mandarin Verbs. 研 究 生: 徐雅苓. Student: Ya-Ling Hsu. 指導教授: 劉美君. Advisor: Mei-Chun Liu. 國立交通大學 外國文學與語言學研究所 碩士論文. A Thesis Submitted to Graduate Institute of Foreign Literatures and Linguistics College of Humanity and Social Science National Chiao Tung University In Partial Fulfillment of the Requirements for the Degree of Master of Arts. June 2006. Hsinchu, Taiwan, Republic of China 中華民國九十五年六月.

(3) 多義現象的多層分析架構: 由中文動詞出發 學生: 徐雅苓. 指導教授: 劉美君 國立交通大學語言與文化研究所 摘. 要. 本篇論文的研究重點,是提出一個以語料庫語言學為基礎的多層次架構,來 探究多義詞的多義現象,進而建立一套區辨語意的自動標注系統。透過不同語言 學理論,例如框架語意學 (參見 Fillmore 和 Atkins 1992), 構式語法 (參見 Goldberg 1996) 以及話語分析 (參見 Hopper 和 Thompson 1980),以期提供一個 以語言學為出發點的語義檢索機制。多義詞作為詞彙的本質之一,不失為了解句 法、語意、及語用三者互動關係的一個關鍵。雖然前人已提供許多不同的研究方 向來探究多義詞的多義性,包括類別特徵分析方法、原型理論、框架理論、以及 關係理論等,但是仍缺乏一個有系統且具可行性的方法。近來的研究如 Liu 和 Wu (2004),他們提出以語意框架的觀點為基礎來檢視多義性,他們認為語詞的 多義性就如 Fillmore 和 Atkins (1992) 所定義的一樣,是被定義在不同的框架之 下。借重不同的框架成份以及其不同的語法表現,Liu 和 Wu (2004)依循著「一 個語意,一個框架」的假設,使我們看到了,語意的不同可以歸結於動詞所屬的 不同框架概念。然而,這樣的語意界定方法,似乎沒有辦法區辨一個多義動詞的 不同語意,當他們屬於不同框架概念,卻有相同框架成分及語法表現的時候。以 中文動作動詞「拿」為例,其中兩個語意就帶有相同的框架成分及語法表現,如 例子 (1), (1) Agent>V>Theme: a. …病人[Agent] 拿 著健保卡[Theme]上門… (語意 1 ‘持’) b.…我[Agent]可不可以順道 拿 個研究學位[Theme]?... (語意 2 ‘得/取’) 因此,由例子 (1) 我們可以預測,區辨多義詞只靠框架理論是不足夠的。 當框架成分無法提供足夠的資訊來決定語意時,還有什麼是我們沒有考慮到的部 分呢?本文中所要提出的架構,則將兩個重要可變因素考慮進來:配搭組合和語 境依存。本文主要目的在於提出一個多層次的分析架構,來定義多義詞在不同語 法表現中的適當語意。這個多層次的分析方法,依據以下三個步驟可以作為一個 語義區辨的模組:(1) 以框架為依據的區辨方法 (2) 以配搭組合為依據的區辨方 法 (3) 以語境依存為依據的區辨方法。 本文研究主要來自中研院漢語平衡語料庫的自然語料。在文中的個案研究皆 為高頻詞,但每個個案只採 200 筆語料作細部標記。使用語料庫的語料,主要是 i.

(4) 因為語料庫的語料,提供了重要的語法語意分布趨向,這是母語說話者的直覺沒 有辦法察覺到的。 首先,依據 FrameNet 的理論,在我們區辨模組的第一步驟,是把一個語料 庫中的多義詞依據其不同的語意框架概念,而定義為不同的語意;其主要的區分 方式,則是依據不同的框架成分及其主要語法表現,來區分成不同的語意組。當 第一個步驟無法成功區辨語意,也就是當碰到不同語意卻帶有相同框架成分及其 相同的語法表現時,我們則需進入模組中的第二個區辨步驟—配搭組合。在這個 步驟中,我們所須注意的是那些和非核心論元的搭配詞組;這些非核心論元的搭 配詞組依據不同的詞類可再作分類,如副詞、形容詞、時態標記等。進而我們將 會發現,多義詞的不同語義,和這些非核心論元會有不同的固定搭配關係。然而, 當搭配組合的方法也無法提供更進一步的資訊時,我們則需要進到第三步驟—語 境依存;在這個步驟中,我們將搜尋在跨語句的語境當中,是否有和多義詞不同 語意相關的詞語。多義詞和不同語義的連結,主要是建立在它們之間語義或語用 上的相關;在 SUMO 中,我們確實是可以搜尋到它們之間的連結。我們將以四 個中文單詞動詞為例—走、拿、聽、看,以論證本文所提出的模組。 在本文中,藉由所提出的機制,除了重新定義多義性之外,也成功的提供電 腦區辨系統,一個以語言學為基礎的有效的語義區辨模組。. ii.

(5) A Hybrid Resolution for Polysemy: Insight from Mandarin Verbs Student: Ya-Ling Hsu. Advisor: Dr. Mei-Chun Liu. Graduate Institute of Foreign Literatures and Linguistics National Chiao Tung University. Abstract This study explores how multiple senses of polysemous words could be distinguished. It proposes a hybrid and corpus-based linguistic model and specifies the procedures to build an automatic tagger for sense disambiguation based on Mandarin verbs. It seeks to provide a linguistically motivated solution for detecting meaning with the aid of linguistic theories such as Frame Semantics (Fillmore and Atkins 1992 ), Construction Grammar (Goldberg 1996) and discourse analysis (Hopper and Thompson 1980).. Being an essential property of the lexicon, polysemy. is the key to understanding the interplay between syntax, semantics and pragmatics. Although polysemy has been investigated in a number of approaches, including classical feature analysis, prototype theory, frame-based approach, relational approach, and so on, a systematic and applicable solution is still lacking. Recently, working on Mandarin lexical semantics, Liu and Wu (2004) proposed a frame-based perspective in viewing polysemy as belong to different ‘frames’, which is defined by Fillmore and Atkins (1992).. Making use of the distinctions in frame elements and their. grammatical realizations, Liu and Wu (2004) is able to show that semantic differences may be attributed to different semantic frames the verb belongs to, following ‘the one sense, one frame’ hypothesis. However, there are cases where two separate meanings. iii.

(6) of the same verb may show exactly the same surface patterns with the same sets of frame elements. For example, in the case of the motion verb NA 拿, two separate senses may end up with the same number and pattern of frame elements, as shown in (1): (1) Agent < V <Theme: a. …病人[Agent] 拿 著 健保卡[Theme]. 上門… (sense 1 ‘carrying’). bing ren. na zhe bao jian ka. shang men. patien. take ZHE health insurance card up door. ‘The patient carried the health insurance card to the counter.’. b.…我[Agent]可不可以順道 wo I. 拿 個 研究學位[Theme]?(sense 2 ‘getting’). ke bu ke yi shun dao na ge yan jiu xue wei can not can by the way take CL research academic degree. ‘By the way, can I get an academic research degree?’. Therefore, it is apparent that a purely frame-based approach may be insufficient in dealing with polysemes. When frame elements fail to provide determining clues, what else should be taken into consideration? The model proposed in this study calls for consideration of two other variables: colloconstructions and contextual dependencies. This study aims to propose a hybrid multi-module solution to identify the most appropriate lexical sense in various expressions of a polyseme. The hybrid approach can be viewed as a sense disambiguating model based on three steps: 1) frame-based distinction, 2) colloconstruction distinction, and 3) contextual dependence distinction. The study is based on naturally occurring data extracted from the Sinica Balanced Corpus, which is established by the CKIP (Chinese Knowledge and Information Processing) group at Academia Sinica and open to the public at the Internet site: http://www.sinica.edu.tw/ftms-bin/kiwi.sh/. Given the high frequency of occurrences of the target words, only 200 entries are examined closely for the discussion. Corpus data provide explicit and implicit distributional tendencies which iv.

(7) may go beyond native speaker’s intuition. Using corpus data as the input, the first step of the proposed model is to identify the senses of a polysemous word corresponding to the distinctions in semantic frames, following FrameNet. The extracted data from Sinica Corpus can be roughly classified into several frames by their basic patterns of expressing the core frame elements (arguments).. When distinctions of frame elements and their basic patterns fail,. senses are further identified by the second module - Colloconstrucion. In this step, attention is paid to the collocational patterns of non-core arguments.. These non-core. arguments can be classified into various syntactic categories, such as adverbials, adjectives, aspectual markers, and so forth.. And frequent collocates, be it. grammatical or lexical, will be identified with each individual sense.. However,. when colloconstruction fails to indicate any decisive cues, the third module contextual information is called upon.. In this module, the relevant contextual. elements are thoroughly searched to establish a relational link within or cross clausal boundaries. The relational link may be established by any semantic/pragmatic associations between the polyseme and the contextual element that a larger semantic taxonomy, such as SUMO synsets (translated in BOW).. To demonstrate the model,. four sets of verbs (zou 走, na 拿, ting 聽, kan 看) will be used as illustrations.. By. redefining polysemy with operational mechanisms, this study successfully provides a linguistic model with theoretical validity to develop a computational system for sense disambiguation.. v.

(8) 致謝 此篇論文可以順利完成要感謝許多一路上給予我支持與鼓勵的人。首先,要 感謝的是我的指導老師劉美君教授。從我一進語言所開始,便參與老師和電信所 共同執行的「卓越計畫」。藉由和不同領域的人的接觸、溝通,使我對語言學的 應用有不同一層的認識。老師總是引領我們從不同的方向去思考問題、發掘問 題、解決問題,進而提出一個完善的方法;她從不事先給予我們答案,而是啟動 我的思考機制,激發了我對語言學不同議題的熱誠。另外,也很感謝老師提供了 我參與國科會「中文動詞語義研究」計劃的機會,使我不僅僅是接觸其他領域, 更是加強了自身語言學專門領域的背景知識,以及研究方法。也由於參與了這 兩個計畫,在研究中,啟發我對本篇論文的靈感。老師對於我論文的指導始終不 厭其煩,時時提醒我論證及寫作邏輯上的重要觀點,同時也適時的鼓勵我投稿, 以吸收其他人的批評指正,讓我不至於陷入自我的迷失之中。老師的諄諄教誨, 讓我得以順利完成論文。另外,也要感謝口試委員連金發老師以及劉辰生老師。 連老師點出了我在寫作上的盲點,更提供了寶貴的意見,引領我朝不同的方向來 思考,進而使我的論文可以更面面俱到。辰生老師對於漢語內部細微的觀察,提 醒了我在語料分析上必要的謹慎,以及在語料的觀察上必須注意的正反兩面;對 於辰生老師我也要特別的感謝,因為老師是我在語言學上的啟蒙老師,老師對語 言學的熱忱,讓我感受到語言學殿堂中的奧妙。對於老師們的建議與指教,讓我 在論文寫作上可以更精進,學生由衷的感謝。同時,對於教導過我的交大與清大 老師們(許慧娟、林若望、潘荷仙、曹逢甫),我也深深的感謝,因為你們的教導, 使我在語言學的路上,擷取了不同語言學領域的知識。 在語言學的路上,也要謝謝一路幫我加油打氣的朋友。Alice 百忙之中還抽 空幫我看資格考的論文,不僅是我球場上的教練,更是我學業上的私人教師;馥 綾、珮婷、明芹、英翰、小花學姊不時的鼓勵我,聽我吐吐悶氣,讓我可以適時 的抒發心情。也很感謝佩儀學姐,她不僅僅是生活中的好朋友,更是學術上值得 學習的榜樣,有你的分享,讓我在語言學的分析討論上有了不少的啟發。當然, 交大的同學及學弟妹們,尤其是一起做國科會計劃的亭儀、姿君、明輝、佳音、 子玲、璦羽,和你們一起討論、彼此切磋語言學上的種種,每每都讓我獲益良多。 最後,當然要感謝我的家人,謝謝我最敬重的爺爺,謝謝你的慈愛,雖然您 還來不及分享我的一切,如今我順利完成修業,你是我想第一個告知的人。謝謝 爸爸媽媽,你們無怨無悔的付出,使我無後顧之憂的在求學路上不斷發展,你們 的鼓勵,永遠是支持我最大的力量。我親愛的妹妹們,你們的加油打氣,永遠是 最窩心,令我感動不已的,謝謝你們。 僅將此本碩士論文,獻給我最敬愛的父母,徐慶旺先生、何麗燕女士,表達 女兒由衷的感謝,爸爸、媽媽,謝謝你們!. vi.

(9) Table of Contents Chinese abstract……………………………………………………………………….i English abstract……………………………………………………………………….iii Acknowledgement……………………………………………………………………vi Table of contents………………………………………………………………...…...vii 1. 2. Introduction.............................................................................................................1 1.1. Polysemy in general......................................................................................2. 1.2. Polysemy in Mandarin……………………………………………………...4. 1.3. Question and solution....................................................................................6. 1.4. Sense disambiguating model……………………………………………….8. 1.5. Organization of the study………………………………………………….10. Literature review…………………………………………………………….......11 2.1. Four major analytic approaches that involve polysemy…………………..11 2.1.1 Classical approach……………………………………………………12 2.1.2 Prototype approach…………………………………………………...13 2.1.3 Frame-based approach………………………………………………..14 2.1.4 Relational approach…………………………………………………..15. 3. 2.2. Corpus-based approach on polysemy……………………………………..16. 2.3. Computational linguistics…………………………………………………20. 2.4. Summary of previous works………………………………………………21. Methodology and data…………………………………………………………...23 3.1. Methodology………………………………………………………………23. 3.2. Data………………………………………………………………………..23. vii.

(10) 4. Case study of motion verb ZOU 走……………………………………………..25 4.1. Frame-based sense identification………………………………………….25. 4.2. Colloconstruction distinction……………………………………………...31 4.2.1 Distinctive collocates: sense 1 vs. sense 3……………………………34 4.2.2 Distinctive collocates: sense 1 vs. sense 4……………………………38. 4.3 5. 6. 7. 8. Contextual dependence distinction………………………………………..41. Case study of motion verb NA 拿……………………………………………….44 5.1. Frame-based sense identification………………………………………….44. 5.2. Colloconstruction distinction: sense 1 vs. sense 2………………………...48. 5.3. Contextual dependence distinction………………………………………..52. Case study of perception verb TING 聽………………………………………...55 6.1. Frame-based sense identification………………………………………….55. 6.2. Colloconstruction distinction: sense 1 vs. sense 2………………………...62. 6.3. Contextual dependence distinction……………………………………......69. Case study of perception verb KAN 看…………………………………………72 7.1. Frame-based sense identification………………………………………….72. 7.2. Colloconstruction distinction: sense 1 vs. sense 2………………………...79. 7.3. Contextual dependence distinction………………………………………..86. Conclusion……………………………………………………………………….89. References……………………………………………………………………………92 Appendix I……………………………………………………………………………97 Appendix II…………………………………………………………………………102 Appendix III………………………...………………………………………………105. viii.

(11) 1. Introduction This paper presents a corpus-based hybrid-analysis model on the issue of sense. disambiguation via case studies of Mandarin ambiguous verbs. It seeks to find a solution for a complete investigation of the behavior of ambiguous words in corpus. To explore ambiguity, an investigation of a relevant issue, polysemy discussed in linguistics, needs to be considered.. Being one essential part of lexicon, polysemy. provides the key to understanding the syntactic and semantic properties of lexicon. Previous research has proposed many different perspectives to discuss this issue. However, these studies still provide insufficient explanations.. This issue might seem. extremely complicated and thorny, but there is a need for a complete investigation of polysemy.. An overall probe into ambiguouous words’ behavior will help advance. the research of Mandarin linguistics in general and provide a practical solution for application in computational system.. Consequently, following this issue the goal of. this study aims to provide a hybrid-analysis module to identify the various expressions of ambiguous words for an in-depth reconsideration of ambiguity.. The. fundamental of this research follows what is claimed in Fillmore (1992:76): “… a word’s meaning can be understood only with reference to a structured background of experience, beliefs, or practices, constituting a kind of conceptual prerequisite for understanding the meaning.”. According to Fillmore’s conceptual schema, Liu and Wu (2004) distinguished ambiguous words base on different cognitive structures (or “frames”) which contains various particular category (or “core frame elements”) and specific “lexical-syntactic patterns” by Fillmore (1992).. However, the essential problem remains: Is. frame-based analysis sufficient to account for all the expressions of ambiguous words? This is the central question for which this paper seeks to provide an answer. -1-.

(12) 1.1 Polysemy in general In previous studies, lexical ambiguity is a heterogeneous phenomenon.. That is,. lexical ambiguity is caused at least by the following three crucial factors (Pustejovsky and Boguraev 1996: 6): —. Contrastive ambiguity, which is normally resolved by contextual and discourse knowledge;. —. Complementary. ambiguity. (or. logical. polysemy),. as. resolved. by. co-composition in the syntactic context of the sentence; and —. Sense extensions, as mediated by lexical rules and specific conditions relating to the speaker and context. In general, the first factor contributes to the appearance of homonymy such as. the two interpretations of bank as in ‘river bank’ and ‘financial bank’, and vagueness, for instance, news in ‘I read the news this morning’ (news as press communiqué) and in ‘I haven’t heard any news about him since he left’ (news as the information about him).. Traditionally, polysemy has been distinguished from homonymy and. vagueness.. Generally speaking, homographs are unrelated words which are. represented by the same word forms while polyemous words have semantically related word meanings.. Another distinction is made between polysemy and. vagueness. Given a huge number of discussions, the differences between polysemy and vagueness are still controversial (Cruse, 1986; Lakoff, 1987; Wierzbicka, 1990; Geeraerts, 1993.).. Geeraerts (1993) claimed that the failure of reaching a consensus. could be attributed to the question: what is our conception of meaning and of lexical semantics?. Recent development in semantic research has generated wide interest in. investigating the polysemy caused by the third factor above—sense extension rather than homonym or vagueness.. Sense extension is basically divided into two types: (1). extended by various syntactic behaviors, (2) extended via metaphor or metonymy -2-.

(13) (Lien 2000).. The polysemy considered in this study involves syntactic variation.. In the study of polysemy attributing to the sense extension, researchers have tried to define and establish relations of different meanings of polysemous words.. In the. early semantic paradigm, the major concern on plysemy was focused on how to define a polysemous lexical item or how to determine the number of senses in a polysemous word, but there remains the issue of these discussions (Ravin, 1990; Jackendoff, 1985; Fellbaum, 1998). In linguistics, there are four major approaches dealing with the issue of polysemy: classical approach of semantic anaysis, prototype approach, frame-based approach, and relational approach.. Within each approach, there are still some controversies. and variations in analyzing polysemy. One point worth emphasizing in the classical approach is that meaning is viewed as consisting of a set of decomposed semantic features by necessary and sufficient conditions (Katz 1972). However, there is a danger of an infinite increase of senses of a polysemous word which is identified by infinite semantic features. The assumption of prototype approach is that meaning is defined in the concept that meaning exhibits family resemblance and is linked to mental representations, cognitive models and bodily experience.. As a direct consequence, Rosch (1977). demonstrated that people categorize objects on the basis of resemblance of the objects to a prototypical member of the category. But the problem is that without constraints, meanings can be infinitely related to each other by resembling features, so that senses of irrelevant polysemous words may end in linking to each other. Recently, Fillmore (1992) proposes a cognitive analysis based on frame semantics.. In this theory, a word’s meaning is understood within structured. background knowledge. Thus, word senses are not directly related to each other but are defined by common background frames. -3-. Further, he investigated words’.

(14) meanings by their realization in different syntactic patterns with different highlighted categories (as core frame elements) from a large corpus and builds a frame-based online dictionary (FrameNet).. Following Fillmore’s Frame Semantics (1992), sense. identification is not just defined by a speaker’s intuition but comes from the real utilization of natural language.. The concept of FRAME refines the notion of. polysemy, as Fillmore reconsiders polysemy extending from a semantic frame into a new domain.. However, sense identification of polysemy is still questioned when. different senses of polysemous words occur in the same realization with the same core frame elements. More recently, Fellbaum (1998) has proposed that words are constructed depending on their meanings by the remains of their semantic relation or the semantic network to which they belong.. However, senses of a polysemous word which occur. in the semantic network might be very distant from each other.. For example, in. WordNet, there are three senses of the noun ash: (1) the residue (2) timber trees (3) ash trees, the first one is in the structure as plant material while the other two as a woody plant (Yale and Claudia 2000).. These senses are distant because their. semantic relation cannot be linked by proximity in WordNet. In sum, from the classical approach (Katz 1972), to the prototype approach (Jackendoff 1985, Lakoff 1987, Taylor 1989), to the frame-based approach (Fillmore and Atkins 1992) and finally to the relational approach, polysemy has gone through a long history and has been studied from different perspectives.. In contrast, Mandarin. polysemy still awaits detailed discussion. Research on Mandarin polysemy will be introduced below. 1.2 Polysemy in Mandarin Due to different theoretical interests, the focus of previous studies on Mandarin polysemy can be characterized as follows: -4-.

(15) —. Grammaticalization: polysemy happens when “lexical items come in certain linguistic context to serve grammatical functions, and one grammaticalized, continue to develop new grammatical functions.” (Su 2002) Then, various functions contribute to different meanings of the lexical items (i.e. Liu 1994, Su 2002, Lai 2004).. —. Metaphorical extension: metaphorical extension is the supporting evidence for meaning change through the linking between abstract to concrete (Lin and Ahrens 2005, Cao, Cai and Liu2001).. In general, earlier discussions encounter two problems.. First, the identification. of different senses of polysemous words in natural occurrences significantly influences the application in natural language processing and Chinese teaching. However, what is insufficient is the investigation of polysemous words in naturally occurring data.. Second, the crucial factor causing polysemy that is often taken into. consideration pertains to the association between meaning and its syntactic behavior. These problems are the most important issue in sense disambiguation.. What is. lacking and needs to be explored is how to identify the senses of disambigous words through their realization in natural occurrences. More recently, following Fillmore et al (1992), Liu and Wu (2004) provided one of earliest studies discussing Mandarin polysemy respect to Frame-based approach. Instead of explaining what is the way meaning extending, they have shifted the focus to investigating the distinction of different senses of an ambiguous word in corpus. Based on frame-semantics, first, they define the senses of a polysemous word via different syntactic behaviors corresponding to basic patterns (BP) in FrameNet. Besides, in their paper, Liu and Wu (2004) also provided other evidences to support this distinction, for example, the collocation association and the semantic attributes of core frame elements, such as the combination of manner, the aspectual markers and -5-.

(16) the negatives.. A more detailed introduction of the study is presented in Chapter 2 of. this paper. In sum, Mandarin polysemy studies are in the preliminary stage.. More. comprehensive and extensive research is supposed to unveil the sense distinction. 1.3 Questions and solutions Following the study of Liu and Wu (2004), this paper aims to provide more clear and complete discussion for sense disambiguation.. Traditionally, in Mandarin,. monosyllabic characters are one crucial source of polysemy.. Verbs have been. viewed as the category which carries various meanings in different syntactic expression, such as mentioned by Pustejovsky and Boguraev (1996) that verbal polysemous words, sometimes, lead to some complicated problems for lexical semantics.. Besides, in sense disambiguation, monosyllabic words are more. complicated. Therefore, in this research, case studies will be focused on the monosyllabic verbs. First of all, extracted data from the corpus are also defined based on frame-semantics as in Liu and Wu’s study (2004).. However, some cases. of corpus data still remain problematic (as in (1) and (2)) that the sense of these two ZOUs can not be identified solely via core frame elements and basic patterns. (1) a. 我[Self-mover]走在 wo I. 大安 森林. 公園[Area] (Sense 1 ‘walking’). zou zai da an sen lin gong yuan walk in. Da An forest. park. ‘I walked in Da An forest park.’. b. 我[Self-mover]走一趟 wo I. zou yitang go once. 公園[Area] (Sense 3 ‘visiting’). 大安森林 da an sen lin. gong yuan. Da An forest. park. ‘I visited at Da An forest park’. (2) a. 我 腳. 好痠, 我[Self-mover]沒辦法 -6-. 走了(Sense 1 ‘walking’).

(17) wo. jiao. hao suan, wo. my feet so. limp,. mei ban fa. zou le. cannot. walk LE. I. ‘My feet are so limp that I can not walk anymore.’. b.火車. 早. 就. huo che zao jiu train. already. 開走了,我們[Self-mover]沒辦法 走了(Sense 4 ‘leaving’) kai zou le,. wo men. mei ban fa zou le. drive away LE we. cannot. walk LE. ‘The train has already driven away, and we can't leave.’. The senses of the verb ZOU in (1) are similar to walking, and visiting in English, and in (2) are similar to walking and leaving in English.. In FrameNet, according to. their basic patterns with core frame elements they should be classified into the same domains because both ZOUs in (1a) and (1b) share the basic pattern: Self-mover <ZOU< Area, and both ZOUs in (2a) and (2b) share the basic pattern: Self-mover < ZOU.. However, by native speakers’ intuition and according to other components in. the context, these ZOUs should be identified as different meanings.. What cannot be. explained is why these different expressions of ZOU 走 belong to the same frame but denote different meanings, contradicting the frame-based approach. Therefore, there might be something insufficient in Frame-based analysis.. Thus, as Liu and Wu. (2004) proved in their study, collocational association could provide more information for further distinction of ambiguous words. module-colloconstruction proposed in this paper.. This is similar to the second Colloconstruction refers to a. specific lexical item categorized with syntactic characteristic bearing certain semantic properties that frequently co-occurs with the target sense.. But, what remains unclear. in Liu and Wu’s research (2004) is that there is no explicit definition and criteria of their collocational association. Besides, it is also found that some cases denote different senses sharing the same collocational association. It would seem, therefore, that further investigations are needed in order to distinguish ambiguous words more -7-.

(18) completely.. In the previous studies, linguists have point out that the word meaning. does not fully exist within lexical items, but some of the meaning components are generated by lexical relation in context (Saeed 1997:53; Cruse 2004).. Other. supports coming from psycholinguists, such as in Swenny’s experiment (1979), point out that relevant lexical items in context can facilitate lexical decision of a ambiguous word.. Therefore, the information from the context is thought to be significant for the. further distinction and each sense is believed to display certain features linking to the relevant lexical in Context. 1.4 Model for sense disambiguating A hybrid sense disambiguating model is proposed in this study to provide complete sense distinction of Mandarin ambiguity.. The resolution formula is. schematically represented in the model diagram as (3) below.. In view of the need. for investigating real language use, the analysis of this paper adopts corpus-based approach. Following the study of Liu and Wu (2004), the first module of sense distinction is also based on frame semantics.. Following the “one sense, one frame”. hypothesis, multiple senses of polysemous words are identified via different core frame elements and basic patterns. The senses of an ambiguous word either can be distinguished in this module or cannot be successfully defined and then, disambiguated in the second module-colloconstruction.. Colloconstruction is. different from collocation and construction. Colloconstruction refers to co-occurring patterns which might be collocates or possible constructions while collocation refers to co-occurring lexical items, and construction refers to a meaning unit. With the information of co-occurring categorical collocation with specific semantic meaning, distinctive colloconstruction may be found to further distinguish multiple senses. But, if colloconstruction fails, further distinctive features should be investigated. That is, in order to further disambiguate multiple senses which can not be identified in -8-.

(19) module 1 and module 2, a third step—module 3: contextual dependency is necessary. In previous works, semantic contexts are divided into two types, word context, and sentential context (Johnson-Laird et al 1989). Word context refers to the background knowledge within the lexical system itself while sentential context means information from outside the lexicon.. The third module, so called contextual dependency refers. to the sentential contexts, that is, we abstract the relevant information from beyond the lexical background knowledge from the context. In this module, we look for the semantic or pragmatic relevant lexical items across clausal boundaries to help further distinction.. What is presented in this paper can be viewed as a complete and detailed. analysis on recent efforts and advances in the research of Mandarin ambiguity. (3) The model of sense disambiguation. -9-.

(20) 1.5 Organization of the study To provide more detailed analysis of the issue—ambiguity—this research is composed of eight sections.. The first section gives the brief but insightful. introduction of this study. The second section is an overview of the background, the method and the theoretical perspectives from both analytic linguistics and computational linguistics involving the issue of ambiguity (or polysemy).. The third. Section gives an overall introduction of the three analytic steps (also three modules) and proposes a model of a sense disambiguator. The focus of section four and five is to postulate the three analytic steps with two case studies, motion verbs ZOU 走, and NA 拿; section six and seven provide another kind of verb—perception verbs—TING 聽 and KAN 看 as illustrations.. The final part of this paper wraps up the study with. a conclusion and invites more academic discussion and interest on the issue of ambiguity in Mandarin.. - 10 -.

(21) 2. Literature review In analytic (or theoretical) approaches to semantics, polysemy has been discussed for a long time but it is still puzzling nowadays.. This issue is not only concerned. with semantic linguistics, but also with an unsolved problem, sense disambiguation, in computational linguistics.. In this section four major analytic approaches to. polysemy, classical approach, relational approach, and prototype approach are presented. Since the corpus-based approach gradually becomes the major approach in analysis of linguistic research, it is also adopted in this paper.. After the. representation of the analytic approaches, a brief introduction of the corpus-based approach is given.. Further, in order to apply this hybrid analytic disambiguating. model in a computational system, an overview of previous studies involving polysemy in computational linguistics is also included.. Finally, a brief. generalization of previous research is presented with some critiques. 2.1. Four major analytic approaches that involve polysemy A lexical item is presented in terms of a process of cognitive abstraction. In order to explain this process, the trend of semantic approaches is leaded by two principles, sometimes, with opposite viewpoints: first, generalization (or reducing) of polysemy, and second, distinction (or increasing) of polysemy.. Depending on. generalization, linguists try to generalize the discussion of polysemy in order to make an explanation of the theory more convincing.. While, according to distinction, an. accounting of the semantic details of polysemy, researchers try to find out as many distinctions as possible (Yale and Claudis, 2000).. These diverse principles, classical. approach, prototype approach, frame-based approach and relational approach provide four major different perspectives to polysemy and are introduced in the following sub-sections. - 11 -.

(22) 2.1.1 Classical approach According to the classical approach, it is traditionally assumed that an individual entity is composed of a set of cognitive categories.. For example, in the sentence,. John is a man, if John possesses a number of necessary and sufficient properties (also features of this approach) that define the category man, he is a man.. Following this. concept, a new semantic explanation of the classical approach is developed by Katz (1972).. He claims that when giving the sense of a word a conceptual schema. should be provided rather than discussing the relationship of the meaning to the word. In this scheme, word knowledge is decomposed into numerous meaning features (which Katz called “conceptual categories”) by necessary and sufficient conditions. In his principle, a lexicon consists of semantic components; and related senses might share some semantic features.. For instance, chair might be decomposed into object. and physical; besides, chair, bottle, and window may share the same semantic marker, object. Moreover, in this schema, even a distinctive semantic feature could be a significant hint to distinguish different senses of polysemous words. However, the principle of Katz’s claim brings about some problems of polysemy. First, infinite semantic features may generate infinite senses.. Further investigating. Katz’s theory, Ravin (1990) proposes that “there are no clear criteria for which aspects of a real world situation are relevant to the semantics of a particular verb, but there is a methodology for determining which aspects ought to be semantically represented.” Second, the classical approach does not emphasize how the semantic components can help us disambiguate polysemous words when different senses are realized in the same expression, that is, there is no mention about the syntactic behaviors of lexical items in the approach. Following Ravin’s statement, a methodology is necessary and will be given in the following section to define the senses of polysemous words.. - 12 -.

(23) 2.1.2. Prototype approach In the classical approach, the view of word meanings as consisting of necessary and sufficient conditions has been questioned, especially in the philosophy of language. For example, Wittgenstein (1958) claims: The idea that in order to get clear about the meaning of a general term one had to find the common element in all its applications has shackled philosophical investigation for it has not only led to no result, but also made the philosopher dismiss as irrelevant the concrete cases, which alone could have helped him to understand the usage of the general term.. In Wittgenstein’s (1953) famous discussion of the meaning of the word game, he concluded that categories of meanings are familiarly resembled. further introduced in psychology by Rosch (1977).. This approach is. She demonstrated that people. categorize objects not depending on necessary and sufficient conditions, but by relying on the resemblance of these objects to the prototypical members.. In her. studies, Rosch did find that people categorize objects by the concept of prototypes. For example, in the Danni culture, they have only two color categories—one represents all light, warm colors and the other represent all dark, cool colors.. Rosch. found that in most conditions, they recognized prototypical red, yellow and white as being in the first category, and prototypical blue, and black in the second category. Rosch also claimed that there are two prototypical models: in the first one, a single prototype contains the largest number of characteristic features and in the second one several prototypes each contain a different set of characteristic features.. Linguists. have adopted the second one to deal with polysemy. With the concept of prototypes, Fillmore (1982) proposed one of the earliest discussions about prototypical meanings. components it resembles.. He defined a word’s meaning by the. When the meaning encompasses all of the components, - 13 -.

(24) the use of the word is the most prototypical.. When the meaning has none of the. components, the use of the word is not prototypical. And when the meaning holds some of the components, the use of the word is peripherally prototypical. Taylor (1989) gives a more direct emphasis on connecting polysemy with the concept of prototypes: ‘if different uses of a lexical item require, for their explication, reference to two different domains, or two different sets of domains, this is a strong indication that the lexical item in question is polysemous.’. For example, school can be. understood as the education of children as well as the administrative structure of a university which can be classified in different domains, thus, it can be viewed as a polysemous word.. Further, Taylor adds another type of prototypical category—one. without central meaning.. For example, over can express a static relation of being. vertical without contact with the reference, as in “the apple is over the table”; or a dynamic relation of being vertical without contact with the reference as in “the plain flew over the country.” In walk over the blocks expresses a dynamic relation without involving contact, and so on (Ravin and Leacock 2000).. With these prototypical. categories, word meaning is defined by the resemblance in the prototype.. However,. in this approach there is no clear discussion of how to distinguish the meanings of polysemous words. 2.1.3. Frame-based approach In addition to prototype concepts, Fillmore et al (1992) proposed frame semantics in which a word’s meaning is defined by a cognitive frame—when one word’s expression is compatible in this frame, it denotes the meaning of this frame. A frame is determined by our background knowledge and experience with the lexicon. That is, a lexical meaning is identified by a structured cognitive schema in our mind. Based on this notion, Fillmore built a frame-based online dictionary in which different senses of polysemous words are linked to various cognitive structures (or “frames”), - 14 -.

(25) and the knowledge of the frame is encoded by the words. In his “frame-based” approach, the concept of “frames” makes it helpful in reconsidering polysemy.. For. example, it is known that there are two senses of the verb RISK which are RISK as “put at risk” and RISK as “face the risk of.”. These two senses occur in different. contexts where they are found in very different syntactic structures, thus, the two senses vary from each other by their specific syntactic behaviors.. Therefore,. different usages of the verb RISK might be necessary to help identify the specific sense of it. The interrelations between Frame and Syntax, thus, become a very important issue in Fillmore’s studies, and by this, a different concept of each sense helps distinguish polysemous words.. Nevertheless, this approach still cannot. account for the situation when the different senses of a polysemous word appear in the same expression. 2.1.4. Relational approach Relation models are widely used to form a semantic network. In these models, words are organized depending on the semantic relations between their meanings. Similar to the prototype approach, the relational approach also deals with semantic fields.. Word relations according to this approach include Synonymy, Atonymy,. Hypernymy, and Hyponymy and so forth.. Synonymy can be defined as when two. words can be substituted for each other in a context without changing the meaning of the clause. Atonymy is defined as substitution of two words in a context that have opposite meaning in the phrase.. Hypernymy (superordinate relation), also called IS. A relation is the linkage between lexical items in a specific-general relationship. Hyponymy, the opposite relation to Hypernymy is the association between lexical items in a general-specific relationship.. The relational approach is ideal for inferring,. especially the transitive properties of word relation.. For example, the hypernym of. book is publication, and the hypernym of publication is piece of work. - 15 -. Because of.

(26) the transitive relation, an assumption could be concluded that the hypernym of book is piece of work. Based on the relational approach, the online dictionary, WordNet (Fellbaum, 1998), was developed by George Miller and his colleagues at Princeton University. As a source of related words for target sense in queries, WordNet indeed provides an improved solution.. For example, looking up the word board in the noun hierarchy of. WordNet, the ‘lumber’ sense of board could be detected by the hint of its related word nail, hammer, and carpenter.. When talking about the polysemous verb, however,. in this network, no information about syntactic relations is given.. As Ravin and. Leacock (2000) stated, “…most relational approaches maintain the classical division of sense distinction for polysemous words but they do not decompose the meaning of concepts”.. Further, the relation of meanings of polysemous words might be far. distant, thus their meaning relation cannot be defined by the semantic relation in the semantic network. 2.2 Corpus-based approach on polysemy According to Douglas et al., there are four essential characteristics for defining what the corpus-based approach is (cf. Douglas, Susan, Randi, 1998:3-4): λ. It is empirical, analyzing the actual patterns of use in natural texts;. λ. It utilizes a large and principled collection of natural texts, known as a “corpus,” as the basis for analysis;. λ. It makes extensive use of computers for analysis, using both automatic and interactive techniques;. λ. It depends on both quantitative and qualitative analytical techniques.. With these features, corpus data can expose the general distribution information of lexical items that native speakers will not readily ascertain by intuition. Rather than discuss what is theoretically possible in language, the significance of the approach, - 16 -.

(27) corpus-based approach is that it is more concerned about how exactly the language is used in daily life. Naturally occurring data coming from corpus gives previous approaches a brand-new perspective to re-investigate research.. The advantage of the. corpus-based approach is that it provides a large database of naturally occurring data and from the data, observational generalization and significant statistic analysis can be more convincing. the corpus.. The range of variation in language is more honestly represented in. Further, naturally occurring data show distributional tendencies for. linguistic analysis.. As the target issue, polysemy can be more effectively solved by. looking at corpus distribution of polysemous words.. The studies of Fillmore et al.. (1992) and Liu (2004) convincingly showed the merit of corpus-based approach in analyzing polysemous words. From the corpus data, Fillmore et al. (1992) generalized numerous semantic concepts (frames) depending on their different sets of categories (core frame elements) realized in syntactic behaviors (basic patterns). For example, in the frame of risk, the core frame elements include Chance, Harm, Victim, Valued Object, (Risky) Situation, Deed, and Actor.. These core frame elements are realized as different. syntactic patterns, such as, the core frame elements value object (VO), and situation (Sit) are realized as:. VO. {NP}. Sit. {Prep NP} for the example(Fillmore et al 1992:87):. He was being asked to risk. VO {he} Sit {being asked to risk} In reference to polysemy, Fillmore et al (1992) claimed that in addition to the sense extension by metaphor and so on, if the verb risk is realized as “put at risk” in one context but as “face the risk of” in another, these must be taken as evidence for different senses of the verb.. Combining grammatical characteristics with semantic - 17 -.

(28) properties Fillmore et al (1992) presented two kinds of polysemy, one kind “resulting from a transfer of a semantic frame to a new domain (through metonymy or metaphor, for example) and the kind that reflects merely the accommodation of a word to different syntactic patterns” (Fillmore et al 1992). Both are discussed in this study. Following Fillmore et al (1992), Liu and Wu (2004) provide one of the earliest studies applying frame-based approach with respect to Mandarin polysemy.. The. goal of their paper was to investigate how meanings are different or related to each other by the case study of encoding verb biao3 shi4 表示.. Through this case study,. Liu and Wu (2004) showed that differences among the identical forms of a lexical (biao3 shi4 表示, for example) can be explained by a systematic matching via a “conceptual schema” According to HowNet1, Liu and Wu (2004) found that there are three definitions of biao3 shi4 表示, express, expression, and show emotion.. Using WordNet, among. the 7 senses of express listed, only 4 are linked to Mandarin biao3 shi4 表示.. They. are express as in She showed her disappointment, verbalize as in She expressed her anger, state as in Could you express this distance in kilometers”, and convey as in His voice carried a lot of anger.. Data from the corpus show that biao3 shi4 表示 can. be similar to English say, point out, state, add, describe, explain, note, affirm, chuckle, mutter, tell, express, refer to, show, indicate, mean, and represent. The problem is, how many meanings does biao3 shi4 表示 have and what principles are used for distinction?. Based on the syntactic behaviors, they classified the data into three. groups (Liu and Wu, 2004):. 1. HowNet is an on-line common-sense knowledge base unveiling inter-conceptual relations and. inter-attribute relations of concepts as connoting in lexicons of the Chinese and their English equivalents. - 18 -.

(29) Group 1-biao3 shi4 表示 1 。 (4). 李先生表示:「這不過是做好分內的事」 ‘Mr. Li says, ‘‘I just did what I’m supposed to do.”’ (related English equivalents: say, point out, state, add, explain, note, affirm, chuckle, mutter, tell and so on. ) Group 2-biao3 shi4 表示 2 (5). 李先生表示同情。 ‘Mr. Li expressed his sympathy.’ (related English equivalents: express, show and so on.) Group3-biao3 shi4 表示 3 (6). a. 我這麼說並不表示我不重視可能的弊端 Saying so doesn’t mean that I am being taking lightly the possibility of creating abuses. b. 鮮花表示愛情。 ‘Fresh flowers represent love.’ c. 一支國旗表示一萬尾烏魚, One flag represents 10,000 grey mullet. (related English equivalents: mean, show, represent, indicate, carry and so on.) Each group can correspond to different frames in FrameNet by linking them to their meaning in English.. That is, biao3 shi4 表示 1 corresponding to say in English. is in Statement frame which includes core frame elements Speaker, and Message; biao3 shi4 表示 2 corresponding to express is in Encoding frame which includes core frame elements Speaker, and Message; biao3 shi4 表示 3 corresponding to represent is in Evidence frame which includes core frame elements Sign, and Message. In - 19 -.

(30) addition to frame information, Liu and Wu (2004) also provide more evidence to support this classification by collocation associations from four parts, the semantic attributes of core frame elements, the combination with manner, the aspectual markers and the negatives.. Depending on their criteria, to some degree, polysemous words. can be explained and defined well.. However, as discussed in this paper, it is found. that although these three senses of biao3 shi4 表示 belong to three different frames, they have similar sets of core frame elements.. For example, in biao3 shi4 表示 1. and biao3 shi4 表示 2, they both contain Speaker and Message.. The problem is. when these two core frame elements are realized as the similar syntactic behavior (same patterns), how could the classification be completed?. This is the central. concern in my study. 2.3 Computational linguistics Computer applications which involve handling the content of natural language need to be concerned with the issue of polysemy.. In recent works, the main focus. in Natural Language Processing (NLP) was on collocations, i.e. target lexical item co-occurs with preferring lexical items.. However, only searching for collocation. raises the problem that the co-occurring lexemes found in corpus data are usually unexpected. There are two considerable problems of collocation based on statistical methods, first, “low precision” and second, difficulty in dealing with “rare collocations” (Li et al., 2005). Moreover, collocational patterns provide a lack of adequacy of grammatical descriptions.. To extend collocational analysis. Collostructions are proposed (Anatol and Gries 2003). Collostructing is to attract lexemes which are associated with a particular construction; the combination of a collexeme and a collostruct is referred to as a collostruction.. However, this device. faces the problem that the extracted collexeme and a collostruct might be unexpected as well.. Moreover, if no collexem and collostruct association is found, how do we - 20 -.

(31) disambiguate polysemous words?. As mentioned above, therefore, determining the. correct sense of a query word by detecting collexem or collostruct is unlikely to be successful. More recently, another important method which can help disambiguate is the finding of topic and local components.. A study of ‘Disambiguating Highly. Ambiguous words’(Towell and Voorhees 1998) explores contextual representations by using neural networks to extract both topical and local contexts and combining the results of the two networks into a single word sense classifier.. The topical. component refers to the word co-occurring with the specific sense of a target word frequently, while the local component contains the syntactic information of the sense. This method has similar concerns to theoretical approaches that combine semantic information and syntactic realization. Although utilizing topical and local components to help identify word senses seems more accurate, there is another perspective which Towell and Voorhees (1998) did not consider. It is found that in previous studies there is insufficient information in each study to provide a highly accurate disambiguator with convincing theoretical dependence.. In this paper, a hybrid model based on the frame-semantic approach. combined with syntactic and pragmatic (discourse) properties is provided.. In. searching for the most effective way of investigating polysemy, a hybrid analysis could provide a theoretical dependence conduit module to build a sense diambiguator. 2.4 Summary of previous works The studies reviewed in this section all deal with polysemy from different points of view.. Among these approaches, Katz’s classical schema (1972), and Fellbaum’s. WordNet (1998) give a clear explanation of polysemy.. However, they established. the relationship between word meanings without investigating naturally occurring language.. In addition, they were not concerned with the effect of syntactic - 21 -.

(32) behavior on the lexical meaning.. In contrast, by combing the semantic category. and syntactic behavior, frame-based approach (Fillmore et al, 1992) investigation of corpus data provides a more convincing discussion of polysemy.. But, still unsolved. are the further distinctions which need respective collocational and pragmatic association.. In Mandarin, besides the study of Liu and Wu (2004), few touch upon. the issue of polysemy investigating corpus data.. The problem remaining is how to. disambiguate polysemous words completely; this is the major focus in this paper.. - 22 -.

(33) 3. Methodology and data 3.1. Data The data, in this study, are extracted from the Academia Sinica Balanced Corpus (Sinica corpus).. The Sinica corpus, containing a total 5 million words with. part-of-speech tagging (Huang et al 1996), is the largest balanced corpus containing both written and spoken contemporary Mandarin data.. This corpus was established. by the Chinese Knowledge and Information Processing (CKIP) group at Academia Sinica,. Taiwan,. and. it. is. open. to. http://www.sinica.edu.tw/ftms-bin/kiwi.sh/.. research. through. the. Internet:. In this corpus, over 200 entries are. found for each case study, but due to limited time only 200 entries are tagged in detailed in this paper. Other websites utilized in this study include FrameNet (by Fillmore 1992): http://framenet.icsi.berkeley.edu/, and Sinica BOW (The Academia Sinica. Bilingual. Ontological. Wordnet. by. sinica. research. group):. http://bow.sinica.edu.tw/. 3.2 Methodology In searching for a method of investigating ambiguity, corpus-based approach convincingly shows the advantage of looking at corpus distribution of ambiguous words.. In addition to the obvious syntactic variations which can be easily dected by. native speakers’ intuition, in corpus data, there are some implicit distributional differences which are not directly recognized by speaker intuition.. Therefore,. depending on the corpus-based approach, this paper intends to further explore semantic and syntactic relations within the senses of an ambiguous word in the corpus. Following the approach adopted by Liu et al (2004), the first step of the proposed model is to identify the senses of an ambiguous word corresponding to FrameNet.. In. this step, the extracted data from Sinica Corpus is roughly classified into several - 23 -.

(34) groups by their various collocations of core arguments.. Using the Chinese-English. translations in the BOW online dictionary, these different groups are related to various senses in English corresponding to Chinese.. It is found that each sense of the. ambiguous word extracted from the corpus did relate to different frames in FrameNet. The various core argument collocations of each sense are also similar to various basic patterns with core frame elements in the different frames in FrameNet.. A small tag. corpus with core frame elements of four case studies is established for this study2. Sense is further identified by the second step—Colloconstrucion.. In this. module, first, a search for categorical collocation from Sinica Corpus is executed. The range of the collocations is set up between 5 lexical units preceding and following the target verb.. Then, only the co-occurring categorical types of non-core. arguments are addressed.. These non-core arguments are concluded to be various. categorical collocates.. Within each categorical collocate, some lexical items with. specific meanings are found to frequently appear with different senses of the target verb. In advance of defining the sense of problematic examples in the second step, the third module—contextual information—is necessary. In this step, the relevant lexical items are scrutinized in the context of where the target word exists.. The way. to look for relevant lexical items is to investigate them in previous or following clauses, that is, the relevant items would be found across clausal boundaries (usually within the range of one clause in the front or back of the target clause). The relation between the target word and the relevant items is associated by their semantic similarities.. The semantic similarities are established by relating to the related. wordnet synsets, a set of near-synonyms, in BOW.. 2. See appendix I. - 24 -.

(35) 4. Case study of the motion verb ZOU (走) The motion verb ZOU 走 is an ambiguous word with high frequency. Liu and Lien (2004) mentioned that in Taiwanese (which is considered to be a dialect of Mandarin) ZOU 走 has multiple meanings varying from its “conceptual structure” and “semantic structure”.. Therefore, ZOU 走 can be utilized as a representative case. for research on polysemy. ZOU 走. detail.. In Sinica Corpus, there are more than 1000 entries for. However, for the purpose of economy only 200 entities are tagged in. But all the data are discussed and investigated in this case study.. 4.1 Frame-based Sense identification According to the model proposed in this paper, by the first step, most examples of ZOU 走 from Sinica Corpus can be tentatively classified into four major groups based on their different collocations with core arguments.. Adopting the. Chinese-English translations in the BOW online dictionary, these four groups can be related to various senses in English corresponding to Chinese: sense 1 as ‘walking’ zoulu (走路), sense 2 as ‘moving’ yidong (移動), sense 3 as ‘visiting’ canfang (參訪), and sense 4 as ‘leaving’ likai (離開).. The distribution percentage for each sense is. presented in the table below. (8). Percentage of 200 Entries of ZOU Percentage (%). Sense 1: walking. 49.5. Sense 2: moving. 13.5. Sense 3: visiting. 5. Sense 4: leaving. 32. As can be seen, sense 1 ‘walking’ occurs most frequently (as shown in Table (8)). - 25 -.

(36) and denotes a specific physical action so that it is assumed to be cognitively salient and prototypical.. The other senses, according to Fillmore’s Frame Semantics (1992),. are transferred from the SELF-MOTION domain (Frame) to other domains (Frames). The process of how to transfer from one source domain to the target domain is not discussed in this paper.. It is more important to investigate how the different frames. with their varying basic patterns consisting of core frame elements can help distinguish senses. In this investigation, each sense of ZOU 走 did relate to different frames with core argument collocations corresponding to various basic patterns with core frame elements in FrameNet 3 .. For example, sense 1 ‘walking’ is contained in. SELF-MOTION frame, sense 2 ‘moving’ is in MOTION frame, sense 3 ‘visiting’ is included in ARRIVING frame, and sense 4 ‘leaving’ is in the DEPARTING frame. The classification of ZOU 走 depending on basic patterns with core frame elements is shown in Table (9) –(12). (9) Basic Patterns with Core Frame Elements of Sense 1 ‘Walking’ Sense. Frame. Frame. and. Elements. No.. Basic Pattern with Core Frame Elements and. (%). Examples. Frame Elements Sense1: SELF-MOTION Area,. 3. BP1. Self-mover < *4 < Path. ZOU LU. Goal,. 羊男和姊妺倆[Self-mover]一齊走在林間小路上. (走路). Path,. [Path]。. ‘walking. Source,. /going’. Self-mover,. BP2. Self-mover < * 我們[Self-mover]紛亂,疲憊地走著. In Mandarin, VerbNet is the only Frame-based searching engine, but it is still under construction.. Therefore, in this paper sense identification is via FrameNet through Chinese-English translations 4. The asterisk `*'represents the target verb of each case study. - 26 -. 33.02. 25.47.

(37) Sense. Frame. Frame. and. Elements. No.. Basic Pattern with Core Frame Elements and. (%). Examples. Frame Elements Sense1: SELF-MOTION Area,. BP3. Self-mover < Direction < *. ZOU LU. Goal,. … 一 個 酒 鬼 , 他 [Self-mover] 半 醒 半 睡 地 往 上. (走路). Path,. [Direction]走。…. ‘walking. Source,. /going’. Self-mover,. BP4. Self-mover < * < Area. 12.26. 7.6. ..你[Self-mover]一個人走在威尼斯聖馬可廣場上. Duration,. [Area],…. Direction BP5. Self-mover < path < *. 6.6. [Self-mover]帶球滿街[path]走。 BP6. Self-mover< * < Goal. 3.77. …[Self-mover]再走著就到了遊塞納河的遊船碼 頭 PontdeL [Gaol]… BP7. Self-mover < * < Direction. 2.83. 七人[Self-mover]這時所走的方向[Direction],早 已不是李文秀平日去師父居所的途徑。 BP8. Self-mover < Area < *. 2.83. 我[Self-mover]在滿街水兵和軍官們中間[Area]走 著 BP9. Self-mover < * < Duration. 2.83. [CNI/Self-mover]又走了十來分鐘[Duration],終於 到了小敏的家 (CNI: co-referential Null identity) BP10. Self-mover < Path < *. 1.89. 我[Self-mover]..挑了僻靜的街道[Path]慢慢地走。 BP11. Self-mover < Path < Goal 如果從倫敦清晨出發,[Self-mover]走M1A1公 路[Path]下午3點左右便可抵達愛丁堡[Goal]。. - 27 -. 0.9.

(38) (10) Basic Patterns with Core Frame Elements of Sense 2 ‘Moving’ Sense. Frame. Frame. and. Elements. No.. Basic Pattern with Core Frame Elements and. (%). Examples. Frame Elements Sense2: MOTION. Goal,. YI. Source,. 他與電腦對奕,…電腦[Theme]每走一步,聲彥就. TONG. Theme,. 得全盤摸一遍…. (移動). Direction,. BP12. BP13. Path. ‘moving. 40. Theme < *. 25. Theme < Path < * 這個車隊 [Mover]要沿著峭壁中間的公路[Path]. ’. 往前走三公里… BP14. 25. Theme < Direction < * …整個時代[Theme]要往哪裡[Direction]走才有希 望…. BP16. 5. Theme < Source < Goal <* 車隊[Theme]從臺北[Source]往宜蘭[Goal]走…. BP17. 5. Theme < Area < Path < * 西 方 式 的 民 主 政 治 [self-mover] , 在 中 國 大 陸 [Area]還有極其長遠的路[Path]要走. (11) Basic Patterns with Core Frame Elements of Sense 3 ‘Visiting’ Sense. Frame. Frame. and. Elements. No.. Basic Pattern with Core Frame Elements and. (%). Examples. Frame Elements Sense3: ARRIVING. Area,. GUAN. Goal,. 我[Self-mover]今天其實打算走 一趟〝金洋村〞. CAN. Self-mover,. [Area],看看有沒有機會一訪〝神秘湖〞。. (參觀). Source. BP5. BP8. ‘visiting. Self-mover < * < Area. Self-mover < Area < *. 50. 30. 再來我們[Self-mover]到南橫[Area]走一趟. ’ BP18. Area < Self-mover < * 塔克金溪縱谷與司馬庫斯部落[Area]遙遙相望。 我[Self-mover]希望下次有機會去走一趟。. - 28 -. 20.

(39) (12) Basic Patterns with Core Frame Elements of Sense 4 ‘Leaving’ Sense. Frame. Frame. and. Elements. No.. Basic Pattern with Core Frame Elements and. (%). Examples. Frame Elements Sense4: DEPARTING. Self-mover,. LI. Source,. 我突然驚醒,她在與大家辭別,[CNI/Self-mover:. Area. 辭別<S>]果真要走了!. KAI. (離開) ‘leaving. BP2. BP19. ’. Self-mover < *. 95.3. Source < * < Self-mover. 3.1. 確實,幾年前香港[Source]移民走了一批高級職員 [Self-mover] BP20. Self-mover < * < Area 回頭一望家三遠,[CNI/Self-mover]不知何事走他 鄉[Area]. As shown in Table (9)-(12), sense 1 ‘walking’ is defined by the basic patterns in the SELF-MOTION frame: Self-mover < * < Path, Self-mover < *, Self-mover < Direction < *, Self-mover < path < *, Self-mover < * < Area, Self-mover < * < Goal, Self-mover < * < Direction, Self-mover < Area < *, Self-mover < * < Duration, Self-mover < Path < *, and Self-mover < Path < Goal; sense 2 ‘moving’ is identified by the basic patterns in the MOTION frame: Theme < *, Theme < path <*, and Theme < Direction < *, Theme < Source < Goal < *, and Theme < Area < Path < *; the meaning of ‘visiting’(sense 3) is determined by the basic patterns in the ARRIVING frame: Area <Self-mover < *, Self-mover < Area < *, and Self-mover < * < Area; sense 4 linked to the meaning of ‘leaving’, is according to the basic patterns in the DEPARTING frame: Self-mover<*, Source < *< Self-mover and Self-mover < * < Area . However, Table (9)-(12) also show the problem that some cases cannot be disambiguated by frame-based distinction.. That is, it is found that different frames. may have similar basic patterns with core frame elements. following instances are presented: - 29 -. For illustration, the. 1.6.

(40) (13) a. 我[Self-mover]在 wo. 裡[Area]走 (Sense 1 ‘walking’). 紅樹林. zai hong shu lin. I. li. zou. in the mangrove inside. walk. ‘I walked in the mangrove’. b. 他[Self-mover]到. 紅樹林[Area]走一趟 (Sense 3 ‘visiting’). ta. dao. he. goes to. hong shu lin the mangrove. zou. yi tang. walk once. ‘He visited the mangrove.’. (14) a. 我[Self-mover]走在 wo. 公園[Area] (Sense 1 ‘walking’). 大安森林. zou zai da an sen lin gong yuan. I. walk in. Da An forest. park. ‘I walked in Da An forest park.’. b. 我[Self-mover]走 一趟 wo. zou yitang. I. 大安森林. 公園[Area] (Sense 3 ‘visiting’). da an sen lin. gong yuan. go once. Da An forest. park. ‘I visited at Da An forest park’. (15) a. 我腳 好痠,我[Self-mover]沒辦法 wo jiao hao suan, wo my feet so. limp,. 走了(Sense 1 ‘walking’). mei ban fa. I. cannot. zou le. walk LE. ‘My feet are so limp that I can not walk anymore.’. b.火車早就開走了,我們[Self-mover]沒辦法走了(Sense 4 ‘leaving’) huo che zao jiu train. already. kai zou le,. wo men. drive away LE we. mei ban fa zou le cannot. walk LE. ‘The train has already driven away, and we can't leave.’. As can be seen in examples (13a) and (13b), sense 1 ‘walking’ and sense 3 ‘visiting’ share the same pattern: Self-mover < Area < * and in example (14) they - 30 -.

(41) share the same pattern: Self-mover < * < Area5.. In examples (15a) and (15b) sense1. ‘walking’ and sense 4 ‘leaving’ show the similar problem that they share the pattern: Self-mover < * . Why would different senses belonging to different Frames share the same basic pattern?. The reason is that in frame semantics, ambiguity is caused by the transfer. of one domain to another domain, but these two domains may not have totally different sets of core frame elements.. That is, there might be some basic elements. which are shared in various domains.. As a consequence, in the realization of core. frame elements, relative frames may have similar expressions.. In such a situation,. determining how to distinguish the different senses carrying similar basic patterns needs to be further explained.. The next section provides a solution to solve this. problem. 4.2 Colloconstructional distinction When frame semantic information is insufficient, word senses can only be defined by a careful examination of colloconstruction.. As Liu and Wu (2004) have. mentioned, collocational association is also an important anchor for sense disambiguation.. Adopting their findings, the second module—colloconstruction—is. proposed in this paper for the further sense distinction.. In this module, first, a search. for categorical collocation from Sinica Corpus is executed (see 16 A) 6 . various categorical collocates of ZOU 走 are found.. Then,. Several categorical collocates. with high frequency are found from the Table (16A) (the shaded statistic data present. 5. The same basic patterns also represented in the shaded areas in Table (9), (11) and (12).. 6. The statistics and the categories in table (16A) are adopted from Sinica Corpus.. The first acronym. of the categorical label represents the traditional syntactic categories (such as V(erb), N(oun), P(reposition), and so on ( see appendix IV for all the abbreviations of category ) except for the D and Di which means Adverb and aspectual markers, respectively. label specifies the subpart characteristics of the categories. - 31 -. The second alphabet of the categorical.

參考文獻

相關文件

• A way of ensuring that charge confinement does occurs is if there is a global symmetry which under which quarks (heavy charges) are charged and the gluons and other light fields

According to a team at Baycrest’s Rotman Research Institute in Canada, there is a clear link between bilingualism and a delayed onset of the symptoms of Alzheimer ’s and other

There is a growing recognition that China will change the world, not only economically, but also politically, intellectually, ideologically and culturally...

Though there are many different versions of historical accounts regarding the exact time of his arrival, Bodhidharma was no doubt a historical figure, who, arriving in

Over there, there is a celebration of Christmas and the little kid, Tiny Tim, is very ill and the family has no money to send him to a doctor.. Cratchit asks the family

There are three major types of personal finance products: mortgages, personal loans and credit cards. Mortgage is a long-term loan that is used for buying a

For example, there are the procedures of “Mazu’s establishment of a monastery” and the standards of “Baizhang’s introduction of pure regulations.” These were established to

It is useful to augment the description of devices and services with annotations that are not captured in the UPnP Template Language. To a lesser extent, there is value in