第五章 結論與未來研究方向

5.2 未來研究方向

Patterns,隨即將擷取出的 Patterns 分解成動詞與名詞的候選詞,並且依照領域與詞性分開,


根據擷取出的候選詞與及交集的字彙集合,作為 S(D)和 S(D*)兩種不同的實驗樣本,

輸入多指標為主的關鍵詞分析模型進行分析,在呈現不同趨勢的每個指標中,依指標值排序 而選出對著於 AWL 等量或倍數的數量並指標值由高至低的字彙,每一份選出的字彙就成為 該領域下獨具意義的字彙。而藉由對應到研究目的,將不同指標序字彙再做進一步的交集,


然而在單一領域適用之學術常用字彙並無法適用於所有的學術寫作範圍。為了達到通 用性的效果,我們採用統計上常用來計算同質性的方法,以卡方分佈(Chi-Square Measure)對 字彙逐一檢驗,將字彙於各領域下的出現頻率作為樣本資料,計算集合為三個領域交集下的 單一字彙分佈狀況,當卡方值數值低時,表示字彙在各領域分佈較為平均,其同質性較高。

但單獨計算字彙之同質性可能會導致最終的結果字彙偏向在領域之頻率都偏低但同質性高的 字彙,故需另外一個輔助性的Threshold 來修正實驗結果。

最終以在各領域出現頻率大於 140,並且同質性高的 S(D*)候選詞,在選擇 AWL 數量 之 1.5 倍後代表各指標意義的字彙列表交集而成的字彙為主,同時為了補足頻率高而同質性 低但可能為學術寫作常用字彙,將 S(D)的候選詞也依上述條件選出的結果,與 S(D*)之結果 進行聯集而得到最終的字彙列表。其中名詞有246 個,動詞則有 147 個,這些字彙可作為在 學術寫作上與英語學習上 AWL 的補遺,同時也提供以這些字彙為主的常用搭配詞,能讓使 用者更快速的學習這些字彙的使用。

本研究是以關鍵詞擷取技術配合指標分析模型對多領域學術論文語料庫進行剖析,而 在關鍵詞擷取部份是採用 PoS Tag Patterns 作為擷取的目標,取出佔多數的名詞加動詞與動 詞加名詞的組合。但英文句子的表現上詞性的組合相當多種,而且在組成上也不限於最少的 三字彙搭配詞。基於此兩個因素,『N-gram Patterns』與『多詞性關係組合』可作為我們未 來的研究方向。

I. N-gram Patterns:N-gram 為 N 個字彙組成的片斷,其中 N 為正整數,N=1 時稱 為 unigram,N=2 時為 bigram,N=3 叫做 trigram,以此類推。在本研究中,N 介 於2 到 3 之間。當 N 變大時,也意味著字彙之間的組合隨之增加,字彙間的關係 也隨之複雜。但是透過文法中詞性修飾與組成的分析,可精確的取出以學術字彙 為核心的字彙組成片斷。除了 N-gram 外,自然語言處理中針對 Chunks 或是 Noun Phrases 的類似單位都常用於關鍵詞擷取技術的應用上。

II. 多詞性關係組合:英文句子組成中,介系詞主要用於表示與承接其他不同詞性之 間的關係。即使是最常用的動詞加名詞的搭陪詞,最後面仍須接上介系詞與後續 內容相連,而像是介系詞加名詞加介系詞此類的搭配詞也不在少數。除了介系詞 外,副詞常用於修飾動詞,而名詞常用形容詞修飾,這些屬於修飾性質的詞在學 術寫作上也經常被使用,如在 AWL 的 570 字組成中,就有 101 個字彙是由這些 附屬性的字彙所構成。在加入這些詞性的關係後,不僅能擴充常用學術寫作字彙 的數量,且能靈活的使用不同詞性的組合,在學術寫作的進行上更加有所助益。

[1] 郭志華. 學術寫作字彙特色分析. URL: http://ir.lib.nctu.edu.tw/handle/987654321/19252

[2] Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. London: Longman.

[3] Chen, C. Y. & Tang, Y. T. (2004). Collocation errors of Taiwanese college students: Oral or written production. In The proceedings of the Eighth International Symposium on English Teaching(pp. 483- 494). Taipei, Taiwan: The Crane Publishing Co.

[4] McEnery T., & Wilson, A. (Eds.). (2001). Corpus linguistics. Edinburgh: Edinburgh University Press.

[5] Mudraya, O. (2006). Engineering English: A lexical frequency instructional model. English for Specific Purposes, Vol. 25, 235-256.

[6] Biber, D. (1998). Variation across speech and writing. Cambridge: Cambridge University Press.

[7] Conrad, C. M. (1996). Investigating Academic Text With Corpus-Based Techniques: An Example From Biology. Linguistics and Education 8, pp. 299-326.

[8] Thompson, P., & Tribble, C. (2001). Looking at Citations: Using Corpora in English for Academic Purposes. Language Learning & Technology, Vol.5, Num. 3 pp. 91-105.

[9] Biber, D., Conrad, S., & Reppen, R. (1998). Corpus Linguistics: Investigating language structure and use. Cambridge: Cambridge University Press.

[10] Ercan, G., & Cicekli, I. (2007). Using Lexical Chains for Keyword Extraction. Information Processing & Management, Vol.43, Issue 6, pp. 1705-1714.

[11] Matsuo, Y., Ishizuka, M. (2003). Keyword Exraction from a Single Document using Word Co-occurrence Statistical Information. International Journal on Artificial Intelligence Tools. World Scientific Publishing Company.

[12] Giarlo, M. J. (2005). A Comparative Analysis of Keyword Extraction Techniques. Rutgers, The State University of New Jersey.

[13] 魏智強. (2006). 自動化問答系統之研製. 私立中華大學資訊工程研究所碩士論文.民國九 十五年八月.

[14] 王俊弘, 劉昭麟, 高照明. (2003). 電腦輔助英文字彙出題系統之研究. 2003 人工智慧,模 糊系統及灰色系統聯合研討會論文集.

[15] Hulth, A. (2003). Improved Automatic Keyword Extraction Given More Linguistic Knowledge.

Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Sapporo, July, 2003, pp. 216-223.

[16] Turney T. D. (2000). Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303–336.

[17] Frank E., Paynter G. W., Witten I. H. (1999). Domain-specific keyphrase extraction. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’99), pages 668–673, Stockholm, Sweden.

[18] Dutta, B., Majumder K. & Sen, B. K. (2009). An analytical model for investigation of some characteristics of the keywords of the subject fermi liquid: a case study. Annals of Library and Information Studies, Vol. 56, December 2009, pp. 273-290

[19] Nation, P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press.

[20] Coxhead, A., & Nation, P. (2001). The specialized vocabulary of English for academic purposes. In J. Flowerdew & M. Peacock (Eds.), Research perspectives on English for academic purpose (pp.252-267). Cambridge: Cambridge University Press.

[21] West, M. (1953). A general service list of English words. London: Longmans, Green.

[22] Coxhead, A. (2000). The Academic Word List: A Corpus-based Word List for Academic Purposes. TESOL quarterly, 2000.

[23] 台大教育視聽館 Academic Vocabulary, URL : http://efreeway.avcenter.ntu.edu.tw/freeway /postgraduates/vocab/vocab_index.html

[24] 廖柏森. (2008). 英文研究論文寫作 - 搭配詞指引 : 眾文圖書.

[25] Benson, M., Benson, E., & Ilson, R. (2007). The BBI dictionary of English word combinations.

台北 : 書林.

[26] 黃茹玉. (2007). 探討應用語言學期刊論文中學術字彙之使用. 國立清華大學外國語文學 系碩士班外語教學組碩士論文. 民國九十六年六月.

[27] Chuang, T. C., Jian, J. J., Chang, Y. C. & Chang, S. C. (2005). Collocational Translation Memory Extraction Based on Statistical and Linguistic Information. Computational Linguistics and Chinese Language Processing Vol. 10, No. 3, September 2005, pp. 329-346.

[28] Nesselhauf, N (2003). The use of collocations by advanced learners of English and some implications for teaching. Applied Linguistics, 24, 223- 242.

[29] Bird, S. (2006) .The Natural Language Toolkit, Proceedings of the COLING/ACL on Interactive presentation sessions table of contents 2006. Sydney, Australia. pp.69 - 72

[30] Lucas, N., Cremilleux, B. & Turmel, L. (2003). Signalling well-written academic articles in an English corpus by text mining techniques. Proceedings Corpus Linguistics 2003. pp. 465-474.

[31] Mantel, N. (1963). Chi-square tests with one degree of freedom; extensions of the Mantel-Haenszel procedure. Journal of the American Statistical Association, Vol. 58, No. 303. pp.


complete blind blind blind blind access vary act act foresee act achieve address burden sign burden complete affect

close complete complete complete root analyze type root sum address address approach sum address address charge close assume reach charge charge trace trace benefit lower trace trace type type bias purpose type list fire sum clarify

list sum root sign promise code code promise promise advise survey conclude occur survey survey lower sign consist research sign burden cease force construct

step force force bridge lower contact parallel lower lower intervene list contrast

cover list type code balance contribute repeat balance balance visit code cooperate conclude code code invert occur coordinate

form research research reserve research create view visit project parallel substitute demonstrate separate substitute substitute decay step denote

benefit step step subdivide parallel derive employ reserve reserve click cover design

direct parallel travel attack form detect function dot dot view attack devote

rate form pertain separate view discriminate summarize click click escape project dominate

produce attack attack waste constrain emerge fail view view transfer separate enable study project visit regress benefit encounter divide constrain constrain travel employ ensure

index separate separate function exhibit establish limit waste waste trust transfer estimate maintain benefit benefit pertain function evaluate

link exhibit exhibit rate pertain exclude access transfer transfer strike rate exhibit correct travel parallel boost drive extract

leave function function float summarize facilitate position trust trust delay fail focus

object pertain form score study function remain rate rate contact divide generate transform drive drive analyse yield identify experiment study study duplicate count illustrate

space yield yield link index image carry index index access limit imply attempt limit limit correct delay impose

ensure delay delay plan score incorporate

signal score score fight duplicate induce integrate contact contact smooth link input

cross analyse analyse position access involve approach duplicate duplicate supervise correct isolate

target link link counter plan label replace access access experiment smooth link provide correct correct grant position maintain

receive plan plan space locate maximize analyze smooth smooth segment demand mediate

follow position position exchange transform minimize equal demand demand contract experiment modify avoid counter counter slide picture obtain continue experiment experiment feed tie participate

lead picture picture signal space pose support tie tie display bias precede contrast space space inscribe segment predict

derive bias bias probe attempt process approximate segment segment approach exchange project

meet attempt attempt target miss promote establish exchange exchange equal plot publish

prove contract contract support sort range contribute slide slide manifest signal register

block plot plot approximate display rely introduce sort sort block integrate remove develop signal signal bend cross require incorporate display display request trade research

grow inscribe inscribe register approach restrict size trade trade incorporate target reveal pass approach approach belong notice reverse image target target load fall revise

input notice notice size provide select adopt fall fall image analyze shift control equal equal input follow site illustrate support support control equal survey

fix judge judge design lead target design contrast contrast change support utilize change aim aim test judge accumulate

test manifest manifest shape contrast aggregate suppose approximate approximate subject aim attribute subject span span rank manifest automate expect guide guide site approximate challenge record block block care meet comment involve request request train span contract

ignore track track play guide diminish connect register register weight block display

decide incorporate incorporate interview request eliminate eliminate belong belong scan track exceed

start load load enable register output improve size size release incorporate phase

speed image image comment join proceed affect input input figure belong release return review review watch load resolve

hold control control query size simulate accord design design attribute pass substitute

play change change question image sum weight test test tune input suspend observe shape shape schedule review terminate express subject subject sound control trace

enable rank rank model fix transfer verify record record base design advocate identify site site measure change aid

consist care care process test conceive deal speed speed estimate shape conform figure return return suggest suppose dispose

note train train fear subject impact regard accord accord doubt rank invest depend play play label record pursue prevent weight weight log adjust restore

check interview interview power site tape investigate scan scan shear care violate capture enable enable trail start assist search release release profit speed chart demonstrate comment comment overlie return conduct modify deal deal host train conflict

serve figure figure tip accord debate tend note note strip play decline evaluate regard regard mail weight differentiate

desire check check wire scan expose refer watch watch defect enable grade concern search search corrupt release implicate account query query coach deal interpret

extend attribute attribute scramble figure monitor draw desire desire blow note persist question concern concern kick regard reject

relate account account cheat check seek combine outperform outperform shake investigate stress simplify question question bid capture survive

sense sense sense ship search sustain bind bind bind allot query undertake remove range range love attribute accommodate

range hope hope orient desire adapt move challenge challenge overbid concern compensate challenge mark mark slim account constrain mention tune tune abort outperform evolve

add guarantee guarantee clang draw format minimize schedule schedule whitewash question grant

operate sound sound spar sense guarantee discuss focus focus whistle bind interact

focus rest rest skim range justify represent model model slack hope manipulate

extract claim claim animate challenge parallel assign base base smart mention quote

rest measure measure prime mark react exist answer answer anneal tune schedule model offer offer dangle guarantee bond create process process welch schedule confer satisfy estimate estimate brush sound consent

base store store personify focus cycle construct suggest suggest wedge represent deduce

result increase power counsel assign draft call power log blast rest feature optimize log label chase model fund calculate label box reprint claim institute

select box mind subsidize base issue measure mind fear cull result layer

explain fear output relinquish call panel determine output host flatter measure purchase

handle host stage dwarf determine recover examine stage phase deflect handle regulate

include phase trail negative examine style match trail mail sift include transport implement mail profit forge match allocate correspond profit wire patch implement converse

describe wire strip shot correspond credit build strip corrupt brake describe deviate assume corrupt copy slice answer exploit

apply bid format speech offer index require coach kick bundle define invoke

begin light inspect grab process offset choose orient abort pace estimate random

define copy negative tile store route process format smart mesh decrease trend compare kick prime pitch suggest unify estimate abort overbid flood compute amend

distribute negative animate trap increase appreciate achieve smart slim sprite power aspect

reduce prime anneal hook log assemble bound overbid slack conceal label aware denote animate route pad box corporate obtain slim speech parse mind data

set anneal patch crease bid discrete solve slack stream route light diverse store route proof misbehave corrupt equate decrease speech broadcast credit copy erode

perform patch clock browse orient explicit suggest stream array curate format finite

update proof mesh download bite send broadcast skim tag output generate clock parse lift stage

run array slice hurt phase propose mesh tag dispatch host

associate skim credit flag negative compute parse download clock smart increase slice prune broadcast abort understand tag clip encrypt prime

learn credit slope cascade route write download sift clip proof power prune pitch slope stream respect clip flag prune array organize slope tile relay speech

output sift shot stream anneal stage pitch browse conquer animate

bite flag bid spike broadcast degree tile coach array clock random shot light warp slack negative browse orient proof patch

‧ 後)Threshold 下之同質性分佈


Frequency > 140,依分別指標且同一區間交集,字彙依同質性由高而低排列

三領域交集候選 詞S(D*)實驗所

得結果 (共 124 個動詞)

act, complete, vary, address, enter, reach, occur, conclude, form, view, employ, exhibit, produce, fail, study, limit, maintain, predict, leave, characterize, utilize, remain, lose, attempt, provide, receive, analyze, follow, explore, avoid, continue, lead, support, derive, meet, prove, contribute, imply, introduce, develop, grow, perceive, control, illustrate, fix, design, change,

