• 沒有找到結果。

中文新聞之關聯詞推薦

N/A
N/A
Protected

Academic year: 2021

Share "中文新聞之關聯詞推薦"

Copied!
6
0
0

加載中.... (立即查看全文)

全文

(1)

ύЎཥᆪϐᜢᖄຒ௢ᙚ

໳ᖰ໩ ම࿡ర

ЎϯεᏢၗᆅ܌ୋ௲௤! ЎϯεᏢၗᆅ܌ࣴزғ

cshwang@faculty.pccu.edu.tw

! 92707157@scenet.pccu.edu.tw

ᄔा

Ӣࣁᆛሞᆛၡ٬Ҕ౗ຫٰຫε౲ϯǴаϷࣽמ מೌޑ຾؁Ǵၗૻ໺ሀຫٰຫ৒ܰǴၗૻໆΨВ੻ ቚуǴӧεໆޑၗૻύǴ٬Ҕޣຫٰຫ֚ᜤפډԖ ҔޑၗૻǶ Ҟ߻ε೽ҽၗૻᔠ઩ޑБԄǴࢂ೸ၸӄЎཛྷ ൨Ǵཛྷ൨р಄ӝ٬Ҕޣ܌ᒡΕᜢᗖຒޑၗ਑Ǵࢗ၌ ޑೲࡋߚத጗ᄌǴԶЪ٬Ҕޣӵ݀ᒡΕϼቶݱ܈ᒱ ᇤޑᜢᗖຒǴᗋࢂ཮ᒪᅅ௞೚ӭၗૻǴࣁΑ෧Ͽ೭ ኬޑୢᚒǴ߾Ѹሡவຒᆶຒϐ໔ޑᜢᖄ܄๱ЋǴԾ ୏ϯࡌᄬᜢᖄຒڂǶҁࣴز२ӃճҔᜪઓ࿶ᆛၡᄽ ᆉݤǴԾ୏ᘏڗрϪӝЎཀޑᜢᗖຒǴӆаຒᓎϸ ᙯᓎ౗ޑ៾ख़ीᆉϦԄǴीᆉрᜢᗖຒϐ໔ޑᜢᖄ ៾ख़Ǵࡌᄬޔௗ܈໔ௗᜢᖄޑຒڂǴ٠Ъගٮ٬Ҕ ޣᘤំ܈ᔠ઩ޑୖԵǶ ҁࣴزᒿᐒᒧڗᖄӝཥᆪᆛޑ 500 ጇཥᆪЎ ҹǴჴᡍ่݀ᡉҢѳ֡؂ጇཥᆪԖ 3 ঁࢂ಄ӝΓπ ۓကᜢᗖຒǴځᎩᗨฅό಄ӝΓπۓကǴՠऊϤԋ ё೏٬Ҕޣ܌ௗڙǶჹ؂ጇЎകǴךॺᘏڗ߻ 20 ঁᜢᗖຒ଺ࣁж߄ຒ༼Ǵٰीᆉځᜢᖄ܄Ƕ ᜢᗖຒǺᜢᗖຒǵᜪઓ࿶ᆛၡǵॹ໺ሀኳࠠǵ໔ௗ ᜢᖄຒǶ

1.

ᆣፕ

җܭᆛሞᆛၡޑጲࠁว৖ǴаϷၗૻࣽמВ੻ ຾؁ǴԖຫٰຫӭޑၗૻӧᆛၡ΢໺ሀǴჹ٬Ҕޣ ٰᇥǴाڗளၗૻࢂߚத৒ܰϷБߡޑǴՠࢂӵՖ ӧεໆޑၗૻύǴᘏڗԖҔޑၗૻࢂ΋ঁ࣬྽ख़ा ޑፐᚒǶ Ҟ߻Ӛεᆛઠගٮޑཛྷ൨БԄǴε೽ҽࢂගٮ ᜢᗖຒ຾Չཛྷ൨Ǵٯӵ٬Ҕޣགྷפکًؓౢ཰ޑ࣬ ᜢၗૻǴ཮ᒡΕᜢᗖຒȸًؓȹǴջ཮р౜Ўകύ ϣ֖ȸًؓȹᜢᗖຒޑЎകǴӵ݀٬ҔޣԖᑫ፪ځ ύ΋ጇЎകǴ཮ޔௗᗺᒧϷ᎙᠐၀ጇЎകǴӵ݀၀ ጇЎക٠όࢂ٬ҔޣགྷाޑǴ߾Ѹ໪ख़ཥӆࢗ၌ځ дᜢᗖຒޑЎകǶᗨฅ೭ঁБݤς࿶ှ،٬Ҕޣό ሡा΋ጇጇЎകᘤំǴջёפрሡाޑၗૻǴՠࢂ ӵ݀٬ҔޣᒡΕϼቶݱ܈ᒱᇤޑᜢᗖຒǴᗋࢂคݤ ᔅշ٬ҔޣǴפډወӧԖҔޑၗૻǶ ΋ጇЎകࢂҗ೚ӭຒ༼܌ಔԋޑǴӧ೭٤ຒ༼ ໣ӝύԖ΋٤ख़ाຒ༼ࢂёа೏ᘏڗрٰǴ׎ԋอ Ўᄔा܈ࢂЎകޑ઩ЇຒǴ೭٤ख़ाޑຒ༼Ǵ΋૓ ᆀϐࣁȨᜢᗖຒȩǴΨ൩ࢂӄЎख़ᗺᜢᗖޑຒ༼Ƕ ᜢᗖຒϐ໔ޑᜢᖄ܄ղۓǴࢂаΒຒӅӕр౜ ޑᓎ౗ࣁЬǴٯӵᜢᗖຒȸျໜȹکȸRFIDȹӅӕ р౜ӧЎҹύޑᓎ౗ࡐଯǴ܌аёղᘐځԖޔௗᜢ ᖄǹԶȸ҉ᙦᎩȹکȸRFIDȹӕਔр౜ޑᓎ౗Ψࡐ ଯǴղᘐΨԖޔௗᜢᖄǴ߾ȸျໜȹǵȸRFIDȹϷ ȸ҉ᙦᎩȹ೭ΟঁᜢᗖຒԖᜢᖄǶӢԜȸျໜȹϷ ȸ҉ᙦᎩȹԖ໔ௗᜢᖄǴ٬Ҕޣё೸ၸᜢᗖຒޑό ӕᜢᖄǴפډӕሦୱࠅόӕБӛޑၗૻǶ ՠࢂӧεໆޑЎӷϐύǴΓπᘏڗрᜢᗖຒϷ ᜢᖄຒࢂߚத઻຤ਔ໔ϷΓΚǴӢԜҁࣴزගрԾ ୏ϯᘏڗрᜢᗖຒǴаϷԾ୏ࡌᄬᜢᖄຒ༼৤Ƕ྽ ٬Ҕޣཛྷ൨ၗૻਔǴନΑ཮ගٮЎകύϪӝЎཀޑ ᜢᗖຒǴӕਔΨ௢ᙚԖޔௗ܈໔ௗ࣬ᜢޑᜢᖄຒǶ ҁࣴزճҔᜪઓ࿶ᆛၡޑᄽᆉݤǴࡌҥԾ୏ᘏ ڗрᜢᗖຒޑ૽ግኳࠠǴӆаຒᓎϸᙯᓎ౗ޑ៾ख़ ीᆉϦԄǴीᆉрᜢᗖຒϐ໔ޑᜢᖄ៾ख़Ǵڗளޔ ௗϷ໔ௗᜢᖄຒǶ

2.

Ў᝘௖૸

2.1 ᜢᗖຒᘏڗ ᜢᗖຒࢂЎകԖཀကޑനλಔԋൂՏǴε೽ҽ ޑЎҹԾ୏ϯೀ౛ǴٯӵԾ୏ᄔाǵԾ୏઩ЇϷԾ ୏ϩᜪ฻Ǵ೿཮Ӄ଺ᜢᗖຒᘏڗ୏բǴӆ຾Չࡕុ ೀ౛ǶёаᇥǴᜢᗖຒᘏڗࢂ܌ԖЎҹԾ୏ೀ౛ޑ ୷ᘵᆶਡЈמೌǶ ᜢᗖຒᘏڗޑБݤǴёεౣϩࣁ಍ीݤǵຒ৤ ݤǵೕ߾ݤ܈೭ΟᅿБݤޑӝٳၮҔǶӧၸѐޑЎ ᝘ύǴ΋૓ᜢᗖຒᘏڗޑמೌёа୔ϩԋΟεᜪࠠ [13] [14]Ƕ ಃ΋ᅿࣁຒ৤КჹݤǺջճҔςࡌҥޑຒ৤Ǵ ٰКჹᒡΕЎҹȐ܈ЎѡȑǴᘏڗЎҹύр౜಄ӝ ຒ৤ύޑТᇟǶ ಃΒᅿࣁЎݤও݋ݤǺ೸ၸԾฅᇟقೀ౛מೌ ޑЎݤও݋ำԄǴও݋рЎҹύޑӜຒТᇟǴӆၮ Ҕ΋٤Бݤᆶྗ߾Ǵၸᘠ௞ό፾ӝޑຒ༼Ƕ ಃΟᅿБݤࣁ಍ीϩ݋ݤǺ೸ၸჹЎҹޑϩ ݋Ǵಕᑈى୼ޑ಍ीୖኧࡕǴӆஒ಍ीୖኧ಄ӝߐ ᘖॶޑТᇟᘏڗрٰǶ ځдޑБݤᗋхࡴ΢ॊБݤޑᆕӝၮҔǴ܈ၮ ҔόӕޑᄽᆉݤǶٯӵ Krulwich, B. and Burkey, C.

(2)

[3] ࣁΑЎകԾ୏ϩᜪǴճҔ࿶ᡍݤ߾ᄽᆉݤǴவ

ЎകύᘏڗрᜢᗖຒǴբࣁϩᜪޑ੝ቻॶǴฅԶځ ჴᡍ่݀ࠅᘏڗрεໆЪեᆒዴࡋޑᜢᗖຒǶ

Muñoz, A. [4] ගрคᅱ࿎ԄᏢಞБݤٰᘏڗ

Β ঁ ӷ ޑ ᜢ ᗖ ຒ Ǵ ௦ Ҕ Ծ ፾ ᔈ Ӆ ਁ ౛ ፕ ᆛ ၡ

(Adaptive Resonance TheoryNetwork, ART)Ǵځ่݀

ΨࢂᘏڗрεໆЪեᆒዴࡋޑᜢᗖຒǶSteier, A. M.,

and Belew, R. K. [8]٬Ҕ࣬ϕૻ৲ڄኧ (Mutual

information) ٰीᆉᜢᗖຒ੝ቻॶǴՠځБݤѝૈ

ௗڙΒঁӷޑᜢᗖຒǶ

Turney, P.D. [9][10] ගр Genex ࢎᄬǴЬाа

ᒪ໺୷Ӣᄽᆉݤ (GeneticAlgorithm ,GA)ᘏڗᜢᗖ ຒǴ่݀ѳ֡؂ጇЎകᘏڗрΒঁᜢᗖຒǶ Witten,

I.H., Paynter, G.W., Frank, E., Gutwin[12] ගр΋ঁ KeaჴբࢎᄬǴ٬ҔنԄ(Bayesian)ᄽᆉݤǴԜᄽᆉ

ݤӧ Turney, P.D.[11]ύჴᡍ᛾ܴр Kea ک Genex Ԗ εऊ࣬฻ޑਏ౗Ƕ 2.2 ຒ༼៾ख़ीᆉ ӧၗૻᓯӸᆶᔠ઩ޑጄᛑԶقǴ઩Їᜏڂࢂ૶ ᒵຒ༼ϐ໔໘ቫ܈ᇟཀޑᜢ߯Ǵ଺ࣁ٬Ҕޣᔠ઩ၗ ਑ਔǴё೸ၸ઩Їᜏڂ௢ᙚ࣬՟ཷۺޑӷ܈ຒǶ ΋૓઩Їᜏڂࢂ૶ᒵӕကຒǴᗋԖϸကຒǵቶ ကຒǵ੟ကຒǵ࣬ᜢຒ฻ǴҔаᘉ৖܈ᕭλᔠ઩ຒ ༼ޑЬᚒጄൎǶຒཀ࣬ᜢޑ઩ЇᜏڂǴѸ໪җΓΚ ᆢៈǴ྽ЎകኧໆຫٰຫӭǴ߾ाԖ׳ӭޑΓΚϷ ਔ໔ωૈᆢៈ઩ЇᜏڂǶࣁΑૈ୼ԖਏϷԾ୏ϯࡌ ҥ઩ЇᜏڂǴךॺаຒ༼Ӆӕр౜ޑᜢ߯Ǵٰ଺ࣁ ຒ༼ϐ໔ޑᜢᖄǶ

Aas, K. and Eikvil, L.[1]᏾౛рӚᅿόӕຒ༼

៾ख़ޑीᆉϦԄǴճҔຒ༼៾ख़ٰຑ՗Ўകޑ࣬՟ ࡋǴӵΠॊǺ NࢂࡰЎҹޑᕴኧҞǴM ࢂࡰᘐຒࡕޑຒ༼ᕴ ኧǴniࢂࡰຒ༼ i р౜ޑЎҹኧǶ )2* BooleanǺനᙁൂޑीᆉБԄǴӵ݀၀ຒ༼р౜ ӧ೭ጇЎകǴ߾៾ख़ॶࣁ 1Ǵϸϐ߾ࣁ 1Ƕ! fikࢂࡰຒ༼ i р౜ӧЎക k ύޑԛኧǶ ⎩ ⎨ ⎧ > = otherwise 0 0 if 1 ik ik f w

(2) word frequency weightingǺຒ༼ i р౜ӧЎക k

ύޑԛኧǶ ik ik f w = (3) TF

×

IDF weightingǺຒᓎϸᙯЎҹᓎ౗Ƕ ) log( * i ik ik n N f w = (4) tfc-weightingǺԵቾډόӕޑЎകߏࡋǴीᆉ ၀ຒޑຒᓎϸᙯЎҹᓎ౗ӧЎകύޑКٯǶ

= = M j k j j i ik ik n N f n N f w 1 2 )] log( * [ ) log( * (5) ltc-weightingǺᜪ՟ tfc-weightingǴࣁᗉխଯຒ ᓎޑቹៜǴԶፓ᏾ຒᓎǶ

= + + + = M j j jk i ik ik n N f n N f w 1 2 )] log( ) 0 . 1 [log( ) log( * ) 0 . 1 log( (6) EntropyǺ⪖៾ख़Ǵࢂ΋ঁፄᚇޑ៾ख़ीᆉБ ݤǴǶ ⎟⎟⎠ ⎞ ⎜⎜⎝ ⎛ + + = = ) log( ) ) log( 1 1 ( * ) 0 . 1 log( 1 i ij N j i ij ik ik n f n f N f w ߻ॊ܌ගрޑຒ༼៾ख़ीᆉϦԄǴЬाࢂ଺ࣁ ЎകϩᜪਔǴीᆉЎകޑ࣬՟ࡋǶҁࣴزճҔ೭٤ ຒ༼៾ख़ϦԄǴٰीᆉຒ༼໔ӧύЎཥᆪЎകύޑ ࣬՟ࡋǶ 2.3 ᜪઓ࿶ᆛၡ ᜪઓ࿶ᆛၡࢂ΋ᅿኳᔕғނઓ࿶س಍ޑೀ౛ س಍Ƕғނઓ࿶س಍җ೚ӭઓ࿶ϡ࣬ϕೱ่ǴԶ؂ ঁઓ࿶ϡ೿ԖᒡрϷᒡΕૻဦکځдઓ࿶ϡ࣬ೱ Ϸ໺ሀ੃৲ǶҞ߻ᜪઓ࿶ᆛၡёϩࣁΠᜪΟᅿǺ ಃ΋ᅿᅱ࿎ԄᏢಞᆛၡǴவୢᚒሦୱύගٮ૽ ግጄٯǴх֖ᒡΕၗ਑Ϸᒡрၗ਑Ƕ٠Ъவᆛၡύ ᏢಞᒡΕၗ਑ᆶᒡрၗ਑ޑϣӧჹࢀೕ߾ǶதᔈҔ ܭႣෳ܈ϩᜪ΢Ƕٯӵ໸چϩભ[2]کઇౢႣෳ[6]Ƕ ҁࣴز܌٬Ҕޑॹ໺ሀઓ࿶ᆛၡ(Back -Propagation Network) ջࢂឦܭԜᜪࠠǶ ಃΒᅿคᅱ࿎ԄᏢಞᆛၡǴவୢᚒሦୱύڗள ѝԖᒡΕၗ਑ޑ૽ግጄٯǴ٠வᆛၡύᏢಞᒡΕၗ ਑ޑϣӧᆫᜪೕ߾ǴаᔈҔܭཥޑਢٯǶٯӵԾಔ ᙃࢀ৔კᆛၡ(Self-Organizing MapǴSOM)ǵԾ፾ᔈ Ӆ ਁ ౛ ፕ ᆛ ၡ (Adaptive Resonance Theory

NetworkǴART) Ƕ

ಃΟᅿᖄགྷԄᏢಞᆛၡǴаރᄊᡂኧॶࣁ૽ግ ጄٯǴᏢಞጄٯύޑ૶Ꮻೕ߾ǴฅࡕᔈҔܭѝԖό ֹ᏾ރᄊॶǴԶሡ௢ፕֹ᏾ރᄊޑཥਢٯǴ೭ᅿᆛ ၡёаᔈҔܭᘏڗᔈҔᆶᚇૻၸᘠǶٯӵᓅදߚᅟ ᆛၡ(Hopfield Neural Network)аϷᚈӛ૶Ꮻᆛၡ

(Bi-directional Associative Memory)฻ឦϐǶ

3.

س಍ࢎᄬ

ҁࣴزس಍ࢎᄬϩԋΒঁኳಔǴಃ΋ኳಔࢂԾ ୏ϯᘏڗрᜢᗖຒ༼ǴಃΒኳಔࢂճҔ΢΋ঁኳಔ ޑᜢᗖຒ༼Ǵीᆉຒ༼໔ޑ࣬՟ࡋǴࡌҥᜢᖄຒڂǶ

(3)

3.1 Ծ୏ϯᜢᗖຒᘏڗ ᜢᗖຒᘏڗ؁ᡯඔॊӵΠǺ (1) ࡌҥ૽ግኳࠠǺӃҗΓπБԄٰۓက૽ ግЎҹޑᜢᗖຒǴӆճҔۓကрޑᜢᗖ ຒٰࡌҥ΋ঁ૽ግኳࠠǶ࣬ᜢࢬำፎـ კ΋Ƕ (2) ᘏڗᜢᗖຒǺճҔ΢΋؁ᡯޑኳࠠǴᘏ ڗрෳ၂ЎҹύޑᜢᗖຒǶ࣬ᜢࢬำፎ ـკΒǶ ߄ 1 ຒ܄ೕ߾ޑၸᘠ 1. Ӝຒ 2. ׎৒ຒɠӜຒ 3. ӜຒɠӜຒ 4. Ӝຒɠ୏ຒ 5. ୏ຒɠӜຒ ύЎᘐຒࢂճҔύࣴଣޑύЎᘐຒس಍[5]ٰ ଺ᘐຒǴӆஒᘐрޑຒ༼೸ၸຒ܄ೕ߾ၸᘠǴᘏڗ рংᒧຒǴӵ߄ 1Ƕ ΢ॊբ཰ᘏڗрٰޑংᒧຒǴѸ໪ӆीᆉΟ໨ ੝ቻॶǴ଺ࣁ૽ግၗ਑ޑ੝ቻǴӵ߄ 2Ƕ೭Ο໨૽ ግ੝ቻॶޑඔॊӵΠǺ (1) ຒ༼р౜ޑ៾ख़ǺҁӢનԵቾຒ༼ӧ؂΋ጇЎ ക΢р౜ޑՏ࿼όӕǴԶԖόӕޑख़ा܄Ǵ܌ аຒ༼р౜όӕޑՏ࿼Ǵ߾೛Ԗόӕޑ៾ख़Ƕ ٯӵр౜ӧܩᓐǴ߾៾ख़ࣁ w1Ǵр౜ӧಃ΋ ࢤǴ߾៾ख़ࣁ w2ǴځдӦБޑ៾ख़ࣁ w3Ǵी ᆉБݤӵϦԄ(1)܌ҢǶځύ 1 ik f ࢂຒ༼ i ӧЎ ҹ k ύǴр౜ӧܩᓐޑຒᓎǴ 2 ik f ࢂຒ༼ i ӧ Ўҹ k ύǴр౜ӧಃ΋ࢤޑຒᓎǴ 3 ik f ࢂຒ༼ iӧЎҹ k ύǴр౜ӧځдՏ࿼ޑຒᓎǶ 3 3 2 2 1 1 w f w f w f PWik = ik× + ik× + ik× (1) (2) ࣬ჹຒߏǺ߄Ңຒ༼ߏࡋନаЎകύ܌Ԗຒ༼ ޑѳ֡ߏࡋǶ (3) TFØIDFǺຒᓎϸᙯЎҹᓎ౗ǴԖٿঁ୷ҁଷ ೛Ǻ΋ঁຒр౜ӧ΋ҽЎҹύԛኧຫӭ߾ຫख़ ाǹऩӧ܌Ԗᇆ໣Ўҹύр౜ԛኧຫӭ߾ຫό ख़ाǴӢࣁ߄Ң೭ຒคݤж߄೭ҽЎҹޑ੝ ܄ǴځीᆉБݤӵϦԄ(2)܌ҢǴځύ fikࢂຒ ༼ i ӧЎҹ k ޑຒᓎǴN ࣁᕴЎҹኧǴniࣁԿ Ͽр౜΋ԛຒ༼ i ޑЎҹኧǶ ) log( * i ik ik n N f TFIDF = (2) კ 1 ࡌҥ૽ግኳࠠ კ 2 ᘏڗᜢᗖຒ

(4)

߄ 2 ૽ግၗ਑ޑ੝ቻ ੝ቻӜᆀ ඔॊ ຒᓎ ຒ༼р౜ӧܩᓐǵ२ࢤϷځдՏ࿼ޑ ຒᓎу៾ᕴکǶ ࣬ჹຒߏ ຒ༼ޑߏࡋନаЎകύޑ܌Ԗຒ༼ ѳ֡ߏࡋǶ TFØIDF ຒ༼ޑ TFØIDF ॶǶ ӕਔҗ஑ৎΓπۓက؂ጇཥᆪޑᜢᗖຒǴӆஒ ϐ߻ύЎᘐຒس಍ᘏڗрٰޑংᒧຒ༼Ǵၸᘠ௞ߏ ࡋλܭ 2 ޑຒ༼Ǵीᆉ΢ॊ૽ግ੝ቻǴу΢ࢂցࣁ ᜢᗖຒǴ᏾౛ԋ૽ግၗ਑Ǵ೸ၸᜪઓ࿶ᆛၡޑॹ໺ ሀБݤ૽ግၗ਑Ǵࡌҥрᜢᗖຒޑ૽ግኳࠠǶ ಃΒ؁ᡯࢂаځдཥᆪЎҹբෳ၂ǴӃճҔύ Ўᘐຒس಍ᘏڗрংᒧຒǴीᆉ؂ঁຒ༼ޑ΢ॊΟ ໨੝ቻǴճҔಃ΋؁ᡯޑᜢᗖຒ૽ግኳࠠǴٰᘏڗ рᜢᗖຒǶ 3.2 ᜢᖄຒ௢ᙚ ᜢᖄຒ௢ᙚ؁ᡯඔॊӵΠǺ (1) ीᆉຒ༼៾ख़ǺஒᘏڗрޑᜢᗖຒǴी ᆉځӧ؂ጇЎകޑ៾ख़Ǵࡌҥຒ༼៾ख़ ᔞǶ (2) ीᆉຒ༼࣬՟ࡋǺஒ΢ॊ؁ᡯޑຒ༼៾ ख़ᔞǴीᆉٿٿຒ༼ӧෳ၂Ўകύޑ࣬ ՟ࡋǴᘏڗрԖᜢᖄޑຒ༼Ƕ ΢ॊ࣬ᜢࢬำፎـკ 3Ƕ კ 3 ᜢᗖຒ௢ᙚ ཥᆪޑϣ৒٠όߏǴаϷຒᓎΨόଯǴ܌аҁ ࣴ ز ௦ Ҕ Aas,K. and Eikvil, L.[1] ܌ ග р ޑ

tfc-weighting БݤǴीᆉӚຒ༼ӧෳ၂ཥᆪЎകύ ޑ៾ख़ǴीᆉϦԄӵ(3)ǴN ࢂࡰЎҹޑᕴኧҞǴM ࢂࡰᘐຒࡕޑຒ༼ᕴኧǴniࢂࡰຒ༼ i р౜ޑЎҹ ኧǶ

= = M j k j j i ik ik n N f n N f w 1 2 )] log( * [ ) log( * (3) ीᆉֹຒ༼៾ख़ࡕǴҁࣴزаӛໆޜ໔ٰ߄Ң ЎകϷຒ༼܌ᄬԋޑΒᆢࡋޜ໔ǴٯӵԖ n ጇཥᆪ ЎകǴϷ܌ԖЎകύᕴӅр౜Ԗ m ঁຒ༼Ǵ߾ࡌҥ n*mޑΒᆢࡋޜ໔Ǵӵკ 4Ƕ аΒᆢࡋޜ໔ٰीᆉຒຒ༼໔ޑᜢᖄ܄Ǵᜢᖄ ܄ෳໆБԄа Salton,Gerard [7]ගрޑӛໆᜢᖄ܄ ीᆉϦԄǴӵ(4) Ǵwikࢂࡰຒ༼ i ӧЎക k ύޑ៾ ख़Ƕ

= × = n k jk ik j i T w w T sim 1 ) , (

(4) კ 4 ӛໆޜ໔

4.

ჴբ่݀

ҁࣴزаᖄӝཥᆪᆛ 11~12 Дޑ଄࿶ཥᆪǴᒿ ᐒᒧڗ 500 ጇ଺ࣁࣴزჹຝǴϩԋ 400 ጇࣁ૽ግЎ ҹϷ 100 ጇࣁෳ၂ЎҹǶ ҁࣴزύޑಃ΋໨૽ግ੝ቻǴຒ༼р౜ޑ៾ ख़Ǵࣁуख़р౜ӧܩᓐ܈२ࢤޑᜢᗖຒޑ៾ख़Ǵ೛ ۓр౜ӧܩᓐǴ߾៾ख़ࣁ 2Ǵр౜ӧಃ΋ࢤǴ߾៾ ख़ࣁ 1.5ǴځдӦБޑ៾ख़ࣁ 1Ƕ ԖᜢԾ୏ᘏڗᜢᗖຒǴҁࣴز௦ڗΒᅿБԄٰ ຑ՗ኳࠠԋਏǴಃ΋ᅿࢂ Ian H. Witten ฻Γӧ Kea ኳࠠύ܌ගрޑБݤǴவЎകύᘏڗрᜢᗖຒǴी ᆉԖӭϿࢂ಄ӝΓπۓကޑᜢᗖຒǴځЬाচӢӵ ΠǺ (1) ԜБݤКҔᆒዴ౗Ϸєӣ౗׳৒ܰ೏٬ Ҕޣ౛ှǶ (2) ᆒዴ౗Ϸєӣ౗ёૈ཮ᇤᏤ٬ҔޣǴࣁ Αଓ؃ଯᆒዴ౗Զ឴࣊Αєӣ౗Ǵ܈ଓ ؃ଯєӣ౗Զ឴࣊Αᆒዴ౗Ƕ (3) ҁБݤ಄ӝ٬ҔޣதаЎക܌ᘏڗрᜢ ᗖຒኧໆٰᑽໆǶ ߄ 3 ൩ࢂа؂ጇෳ၂Ўകڗр߻ 5ǵ10ǵ15ǵ 20ঁᜢᗖຒǴ಍ीԖӭϿঁࢂ಄ӝΓπۓကޑᜢᗖ ຒǴ٠ीᆉ؂ጇჴሞ಄ӝޑᜢᗖຒኧໆǶKea ኳࠠ аमЎයтࣁࣴزჹຝǴᘏڗр؂ጇޑ߻ 5ǵ10ǵ

(5)

15ǵ20 ঁຒ༼ύѳ֡Ԗ 0.93ǵ1.39ǵ1.68ǵ1.88 ঁ ಄ӝΓπۓကޑᜢᗖຒǴᗨฅکҁࣴزޑෳ၂Ўҹ ޑᇟقϷϣ৒όӕǴคݤ࠼ᢀޑКၨǴՠࢂҁࣴز ኳࠠޑჴᡍ่݀ᡉҢК Kea ޑၨ٫Ƕ ߄ 3 ᜢᗖຒኧໆ ᘏڗᜢᗖຒኧໆ ಄ ӝ Γ πۓက 5 10 15 20 ѳ֡ 1.98 2.7 2.99 3.1 ಃΒᅿ߾ࢂаനதـޑᆒዴ౗(Precision)Ϸє ӣ౗(Recall)ٰຑ՗Ǵ่݀ፎـკ 5Ǵ಍ी૽ግрჴ ሞ಄ӝᜢᗖຒޑᆒዴ౗Ϸєӣ౗Ǵ่݀߄Ңрӧᘏ ڗຒ༼ኧໆຫӭǴ߾єӣ౗ຫଯǴᆒዴ౗ຫեǶ კ 5 Ծ୏ᘏڗᜢᗖຒޑᆒዴ౗Ϸєӣ౗ Ҟ߻ 500 ጇཥᆪϩ๏ 30 ঁԖ଄࿶ङඳޑࣴز ғຑ՗ǴεऊԖ 58%ޑᜢᗖຒёௗڙǶ ૽ግኳࠠ܌ᘏڗрޑᜢᗖຒǴӧ߻ 20 ঁᜢᗖ ຒޑєӣ౗ଯၲ 0.97ǴӢԜҁࣴز௦ڗ؂ጇЎകޑ ߻ 20 ঁᜢᗖຒǴٰीᆉځᜢᖄ܄Ƕ ٩Ᏽຒ༼ϐ໔ޑᜢᖄ܄Ǵڗрᜢᖄ៾ख़ଯܭ 0.5ޑޔௗϷ໔ௗᜢᖄຒǴӵ߄ 4Ǵ߄ 4 ύޑȸΌ౎ȹ ࢂғౢচ਑Ǵکȸҡϯ཰ȹǵȸҡϯౢ཰ȹϷȸε ഌҡϯȹԖޔௗᜢ߯ǴԶȸҡϯ཰ȹ΋ຒΞჹᔈډ ȸᇸݨ຋ှቷȹǵȸ׫ၗीฝȹ ǵȸԃౢૈȹǴ ΨёаᇥࢂȸΌ౎ȹϷȸᇸݨ຋ှቷȹӧύЎཥ ᆪύԖ໔ௗᜢ߯Ƕ ߄ 4 ޔௗϷ໔ௗᜢᖄຒ ᜢᖄຒ ៾ख़ ࢗ၌ ຒ༼ ޔௗ ໔ௗ ޔ ௗ ໔ௗ Ό౎ ҡϯ཰ ᇸ ݨ ຋ ှ ቷ 0.89 0.99 Ό౎ ҡϯ཰ ׫ၗीฝ 0.89 0.73 Ό౎ ҡϯ཰ ԃౢૈ 0.89 0.73 Ό౎ ҡ ϯ ౢ ཰ ᖄӝ௦ᖼ 0.75 0.89 Ό౎ ε ഌ ҡ ϯ ٿ ۞ ҡ ϯ཰ 0.63 0.75 Γ҇ჾϲ ॶႣය Ѧ ༊ Ӹۭ ዗ᒲࢬΕ 0.66 0.72 Γ҇ჾສ ී Ѡ ༟ ჱݢΟය ݤ ୯ Ѓ ᎿሌՉ 1.00 1.00 Γ҇ჾສ ී Ѡ ༟ ჱݢΟය ᖄສਢ 1.00 0.87 Γ҇ჾສ ී Ѡ ༟ ჱݢΟය ᑼၗ 1.00 0.57 Γ҇ჾສ ී ύ ၗ ሌՉ Ѡ୘ᑼၗ 0.67 0.67 Γ҇ჾສ ී ύ ၗ ሌՉ ֎ ԏ Ѧ ༊Ӹී 0.67 0.67 Γ҇ჾສ ී ύ ၗ ሌՉ Ѧ ༊ Ӹ ීྗഢߎ 0.67 0.54 Γ҇ჾສ ී ύ ၗ ሌՉ ᖄສਢ 0.67 0.59 ύ๮ႝߞ ೯ ૻ ཰ ޣ ၰ ၡ ٬ Ҕ຤ 0.58 0.67 ύ๮ႝߞ ଯ ೲ Ϧ ၡ ᇻܿႝη 0.57 0.75 ύ๮ႝߞ ଯ ೲ Ϧ ၡ ႝ η ԏ ຤س಍ 0.57 0.75 ѠՋ୔ Ѡ ༟ ໣ ი ໦ ݅ ᚆ ৞ཥᑫ୔ 0.76 0.59 ѠՋ୔ Ѡ ༟ ໣ ი ཥВ៓ 0.76 0.55 ѠՋ୔ Ѡ ༟ ໣ ი ჏ က ᑜ ࡹ۬ 0.76 0.77 ѠՋ୔ Ѡ ༟ ໣ ი Ꭶ ғ Ў ϯ׸ 0.76 0.78 ѠՋ୔ ྡྷᒳቷ ᄆ ϯ ε ࠤ ੇ঵Ӧ 0.55 0.83 Ѡ༟໣ი ཥВ៓ ᒳ៓ቷ 0.55 0.96

5.

่ፕ

ҁࣴزޑჴբ่݀ǴளޕаΠ่ፕǺ (1) ύЎཥᆪЎകϣ৒όߏǴᜢᗖຒޑ੝ቻ όܴᡉǴၨό৒ܰᘏڗрٰǴӧҁࣴز ᗨฅԖၨଯޑєӣ౗Ǵࠅࢂեᆒዴ౗Ƕ (2) ҁࣴز܌ᘏڗрٰޑᗨฅόࢂ೿಄ӝΓ πۓကᜢᗖຒǴՠ؂ጇЎകԿϿԖϤԋ ޑຒ༼ёа೏٬ҔޣௗڙǶ

(6)

(3) ҁࣴزаຒ༼ϐ໔Ӆӕр౜ޑᓎ౗Ǵी ᆉຒ༼ᜢᖄ܄Ǵᡣ٬Ҕޣӧอਔ໔ջё ᘤំډύЎཥᆪޑᙁܰ઩ЇϷᜢᖄǶؒ Ԗکࢗ၌ຒ༼Ӆӕр౜ӧЎകύǴՠࢂ ࠅکࢗ၌ຒ༼ޑޔௗᜢᖄຒӅӕр౜ӧ ЎകύǴΨёаբࣁ٬Ҕޣࢗ၌ޑୖԵǶ

6.

҂ٰࣴزБӛ

ҁࣴز܌ᘏڗрޑᜢᗖຒϷᜢᖄຒǴ҂ٰ཮ᝩ ុࣴزӵՖᔈҔӧཥᆪЎҹᔠ઩ޑфૈǶࡌҥрཥ ᆪЎҹᜢᗖຒ༼઩ЇǴаკҢᡉҢόӕᜢᗖຒϐ໔ ޑᜢ߯Ǵڐշ٬Ҕޣזೲࢗ၌܌ሡाޑཥᆪЎҹǶ

ୖԵЎ᝘

[1] Aas, K., Eikvil, L.: Text Categorisation: A Survey. Norwegian Computing Center, Oslo 1999 .

[2] Dutta, S. and Shekhar, S., Bond rating: A

non-conservative application of neural networks, IEEE International Conference on Neural Networks-San Diego, Vol.2 , pp443-450, 1988. [3] Krulwich, B., and Burkey, C. , Learning user

information interests through the extraction of semantically significant phrases. In M. Hearst and H. Hirsh, editors, AAAI 1996 Spring Symposium on Machine Learning in Information Access. California: AAAI Press.

[4] Muñoz, A., Compound key word generation

from document databases using a hierarchical clustering ART model. Intelligent Data Analysis, 1 (1), Amsterdam: Elsevier.1996.

[5] Ma, Wei-Yun and Keh-Jiann Chen, Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff, Proceedings of ACL, Second SIGHAN Workshop on Chinese Language Processing, pp168-171, 2003.

[6] Odom, M, Sharda, R., Aneural network model for bankruptcy prediction,IEEE INNS IJCNN,Vol.2,PP.163-168,1990.

[7] Salton,Gerard , Automatic text processing:the transformation, analysis, and retrieval of information by computer,Addison-wesley publishing Company, Inc,1989.

[8] Steier, A. M., and Belew, R. K. , Exporting phrases: A statistical analysis of topical language. In R. Casey and B. Croft, editors, Second Symposium on Document Analysis and Information Retrieval, pp. 179-190, 1993.

[9] Turney, P.D., Extraction of Keyphrases from Text: Evaluation of Four Algorithms. National Research Council, Institute for Information Technology, Technical Report ERB-1051,1997.

[10] Turney, P.D., Learning to Extract Keyphrases from Text. National Research Council, Institute for Information Technology, Technical Report ERB-1057,1999.

[11] Turney, P.D. Learning algorithms for keyphrase extraction. Information Retrieval, 2, pp.303-336, 2000.

[12] Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C. and Nevill-Manning, C.G., KEA: Practical automatic keyphrase extraction. Proceedings of Digital Libraries 99 (DL'99), pp. 254-256. ACM Press,1999. [13] මϡᡉǴᜢᗖຒԾ୏ᘏڗמೌᆶ࣬ᜢຒӣ㎸Ǵ ύ୯კਜᓔᏢ཮཮ൔǴ1997 ԃǴ12 ДǴಃϖ ΜΐයǴ। 59-64Ƕ [14] මϡᡉǴᜢᗖຒԾ୏ᘏڗמೌϐ௖૸Ǵύ୯კ ਜᓔᏢ཮཮ૻǴ1997 ԃǴ9 ДǴಃ 106 යǴ । 26-29Ƕ

參考文獻

相關文件

The differential mode of association: Understanding of traditional Chinese social structure and the behaviors of the Chinese people. Introduction to Leadership: Concepts

* 1. List any 5 types of market segmentation. Briefly describe the characteristics and contents of a good research report.. Resources for the TEKLA curriculum at

DVDs, Podcasts, language teaching software, video games, and even foreign- language music and music videos can provide positive and fun associations with the language for

Shang-Yu Su, Chao-Wei Huang, and Yun-Nung Chen, “Dual Supervised Learning for Natural Language Understanding and Generation,” in Proceedings of The 57th Annual Meeting of

– One of the strengths of CKC Chinese Input System is that it caters for the input of phrases to increase input speed.. „ The system has predefined common Chinese phrases, such

• One of the strengths of CKC Chinese Input System is that it caters for the input of phrases to increase input speed.  The system has predefined common Chinese phrases, such

Godsill, “Detection of abrupt spectral changes using support vector machines: an application to audio signal segmentation,” Proceedings of the IEEE International Conference

D.Wilcox, “A hidden Markov model framework for video segmentation using audio and image features,” in Proceedings of the 1998 IEEE Internation Conference on Acoustics, Speech,